βΉοΈ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.7 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/ |
| Last Crawled | 2026-03-21 05:50:52 (22 days ago) |
| First Indexed | 2020-05-22 09:58:33 (5 years ago) |
| HTTP Status Code | 200 |
| Meta Title | SELECT - ClickHouse Documentation |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | SELECT Queries Syntax
ΒΆ
SELECT
performs data retrieval.
SELECT
[
DISTINCT
]
expr_list
[
FROM
[
db
.]
table
|
(
subquery
)
|
table_function
]
[
FINAL
]
[
SAMPLE
sample_coeff
]
[
ARRAY
JOIN
...]
[
GLOBAL
]
[
ANY
|
ALL
]
[
INNER
|
LEFT
|
RIGHT
|
FULL
|
CROSS
]
[
OUTER
]
JOIN
(
subquery
)
|
table
USING
columns_list
[
PREWHERE
expr
]
[
WHERE
expr
]
[
GROUP
BY
expr_list
]
[
WITH
TOTALS
]
[
HAVING
expr
]
[
ORDER
BY
expr_list
]
[
LIMIT
[
n
,
]
m
]
[
UNION
ALL
...]
[
INTO
OUTFILE
filename
]
[
FORMAT
format
]
[
LIMIT
n
BY
columns
]
All the clauses are optional, except for the required list of expressions immediately after SELECT.
The clauses below are described in almost the same order as in the query execution conveyor.
If the query omits the
DISTINCT
,
GROUP BY
and
ORDER BY
clauses and the
IN
and
JOIN
subqueries, the query will be completely stream processed, using O(1) amount of RAM.
Otherwise, the query might consume a lot of RAM if the appropriate restrictions are not specified:
max_memory_usage
,
max_rows_to_group_by
,
max_rows_to_sort
,
max_rows_in_distinct
,
max_bytes_in_distinct
,
max_rows_in_set
,
max_bytes_in_set
,
max_rows_in_join
,
max_bytes_in_join
,
max_bytes_before_external_sort
,
max_bytes_before_external_group_by
. For more information, see the section "Settings". It is possible to use external sorting (saving temporary tables to a disk) and external aggregation.
The system does not have "merge join"
.
FROM Clause
ΒΆ
If the FROM clause is omitted, data will be read from the
system.one
table.
The 'system.one' table contains exactly one row (this table fulfills the same purpose as the DUAL table found in other DBMSs).
The FROM clause specifies the table to read data from, or a subquery, or a table function; ARRAY JOIN and the regular JOIN may also be included (see below).
Instead of a table, the SELECT subquery may be specified in brackets.
In this case, the subquery processing pipeline will be built into the processing pipeline of an external query.
In contrast to standard SQL, a synonym does not need to be specified after a subquery. For compatibility, it is possible to write 'AS name' after a subquery, but the specified name isn't used anywhere.
A table function may be specified instead of a table. For more information, see the section "Table functions".
To execute a query, all the columns listed in the query are extracted from the appropriate table. Any columns not needed for the external query are thrown out of the subqueries.
If a query does not list any columns (for example, SELECT count() FROM t), some column is extracted from the table anyway (the smallest one is preferred), in order to calculate the number of rows.
The FINAL modifier can be used only for a SELECT from a CollapsingMergeTree table. When you specify FINAL, data is selected fully "collapsed". Keep in mind that using FINAL leads to a selection that includes columns related to the primary key, in addition to the columns specified in the SELECT. Additionally, the query will be executed in a single stream, and data will be merged during query execution. This means that when using FINAL, the query is processed more slowly. In most cases, you should avoid using FINAL. For more information, see the section "CollapsingMergeTree engine".
SAMPLE Clause
ΒΆ
The
SAMPLE
clause allows for approximated query processing. Approximated query processing is only supported by the tables in the
MergeTree
family, and only if the sampling expression was specified during table creation (see
MergeTree engine
).
The features of data sampling are listed below:
Data sampling is a deterministic mechanism. The result of the same
SELECT .. SAMPLE
query is always the same.
Sampling works consistently for different tables. For tables with a single sampling key, a sample with the same coefficient always selects the same subset of possible data. For example, a sample of user IDs takes rows with the same subset of all the possible user IDs from different tables. This allows using the sample in subqueries in the
IN
clause, as well as for manually correlating results of different queries with samples.
Sampling allows reading less data from a disk. Note that for this you must specify the sampling key correctly. For more details see
Creating a MergeTree Table
.
The
SAMPLE
clause can be specified in several ways:
SAMPLE k
, where
k
is a decimal number from 0 to 1. The query is executed on
k
percent of data. For example,
SAMPLE 0.1
runs the query on 10% of data.
Read more
SAMPLE n
, where
n
is a sufficiently large integer. The query is executed on a sample of at least
n
rows (but not significantly more than this). For example,
SAMPLE 10000000
runs the query on a minimum of 10,000,000 rows.
Read more
SAMPLE k OFFSET m
where
k
and
m
are numbers from 0 to 1. The query is executed on a sample of
k
percent of the data. The data used for the sample is offset by
m
percent.
Read more
SAMPLE k
ΒΆ
In a
SAMPLE k
clause,
k
is a percent amount of data that the sample is taken from. The example is shown below:
SELECT
Title
,
count
()
*
10
AS
PageViews
FROM
hits_distributed
SAMPLE
0
.
1
WHERE
CounterID
=
34
GROUP
BY
Title
ORDER
BY
PageViews
DESC
LIMIT
1000
In this example, the query is executed on a sample from 0.1 (10%) of data. Values of aggregate functions are not corrected automatically, so to get an approximate result, the value 'count()' is manually multiplied by 10.
SAMPLE n
ΒΆ
In this case, the query is executed on a sample of at least
n
rows, where
n
is a sufficiently large integer. For example,
SAMPLE 10000000
.
Since the minimum unit for data reading is one granule (its size is set by the
index_granularity
setting), it makes sense to set a sample that is much larger than the size of the granule.
When using the
SAMPLE n
clause, the relative coefficient is calculated dynamically. Since you do not know which relative percent of data was processed, you do not know the coefficient the aggregate functions should be multiplied by (for example, you do not know if the
SAMPLE 1000000
was taken from a set of 10,000,000 rows or from a set of 1,000,000,000 rows). In this case, use the
_sample_factor
column to get the approximate result.
The
_sample_factor
is a virtual column that ClickHouse stores relative coefficients in. This column is created automatically when you create a table with the specified sampling key. The usage example is shown below:
SELECT
sum
(
Duration
*
_sample_factor
)
FROM
visits
SAMPLE
10000000
If you need to get the approximate count of rows in a
SELECT .. SAMPLE n
query, get the sum() of
_sample_factor
column instead of counting
count(column * _sample_factor)
value. For example:
SELECT
sum
(
_sample_factor
)
FROM
visits
SAMPLE
10000000
Note that to calculate the average in a
SELECT .. SAMPLE n
query, you do not need to use
_sample_factor
column:
SELECT
avg
(
Duration
)
FROM
visits
SAMPLE
10000000
SAMPLE k OFFSET m
ΒΆ
You can specify the
SAMPLE k OFFSET m
clause, where
k
and
m
are numbers from 0 to 1. Examples are shown below.
Example 1.
SAMPLE
1
/
10
In this example, the sample is the 1/10th of all data:
[++------------------]
Example 2.
SAMPLE
1
/
10
OFFSET
1
/
2
Here, the sample of 10% is taken from the second half of data.
[----------++--------]
ARRAY JOIN Clause
ΒΆ
Allows executing JOIN with an array or nested data structure. The intent is similar to the 'arrayJoin' function, but its functionality is broader.
ARRAY JOIN
is essentially
INNER JOIN
with an array. Example:
:) CREATE TABLE arrays_test (s String, arr Array(UInt8)) ENGINE = Memory
CREATE TABLE arrays_test
(
s String,
arr Array(UInt8)
) ENGINE = Memory
Ok.
0 rows in set. Elapsed: 0.001 sec.
:) INSERT INTO arrays_test VALUES ('Hello', [1,2]), ('World', [3,4,5]), ('Goodbye', [])
INSERT INTO arrays_test VALUES
Ok.
3 rows in set. Elapsed: 0.001 sec.
:) SELECT * FROM arrays_test
SELECT *
FROM arrays_test
ββsββββββββ¬βarrββββββ
β Hello β [1,2] β
β World β [3,4,5] β
β Goodbye β [] β
βββββββββββ΄ββββββββββ
3 rows in set. Elapsed: 0.001 sec.
:) SELECT s, arr FROM arrays_test ARRAY JOIN arr
SELECT s, arr
FROM arrays_test
ARRAY JOIN arr
ββsββββββ¬βarrββ
β Hello β 1 β
β Hello β 2 β
β World β 3 β
β World β 4 β
β World β 5 β
βββββββββ΄ββββββ
5 rows in set. Elapsed: 0.001 sec.
An alias can be specified for an array in the ARRAY JOIN clause. In this case, an array item can be accessed by this alias, but the array itself by the original name. Example:
:) SELECT s, arr, a FROM arrays_test ARRAY JOIN arr AS a
SELECT s, arr, a
FROM arrays_test
ARRAY JOIN arr AS a
ββsββββββ¬βarrββββββ¬βaββ
β Hello β [1,2] β 1 β
β Hello β [1,2] β 2 β
β World β [3,4,5] β 3 β
β World β [3,4,5] β 4 β
β World β [3,4,5] β 5 β
βββββββββ΄ββββββββββ΄ββββ
5 rows in set. Elapsed: 0.001 sec.
Multiple arrays of the same size can be comma-separated in the ARRAY JOIN clause. In this case, JOIN is performed with them simultaneously (the direct sum, not the direct product). Example:
:) SELECT s, arr, a, num, mapped FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(x -> x + 1, arr) AS mapped
SELECT s, arr, a, num, mapped
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(lambda(tuple(x), plus(x, 1)), arr) AS mapped
ββsββββββ¬βarrββββββ¬βaββ¬βnumββ¬βmappedββ
β Hello β [1,2] β 1 β 1 β 2 β
β Hello β [1,2] β 2 β 2 β 3 β
β World β [3,4,5] β 3 β 1 β 4 β
β World β [3,4,5] β 4 β 2 β 5 β
β World β [3,4,5] β 5 β 3 β 6 β
βββββββββ΄ββββββββββ΄ββββ΄ββββββ΄βββββββββ
5 rows in set. Elapsed: 0.002 sec.
:) SELECT s, arr, a, num, arrayEnumerate(arr) FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num
SELECT s, arr, a, num, arrayEnumerate(arr)
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num
ββsββββββ¬βarrββββββ¬βaββ¬βnumββ¬βarrayEnumerate(arr)ββ
β Hello β [1,2] β 1 β 1 β [1,2] β
β Hello β [1,2] β 2 β 2 β [1,2] β
β World β [3,4,5] β 3 β 1 β [1,2,3] β
β World β [3,4,5] β 4 β 2 β [1,2,3] β
β World β [3,4,5] β 5 β 3 β [1,2,3] β
βββββββββ΄ββββββββββ΄ββββ΄ββββββ΄ββββββββββββββββββββββ
5 rows in set. Elapsed: 0.002 sec.
ARRAY JOIN also works with nested data structures. Example:
:) CREATE TABLE nested_test (s String, nest Nested(x UInt8, y UInt32)) ENGINE = Memory
CREATE TABLE nested_test
(
s String,
nest Nested(
x UInt8,
y UInt32)
) ENGINE = Memory
Ok.
0 rows in set. Elapsed: 0.006 sec.
:) INSERT INTO nested_test VALUES ('Hello', [1,2], [10,20]), ('World', [3,4,5], [30,40,50]), ('Goodbye', [], [])
INSERT INTO nested_test VALUES
Ok.
3 rows in set. Elapsed: 0.001 sec.
:) SELECT * FROM nested_test
SELECT *
FROM nested_test
ββsββββββββ¬βnest.xβββ¬βnest.yββββββ
β Hello β [1,2] β [10,20] β
β World β [3,4,5] β [30,40,50] β
β Goodbye β [] β [] β
βββββββββββ΄ββββββββββ΄βββββββββββββ
3 rows in set. Elapsed: 0.001 sec.
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest
ββsββββββ¬βnest.xββ¬βnest.yββ
β Hello β 1 β 10 β
β Hello β 2 β 20 β
β World β 3 β 30 β
β World β 4 β 40 β
β World β 5 β 50 β
βββββββββ΄βββββββββ΄βββββββββ
5 rows in set. Elapsed: 0.001 sec.
When specifying names of nested data structures in ARRAY JOIN, the meaning is the same as ARRAY JOIN with all the array elements that it consists of. Example:
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x, nest.y
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`, `nest.y`
ββsββββββ¬βnest.xββ¬βnest.yββ
β Hello β 1 β 10 β
β Hello β 2 β 20 β
β World β 3 β 30 β
β World β 4 β 40 β
β World β 5 β 50 β
βββββββββ΄βββββββββ΄βββββββββ
5 rows in set. Elapsed: 0.001 sec.
This variation also makes sense:
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`
ββsββββββ¬βnest.xββ¬βnest.yββββββ
β Hello β 1 β [10,20] β
β Hello β 2 β [10,20] β
β World β 3 β [30,40,50] β
β World β 4 β [30,40,50] β
β World β 5 β [30,40,50] β
βββββββββ΄βββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.001 sec.
An alias may be used for a nested data structure, in order to select either the JOIN result or the source array. Example:
:) SELECT s, n.x, n.y, nest.x, nest.y FROM nested_test ARRAY JOIN nest AS n
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest AS n
ββsββββββ¬βn.xββ¬βn.yββ¬βnest.xβββ¬βnest.yββββββ
β Hello β 1 β 10 β [1,2] β [10,20] β
β Hello β 2 β 20 β [1,2] β [10,20] β
β World β 3 β 30 β [3,4,5] β [30,40,50] β
β World β 4 β 40 β [3,4,5] β [30,40,50] β
β World β 5 β 50 β [3,4,5] β [30,40,50] β
βββββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.001 sec.
Example of using the arrayEnumerate function:
:) SELECT s, n.x, n.y, nest.x, nest.y, num FROM nested_test ARRAY JOIN nest AS n, arrayEnumerate(nest.x) AS num
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`, num
FROM nested_test
ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num
ββsββββββ¬βn.xββ¬βn.yββ¬βnest.xβββ¬βnest.yββββββ¬βnumββ
β Hello β 1 β 10 β [1,2] β [10,20] β 1 β
β Hello β 2 β 20 β [1,2] β [10,20] β 2 β
β World β 3 β 30 β [3,4,5] β [30,40,50] β 1 β
β World β 4 β 40 β [3,4,5] β [30,40,50] β 2 β
β World β 5 β 50 β [3,4,5] β [30,40,50] β 3 β
βββββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββββ΄ββββββ
5 rows in set. Elapsed: 0.002 sec.
The query can only specify a single ARRAY JOIN clause.
The corresponding conversion can be performed before the WHERE/PREWHERE clause (if its result is needed in this clause), or after completing WHERE/PREWHERE (to reduce the volume of calculations).
JOIN Clause
ΒΆ
Joins the data in the normal
SQL JOIN
sense.
Note
Not related to
ARRAY JOIN
.
SELECT
<
expr_list
>
FROM
<
left_subquery
>
[
GLOBAL
]
[
ANY
|
ALL
]
[
INNER
|
LEFT
|
RIGHT
|
FULL
|
CROSS
]
[
OUTER
]
JOIN
<
right_subquery
>
(
ON
<
expr_list
>
)
|
(
USING
<
column_list
>
)
...
The table names can be specified instead of
<left_subquery>
and
<right_subquery>
. This is equivalent to the
SELECT * FROM table
subquery, except in a special case when the table has the
Join
engine β an array prepared for joining.
Supported types of
JOIN
INNER JOIN
(or
JOIN
)
LEFT JOIN
(or
LEFT OUTER JOIN
)
RIGHT JOIN
(or
RIGHT OUTER JOIN
)
FULL JOIN
(or
FULL OUTER JOIN
)
CROSS JOIN
(or
,
)
See standard
SQL JOIN
description.
ANY or ALL strictness
If
ALL
is specified and the right table has several matching rows, the data will be multiplied by the number of these rows. This is the normal
JOIN
behavior for standard SQL.
If
ANY
is specified and the right table has several matching rows, only the first one found is joined. If the right table has only one matching row, the results of
ANY
and
ALL
are the same.
To set the default strictness value, use the session configuration parameter
join_default_strictness
.
GLOBAL JOIN
When using a normal
JOIN
, the query is sent to remote servers. Subqueries are run on each of them in order to make the right table, and the join is performed with this table. In other words, the right table is formed on each server separately.
When using
GLOBAL ... JOIN
, first the requestor server runs a subquery to calculate the right table. This temporary table is passed to each remote server, and queries are run on them using the temporary data that was transmitted.
Be careful when using
GLOBAL
. For more information, see the section
Distributed subqueries
.
Usage Recommendations
All columns that are not needed for the
JOIN
are deleted from the subquery.
When running a
JOIN
, there is no optimization of the order of execution in relation to other stages of the query. The join (a search in the right table) is run before filtering in
WHERE
and before aggregation. In order to explicitly set the processing order, we recommend running a
JOIN
subquery with a subquery.
Example:
SELECT
CounterID
,
hits
,
visits
FROM
(
SELECT
CounterID
,
count
()
AS
hits
FROM
test
.
hits
GROUP
BY
CounterID
)
ANY
LEFT
JOIN
(
SELECT
CounterID
,
sum
(
Sign
)
AS
visits
FROM
test
.
visits
GROUP
BY
CounterID
)
USING
CounterID
ORDER
BY
hits
DESC
LIMIT
10
ββCounterIDββ¬βββhitsββ¬βvisitsββ
β 1143050 β 523264 β 13665 β
β 731962 β 475698 β 102716 β
β 722545 β 337212 β 108187 β
β 722889 β 252197 β 10547 β
β 2237260 β 196036 β 9522 β
β 23057320 β 147211 β 7689 β
β 722818 β 90109 β 17847 β
β 48221 β 85379 β 4652 β
β 19762435 β 77807 β 7026 β
β 722884 β 77492 β 11056 β
βββββββββββββ΄βββββββββ΄βββββββββ
Subqueries don't allow you to set names or use them for referencing a column from a specific subquery.
The columns specified in
USING
must have the same names in both subqueries, and the other columns must be named differently. You can use aliases to change the names of columns in subqueries (the example uses the aliases 'hits' and 'visits').
The
USING
clause specifies one or more columns to join, which establishes the equality of these columns. The list of columns is set without brackets. More complex join conditions are not supported.
The right table (the subquery result) resides in RAM. If there isn't enough memory, you can't run a
JOIN
.
Each time a query is run with the same
JOIN
, the subquery is run again because the result is not cached. To avoid this, use the special
Join
table engine, which is a prepared array for joining that is always in RAM.
In some cases, it is more efficient to use
IN
instead of
JOIN
.
Among the various types of
JOIN
, the most efficient is
ANY LEFT JOIN
, then
ANY INNER JOIN
. The least efficient are
ALL LEFT JOIN
and
ALL INNER JOIN
.
If you need a
JOIN
for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a
JOIN
might not be very convenient due to the bulky syntax and the fact that the right table is re-accessed for every query. For such cases, there is an "external dictionaries" feature that you should use instead of
JOIN
. For more information, see the section
External dictionaries
.
Processing of Empty or NULL Cells
ΒΆ
While joining tables, the empty cells may appear. The setting
join_use_nulls
define how ClickHouse fills these cells.
If the
JOIN
keys are
Nullable
fields, the rows where at least one of the keys has the value
NULL
are not joined.
WHERE Clause
ΒΆ
If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators.
This expression will be used for filtering data before all other transformations.
If indexes are supported by the database table engine, the expression is evaluated on the ability to use indexes.
PREWHERE Clause
ΒΆ
This clause has the same meaning as the WHERE clause. The difference is in which data is read from the table.
When using PREWHERE, first only the columns necessary for executing PREWHERE are read. Then the other columns are read that are needed for running the query, but only those blocks where the PREWHERE expression is true.
It makes sense to use PREWHERE if there are filtration conditions that are used by a minority of the columns in the query, but that provide strong data filtration. This reduces the volume of data to read.
For example, it is useful to write PREWHERE for queries that extract a large number of columns, but that only have filtration for a few columns.
PREWHERE is only supported by tables from the
*MergeTree
family.
A query may simultaneously specify PREWHERE and WHERE. In this case, PREWHERE precedes WHERE.
If the 'optimize_move_to_prewhere' setting is set to 1 and PREWHERE is omitted, the system uses heuristics to automatically move parts of expressions from WHERE to PREWHERE.
GROUP BY Clause
ΒΆ
This is one of the most important parts of a column-oriented DBMS.
If there is a GROUP BY clause, it must contain a list of expressions. Each expression will be referred to here as a "key".
All the expressions in the SELECT, HAVING, and ORDER BY clauses must be calculated from keys or from aggregate functions. In other words, each column selected from the table must be used either in keys or inside aggregate functions.
If a query contains only table columns inside aggregate functions, the GROUP BY clause can be omitted, and aggregation by an empty set of keys is assumed.
Example:
SELECT
count
(),
median
(
FetchTiming
>
60
?
60
:
FetchTiming
),
count
()
-
sum
(
Refresh
)
FROM
hits
However, in contrast to standard SQL, if the table doesn't have any rows (either there aren't any at all, or there aren't any after using WHERE to filter), an empty result is returned, and not the result from one of the rows containing the initial values of aggregate functions.
As opposed to MySQL (and conforming to standard SQL), you can't get some value of some column that is not in a key or aggregate function (except constant expressions). To work around this, you can use the 'any' aggregate function (get the first encountered value) or 'min/max'.
Example:
SELECT
domainWithoutWWW
(
URL
)
AS
domain
,
count
(),
any
(
Title
)
AS
title
-- getting the first occurred page header for each domain.
FROM
hits
GROUP
BY
domain
For every different key value encountered, GROUP BY calculates a set of aggregate function values.
GROUP BY is not supported for array columns.
A constant can't be specified as arguments for aggregate functions. Example: sum(1). Instead of this, you can get rid of the constant. Example:
count()
.
NULL processing
ΒΆ
For grouping, ClickHouse interprets
NULL
as a value, and
NULL=NULL
.
Here's an example to show what this means.
Assume you have this table:
ββxββ¬ββββyββ
β 1 β 2 β
β 2 β α΄Ία΅α΄Έα΄Έ β
β 3 β 2 β
β 3 β 3 β
β 3 β α΄Ία΅α΄Έα΄Έ β
βββββ΄βββββββ
The query
SELECT sum(x), y FROM t_null_big GROUP BY y
results in:
ββsum(x)ββ¬ββββyββ
β 4 β 2 β
β 3 β 3 β
β 5 β α΄Ία΅α΄Έα΄Έ β
ββββββββββ΄βββββββ
You can see that
GROUP BY
for
Π£ = NULL
summed up
x
, as if
NULL
is this value.
If you pass several keys to
GROUP BY
, the result will give you all the combinations of the selection, as if
NULL
were a specific value.
WITH TOTALS Modifier
ΒΆ
If the WITH TOTALS modifier is specified, another row will be calculated. This row will have key columns containing default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the "total" values).
This extra row is output in JSON*, TabSeparated*, and Pretty* formats, separately from the other rows. In the other formats, this row is not output.
In JSON* formats, this row is output as a separate 'totals' field. In TabSeparated* formats, the row comes after the main result, preceded by an empty row (after the other data). In Pretty* formats, the row is output as a separate table after the main result.
WITH TOTALS
can be run in different ways when HAVING is present. The behavior depends on the 'totals_mode' setting.
By default,
totals_mode = 'before_having'
. In this case, 'totals' is calculated across all rows, including the ones that don't pass through HAVING and 'max_rows_to_group_by'.
The other alternatives include only the rows that pass through HAVING in 'totals', and behave differently with the setting
max_rows_to_group_by
and
group_by_overflow_mode = 'any'
.
after_having_exclusive
β Don't include rows that didn't pass through
max_rows_to_group_by
. In other words, 'totals' will have less than or the same number of rows as it would if
max_rows_to_group_by
were omitted.
after_having_inclusive
β Include all the rows that didn't pass through 'max_rows_to_group_by' in 'totals'. In other words, 'totals' will have more than or the same number of rows as it would if
max_rows_to_group_by
were omitted.
after_having_auto
β Count the number of rows that passed through HAVING. If it is more than a certain amount (by default, 50%), include all the rows that didn't pass through 'max_rows_to_group_by' in 'totals'. Otherwise, do not include them.
totals_auto_threshold
β By default, 0.5. The coefficient for
after_having_auto
.
If
max_rows_to_group_by
and
group_by_overflow_mode = 'any'
are not used, all variations of
after_having
are the same, and you can use any of them (for example,
after_having_auto
).
You can use WITH TOTALS in subqueries, including subqueries in the JOIN clause (in this case, the respective total values are combined).
GROUP BY in External Memory
ΒΆ
You can enable dumping temporary data to the disk to restrict memory usage during GROUP BY.
The
max_bytes_before_external_group_by
setting determines the threshold RAM consumption for dumping GROUP BY temporary data to the file system. If set to 0 (the default), it is disabled.
When using
max_bytes_before_external_group_by
, we recommend that you set max_memory_usage about twice as high. This is necessary because there are two stages to aggregation: reading the date and forming intermediate data (1) and merging the intermediate data (2). Dumping data to the file system can only occur during stage 1. If the temporary data wasn't dumped, then stage 2 might require up to the same amount of memory as in stage 1.
For example, if
max_memory_usage
was set to 10000000000 and you want to use external aggregation, it makes sense to set
max_bytes_before_external_group_by
to 10000000000, and max_memory_usage to 20000000000. When external aggregation is triggered (if there was at least one dump of temporary data), maximum consumption of RAM is only slightly more than
max_bytes_before_external_group_by
.
With distributed query processing, external aggregation is performed on remote servers. In order for the requestor server to use only a small amount of RAM, set
distributed_aggregation_memory_efficient
to 1.
When merging data flushed to the disk, as well as when merging results from remote servers when the
distributed_aggregation_memory_efficient
setting is enabled, consumes up to 1/256 * the number of threads from the total amount of RAM.
When external aggregation is enabled, if there was less than
max_bytes_before_external_group_by
of data (i.e. data was not flushed), the query runs just as fast as without external aggregation. If any temporary data was flushed, the run time will be several times longer (approximately three times).
If you have an ORDER BY with a small LIMIT after GROUP BY, then the ORDER BY CLAUSE will not use significant amounts of RAM.
But if the ORDER BY doesn't have LIMIT, don't forget to enable external sorting (
max_bytes_before_external_sort
).
LIMIT N BY Clause
ΒΆ
LIMIT N BY COLUMNS selects the top N rows for each group of COLUMNS. LIMIT N BY is not related to LIMIT; they can both be used in the same query. The key for LIMIT N BY can contain any number of columns or expressions.
Example:
SELECT
domainWithoutWWW
(
URL
)
AS
domain
,
domainWithoutWWW
(
REFERRER_URL
)
AS
referrer
,
device_type
,
count
()
cnt
FROM
hits
GROUP
BY
domain
,
referrer
,
device_type
ORDER
BY
cnt
DESC
LIMIT
5
BY
domain
,
device_type
LIMIT
100
The query will select the top 5 referrers for each
domain, device_type
pair, but not more than 100 rows (
LIMIT n BY + LIMIT
).
HAVING Clause
ΒΆ
Allows filtering the result received after GROUP BY, similar to the WHERE clause.
WHERE and HAVING differ in that WHERE is performed before aggregation (GROUP BY), while HAVING is performed after it.
If aggregation is not performed, HAVING can't be used.
ORDER BY Clause
ΒΆ
The ORDER BY clause contains a list of expressions, which can each be assigned DESC or ASC (the sorting direction). If the direction is not specified, ASC is assumed. ASC is sorted in ascending order, and DESC in descending order. The sorting direction applies to a single expression, not to the entire list. Example:
ORDER BY Visits DESC, SearchPhrase
For sorting by String values, you can specify collation (comparison). Example:
ORDER BY SearchPhrase COLLATE 'tr'
- for sorting by keyword in ascending order, using the Turkish alphabet, case insensitive, assuming that strings are UTF-8 encoded. COLLATE can be specified or not for each expression in ORDER BY independently. If ASC or DESC is specified, COLLATE is specified after it. When using COLLATE, sorting is always case-insensitive.
We only recommend using COLLATE for final sorting of a small number of rows, since sorting with COLLATE is less efficient than normal sorting by bytes.
Rows that have identical values for the list of sorting expressions are output in an arbitrary order, which can also be nondeterministic (different each time).
If the ORDER BY clause is omitted, the order of the rows is also undefined, and may be nondeterministic as well.
NaN
and
NULL
sorting order:
With the modifier
NULLS FIRST
β First
NULL
, then
NaN
, then other values.
With the modifier
NULLS LAST
β First the values, then
NaN
, then
NULL
.
Default β The same as with the
NULLS LAST
modifier.
Example:
For the table
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 2 β 2 β
β 1 β nan β
β 2 β 2 β
β 3 β 4 β
β 5 β 6 β
β 6 β nan β
β 7 β α΄Ία΅α΄Έα΄Έ β
β 6 β 7 β
β 8 β 9 β
βββββ΄βββββββ
Run the query
SELECT * FROM t_null_nan ORDER BY y NULLS FIRST
to get:
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 7 β α΄Ία΅α΄Έα΄Έ β
β 1 β nan β
β 6 β nan β
β 2 β 2 β
β 2 β 2 β
β 3 β 4 β
β 5 β 6 β
β 6 β 7 β
β 8 β 9 β
βββββ΄βββββββ
When floating point numbers are sorted, NaNs are separate from the other values. Regardless of the sorting order, NaNs come at the end. In other words, for ascending sorting they are placed as if they are larger than all the other numbers, while for descending sorting they are placed as if they are smaller than the rest.
Less RAM is used if a small enough LIMIT is specified in addition to ORDER BY. Otherwise, the amount of memory spent is proportional to the volume of data for sorting. For distributed query processing, if GROUP BY is omitted, sorting is partially done on remote servers, and the results are merged on the requestor server. This means that for distributed sorting, the volume of data to sort can be greater than the amount of memory on a single server.
If there is not enough RAM, it is possible to perform sorting in external memory (creating temporary files on a disk). Use the setting
max_bytes_before_external_sort
for this purpose. If it is set to 0 (the default), external sorting is disabled. If it is enabled, when the volume of data to sort reaches the specified number of bytes, the collected data is sorted and dumped into a temporary file. After all data is read, all the sorted files are merged and the results are output. Files are written to the /var/lib/clickhouse/tmp/ directory in the config (by default, but you can use the 'tmp_path' parameter to change this setting).
Running a query may use more memory than 'max_bytes_before_external_sort'. For this reason, this setting must have a value significantly smaller than 'max_memory_usage'. As an example, if your server has 128 GB of RAM and you need to run a single query, set 'max_memory_usage' to 100 GB, and 'max_bytes_before_external_sort' to 80 GB.
External sorting works much less effectively than sorting in RAM.
SELECT Clause
ΒΆ
The expressions specified in the SELECT clause are analyzed after the calculations for all the clauses listed above are completed.
More specifically, expressions are analyzed that are above the aggregate functions, if there are any aggregate functions.
The aggregate functions and everything below them are calculated during aggregation (GROUP BY).
These expressions work as if they are applied to separate rows in the result.
DISTINCT Clause
ΒΆ
If DISTINCT is specified, only a single row will remain out of all the sets of fully matching rows in the result.
The result will be the same as if GROUP BY were specified across all the fields specified in SELECT without aggregate functions. But there are several differences from GROUP BY:
DISTINCT can be applied together with GROUP BY.
When ORDER BY is omitted and LIMIT is defined, the query stops running immediately after the required number of different rows has been read.
Data blocks are output as they are processed, without waiting for the entire query to finish running.
DISTINCT is not supported if SELECT has at least one array column.
DISTINCT
works with
NULL
as if
NULL
were a specific value, and
NULL=NULL
. In other words, in the
DISTINCT
results, different combinations with
NULL
only occur once.
LIMIT Clause
ΒΆ
LIMIT m
allows you to select the first
m
rows from the result.
LIMIT n
, m allows you to select the first
m
rows from the result after skipping the first
n
rows.
n
and
m
must be non-negative integers.
If there isn't an ORDER BY clause that explicitly sorts results, the result may be arbitrary and nondeterministic.
UNION ALL Clause
ΒΆ
You can use UNION ALL to combine any number of queries. Example:
SELECT
CounterID
,
1
AS
table
,
toInt64
(
count
())
AS
c
FROM
test
.
hits
GROUP
BY
CounterID
UNION
ALL
SELECT
CounterID
,
2
AS
table
,
sum
(
Sign
)
AS
c
FROM
test
.
visits
GROUP
BY
CounterID
HAVING
c
>
0
Only UNION ALL is supported. The regular UNION (UNION DISTINCT) is not supported. If you need UNION DISTINCT, you can write SELECT DISTINCT from a subquery containing UNION ALL.
Queries that are parts of UNION ALL can be run simultaneously, and their results can be mixed together.
The structure of results (the number and type of columns) must match for the queries. But the column names can differ. In this case, the column names for the final result will be taken from the first query. Type casting is performed for unions. For example, if two queries being combined have the same field with non-
Nullable
and
Nullable
types from a compatible type, the resulting
UNION ALL
has a
Nullable
type field.
Queries that are parts of UNION ALL can't be enclosed in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause.
INTO OUTFILE Clause
ΒΆ
Add the
INTO OUTFILE filename
clause (where filename is a string literal) to redirect query output to the specified file.
In contrast to MySQL, the file is created on the client side. The query will fail if a file with the same filename already exists.
This functionality is available in the command-line client and clickhouse-local (a query sent via HTTP interface will fail).
The default output format is TabSeparated (the same as in the command-line client batch mode).
FORMAT Clause
ΒΆ
Specify 'FORMAT format' to get data in any specified format.
You can use this for convenience, or for creating dumps.
For more information, see the section "Formats".
If the FORMAT clause is omitted, the default format is used, which depends on both the settings and the interface used for accessing the DB. For the HTTP interface and the command-line client in batch mode, the default format is TabSeparated. For the command-line client in interactive mode, the default format is PrettyCompact (it has attractive and compact tables).
When using the command-line client, data is passed to the client in an internal efficient format. The client independently interprets the FORMAT clause of the query and formats the data itself (thus relieving the network and the server from the load).
IN Operators
ΒΆ
The
IN
,
NOT IN
,
GLOBAL IN
, and
GLOBAL NOT IN
operators are covered separately, since their functionality is quite rich.
The left side of the operator is either a single column or a tuple.
Examples:
SELECT
UserID
IN
(
123
,
456
)
FROM
...
SELECT
(
CounterID
,
UserID
)
IN
((
34
,
123
),
(
101500
,
456
))
FROM
...
If the left side is a single column that is in the index, and the right side is a set of constants, the system uses the index for processing the query.
Don't list too many values explicitly (i.e. millions). If a data set is large, put it in a temporary table (for example, see the section "External data for query processing"), then use a subquery.
The right side of the operator can be a set of constant expressions, a set of tuples with constant expressions (shown in the examples above), or the name of a database table or SELECT subquery in brackets.
If the right side of the operator is the name of a table (for example,
UserID IN users
), this is equivalent to the subquery
UserID IN (SELECT * FROM users)
. Use this when working with external data that is sent along with the query. For example, the query can be sent together with a set of user IDs loaded to the 'users' temporary table, which should be filtered.
If the right side of the operator is a table name that has the Set engine (a prepared data set that is always in RAM), the data set will not be created over again for each query.
The subquery may specify more than one column for filtering tuples.
Example:
SELECT
(
CounterID
,
UserID
)
IN
(
SELECT
CounterID
,
UserID
FROM
...)
FROM
...
The columns to the left and right of the IN operator should have the same type.
The IN operator and subquery may occur in any part of the query, including in aggregate functions and lambda functions.
Example:
SELECT
EventDate
,
avg
(
UserID
IN
(
SELECT
UserID
FROM
test
.
hits
WHERE
EventDate
=
toDate
(
'2014-03-17'
)
))
AS
ratio
FROM
test
.
hits
GROUP
BY
EventDate
ORDER
BY
EventDate
ASC
βββEventDateββ¬ββββratioββ
β 2014-03-17 β 1 β
β 2014-03-18 β 0.807696 β
β 2014-03-19 β 0.755406 β
β 2014-03-20 β 0.723218 β
β 2014-03-21 β 0.697021 β
β 2014-03-22 β 0.647851 β
β 2014-03-23 β 0.648416 β
ββββββββββββββ΄βββββββββββ
For each day after March 17th, count the percentage of pageviews made by users who visited the site on March 17th.
A subquery in the IN clause is always run just one time on a single server. There are no dependent subqueries.
NULL processing
ΒΆ
During request processing, the IN operator assumes that the result of an operation with
NULL
is always equal to
0
, regardless of whether
NULL
is on the right or left side of the operator.
NULL
values are not included in any dataset, do not correspond to each other and cannot be compared.
Here is an example with the
t_null
table:
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 2 β 3 β
βββββ΄βββββββ
Running the query
SELECT x FROM t_null WHERE y IN (NULL,3)
gives you the following result:
ββxββ
β 2 β
βββββ
You can see that the row in which
y = NULL
is thrown out of the query results. This is because ClickHouse can't decide whether
NULL
is included in the
(NULL,3)
set, returns
0
as the result of the operation, and
SELECT
excludes this row from the final output.
SELECT y IN (NULL, 3)
FROM t_null
ββin(y, tuple(NULL, 3))ββ
β 0 β
β 1 β
βββββββββββββββββββββββββ
Distributed Subqueries
ΒΆ
There are two options for IN-s with subqueries (similar to JOINs): normal
IN
/
JOIN
and
GLOBAL IN
/
GLOBAL JOIN
. They differ in how they are run for distributed query processing.
Attention
Remember that the algorithms described below may work differently depending on the
settings
distributed_product_mode
setting.
When using the regular IN, the query is sent to remote servers, and each of them runs the subqueries in the
IN
or
JOIN
clause.
When using
GLOBAL IN
/
GLOBAL JOINs
, first all the subqueries are run for
GLOBAL IN
/
GLOBAL JOINs
, and the results are collected in temporary tables. Then the temporary tables are sent to each remote server, where the queries are run using this temporary data.
For a non-distributed query, use the regular
IN
/
JOIN
.
Be careful when using subqueries in the
IN
/
JOIN
clauses for distributed query processing.
Let's look at some examples. Assume that each server in the cluster has a normal
local_table
. Each server also has a
distributed_table
table with the
Distributed
type, which looks at all the servers in the cluster.
For a query to the
distributed_table
, the query will be sent to all the remote servers and run on them using the
local_table
.
For example, the query
SELECT
uniq
(
UserID
)
FROM
distributed_table
will be sent to all remote servers as
SELECT
uniq
(
UserID
)
FROM
local_table
and run on each of them in parallel, until it reaches the stage where intermediate results can be combined. Then the intermediate results will be returned to the requestor server and merged on it, and the final result will be sent to the client.
Now let's examine a query with IN:
SELECT
uniq
(
UserID
)
FROM
distributed_table
WHERE
CounterID
=
101500
AND
UserID
IN
(
SELECT
UserID
FROM
local_table
WHERE
CounterID
=
34
)
Calculation of the intersection of audiences of two sites.
This query will be sent to all remote servers as
SELECT
uniq
(
UserID
)
FROM
local_table
WHERE
CounterID
=
101500
AND
UserID
IN
(
SELECT
UserID
FROM
local_table
WHERE
CounterID
=
34
)
In other words, the data set in the IN clause will be collected on each server independently, only across the data that is stored locally on each of the servers.
This will work correctly and optimally if you are prepared for this case and have spread data across the cluster servers such that the data for a single UserID resides entirely on a single server. In this case, all the necessary data will be available locally on each server. Otherwise, the result will be inaccurate. We refer to this variation of the query as "local IN".
To correct how the query works when data is spread randomly across the cluster servers, you could specify
distributed_table
inside a subquery. The query would look like this:
SELECT
uniq
(
UserID
)
FROM
distributed_table
WHERE
CounterID
=
101500
AND
UserID
IN
(
SELECT
UserID
FROM
distributed_table
WHERE
CounterID
=
34
)
This query will be sent to all remote servers as
SELECT
uniq
(
UserID
)
FROM
local_table
WHERE
CounterID
=
101500
AND
UserID
IN
(
SELECT
UserID
FROM
distributed_table
WHERE
CounterID
=
34
)
The subquery will begin running on each remote server. Since the subquery uses a distributed table, the subquery that is on each remote server will be resent to every remote server as
SELECT
UserID
FROM
local_table
WHERE
CounterID
=
34
For example, if you have a cluster of 100 servers, executing the entire query will require 10,000 elementary requests, which is generally considered unacceptable.
In such cases, you should always use GLOBAL IN instead of IN. Let's look at how it works for the query
SELECT
uniq
(
UserID
)
FROM
distributed_table
WHERE
CounterID
=
101500
AND
UserID
GLOBAL
IN
(
SELECT
UserID
FROM
distributed_table
WHERE
CounterID
=
34
)
The requestor server will run the subquery
SELECT
UserID
FROM
distributed_table
WHERE
CounterID
=
34
and the result will be put in a temporary table in RAM. Then the request will be sent to each remote server as
SELECT
uniq
(
UserID
)
FROM
local_table
WHERE
CounterID
=
101500
AND
UserID
GLOBAL
IN
_data1
and the temporary table
_data1
will be sent to every remote server with the query (the name of the temporary table is implementation-defined).
This is more optimal than using the normal IN. However, keep the following points in mind:
When creating a temporary table, data is not made unique. To reduce the volume of data transmitted over the network, specify DISTINCT in the subquery. (You don't need to do this for a normal IN.)
The temporary table will be sent to all the remote servers. Transmission does not account for network topology. For example, if 10 remote servers reside in a datacenter that is very remote in relation to the requestor server, the data will be sent 10 times over the channel to the remote datacenter. Try to avoid large data sets when using GLOBAL IN.
When transmitting data to remote servers, restrictions on network bandwidth are not configurable. You might overload the network.
Try to distribute data across servers so that you don't need to use GLOBAL IN on a regular basis.
If you need to use GLOBAL IN often, plan the location of the ClickHouse cluster so that a single group of replicas resides in no more than one data center with a fast network between them, so that a query can be processed entirely within a single data center.
It also makes sense to specify a local table in the
GLOBAL IN
clause, in case this local table is only available on the requestor server and you want to use data from it on remote servers.
Extreme Values
ΒΆ
In addition to results, you can also get minimum and maximum values for the results columns. To do this, set the
extremes
setting to 1. Minimums and maximums are calculated for numeric types, dates, and dates with times. For other columns, the default values are output.
An extra two rows are calculated β the minimums and maximums, respectively. These extra two rows are output in JSON*, TabSeparated*, and Pretty* formats, separate from the other rows. They are not output for other formats.
In JSON* formats, the extreme values are output in a separate 'extremes' field. In TabSeparated* formats, the row comes after the main result, and after 'totals' if present. It is preceded by an empty row (after the other data). In Pretty* formats, the row is output as a separate table after the main result, and after 'totals' if present.
Extreme values are calculated for rows that have passed through LIMIT. However, when using 'LIMIT offset, size', the rows before 'offset' are included in 'extremes'. In stream requests, the result may also include a small number of rows that passed through LIMIT.
Notes
ΒΆ
The
GROUP BY
and
ORDER BY
clauses do not support positional arguments. This contradicts MySQL, but conforms to standard SQL.
For example,
GROUP BY 1, 2
will be interpreted as grouping by constants (i.e. aggregation of all rows into one).
You can use synonyms (
AS
aliases) in any part of a query.
You can put an asterisk in any part of a query instead of an expression. When the query is analyzed, the asterisk is expanded to a list of all table columns (excluding the
MATERIALIZED
and
ALIAS
columns). There are only a few cases when using an asterisk is justified:
When creating a table dump.
For tables containing just a few columns, such as system tables.
For getting information about what columns are in a table. In this case, set
LIMIT 1
. But it is better to use the
DESC TABLE
query.
When there is strong filtration on a small number of columns using
PREWHERE
.
In subqueries (since columns that aren't needed for the external query are excluded from subqueries).
In all other cases, we don't recommend using the asterisk, since it only gives you the drawbacks of a columnar DBMS instead of the advantages. In other words using the asterisk is not recommended.
Original article |
| Markdown | [](http://devdoc.net/ "ClickHouse Documentation")
ClickHouse Documentation SELECT
Type to start searching
ClickHouse Documentation
- Introduction
Introduction
- [Overview](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/ "Overview")
- [Distinctive Features of ClickHouse](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/introduction/distinctive_features/ "Distinctive Features of ClickHouse")
- [ClickHouse Features that Can Be Considered Disadvantages](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/introduction/features_considered_disadvantages/ "ClickHouse Features that Can Be Considered Disadvantages")
- [Performance](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/introduction/performance/ "Performance")
- [The Yandex.Metrica Task](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/introduction/ya_metrika_task/ "The Yandex.Metrica Task")
- Getting Started
Getting Started
- [Deploying and Running](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/ "Deploying and Running")
- Example Datasets
Example Datasets
- [OnTime](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/ontime/ "OnTime")
- [New York Taxi Data](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/nyc_taxi/ "New York Taxi Data")
- [AMPLab Big Data Benchmark](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/amplab_benchmark/ "AMPLab Big Data Benchmark")
- [WikiStat](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/wikistat/ "WikiStat")
- [Terabyte Click Logs from Criteo](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/criteo/ "Terabyte Click Logs from Criteo")
- [Star Schema Benchmark](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/star_schema/ "Star Schema Benchmark")
- [Yandex.Metrica Data](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/getting_started/example_datasets/metrica/ "Yandex.Metrica Data")
- Interfaces
Interfaces
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/ "Introduction")
- [Command-Line Client](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/cli/ "Command-Line Client")
- [Native Interface (TCP)](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/tcp/ "Native Interface (TCP)")
- [HTTP Interface](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/http/ "HTTP Interface")
- [Input and Output Formats](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/formats/ "Input and Output Formats")
- [JDBC Driver](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/jdbc/ "JDBC Driver")
- [ODBC Driver](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/odbc/ "ODBC Driver")
- Third-Party
Third-Party
- [Client Libraries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/third-party/client_libraries/ "Client Libraries")
- [Integrations](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/third-party/integrations/ "Integrations")
- [Visual Interfaces](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/third-party/gui/ "Visual Interfaces")
- [Proxies](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/interfaces/third-party/proxy/ "Proxies")
- Data Types
Data Types
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/ "Introduction")
- [UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/int_uint/ "UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64")
- [Float32, Float64](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/float/ "Float32, Float64")
- [Decimal](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/decimal/ "Decimal")
- [Boolean](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/boolean/ "Boolean")
- [String](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/string/ "String")
- [FixedString(N)](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/fixedstring/ "FixedString(N)")
- [UUID](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/uuid/ "UUID")
- [Date](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/date/ "Date")
- [DateTime](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/datetime/ "DateTime")
- [Enum](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/enum/ "Enum")
- [Array(T)](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/array/ "Array(T)")
- [AggregateFunction(name, types\_of\_arguments...)](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/nested_data_structures/aggregatefunction/ "AggregateFunction(name, types_of_arguments...)")
- [Tuple(T1, T2, ...)](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/tuple/ "Tuple(T1, T2, ...)")
- [Nullable](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/nullable/ "Nullable")
- Nested Data Structures
Nested Data Structures
- [Nested(Name1 Type1, Name2 Type2, ...)](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/nested_data_structures/nested/ "Nested(Name1 Type1, Name2 Type2, ...)")
- Special Data Types
Special Data Types
- [Expression](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/special_data_types/expression/ "Expression")
- [Set](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/special_data_types/set/ "Set")
- [Nothing](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/special_data_types/nothing/ "Nothing")
- Table Engines
Table Engines
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/ "Introduction")
- MergeTree Family
MergeTree Family
- [MergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/mergetree/ "MergeTree")
- [Data Replication](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/replication/ "Data Replication")
- [Custom Partitioning Key](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/custom_partitioning_key/ "Custom Partitioning Key")
- [ReplacingMergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/replacingmergetree/ "ReplacingMergeTree")
- [SummingMergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/summingmergetree/ "SummingMergeTree")
- [AggregatingMergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/aggregatingmergetree/ "AggregatingMergeTree")
- [CollapsingMergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/collapsingmergetree/ "CollapsingMergeTree")
- [VersionedCollapsingMergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/versionedcollapsingmergetree/ "VersionedCollapsingMergeTree")
- [GraphiteMergeTree](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/graphitemergetree/ "GraphiteMergeTree")
- Log Family
Log Family
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/log_family/ "Introduction")
- [StripeLog](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/stripelog/ "StripeLog")
- [Log](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/log/ "Log")
- [TinyLog](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/tinylog/ "TinyLog")
- Integrations
Integrations
- [Kafka](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/kafka/ "Kafka")
- [MySQL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/mysql/ "MySQL")
- Special
Special
- [Distributed](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/distributed/ "Distributed")
- [External data](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/external_data/ "External data")
- [Dictionary](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/dictionary/ "Dictionary")
- [Merge](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/merge/ "Merge")
- [File](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/file/ "File")
- [Null](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/null/ "Null")
- [Set](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/set/ "Set")
- [Join](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/join/ "Join")
- [URL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/url/ "URL")
- [View](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/view/ "View")
- [MaterializedView](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/materializedview/ "MaterializedView")
- [Memory](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/memory/ "Memory")
- [Buffer](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/buffer/ "Buffer")
- SQL Reference
SQL Reference
- SELECT
[**SELECT**](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/ "SELECT")
Table of contents
- [FROM Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#from-clause "FROM Clause")
- [SAMPLE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-clause "SAMPLE Clause")
- [SAMPLE k](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-k "SAMPLE k")
- [SAMPLE n](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-n "SAMPLE n")
- [SAMPLE k OFFSET m](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-offset "SAMPLE k OFFSET m")
- [ARRAY JOIN Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-array-join-clause "ARRAY JOIN Clause")
- [JOIN Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-join "JOIN Clause")
- [Processing of Empty or NULL Cells](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#processing-of-empty-or-null-cells "Processing of Empty or NULL Cells")
- [WHERE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#where-clause "WHERE Clause")
- [PREWHERE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#prewhere-clause "PREWHERE Clause")
- [GROUP BY Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-clause "GROUP BY Clause")
- [NULL processing](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing "NULL processing")
- [WITH TOTALS Modifier](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#with-totals-modifier "WITH TOTALS Modifier")
- [GROUP BY in External Memory](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-in-external-memory "GROUP BY in External Memory")
- [LIMIT N BY Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-n-by-clause "LIMIT N BY Clause")
- [HAVING Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#having-clause "HAVING Clause")
- [ORDER BY Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#order-by-clause "ORDER BY Clause")
- [SELECT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-clause "SELECT Clause")
- [DISTINCT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distinct "DISTINCT Clause")
- [LIMIT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-clause "LIMIT Clause")
- [UNION ALL Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#union-all-clause "UNION ALL Clause")
- [INTO OUTFILE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#into-outfile-clause "INTO OUTFILE Clause")
- [FORMAT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#format-clause "FORMAT Clause")
- [IN Operators](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-in-operators "IN Operators")
- [NULL processing](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing_1 "NULL processing")
- [Distributed Subqueries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distributed-subqueries "Distributed Subqueries")
- [Extreme Values](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#extreme-values "Extreme Values")
- [Notes](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#notes "Notes")
- [INSERT INTO](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/insert_into/ "INSERT INTO")
- [CREATE](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/create/ "CREATE")
- [ALTER](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/alter/ "ALTER")
- [Other Kinds of Queries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/misc/ "Other Kinds of Queries")
- Functions
Functions
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/ "Introduction")
- [Arithmetic](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/arithmetic_functions/ "Arithmetic")
- [Comparison](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/comparison_functions/ "Comparison")
- [Logical](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/logical_functions/ "Logical")
- [Type Conversion](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/type_conversion_functions/ "Type Conversion")
- [Working with Dates and Times](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/date_time_functions/ "Working with Dates and Times")
- [Working with strings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/string_functions/ "Working with strings")
- [For Searching Strings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/string_search_functions/ "For Searching Strings")
- [For Replacing in Strings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/string_replace_functions/ "For Replacing in Strings")
- [Conditional](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/conditional_functions/ "Conditional ")
- [Mathematical](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/math_functions/ "Mathematical")
- [Rounding](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/rounding_functions/ "Rounding")
- [Working with Arrays](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/array_functions/ "Working with Arrays")
- [Splitting and Merging Strings and Arrays](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/splitting_merging_functions/ "Splitting and Merging Strings and Arrays")
- [Bit](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/bit_functions/ "Bit")
- [Bitmap functions](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/bitmap_functions/ "Bitmap functions")
- [Hash](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/hash_functions/ "Hash")
- [Generating Pseudo-Random Numbers](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/random_functions/ "Generating Pseudo-Random Numbers")
- [Encoding](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/encoding_functions/ "Encoding")
- [Working with UUID](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/uuid_functions/ "Working with UUID")
- [Working with URLs](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/url_functions/ "Working with URLs")
- [Working with IP Addresses](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/ip_address_functions/ "Working with IP Addresses")
- [Working with JSON.](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/json_functions/ "Working with JSON.")
- [Higher-Order](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/higher_order_functions/ "Higher-Order")
- [Working with External Dictionaries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/ext_dict_functions/ "Working with External Dictionaries")
- [Working with Yandex.Metrica Dictionaries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/ym_dict_functions/ "Working with Yandex.Metrica Dictionaries")
- [Implementing the IN Operator](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/in_functions/ "Implementing the IN Operator")
- [arrayJoin](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/array_join/ "arrayJoin")
- [Working with geographical coordinates](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/geo/ "Working with geographical coordinates")
- [Working with Nullable arguments](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/functions_for_nulls/ "Working with Nullable arguments")
- [Other](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/functions/other_functions/ "Other")
- Aggregate Functions
Aggregate Functions
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/agg_functions/ "Introduction")
- [Reference](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/agg_functions/reference/ "Reference")
- [Aggregate function combinators](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/agg_functions/combinators/ "Aggregate function combinators")
- [Parametric aggregate functions](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/agg_functions/parametric_functions/ "Parametric aggregate functions")
- Table Functions
Table Functions
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/ "Introduction")
- [file](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/file/ "file")
- [merge](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/merge/ "merge")
- [numbers](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/numbers/ "numbers")
- [remote](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/remote/ "remote")
- [url](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/url/ "url")
- [jdbc](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/table_functions/jdbc/ "jdbc")
- Dictionaries
Dictionaries
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/ "Introduction")
- External Dictionaries
External Dictionaries
- [General Description](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts/ "General Description")
- [Configuring an External Dictionary](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts_dict/ "Configuring an External Dictionary")
- [Storing Dictionaries in Memory](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts_dict_layout/ "Storing Dictionaries in Memory")
- [Dictionary Updates](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts_dict_lifetime/ "Dictionary Updates")
- [Sources of External Dictionaries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts_dict_sources/ "Sources of External Dictionaries")
- [Dictionary Key and Fields](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts_dict_structure/ "Dictionary Key and Fields")
- [Internal Dictionaries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/internal_dicts/ "Internal Dictionaries")
- [Operators](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/operators/ "Operators")
- [General Syntax](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/ "General Syntax")
- Operations
Operations
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/ "Introduction")
- [Requirements](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/requirements/ "Requirements")
- [Monitoring](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/monitoring/ "Monitoring")
- [Troubleshooting](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/troubleshooting/ "Troubleshooting")
- [Usage Recommendations](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/tips/ "Usage Recommendations")
- [ClickHouse Update](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/update/ "ClickHouse Update")
- [Access Rights](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/access_rights/ "Access Rights")
- [Data Backup](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/backup/ "Data Backup")
- [Configuration Files](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/configuration_files/ "Configuration Files")
- [Quotas](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/quotas/ "Quotas")
- [System Tables](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/system_tables/ "System Tables")
- Server Configuration Parameters
Server Configuration Parameters
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/server_settings/ "Introduction")
- [Server Settings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/server_settings/settings/ "Server Settings")
- Settings
Settings
- [Introduction](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/ "Introduction")
- [Permissions for Queries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/permissions_for_queries/ "Permissions for Queries")
- [Restrictions on Query Complexity](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/query_complexity/ "Restrictions on Query Complexity")
- [Settings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/ "Settings")
- [Settings Profiles](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings_profiles/ "Settings Profiles")
- Utilities
Utilities
- [Overview](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/utils/ "Overview")
- [clickhouse-copier](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/utils/clickhouse-copier/ "clickhouse-copier")
- [clickhouse-local](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/utils/clickhouse-local/ "clickhouse-local")
- F.A.Q.
F.A.Q.
- [General Questions](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/faq/general/ "General Questions")
- Development
Development
- [Overview of ClickHouse Architecture](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/development/architecture/ "Overview of ClickHouse Architecture")
- [How to Build ClickHouse on Linux](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/development/build/ "How to Build ClickHouse on Linux")
- [How to Build ClickHouse on Mac OS X](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/development/build_osx/ "How to Build ClickHouse on Mac OS X")
- [How to Write C++ code](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/development/style/ "How to Write C++ code")
- [How to Run ClickHouse Tests](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/development/tests/ "How to Run ClickHouse Tests")
- [Third-Party Libraries Used](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/development/contrib/ "Third-Party Libraries Used")
- What's New
What's New
- [Roadmap](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/roadmap/ "Roadmap")
- [Changelog](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/changelog/ "Changelog")
- [Security Changelog](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/security_changelog/ "Security Changelog")
- [Single page version](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/single/)
- [PDF version](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/single/clickhouse_en.pdf)
- [ClickHouse source code](https://github.com/yandex/ClickHouse/)
Table of contents
- [FROM Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#from-clause "FROM Clause")
- [SAMPLE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-clause "SAMPLE Clause")
- [SAMPLE k](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-k "SAMPLE k")
- [SAMPLE n](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-n "SAMPLE n")
- [SAMPLE k OFFSET m](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-offset "SAMPLE k OFFSET m")
- [ARRAY JOIN Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-array-join-clause "ARRAY JOIN Clause")
- [JOIN Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-join "JOIN Clause")
- [Processing of Empty or NULL Cells](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#processing-of-empty-or-null-cells "Processing of Empty or NULL Cells")
- [WHERE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#where-clause "WHERE Clause")
- [PREWHERE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#prewhere-clause "PREWHERE Clause")
- [GROUP BY Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-clause "GROUP BY Clause")
- [NULL processing](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing "NULL processing")
- [WITH TOTALS Modifier](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#with-totals-modifier "WITH TOTALS Modifier")
- [GROUP BY in External Memory](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-in-external-memory "GROUP BY in External Memory")
- [LIMIT N BY Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-n-by-clause "LIMIT N BY Clause")
- [HAVING Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#having-clause "HAVING Clause")
- [ORDER BY Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#order-by-clause "ORDER BY Clause")
- [SELECT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-clause "SELECT Clause")
- [DISTINCT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distinct "DISTINCT Clause")
- [LIMIT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-clause "LIMIT Clause")
- [UNION ALL Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#union-all-clause "UNION ALL Clause")
- [INTO OUTFILE Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#into-outfile-clause "INTO OUTFILE Clause")
- [FORMAT Clause](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#format-clause "FORMAT Clause")
- [IN Operators](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-in-operators "IN Operators")
- [NULL processing](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing_1 "NULL processing")
- [Distributed Subqueries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distributed-subqueries "Distributed Subqueries")
- [Extreme Values](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#extreme-values "Extreme Values")
- [Notes](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#notes "Notes")
[ξ](https://github.com/yandex/ClickHouse/edit/master/docs/en/query_language/select.md "Edit this page")
# SELECT Queries Syntax[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-queries-syntax "Permanent link")
`SELECT` performs data retrieval.
```
SELECT [DISTINCT] expr_list
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[ARRAY JOIN ...]
[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr_list] [WITH TOTALS]
[HAVING expr]
[ORDER BY expr_list]
[LIMIT [n, ]m]
[UNION ALL ...]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT n BY columns]
```
All the clauses are optional, except for the required list of expressions immediately after SELECT. The clauses below are described in almost the same order as in the query execution conveyor.
If the query omits the `DISTINCT`, `GROUP BY` and `ORDER BY` clauses and the `IN` and `JOIN` subqueries, the query will be completely stream processed, using O(1) amount of RAM. Otherwise, the query might consume a lot of RAM if the appropriate restrictions are not specified: `max_memory_usage`, `max_rows_to_group_by`, `max_rows_to_sort`, `max_rows_in_distinct`, `max_bytes_in_distinct`, `max_rows_in_set`, `max_bytes_in_set`, `max_rows_in_join`, `max_bytes_in_join`, `max_bytes_before_external_sort`, `max_bytes_before_external_group_by`. For more information, see the section "Settings". It is possible to use external sorting (saving temporary tables to a disk) and external aggregation. `The system does not have "merge join"`.
### FROM Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#from-clause "Permanent link")
If the FROM clause is omitted, data will be read from the `system.one` table. The 'system.one' table contains exactly one row (this table fulfills the same purpose as the DUAL table found in other DBMSs).
The FROM clause specifies the table to read data from, or a subquery, or a table function; ARRAY JOIN and the regular JOIN may also be included (see below).
Instead of a table, the SELECT subquery may be specified in brackets. In this case, the subquery processing pipeline will be built into the processing pipeline of an external query. In contrast to standard SQL, a synonym does not need to be specified after a subquery. For compatibility, it is possible to write 'AS name' after a subquery, but the specified name isn't used anywhere.
A table function may be specified instead of a table. For more information, see the section "Table functions".
To execute a query, all the columns listed in the query are extracted from the appropriate table. Any columns not needed for the external query are thrown out of the subqueries. If a query does not list any columns (for example, SELECT count() FROM t), some column is extracted from the table anyway (the smallest one is preferred), in order to calculate the number of rows.
The FINAL modifier can be used only for a SELECT from a CollapsingMergeTree table. When you specify FINAL, data is selected fully "collapsed". Keep in mind that using FINAL leads to a selection that includes columns related to the primary key, in addition to the columns specified in the SELECT. Additionally, the query will be executed in a single stream, and data will be merged during query execution. This means that when using FINAL, the query is processed more slowly. In most cases, you should avoid using FINAL. For more information, see the section "CollapsingMergeTree engine".
### SAMPLE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-clause "Permanent link")
The `SAMPLE` clause allows for approximated query processing. Approximated query processing is only supported by the tables in the `MergeTree` family, and only if the sampling expression was specified during table creation (see [MergeTree engine](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/mergetree/)).
The features of data sampling are listed below:
- Data sampling is a deterministic mechanism. The result of the same `SELECT .. SAMPLE` query is always the same.
- Sampling works consistently for different tables. For tables with a single sampling key, a sample with the same coefficient always selects the same subset of possible data. For example, a sample of user IDs takes rows with the same subset of all the possible user IDs from different tables. This allows using the sample in subqueries in the `IN` clause, as well as for manually correlating results of different queries with samples.
- Sampling allows reading less data from a disk. Note that for this you must specify the sampling key correctly. For more details see [Creating a MergeTree Table](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/mergetree/#table_engine-mergetree-creating-a-table).
The `SAMPLE` clause can be specified in several ways:
- `SAMPLE k`, where `k` is a decimal number from 0 to 1. The query is executed on `k` percent of data. For example, `SAMPLE 0.1` runs the query on 10% of data. [Read more](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-k)
- `SAMPLE n`, where `n` is a sufficiently large integer. The query is executed on a sample of at least `n` rows (but not significantly more than this). For example, `SAMPLE 10000000` runs the query on a minimum of 10,000,000 rows. [Read more](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-n)
- `SAMPLE k OFFSET m` where `k` and `m` are numbers from 0 to 1. The query is executed on a sample of `k` percent of the data. The data used for the sample is offset by `m` percent. [Read more](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-offset)
#### SAMPLE k[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-k "Permanent link")
In a `SAMPLE k` clause, `k` is a percent amount of data that the sample is taken from. The example is shown below:
```
SELECT
Title,
count() * 10 AS PageViews
FROM hits_distributed
SAMPLE 0.1
WHERE
CounterID = 34
GROUP BY Title
ORDER BY PageViews DESC LIMIT 1000
```
In this example, the query is executed on a sample from 0.1 (10%) of data. Values of aggregate functions are not corrected automatically, so to get an approximate result, the value 'count()' is manually multiplied by 10.
#### SAMPLE n[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-n "Permanent link")
In this case, the query is executed on a sample of at least `n` rows, where `n` is a sufficiently large integer. For example, `SAMPLE 10000000`.
Since the minimum unit for data reading is one granule (its size is set by the `index_granularity` setting), it makes sense to set a sample that is much larger than the size of the granule.
When using the `SAMPLE n` clause, the relative coefficient is calculated dynamically. Since you do not know which relative percent of data was processed, you do not know the coefficient the aggregate functions should be multiplied by (for example, you do not know if the `SAMPLE 1000000` was taken from a set of 10,000,000 rows or from a set of 1,000,000,000 rows). In this case, use the `_sample_factor` column to get the approximate result.
The `_sample_factor` is a virtual column that ClickHouse stores relative coefficients in. This column is created automatically when you create a table with the specified sampling key. The usage example is shown below:
```
SELECT sum(Duration * _sample_factor)
FROM visits
SAMPLE 10000000
```
If you need to get the approximate count of rows in a `SELECT .. SAMPLE n` query, get the sum() of `_sample_factor` column instead of counting `count(column * _sample_factor)` value. For example:
```
SELECT sum(_sample_factor)
FROM visits
SAMPLE 10000000
```
Note that to calculate the average in a `SELECT .. SAMPLE n` query, you do not need to use `_sample_factor` column:
```
SELECT avg(Duration)
FROM visits
SAMPLE 10000000
```
#### SAMPLE k OFFSET m[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-offset "Permanent link")
You can specify the `SAMPLE k OFFSET m` clause, where `k` and `m` are numbers from 0 to 1. Examples are shown below.
Example 1.
```
SAMPLE 1/10
```
In this example, the sample is the 1/10th of all data:
`[++------------------]`
Example 2.
```
SAMPLE 1/10 OFFSET 1/2
```
Here, the sample of 10% is taken from the second half of data.
`[----------++--------]`
### ARRAY JOIN Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-array-join-clause "Permanent link")
Allows executing JOIN with an array or nested data structure. The intent is similar to the 'arrayJoin' function, but its functionality is broader.
`ARRAY JOIN` is essentially `INNER JOIN` with an array. Example:
```
:) CREATE TABLE arrays_test (s String, arr Array(UInt8)) ENGINE = Memory
CREATE TABLE arrays_test
(
s String,
arr Array(UInt8)
) ENGINE = Memory
Ok.
0 rows in set. Elapsed: 0.001 sec.
:) INSERT INTO arrays_test VALUES ('Hello', [1,2]), ('World', [3,4,5]), ('Goodbye', [])
INSERT INTO arrays_test VALUES
Ok.
3 rows in set. Elapsed: 0.001 sec.
:) SELECT * FROM arrays_test
SELECT *
FROM arrays_test
ββsββββββββ¬βarrββββββ
β Hello β [1,2] β
β World β [3,4,5] β
β Goodbye β [] β
βββββββββββ΄ββββββββββ
3 rows in set. Elapsed: 0.001 sec.
:) SELECT s, arr FROM arrays_test ARRAY JOIN arr
SELECT s, arr
FROM arrays_test
ARRAY JOIN arr
ββsββββββ¬βarrββ
β Hello β 1 β
β Hello β 2 β
β World β 3 β
β World β 4 β
β World β 5 β
βββββββββ΄ββββββ
5 rows in set. Elapsed: 0.001 sec.
```
An alias can be specified for an array in the ARRAY JOIN clause. In this case, an array item can be accessed by this alias, but the array itself by the original name. Example:
```
:) SELECT s, arr, a FROM arrays_test ARRAY JOIN arr AS a
SELECT s, arr, a
FROM arrays_test
ARRAY JOIN arr AS a
ββsββββββ¬βarrββββββ¬βaββ
β Hello β [1,2] β 1 β
β Hello β [1,2] β 2 β
β World β [3,4,5] β 3 β
β World β [3,4,5] β 4 β
β World β [3,4,5] β 5 β
βββββββββ΄ββββββββββ΄ββββ
5 rows in set. Elapsed: 0.001 sec.
```
Multiple arrays of the same size can be comma-separated in the ARRAY JOIN clause. In this case, JOIN is performed with them simultaneously (the direct sum, not the direct product). Example:
```
:) SELECT s, arr, a, num, mapped FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(x -> x + 1, arr) AS mapped
SELECT s, arr, a, num, mapped
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(lambda(tuple(x), plus(x, 1)), arr) AS mapped
ββsββββββ¬βarrββββββ¬βaββ¬βnumββ¬βmappedββ
β Hello β [1,2] β 1 β 1 β 2 β
β Hello β [1,2] β 2 β 2 β 3 β
β World β [3,4,5] β 3 β 1 β 4 β
β World β [3,4,5] β 4 β 2 β 5 β
β World β [3,4,5] β 5 β 3 β 6 β
βββββββββ΄ββββββββββ΄ββββ΄ββββββ΄βββββββββ
5 rows in set. Elapsed: 0.002 sec.
:) SELECT s, arr, a, num, arrayEnumerate(arr) FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num
SELECT s, arr, a, num, arrayEnumerate(arr)
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num
ββsββββββ¬βarrββββββ¬βaββ¬βnumββ¬βarrayEnumerate(arr)ββ
β Hello β [1,2] β 1 β 1 β [1,2] β
β Hello β [1,2] β 2 β 2 β [1,2] β
β World β [3,4,5] β 3 β 1 β [1,2,3] β
β World β [3,4,5] β 4 β 2 β [1,2,3] β
β World β [3,4,5] β 5 β 3 β [1,2,3] β
βββββββββ΄ββββββββββ΄ββββ΄ββββββ΄ββββββββββββββββββββββ
5 rows in set. Elapsed: 0.002 sec.
```
ARRAY JOIN also works with nested data structures. Example:
```
:) CREATE TABLE nested_test (s String, nest Nested(x UInt8, y UInt32)) ENGINE = Memory
CREATE TABLE nested_test
(
s String,
nest Nested(
x UInt8,
y UInt32)
) ENGINE = Memory
Ok.
0 rows in set. Elapsed: 0.006 sec.
:) INSERT INTO nested_test VALUES ('Hello', [1,2], [10,20]), ('World', [3,4,5], [30,40,50]), ('Goodbye', [], [])
INSERT INTO nested_test VALUES
Ok.
3 rows in set. Elapsed: 0.001 sec.
:) SELECT * FROM nested_test
SELECT *
FROM nested_test
ββsββββββββ¬βnest.xβββ¬βnest.yββββββ
β Hello β [1,2] β [10,20] β
β World β [3,4,5] β [30,40,50] β
β Goodbye β [] β [] β
βββββββββββ΄ββββββββββ΄βββββββββββββ
3 rows in set. Elapsed: 0.001 sec.
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest
ββsββββββ¬βnest.xββ¬βnest.yββ
β Hello β 1 β 10 β
β Hello β 2 β 20 β
β World β 3 β 30 β
β World β 4 β 40 β
β World β 5 β 50 β
βββββββββ΄βββββββββ΄βββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
When specifying names of nested data structures in ARRAY JOIN, the meaning is the same as ARRAY JOIN with all the array elements that it consists of. Example:
```
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x, nest.y
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`, `nest.y`
ββsββββββ¬βnest.xββ¬βnest.yββ
β Hello β 1 β 10 β
β Hello β 2 β 20 β
β World β 3 β 30 β
β World β 4 β 40 β
β World β 5 β 50 β
βββββββββ΄βββββββββ΄βββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
This variation also makes sense:
```
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`
ββsββββββ¬βnest.xββ¬βnest.yββββββ
β Hello β 1 β [10,20] β
β Hello β 2 β [10,20] β
β World β 3 β [30,40,50] β
β World β 4 β [30,40,50] β
β World β 5 β [30,40,50] β
βββββββββ΄βββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
An alias may be used for a nested data structure, in order to select either the JOIN result or the source array. Example:
```
:) SELECT s, n.x, n.y, nest.x, nest.y FROM nested_test ARRAY JOIN nest AS n
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest AS n
ββsββββββ¬βn.xββ¬βn.yββ¬βnest.xβββ¬βnest.yββββββ
β Hello β 1 β 10 β [1,2] β [10,20] β
β Hello β 2 β 20 β [1,2] β [10,20] β
β World β 3 β 30 β [3,4,5] β [30,40,50] β
β World β 4 β 40 β [3,4,5] β [30,40,50] β
β World β 5 β 50 β [3,4,5] β [30,40,50] β
βββββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
Example of using the arrayEnumerate function:
```
:) SELECT s, n.x, n.y, nest.x, nest.y, num FROM nested_test ARRAY JOIN nest AS n, arrayEnumerate(nest.x) AS num
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`, num
FROM nested_test
ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num
ββsββββββ¬βn.xββ¬βn.yββ¬βnest.xβββ¬βnest.yββββββ¬βnumββ
β Hello β 1 β 10 β [1,2] β [10,20] β 1 β
β Hello β 2 β 20 β [1,2] β [10,20] β 2 β
β World β 3 β 30 β [3,4,5] β [30,40,50] β 1 β
β World β 4 β 40 β [3,4,5] β [30,40,50] β 2 β
β World β 5 β 50 β [3,4,5] β [30,40,50] β 3 β
βββββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββββ΄ββββββ
5 rows in set. Elapsed: 0.002 sec.
```
The query can only specify a single ARRAY JOIN clause.
The corresponding conversion can be performed before the WHERE/PREWHERE clause (if its result is needed in this clause), or after completing WHERE/PREWHERE (to reduce the volume of calculations).
### JOIN Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-join "Permanent link")
Joins the data in the normal [SQL JOIN](https://en.wikipedia.org/wiki/Join_\(SQL\)) sense.
Note
Not related to [ARRAY JOIN](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-array-join-clause).
```
SELECT <expr_list>
FROM <left_subquery>
[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN <right_subquery>
(ON <expr_list>)|(USING <column_list>) ...
```
The table names can be specified instead of `<left_subquery>` and `<right_subquery>`. This is equivalent to the `SELECT * FROM table` subquery, except in a special case when the table has the [Join](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/join/) engine β an array prepared for joining.
**Supported types of `JOIN`**
- `INNER JOIN` (or `JOIN`)
- `LEFT JOIN` (or `LEFT OUTER JOIN`)
- `RIGHT JOIN` (or `RIGHT OUTER JOIN`)
- `FULL JOIN` (or `FULL OUTER JOIN`)
- `CROSS JOIN` (or `,` )
See standard [SQL JOIN](https://en.wikipedia.org/wiki/Join_\(SQL\)) description.
**ANY or ALL strictness**
If `ALL` is specified and the right table has several matching rows, the data will be multiplied by the number of these rows. This is the normal `JOIN` behavior for standard SQL. If `ANY` is specified and the right table has several matching rows, only the first one found is joined. If the right table has only one matching row, the results of `ANY` and `ALL` are the same.
To set the default strictness value, use the session configuration parameter [join\_default\_strictness](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/#settings-join_default_strictness).
**GLOBAL JOIN**
When using a normal `JOIN`, the query is sent to remote servers. Subqueries are run on each of them in order to make the right table, and the join is performed with this table. In other words, the right table is formed on each server separately.
When using `GLOBAL ... JOIN`, first the requestor server runs a subquery to calculate the right table. This temporary table is passed to each remote server, and queries are run on them using the temporary data that was transmitted.
Be careful when using `GLOBAL`. For more information, see the section [Distributed subqueries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distributed-subqueries).
**Usage Recommendations**
All columns that are not needed for the `JOIN` are deleted from the subquery.
When running a `JOIN`, there is no optimization of the order of execution in relation to other stages of the query. The join (a search in the right table) is run before filtering in `WHERE` and before aggregation. In order to explicitly set the processing order, we recommend running a `JOIN` subquery with a subquery.
Example:
```
SELECT
CounterID,
hits,
visits
FROM
(
SELECT
CounterID,
count() AS hits
FROM test.hits
GROUP BY CounterID
) ANY LEFT JOIN
(
SELECT
CounterID,
sum(Sign) AS visits
FROM test.visits
GROUP BY CounterID
) USING CounterID
ORDER BY hits DESC
LIMIT 10
```
```
ββCounterIDββ¬βββhitsββ¬βvisitsββ
β 1143050 β 523264 β 13665 β
β 731962 β 475698 β 102716 β
β 722545 β 337212 β 108187 β
β 722889 β 252197 β 10547 β
β 2237260 β 196036 β 9522 β
β 23057320 β 147211 β 7689 β
β 722818 β 90109 β 17847 β
β 48221 β 85379 β 4652 β
β 19762435 β 77807 β 7026 β
β 722884 β 77492 β 11056 β
βββββββββββββ΄βββββββββ΄βββββββββ
```
Subqueries don't allow you to set names or use them for referencing a column from a specific subquery. The columns specified in `USING` must have the same names in both subqueries, and the other columns must be named differently. You can use aliases to change the names of columns in subqueries (the example uses the aliases 'hits' and 'visits').
The `USING` clause specifies one or more columns to join, which establishes the equality of these columns. The list of columns is set without brackets. More complex join conditions are not supported.
The right table (the subquery result) resides in RAM. If there isn't enough memory, you can't run a `JOIN`.
Each time a query is run with the same `JOIN`, the subquery is run again because the result is not cached. To avoid this, use the special [Join](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/join/) table engine, which is a prepared array for joining that is always in RAM.
In some cases, it is more efficient to use `IN` instead of `JOIN`. Among the various types of `JOIN`, the most efficient is `ANY LEFT JOIN`, then `ANY INNER JOIN`. The least efficient are `ALL LEFT JOIN` and `ALL INNER JOIN`.
If you need a `JOIN` for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a `JOIN` might not be very convenient due to the bulky syntax and the fact that the right table is re-accessed for every query. For such cases, there is an "external dictionaries" feature that you should use instead of `JOIN`. For more information, see the section [External dictionaries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts/).
#### Processing of Empty or NULL Cells[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#processing-of-empty-or-null-cells "Permanent link")
While joining tables, the empty cells may appear. The setting [join\_use\_nulls](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/#settings-join_use_nulls) define how ClickHouse fills these cells.
If the `JOIN` keys are [Nullable](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/nullable/) fields, the rows where at least one of the keys has the value [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/#null-literal) are not joined.
### WHERE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#where-clause "Permanent link")
If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators. This expression will be used for filtering data before all other transformations.
If indexes are supported by the database table engine, the expression is evaluated on the ability to use indexes.
### PREWHERE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#prewhere-clause "Permanent link")
This clause has the same meaning as the WHERE clause. The difference is in which data is read from the table. When using PREWHERE, first only the columns necessary for executing PREWHERE are read. Then the other columns are read that are needed for running the query, but only those blocks where the PREWHERE expression is true.
It makes sense to use PREWHERE if there are filtration conditions that are used by a minority of the columns in the query, but that provide strong data filtration. This reduces the volume of data to read.
For example, it is useful to write PREWHERE for queries that extract a large number of columns, but that only have filtration for a few columns.
PREWHERE is only supported by tables from the `*MergeTree` family.
A query may simultaneously specify PREWHERE and WHERE. In this case, PREWHERE precedes WHERE.
If the 'optimize\_move\_to\_prewhere' setting is set to 1 and PREWHERE is omitted, the system uses heuristics to automatically move parts of expressions from WHERE to PREWHERE.
### GROUP BY Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-clause "Permanent link")
This is one of the most important parts of a column-oriented DBMS.
If there is a GROUP BY clause, it must contain a list of expressions. Each expression will be referred to here as a "key". All the expressions in the SELECT, HAVING, and ORDER BY clauses must be calculated from keys or from aggregate functions. In other words, each column selected from the table must be used either in keys or inside aggregate functions.
If a query contains only table columns inside aggregate functions, the GROUP BY clause can be omitted, and aggregation by an empty set of keys is assumed.
Example:
```
SELECT
count(),
median(FetchTiming > 60 ? 60 : FetchTiming),
count() - sum(Refresh)
FROM hits
```
However, in contrast to standard SQL, if the table doesn't have any rows (either there aren't any at all, or there aren't any after using WHERE to filter), an empty result is returned, and not the result from one of the rows containing the initial values of aggregate functions.
As opposed to MySQL (and conforming to standard SQL), you can't get some value of some column that is not in a key or aggregate function (except constant expressions). To work around this, you can use the 'any' aggregate function (get the first encountered value) or 'min/max'.
Example:
```
SELECT
domainWithoutWWW(URL) AS domain,
count(),
any(Title) AS title -- getting the first occurred page header for each domain.
FROM hits
GROUP BY domain
```
For every different key value encountered, GROUP BY calculates a set of aggregate function values.
GROUP BY is not supported for array columns.
A constant can't be specified as arguments for aggregate functions. Example: sum(1). Instead of this, you can get rid of the constant. Example: `count()`.
#### NULL processing[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing "Permanent link")
For grouping, ClickHouse interprets [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/) as a value, and `NULL=NULL`.
Here's an example to show what this means.
Assume you have this table:
```
ββxββ¬ββββyββ
β 1 β 2 β
β 2 β α΄Ία΅α΄Έα΄Έ β
β 3 β 2 β
β 3 β 3 β
β 3 β α΄Ία΅α΄Έα΄Έ β
βββββ΄βββββββ
```
The query `SELECT sum(x), y FROM t_null_big GROUP BY y` results in:
```
ββsum(x)ββ¬ββββyββ
β 4 β 2 β
β 3 β 3 β
β 5 β α΄Ία΅α΄Έα΄Έ β
ββββββββββ΄βββββββ
```
You can see that `GROUP BY` for `Π£ = NULL` summed up `x`, as if `NULL` is this value.
If you pass several keys to `GROUP BY`, the result will give you all the combinations of the selection, as if `NULL` were a specific value.
#### WITH TOTALS Modifier[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#with-totals-modifier "Permanent link")
If the WITH TOTALS modifier is specified, another row will be calculated. This row will have key columns containing default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the "total" values).
This extra row is output in JSON\*, TabSeparated\*, and Pretty\* formats, separately from the other rows. In the other formats, this row is not output.
In JSON\* formats, this row is output as a separate 'totals' field. In TabSeparated\* formats, the row comes after the main result, preceded by an empty row (after the other data). In Pretty\* formats, the row is output as a separate table after the main result.
`WITH TOTALS` can be run in different ways when HAVING is present. The behavior depends on the 'totals\_mode' setting. By default, `totals_mode = 'before_having'`. In this case, 'totals' is calculated across all rows, including the ones that don't pass through HAVING and 'max\_rows\_to\_group\_by'.
The other alternatives include only the rows that pass through HAVING in 'totals', and behave differently with the setting `max_rows_to_group_by` and `group_by_overflow_mode = 'any'`.
`after_having_exclusive` β Don't include rows that didn't pass through `max_rows_to_group_by`. In other words, 'totals' will have less than or the same number of rows as it would if `max_rows_to_group_by` were omitted.
`after_having_inclusive` β Include all the rows that didn't pass through 'max\_rows\_to\_group\_by' in 'totals'. In other words, 'totals' will have more than or the same number of rows as it would if `max_rows_to_group_by` were omitted.
`after_having_auto` β Count the number of rows that passed through HAVING. If it is more than a certain amount (by default, 50%), include all the rows that didn't pass through 'max\_rows\_to\_group\_by' in 'totals'. Otherwise, do not include them.
`totals_auto_threshold` β By default, 0.5. The coefficient for `after_having_auto`.
If `max_rows_to_group_by` and `group_by_overflow_mode = 'any'` are not used, all variations of `after_having` are the same, and you can use any of them (for example, `after_having_auto`).
You can use WITH TOTALS in subqueries, including subqueries in the JOIN clause (in this case, the respective total values are combined).
#### GROUP BY in External Memory[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-in-external-memory "Permanent link")
You can enable dumping temporary data to the disk to restrict memory usage during GROUP BY. The `max_bytes_before_external_group_by` setting determines the threshold RAM consumption for dumping GROUP BY temporary data to the file system. If set to 0 (the default), it is disabled.
When using `max_bytes_before_external_group_by`, we recommend that you set max\_memory\_usage about twice as high. This is necessary because there are two stages to aggregation: reading the date and forming intermediate data (1) and merging the intermediate data (2). Dumping data to the file system can only occur during stage 1. If the temporary data wasn't dumped, then stage 2 might require up to the same amount of memory as in stage 1.
For example, if `max_memory_usage` was set to 10000000000 and you want to use external aggregation, it makes sense to set `max_bytes_before_external_group_by` to 10000000000, and max\_memory\_usage to 20000000000. When external aggregation is triggered (if there was at least one dump of temporary data), maximum consumption of RAM is only slightly more than `max_bytes_before_external_group_by`.
With distributed query processing, external aggregation is performed on remote servers. In order for the requestor server to use only a small amount of RAM, set `distributed_aggregation_memory_efficient` to 1.
When merging data flushed to the disk, as well as when merging results from remote servers when the `distributed_aggregation_memory_efficient` setting is enabled, consumes up to 1/256 \* the number of threads from the total amount of RAM.
When external aggregation is enabled, if there was less than `max_bytes_before_external_group_by` of data (i.e. data was not flushed), the query runs just as fast as without external aggregation. If any temporary data was flushed, the run time will be several times longer (approximately three times).
If you have an ORDER BY with a small LIMIT after GROUP BY, then the ORDER BY CLAUSE will not use significant amounts of RAM. But if the ORDER BY doesn't have LIMIT, don't forget to enable external sorting (`max_bytes_before_external_sort`).
### LIMIT N BY Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-n-by-clause "Permanent link")
LIMIT N BY COLUMNS selects the top N rows for each group of COLUMNS. LIMIT N BY is not related to LIMIT; they can both be used in the same query. The key for LIMIT N BY can contain any number of columns or expressions.
Example:
```
SELECT
domainWithoutWWW(URL) AS domain,
domainWithoutWWW(REFERRER_URL) AS referrer,
device_type,
count() cnt
FROM hits
GROUP BY domain, referrer, device_type
ORDER BY cnt DESC
LIMIT 5 BY domain, device_type
LIMIT 100
```
The query will select the top 5 referrers for each `domain, device_type` pair, but not more than 100 rows (`LIMIT n BY + LIMIT`).
### HAVING Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#having-clause "Permanent link")
Allows filtering the result received after GROUP BY, similar to the WHERE clause. WHERE and HAVING differ in that WHERE is performed before aggregation (GROUP BY), while HAVING is performed after it. If aggregation is not performed, HAVING can't be used.
### ORDER BY Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#order-by-clause "Permanent link")
The ORDER BY clause contains a list of expressions, which can each be assigned DESC or ASC (the sorting direction). If the direction is not specified, ASC is assumed. ASC is sorted in ascending order, and DESC in descending order. The sorting direction applies to a single expression, not to the entire list. Example: `ORDER BY Visits DESC, SearchPhrase`
For sorting by String values, you can specify collation (comparison). Example: `ORDER BY SearchPhrase COLLATE 'tr'` - for sorting by keyword in ascending order, using the Turkish alphabet, case insensitive, assuming that strings are UTF-8 encoded. COLLATE can be specified or not for each expression in ORDER BY independently. If ASC or DESC is specified, COLLATE is specified after it. When using COLLATE, sorting is always case-insensitive.
We only recommend using COLLATE for final sorting of a small number of rows, since sorting with COLLATE is less efficient than normal sorting by bytes.
Rows that have identical values for the list of sorting expressions are output in an arbitrary order, which can also be nondeterministic (different each time). If the ORDER BY clause is omitted, the order of the rows is also undefined, and may be nondeterministic as well.
`NaN` and `NULL` sorting order:
- With the modifier `NULLS FIRST` β First `NULL`, then `NaN`, then other values.
- With the modifier `NULLS LAST` β First the values, then `NaN`, then `NULL`.
- Default β The same as with the `NULLS LAST` modifier.
Example:
For the table
```
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 2 β 2 β
β 1 β nan β
β 2 β 2 β
β 3 β 4 β
β 5 β 6 β
β 6 β nan β
β 7 β α΄Ία΅α΄Έα΄Έ β
β 6 β 7 β
β 8 β 9 β
βββββ΄βββββββ
```
Run the query `SELECT * FROM t_null_nan ORDER BY y NULLS FIRST` to get:
```
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 7 β α΄Ία΅α΄Έα΄Έ β
β 1 β nan β
β 6 β nan β
β 2 β 2 β
β 2 β 2 β
β 3 β 4 β
β 5 β 6 β
β 6 β 7 β
β 8 β 9 β
βββββ΄βββββββ
```
When floating point numbers are sorted, NaNs are separate from the other values. Regardless of the sorting order, NaNs come at the end. In other words, for ascending sorting they are placed as if they are larger than all the other numbers, while for descending sorting they are placed as if they are smaller than the rest.
Less RAM is used if a small enough LIMIT is specified in addition to ORDER BY. Otherwise, the amount of memory spent is proportional to the volume of data for sorting. For distributed query processing, if GROUP BY is omitted, sorting is partially done on remote servers, and the results are merged on the requestor server. This means that for distributed sorting, the volume of data to sort can be greater than the amount of memory on a single server.
If there is not enough RAM, it is possible to perform sorting in external memory (creating temporary files on a disk). Use the setting `max_bytes_before_external_sort` for this purpose. If it is set to 0 (the default), external sorting is disabled. If it is enabled, when the volume of data to sort reaches the specified number of bytes, the collected data is sorted and dumped into a temporary file. After all data is read, all the sorted files are merged and the results are output. Files are written to the /var/lib/clickhouse/tmp/ directory in the config (by default, but you can use the 'tmp\_path' parameter to change this setting).
Running a query may use more memory than 'max\_bytes\_before\_external\_sort'. For this reason, this setting must have a value significantly smaller than 'max\_memory\_usage'. As an example, if your server has 128 GB of RAM and you need to run a single query, set 'max\_memory\_usage' to 100 GB, and 'max\_bytes\_before\_external\_sort' to 80 GB.
External sorting works much less effectively than sorting in RAM.
### SELECT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-clause "Permanent link")
The expressions specified in the SELECT clause are analyzed after the calculations for all the clauses listed above are completed. More specifically, expressions are analyzed that are above the aggregate functions, if there are any aggregate functions. The aggregate functions and everything below them are calculated during aggregation (GROUP BY). These expressions work as if they are applied to separate rows in the result.
### DISTINCT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distinct "Permanent link")
If DISTINCT is specified, only a single row will remain out of all the sets of fully matching rows in the result. The result will be the same as if GROUP BY were specified across all the fields specified in SELECT without aggregate functions. But there are several differences from GROUP BY:
- DISTINCT can be applied together with GROUP BY.
- When ORDER BY is omitted and LIMIT is defined, the query stops running immediately after the required number of different rows has been read.
- Data blocks are output as they are processed, without waiting for the entire query to finish running.
DISTINCT is not supported if SELECT has at least one array column.
`DISTINCT` works with [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
### LIMIT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-clause "Permanent link")
`LIMIT m` allows you to select the first `m` rows from the result. `LIMIT n`, m allows you to select the first `m` rows from the result after skipping the first `n` rows.
`n` and `m` must be non-negative integers.
If there isn't an ORDER BY clause that explicitly sorts results, the result may be arbitrary and nondeterministic.
### UNION ALL Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#union-all-clause "Permanent link")
You can use UNION ALL to combine any number of queries. Example:
```
SELECT CounterID, 1 AS table, toInt64(count()) AS c
FROM test.hits
GROUP BY CounterID
UNION ALL
SELECT CounterID, 2 AS table, sum(Sign) AS c
FROM test.visits
GROUP BY CounterID
HAVING c > 0
```
Only UNION ALL is supported. The regular UNION (UNION DISTINCT) is not supported. If you need UNION DISTINCT, you can write SELECT DISTINCT from a subquery containing UNION ALL.
Queries that are parts of UNION ALL can be run simultaneously, and their results can be mixed together.
The structure of results (the number and type of columns) must match for the queries. But the column names can differ. In this case, the column names for the final result will be taken from the first query. Type casting is performed for unions. For example, if two queries being combined have the same field with non-`Nullable` and `Nullable` types from a compatible type, the resulting `UNION ALL` has a `Nullable` type field.
Queries that are parts of UNION ALL can't be enclosed in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause.
### INTO OUTFILE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#into-outfile-clause "Permanent link")
Add the `INTO OUTFILE filename` clause (where filename is a string literal) to redirect query output to the specified file. In contrast to MySQL, the file is created on the client side. The query will fail if a file with the same filename already exists. This functionality is available in the command-line client and clickhouse-local (a query sent via HTTP interface will fail).
The default output format is TabSeparated (the same as in the command-line client batch mode).
### FORMAT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#format-clause "Permanent link")
Specify 'FORMAT format' to get data in any specified format. You can use this for convenience, or for creating dumps. For more information, see the section "Formats". If the FORMAT clause is omitted, the default format is used, which depends on both the settings and the interface used for accessing the DB. For the HTTP interface and the command-line client in batch mode, the default format is TabSeparated. For the command-line client in interactive mode, the default format is PrettyCompact (it has attractive and compact tables).
When using the command-line client, data is passed to the client in an internal efficient format. The client independently interprets the FORMAT clause of the query and formats the data itself (thus relieving the network and the server from the load).
### IN Operators[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-in-operators "Permanent link")
The `IN`, `NOT IN`, `GLOBAL IN`, and `GLOBAL NOT IN` operators are covered separately, since their functionality is quite rich.
The left side of the operator is either a single column or a tuple.
Examples:
```
SELECT UserID IN (123, 456) FROM ...
SELECT (CounterID, UserID) IN ((34, 123), (101500, 456)) FROM ...
```
If the left side is a single column that is in the index, and the right side is a set of constants, the system uses the index for processing the query.
Don't list too many values explicitly (i.e. millions). If a data set is large, put it in a temporary table (for example, see the section "External data for query processing"), then use a subquery.
The right side of the operator can be a set of constant expressions, a set of tuples with constant expressions (shown in the examples above), or the name of a database table or SELECT subquery in brackets.
If the right side of the operator is the name of a table (for example, `UserID IN users`), this is equivalent to the subquery `UserID IN (SELECT * FROM users)`. Use this when working with external data that is sent along with the query. For example, the query can be sent together with a set of user IDs loaded to the 'users' temporary table, which should be filtered.
If the right side of the operator is a table name that has the Set engine (a prepared data set that is always in RAM), the data set will not be created over again for each query.
The subquery may specify more than one column for filtering tuples. Example:
```
SELECT (CounterID, UserID) IN (SELECT CounterID, UserID FROM ...) FROM ...
```
The columns to the left and right of the IN operator should have the same type.
The IN operator and subquery may occur in any part of the query, including in aggregate functions and lambda functions. Example:
```
SELECT
EventDate,
avg(UserID IN
(
SELECT UserID
FROM test.hits
WHERE EventDate = toDate('2014-03-17')
)) AS ratio
FROM test.hits
GROUP BY EventDate
ORDER BY EventDate ASC
```
```
βββEventDateββ¬ββββratioββ
β 2014-03-17 β 1 β
β 2014-03-18 β 0.807696 β
β 2014-03-19 β 0.755406 β
β 2014-03-20 β 0.723218 β
β 2014-03-21 β 0.697021 β
β 2014-03-22 β 0.647851 β
β 2014-03-23 β 0.648416 β
ββββββββββββββ΄βββββββββββ
```
For each day after March 17th, count the percentage of pageviews made by users who visited the site on March 17th. A subquery in the IN clause is always run just one time on a single server. There are no dependent subqueries.
#### NULL processing[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing_1 "Permanent link")
During request processing, the IN operator assumes that the result of an operation with [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/) is always equal to `0`, regardless of whether `NULL` is on the right or left side of the operator. `NULL` values are not included in any dataset, do not correspond to each other and cannot be compared.
Here is an example with the `t_null` table:
```
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 2 β 3 β
βββββ΄βββββββ
```
Running the query `SELECT x FROM t_null WHERE y IN (NULL,3)` gives you the following result:
```
ββxββ
β 2 β
βββββ
```
You can see that the row in which `y = NULL` is thrown out of the query results. This is because ClickHouse can't decide whether `NULL` is included in the `(NULL,3)` set, returns `0` as the result of the operation, and `SELECT` excludes this row from the final output.
```
SELECT y IN (NULL, 3)
FROM t_null
ββin(y, tuple(NULL, 3))ββ
β 0 β
β 1 β
βββββββββββββββββββββββββ
```
#### Distributed Subqueries[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distributed-subqueries "Permanent link")
There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / `JOIN` and `GLOBAL IN` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
Attention
Remember that the algorithms described below may work differently depending on the [settings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/) `distributed_product_mode` setting.
When using the regular IN, the query is sent to remote servers, and each of them runs the subqueries in the `IN` or `JOIN` clause.
When using `GLOBAL IN` / `GLOBAL JOINs`, first all the subqueries are run for `GLOBAL IN` / `GLOBAL JOINs`, and the results are collected in temporary tables. Then the temporary tables are sent to each remote server, where the queries are run using this temporary data.
For a non-distributed query, use the regular `IN` / `JOIN`.
Be careful when using subqueries in the `IN` / `JOIN` clauses for distributed query processing.
Let's look at some examples. Assume that each server in the cluster has a normal **local\_table**. Each server also has a **distributed\_table** table with the **Distributed** type, which looks at all the servers in the cluster.
For a query to the **distributed\_table**, the query will be sent to all the remote servers and run on them using the **local\_table**.
For example, the query
```
SELECT uniq(UserID) FROM distributed_table
```
will be sent to all remote servers as
```
SELECT uniq(UserID) FROM local_table
```
and run on each of them in parallel, until it reaches the stage where intermediate results can be combined. Then the intermediate results will be returned to the requestor server and merged on it, and the final result will be sent to the client.
Now let's examine a query with IN:
```
SELECT uniq(UserID) FROM distributed_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM local_table WHERE CounterID = 34)
```
- Calculation of the intersection of audiences of two sites.
This query will be sent to all remote servers as
```
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM local_table WHERE CounterID = 34)
```
In other words, the data set in the IN clause will be collected on each server independently, only across the data that is stored locally on each of the servers.
This will work correctly and optimally if you are prepared for this case and have spread data across the cluster servers such that the data for a single UserID resides entirely on a single server. In this case, all the necessary data will be available locally on each server. Otherwise, the result will be inaccurate. We refer to this variation of the query as "local IN".
To correct how the query works when data is spread randomly across the cluster servers, you could specify **distributed\_table** inside a subquery. The query would look like this:
```
SELECT uniq(UserID) FROM distributed_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM distributed_table WHERE CounterID = 34)
```
This query will be sent to all remote servers as
```
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM distributed_table WHERE CounterID = 34)
```
The subquery will begin running on each remote server. Since the subquery uses a distributed table, the subquery that is on each remote server will be resent to every remote server as
```
SELECT UserID FROM local_table WHERE CounterID = 34
```
For example, if you have a cluster of 100 servers, executing the entire query will require 10,000 elementary requests, which is generally considered unacceptable.
In such cases, you should always use GLOBAL IN instead of IN. Let's look at how it works for the query
```
SELECT uniq(UserID) FROM distributed_table WHERE CounterID = 101500 AND UserID GLOBAL IN (SELECT UserID FROM distributed_table WHERE CounterID = 34)
```
The requestor server will run the subquery
```
SELECT UserID FROM distributed_table WHERE CounterID = 34
```
and the result will be put in a temporary table in RAM. Then the request will be sent to each remote server as
```
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID GLOBAL IN _data1
```
and the temporary table `_data1` will be sent to every remote server with the query (the name of the temporary table is implementation-defined).
This is more optimal than using the normal IN. However, keep the following points in mind:
1. When creating a temporary table, data is not made unique. To reduce the volume of data transmitted over the network, specify DISTINCT in the subquery. (You don't need to do this for a normal IN.)
2. The temporary table will be sent to all the remote servers. Transmission does not account for network topology. For example, if 10 remote servers reside in a datacenter that is very remote in relation to the requestor server, the data will be sent 10 times over the channel to the remote datacenter. Try to avoid large data sets when using GLOBAL IN.
3. When transmitting data to remote servers, restrictions on network bandwidth are not configurable. You might overload the network.
4. Try to distribute data across servers so that you don't need to use GLOBAL IN on a regular basis.
5. If you need to use GLOBAL IN often, plan the location of the ClickHouse cluster so that a single group of replicas resides in no more than one data center with a fast network between them, so that a query can be processed entirely within a single data center.
It also makes sense to specify a local table in the `GLOBAL IN` clause, in case this local table is only available on the requestor server and you want to use data from it on remote servers.
### Extreme Values[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#extreme-values "Permanent link")
In addition to results, you can also get minimum and maximum values for the results columns. To do this, set the **extremes** setting to 1. Minimums and maximums are calculated for numeric types, dates, and dates with times. For other columns, the default values are output.
An extra two rows are calculated β the minimums and maximums, respectively. These extra two rows are output in JSON\*, TabSeparated\*, and Pretty\* formats, separate from the other rows. They are not output for other formats.
In JSON\* formats, the extreme values are output in a separate 'extremes' field. In TabSeparated\* formats, the row comes after the main result, and after 'totals' if present. It is preceded by an empty row (after the other data). In Pretty\* formats, the row is output as a separate table after the main result, and after 'totals' if present.
Extreme values are calculated for rows that have passed through LIMIT. However, when using 'LIMIT offset, size', the rows before 'offset' are included in 'extremes'. In stream requests, the result may also include a small number of rows that passed through LIMIT.
### Notes[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#notes "Permanent link")
The `GROUP BY` and `ORDER BY` clauses do not support positional arguments. This contradicts MySQL, but conforms to standard SQL. For example, `GROUP BY 1, 2` will be interpreted as grouping by constants (i.e. aggregation of all rows into one).
You can use synonyms (`AS` aliases) in any part of a query.
You can put an asterisk in any part of a query instead of an expression. When the query is analyzed, the asterisk is expanded to a list of all table columns (excluding the `MATERIALIZED` and `ALIAS` columns). There are only a few cases when using an asterisk is justified:
- When creating a table dump.
- For tables containing just a few columns, such as system tables.
- For getting information about what columns are in a table. In this case, set `LIMIT 1`. But it is better to use the `DESC TABLE` query.
- When there is strong filtration on a small number of columns using `PREWHERE`.
- In subqueries (since columns that aren't needed for the external query are excluded from subqueries).
In all other cases, we don't recommend using the asterisk, since it only gives you the drawbacks of a columnar DBMS instead of the advantages. In other words using the asterisk is not recommended.
[Original article](https://clickhouse.yandex/docs/en/query_language/select/)
[Previous Buffer](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/buffer/ "Buffer")
[Next INSERT INTO](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/insert_into/ "INSERT INTO")
Β©2016β2019 Yandex LLC |
| Readable Markdown | ## SELECT Queries Syntax[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-queries-syntax "Permanent link")
`SELECT` performs data retrieval.
```
SELECT [DISTINCT] expr_list
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[ARRAY JOIN ...]
[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr_list] [WITH TOTALS]
[HAVING expr]
[ORDER BY expr_list]
[LIMIT [n, ]m]
[UNION ALL ...]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT n BY columns]
```
All the clauses are optional, except for the required list of expressions immediately after SELECT. The clauses below are described in almost the same order as in the query execution conveyor.
If the query omits the `DISTINCT`, `GROUP BY` and `ORDER BY` clauses and the `IN` and `JOIN` subqueries, the query will be completely stream processed, using O(1) amount of RAM. Otherwise, the query might consume a lot of RAM if the appropriate restrictions are not specified: `max_memory_usage`, `max_rows_to_group_by`, `max_rows_to_sort`, `max_rows_in_distinct`, `max_bytes_in_distinct`, `max_rows_in_set`, `max_bytes_in_set`, `max_rows_in_join`, `max_bytes_in_join`, `max_bytes_before_external_sort`, `max_bytes_before_external_group_by`. For more information, see the section "Settings". It is possible to use external sorting (saving temporary tables to a disk) and external aggregation. `The system does not have "merge join"`.
### FROM Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#from-clause "Permanent link")
If the FROM clause is omitted, data will be read from the `system.one` table. The 'system.one' table contains exactly one row (this table fulfills the same purpose as the DUAL table found in other DBMSs).
The FROM clause specifies the table to read data from, or a subquery, or a table function; ARRAY JOIN and the regular JOIN may also be included (see below).
Instead of a table, the SELECT subquery may be specified in brackets. In this case, the subquery processing pipeline will be built into the processing pipeline of an external query. In contrast to standard SQL, a synonym does not need to be specified after a subquery. For compatibility, it is possible to write 'AS name' after a subquery, but the specified name isn't used anywhere.
A table function may be specified instead of a table. For more information, see the section "Table functions".
To execute a query, all the columns listed in the query are extracted from the appropriate table. Any columns not needed for the external query are thrown out of the subqueries. If a query does not list any columns (for example, SELECT count() FROM t), some column is extracted from the table anyway (the smallest one is preferred), in order to calculate the number of rows.
The FINAL modifier can be used only for a SELECT from a CollapsingMergeTree table. When you specify FINAL, data is selected fully "collapsed". Keep in mind that using FINAL leads to a selection that includes columns related to the primary key, in addition to the columns specified in the SELECT. Additionally, the query will be executed in a single stream, and data will be merged during query execution. This means that when using FINAL, the query is processed more slowly. In most cases, you should avoid using FINAL. For more information, see the section "CollapsingMergeTree engine".
### SAMPLE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-clause "Permanent link")
The `SAMPLE` clause allows for approximated query processing. Approximated query processing is only supported by the tables in the `MergeTree` family, and only if the sampling expression was specified during table creation (see [MergeTree engine](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/mergetree/)).
The features of data sampling are listed below:
- Data sampling is a deterministic mechanism. The result of the same `SELECT .. SAMPLE` query is always the same.
- Sampling works consistently for different tables. For tables with a single sampling key, a sample with the same coefficient always selects the same subset of possible data. For example, a sample of user IDs takes rows with the same subset of all the possible user IDs from different tables. This allows using the sample in subqueries in the `IN` clause, as well as for manually correlating results of different queries with samples.
- Sampling allows reading less data from a disk. Note that for this you must specify the sampling key correctly. For more details see [Creating a MergeTree Table](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/mergetree/#table_engine-mergetree-creating-a-table).
The `SAMPLE` clause can be specified in several ways:
- `SAMPLE k`, where `k` is a decimal number from 0 to 1. The query is executed on `k` percent of data. For example, `SAMPLE 0.1` runs the query on 10% of data. [Read more](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-k)
- `SAMPLE n`, where `n` is a sufficiently large integer. The query is executed on a sample of at least `n` rows (but not significantly more than this). For example, `SAMPLE 10000000` runs the query on a minimum of 10,000,000 rows. [Read more](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-n)
- `SAMPLE k OFFSET m` where `k` and `m` are numbers from 0 to 1. The query is executed on a sample of `k` percent of the data. The data used for the sample is offset by `m` percent. [Read more](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-offset)
#### SAMPLE k[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-k "Permanent link")
In a `SAMPLE k` clause, `k` is a percent amount of data that the sample is taken from. The example is shown below:
```
SELECT
Title,
count() * 10 AS PageViews
FROM hits_distributed
SAMPLE 0.1
WHERE
CounterID = 34
GROUP BY Title
ORDER BY PageViews DESC LIMIT 1000
```
In this example, the query is executed on a sample from 0.1 (10%) of data. Values of aggregate functions are not corrected automatically, so to get an approximate result, the value 'count()' is manually multiplied by 10.
#### SAMPLE n[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-n "Permanent link")
In this case, the query is executed on a sample of at least `n` rows, where `n` is a sufficiently large integer. For example, `SAMPLE 10000000`.
Since the minimum unit for data reading is one granule (its size is set by the `index_granularity` setting), it makes sense to set a sample that is much larger than the size of the granule.
When using the `SAMPLE n` clause, the relative coefficient is calculated dynamically. Since you do not know which relative percent of data was processed, you do not know the coefficient the aggregate functions should be multiplied by (for example, you do not know if the `SAMPLE 1000000` was taken from a set of 10,000,000 rows or from a set of 1,000,000,000 rows). In this case, use the `_sample_factor` column to get the approximate result.
The `_sample_factor` is a virtual column that ClickHouse stores relative coefficients in. This column is created automatically when you create a table with the specified sampling key. The usage example is shown below:
```
SELECT sum(Duration * _sample_factor)
FROM visits
SAMPLE 10000000
```
If you need to get the approximate count of rows in a `SELECT .. SAMPLE n` query, get the sum() of `_sample_factor` column instead of counting `count(column * _sample_factor)` value. For example:
```
SELECT sum(_sample_factor)
FROM visits
SAMPLE 10000000
```
Note that to calculate the average in a `SELECT .. SAMPLE n` query, you do not need to use `_sample_factor` column:
```
SELECT avg(Duration)
FROM visits
SAMPLE 10000000
```
#### SAMPLE k OFFSET m[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-sample-offset "Permanent link")
You can specify the `SAMPLE k OFFSET m` clause, where `k` and `m` are numbers from 0 to 1. Examples are shown below.
Example 1.
```
SAMPLE 1/10
```
In this example, the sample is the 1/10th of all data:
`[++------------------]`
Example 2.
```
SAMPLE 1/10 OFFSET 1/2
```
Here, the sample of 10% is taken from the second half of data.
`[----------++--------]`
### ARRAY JOIN Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-array-join-clause "Permanent link")
Allows executing JOIN with an array or nested data structure. The intent is similar to the 'arrayJoin' function, but its functionality is broader.
`ARRAY JOIN` is essentially `INNER JOIN` with an array. Example:
```
:) CREATE TABLE arrays_test (s String, arr Array(UInt8)) ENGINE = Memory
CREATE TABLE arrays_test
(
s String,
arr Array(UInt8)
) ENGINE = Memory
Ok.
0 rows in set. Elapsed: 0.001 sec.
:) INSERT INTO arrays_test VALUES ('Hello', [1,2]), ('World', [3,4,5]), ('Goodbye', [])
INSERT INTO arrays_test VALUES
Ok.
3 rows in set. Elapsed: 0.001 sec.
:) SELECT * FROM arrays_test
SELECT *
FROM arrays_test
ββsββββββββ¬βarrββββββ
β Hello β [1,2] β
β World β [3,4,5] β
β Goodbye β [] β
βββββββββββ΄ββββββββββ
3 rows in set. Elapsed: 0.001 sec.
:) SELECT s, arr FROM arrays_test ARRAY JOIN arr
SELECT s, arr
FROM arrays_test
ARRAY JOIN arr
ββsββββββ¬βarrββ
β Hello β 1 β
β Hello β 2 β
β World β 3 β
β World β 4 β
β World β 5 β
βββββββββ΄ββββββ
5 rows in set. Elapsed: 0.001 sec.
```
An alias can be specified for an array in the ARRAY JOIN clause. In this case, an array item can be accessed by this alias, but the array itself by the original name. Example:
```
:) SELECT s, arr, a FROM arrays_test ARRAY JOIN arr AS a
SELECT s, arr, a
FROM arrays_test
ARRAY JOIN arr AS a
ββsββββββ¬βarrββββββ¬βaββ
β Hello β [1,2] β 1 β
β Hello β [1,2] β 2 β
β World β [3,4,5] β 3 β
β World β [3,4,5] β 4 β
β World β [3,4,5] β 5 β
βββββββββ΄ββββββββββ΄ββββ
5 rows in set. Elapsed: 0.001 sec.
```
Multiple arrays of the same size can be comma-separated in the ARRAY JOIN clause. In this case, JOIN is performed with them simultaneously (the direct sum, not the direct product). Example:
```
:) SELECT s, arr, a, num, mapped FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(x -> x + 1, arr) AS mapped
SELECT s, arr, a, num, mapped
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(lambda(tuple(x), plus(x, 1)), arr) AS mapped
ββsββββββ¬βarrββββββ¬βaββ¬βnumββ¬βmappedββ
β Hello β [1,2] β 1 β 1 β 2 β
β Hello β [1,2] β 2 β 2 β 3 β
β World β [3,4,5] β 3 β 1 β 4 β
β World β [3,4,5] β 4 β 2 β 5 β
β World β [3,4,5] β 5 β 3 β 6 β
βββββββββ΄ββββββββββ΄ββββ΄ββββββ΄βββββββββ
5 rows in set. Elapsed: 0.002 sec.
:) SELECT s, arr, a, num, arrayEnumerate(arr) FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num
SELECT s, arr, a, num, arrayEnumerate(arr)
FROM arrays_test
ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num
ββsββββββ¬βarrββββββ¬βaββ¬βnumββ¬βarrayEnumerate(arr)ββ
β Hello β [1,2] β 1 β 1 β [1,2] β
β Hello β [1,2] β 2 β 2 β [1,2] β
β World β [3,4,5] β 3 β 1 β [1,2,3] β
β World β [3,4,5] β 4 β 2 β [1,2,3] β
β World β [3,4,5] β 5 β 3 β [1,2,3] β
βββββββββ΄ββββββββββ΄ββββ΄ββββββ΄ββββββββββββββββββββββ
5 rows in set. Elapsed: 0.002 sec.
```
ARRAY JOIN also works with nested data structures. Example:
```
:) CREATE TABLE nested_test (s String, nest Nested(x UInt8, y UInt32)) ENGINE = Memory
CREATE TABLE nested_test
(
s String,
nest Nested(
x UInt8,
y UInt32)
) ENGINE = Memory
Ok.
0 rows in set. Elapsed: 0.006 sec.
:) INSERT INTO nested_test VALUES ('Hello', [1,2], [10,20]), ('World', [3,4,5], [30,40,50]), ('Goodbye', [], [])
INSERT INTO nested_test VALUES
Ok.
3 rows in set. Elapsed: 0.001 sec.
:) SELECT * FROM nested_test
SELECT *
FROM nested_test
ββsββββββββ¬βnest.xβββ¬βnest.yββββββ
β Hello β [1,2] β [10,20] β
β World β [3,4,5] β [30,40,50] β
β Goodbye β [] β [] β
βββββββββββ΄ββββββββββ΄βββββββββββββ
3 rows in set. Elapsed: 0.001 sec.
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest
ββsββββββ¬βnest.xββ¬βnest.yββ
β Hello β 1 β 10 β
β Hello β 2 β 20 β
β World β 3 β 30 β
β World β 4 β 40 β
β World β 5 β 50 β
βββββββββ΄βββββββββ΄βββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
When specifying names of nested data structures in ARRAY JOIN, the meaning is the same as ARRAY JOIN with all the array elements that it consists of. Example:
```
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x, nest.y
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`, `nest.y`
ββsββββββ¬βnest.xββ¬βnest.yββ
β Hello β 1 β 10 β
β Hello β 2 β 20 β
β World β 3 β 30 β
β World β 4 β 40 β
β World β 5 β 50 β
βββββββββ΄βββββββββ΄βββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
This variation also makes sense:
```
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN `nest.x`
ββsββββββ¬βnest.xββ¬βnest.yββββββ
β Hello β 1 β [10,20] β
β Hello β 2 β [10,20] β
β World β 3 β [30,40,50] β
β World β 4 β [30,40,50] β
β World β 5 β [30,40,50] β
βββββββββ΄βββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
An alias may be used for a nested data structure, in order to select either the JOIN result or the source array. Example:
```
:) SELECT s, n.x, n.y, nest.x, nest.y FROM nested_test ARRAY JOIN nest AS n
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest AS n
ββsββββββ¬βn.xββ¬βn.yββ¬βnest.xβββ¬βnest.yββββββ
β Hello β 1 β 10 β [1,2] β [10,20] β
β Hello β 2 β 20 β [1,2] β [10,20] β
β World β 3 β 30 β [3,4,5] β [30,40,50] β
β World β 4 β 40 β [3,4,5] β [30,40,50] β
β World β 5 β 50 β [3,4,5] β [30,40,50] β
βββββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.001 sec.
```
Example of using the arrayEnumerate function:
```
:) SELECT s, n.x, n.y, nest.x, nest.y, num FROM nested_test ARRAY JOIN nest AS n, arrayEnumerate(nest.x) AS num
SELECT s, `n.x`, `n.y`, `nest.x`, `nest.y`, num
FROM nested_test
ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num
ββsββββββ¬βn.xββ¬βn.yββ¬βnest.xβββ¬βnest.yββββββ¬βnumββ
β Hello β 1 β 10 β [1,2] β [10,20] β 1 β
β Hello β 2 β 20 β [1,2] β [10,20] β 2 β
β World β 3 β 30 β [3,4,5] β [30,40,50] β 1 β
β World β 4 β 40 β [3,4,5] β [30,40,50] β 2 β
β World β 5 β 50 β [3,4,5] β [30,40,50] β 3 β
βββββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββββ΄ββββββ
5 rows in set. Elapsed: 0.002 sec.
```
The query can only specify a single ARRAY JOIN clause.
The corresponding conversion can be performed before the WHERE/PREWHERE clause (if its result is needed in this clause), or after completing WHERE/PREWHERE (to reduce the volume of calculations).
### JOIN Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-join "Permanent link")
Joins the data in the normal [SQL JOIN](https://en.wikipedia.org/wiki/Join_\(SQL\)) sense.
Note
Not related to [ARRAY JOIN](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-array-join-clause).
```
SELECT <expr_list>
FROM <left_subquery>
[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN <right_subquery>
(ON <expr_list>)|(USING <column_list>) ...
```
The table names can be specified instead of `<left_subquery>` and `<right_subquery>`. This is equivalent to the `SELECT * FROM table` subquery, except in a special case when the table has the [Join](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/join/) engine β an array prepared for joining.
**Supported types of `JOIN`**
- `INNER JOIN` (or `JOIN`)
- `LEFT JOIN` (or `LEFT OUTER JOIN`)
- `RIGHT JOIN` (or `RIGHT OUTER JOIN`)
- `FULL JOIN` (or `FULL OUTER JOIN`)
- `CROSS JOIN` (or `,` )
See standard [SQL JOIN](https://en.wikipedia.org/wiki/Join_\(SQL\)) description.
**ANY or ALL strictness**
If `ALL` is specified and the right table has several matching rows, the data will be multiplied by the number of these rows. This is the normal `JOIN` behavior for standard SQL. If `ANY` is specified and the right table has several matching rows, only the first one found is joined. If the right table has only one matching row, the results of `ANY` and `ALL` are the same.
To set the default strictness value, use the session configuration parameter [join\_default\_strictness](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/#settings-join_default_strictness).
**GLOBAL JOIN**
When using a normal `JOIN`, the query is sent to remote servers. Subqueries are run on each of them in order to make the right table, and the join is performed with this table. In other words, the right table is formed on each server separately.
When using `GLOBAL ... JOIN`, first the requestor server runs a subquery to calculate the right table. This temporary table is passed to each remote server, and queries are run on them using the temporary data that was transmitted.
Be careful when using `GLOBAL`. For more information, see the section [Distributed subqueries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distributed-subqueries).
**Usage Recommendations**
All columns that are not needed for the `JOIN` are deleted from the subquery.
When running a `JOIN`, there is no optimization of the order of execution in relation to other stages of the query. The join (a search in the right table) is run before filtering in `WHERE` and before aggregation. In order to explicitly set the processing order, we recommend running a `JOIN` subquery with a subquery.
Example:
```
SELECT
CounterID,
hits,
visits
FROM
(
SELECT
CounterID,
count() AS hits
FROM test.hits
GROUP BY CounterID
) ANY LEFT JOIN
(
SELECT
CounterID,
sum(Sign) AS visits
FROM test.visits
GROUP BY CounterID
) USING CounterID
ORDER BY hits DESC
LIMIT 10
```
```
ββCounterIDββ¬βββhitsββ¬βvisitsββ
β 1143050 β 523264 β 13665 β
β 731962 β 475698 β 102716 β
β 722545 β 337212 β 108187 β
β 722889 β 252197 β 10547 β
β 2237260 β 196036 β 9522 β
β 23057320 β 147211 β 7689 β
β 722818 β 90109 β 17847 β
β 48221 β 85379 β 4652 β
β 19762435 β 77807 β 7026 β
β 722884 β 77492 β 11056 β
βββββββββββββ΄βββββββββ΄βββββββββ
```
Subqueries don't allow you to set names or use them for referencing a column from a specific subquery. The columns specified in `USING` must have the same names in both subqueries, and the other columns must be named differently. You can use aliases to change the names of columns in subqueries (the example uses the aliases 'hits' and 'visits').
The `USING` clause specifies one or more columns to join, which establishes the equality of these columns. The list of columns is set without brackets. More complex join conditions are not supported.
The right table (the subquery result) resides in RAM. If there isn't enough memory, you can't run a `JOIN`.
Each time a query is run with the same `JOIN`, the subquery is run again because the result is not cached. To avoid this, use the special [Join](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/table_engines/join/) table engine, which is a prepared array for joining that is always in RAM.
In some cases, it is more efficient to use `IN` instead of `JOIN`. Among the various types of `JOIN`, the most efficient is `ANY LEFT JOIN`, then `ANY INNER JOIN`. The least efficient are `ALL LEFT JOIN` and `ALL INNER JOIN`.
If you need a `JOIN` for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a `JOIN` might not be very convenient due to the bulky syntax and the fact that the right table is re-accessed for every query. For such cases, there is an "external dictionaries" feature that you should use instead of `JOIN`. For more information, see the section [External dictionaries](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/dicts/external_dicts/).
#### Processing of Empty or NULL Cells[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#processing-of-empty-or-null-cells "Permanent link")
While joining tables, the empty cells may appear. The setting [join\_use\_nulls](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/#settings-join_use_nulls) define how ClickHouse fills these cells.
If the `JOIN` keys are [Nullable](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/data_types/nullable/) fields, the rows where at least one of the keys has the value [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/#null-literal) are not joined.
### WHERE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#where-clause "Permanent link")
If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators. This expression will be used for filtering data before all other transformations.
If indexes are supported by the database table engine, the expression is evaluated on the ability to use indexes.
### PREWHERE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#prewhere-clause "Permanent link")
This clause has the same meaning as the WHERE clause. The difference is in which data is read from the table. When using PREWHERE, first only the columns necessary for executing PREWHERE are read. Then the other columns are read that are needed for running the query, but only those blocks where the PREWHERE expression is true.
It makes sense to use PREWHERE if there are filtration conditions that are used by a minority of the columns in the query, but that provide strong data filtration. This reduces the volume of data to read.
For example, it is useful to write PREWHERE for queries that extract a large number of columns, but that only have filtration for a few columns.
PREWHERE is only supported by tables from the `*MergeTree` family.
A query may simultaneously specify PREWHERE and WHERE. In this case, PREWHERE precedes WHERE.
If the 'optimize\_move\_to\_prewhere' setting is set to 1 and PREWHERE is omitted, the system uses heuristics to automatically move parts of expressions from WHERE to PREWHERE.
### GROUP BY Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-clause "Permanent link")
This is one of the most important parts of a column-oriented DBMS.
If there is a GROUP BY clause, it must contain a list of expressions. Each expression will be referred to here as a "key". All the expressions in the SELECT, HAVING, and ORDER BY clauses must be calculated from keys or from aggregate functions. In other words, each column selected from the table must be used either in keys or inside aggregate functions.
If a query contains only table columns inside aggregate functions, the GROUP BY clause can be omitted, and aggregation by an empty set of keys is assumed.
Example:
```
SELECT
count(),
median(FetchTiming > 60 ? 60 : FetchTiming),
count() - sum(Refresh)
FROM hits
```
However, in contrast to standard SQL, if the table doesn't have any rows (either there aren't any at all, or there aren't any after using WHERE to filter), an empty result is returned, and not the result from one of the rows containing the initial values of aggregate functions.
As opposed to MySQL (and conforming to standard SQL), you can't get some value of some column that is not in a key or aggregate function (except constant expressions). To work around this, you can use the 'any' aggregate function (get the first encountered value) or 'min/max'.
Example:
```
SELECT
domainWithoutWWW(URL) AS domain,
count(),
any(Title) AS title -- getting the first occurred page header for each domain.
FROM hits
GROUP BY domain
```
For every different key value encountered, GROUP BY calculates a set of aggregate function values.
GROUP BY is not supported for array columns.
A constant can't be specified as arguments for aggregate functions. Example: sum(1). Instead of this, you can get rid of the constant. Example: `count()`.
#### NULL processing[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing "Permanent link")
For grouping, ClickHouse interprets [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/) as a value, and `NULL=NULL`.
Here's an example to show what this means.
Assume you have this table:
```
ββxββ¬ββββyββ
β 1 β 2 β
β 2 β α΄Ία΅α΄Έα΄Έ β
β 3 β 2 β
β 3 β 3 β
β 3 β α΄Ία΅α΄Έα΄Έ β
βββββ΄βββββββ
```
The query `SELECT sum(x), y FROM t_null_big GROUP BY y` results in:
```
ββsum(x)ββ¬ββββyββ
β 4 β 2 β
β 3 β 3 β
β 5 β α΄Ία΅α΄Έα΄Έ β
ββββββββββ΄βββββββ
```
You can see that `GROUP BY` for `Π£ = NULL` summed up `x`, as if `NULL` is this value.
If you pass several keys to `GROUP BY`, the result will give you all the combinations of the selection, as if `NULL` were a specific value.
#### WITH TOTALS Modifier[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#with-totals-modifier "Permanent link")
If the WITH TOTALS modifier is specified, another row will be calculated. This row will have key columns containing default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the "total" values).
This extra row is output in JSON\*, TabSeparated\*, and Pretty\* formats, separately from the other rows. In the other formats, this row is not output.
In JSON\* formats, this row is output as a separate 'totals' field. In TabSeparated\* formats, the row comes after the main result, preceded by an empty row (after the other data). In Pretty\* formats, the row is output as a separate table after the main result.
`WITH TOTALS` can be run in different ways when HAVING is present. The behavior depends on the 'totals\_mode' setting. By default, `totals_mode = 'before_having'`. In this case, 'totals' is calculated across all rows, including the ones that don't pass through HAVING and 'max\_rows\_to\_group\_by'.
The other alternatives include only the rows that pass through HAVING in 'totals', and behave differently with the setting `max_rows_to_group_by` and `group_by_overflow_mode = 'any'`.
`after_having_exclusive` β Don't include rows that didn't pass through `max_rows_to_group_by`. In other words, 'totals' will have less than or the same number of rows as it would if `max_rows_to_group_by` were omitted.
`after_having_inclusive` β Include all the rows that didn't pass through 'max\_rows\_to\_group\_by' in 'totals'. In other words, 'totals' will have more than or the same number of rows as it would if `max_rows_to_group_by` were omitted.
`after_having_auto` β Count the number of rows that passed through HAVING. If it is more than a certain amount (by default, 50%), include all the rows that didn't pass through 'max\_rows\_to\_group\_by' in 'totals'. Otherwise, do not include them.
`totals_auto_threshold` β By default, 0.5. The coefficient for `after_having_auto`.
If `max_rows_to_group_by` and `group_by_overflow_mode = 'any'` are not used, all variations of `after_having` are the same, and you can use any of them (for example, `after_having_auto`).
You can use WITH TOTALS in subqueries, including subqueries in the JOIN clause (in this case, the respective total values are combined).
#### GROUP BY in External Memory[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-group-by-in-external-memory "Permanent link")
You can enable dumping temporary data to the disk to restrict memory usage during GROUP BY. The `max_bytes_before_external_group_by` setting determines the threshold RAM consumption for dumping GROUP BY temporary data to the file system. If set to 0 (the default), it is disabled.
When using `max_bytes_before_external_group_by`, we recommend that you set max\_memory\_usage about twice as high. This is necessary because there are two stages to aggregation: reading the date and forming intermediate data (1) and merging the intermediate data (2). Dumping data to the file system can only occur during stage 1. If the temporary data wasn't dumped, then stage 2 might require up to the same amount of memory as in stage 1.
For example, if `max_memory_usage` was set to 10000000000 and you want to use external aggregation, it makes sense to set `max_bytes_before_external_group_by` to 10000000000, and max\_memory\_usage to 20000000000. When external aggregation is triggered (if there was at least one dump of temporary data), maximum consumption of RAM is only slightly more than `max_bytes_before_external_group_by`.
With distributed query processing, external aggregation is performed on remote servers. In order for the requestor server to use only a small amount of RAM, set `distributed_aggregation_memory_efficient` to 1.
When merging data flushed to the disk, as well as when merging results from remote servers when the `distributed_aggregation_memory_efficient` setting is enabled, consumes up to 1/256 \* the number of threads from the total amount of RAM.
When external aggregation is enabled, if there was less than `max_bytes_before_external_group_by` of data (i.e. data was not flushed), the query runs just as fast as without external aggregation. If any temporary data was flushed, the run time will be several times longer (approximately three times).
If you have an ORDER BY with a small LIMIT after GROUP BY, then the ORDER BY CLAUSE will not use significant amounts of RAM. But if the ORDER BY doesn't have LIMIT, don't forget to enable external sorting (`max_bytes_before_external_sort`).
### LIMIT N BY Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-n-by-clause "Permanent link")
LIMIT N BY COLUMNS selects the top N rows for each group of COLUMNS. LIMIT N BY is not related to LIMIT; they can both be used in the same query. The key for LIMIT N BY can contain any number of columns or expressions.
Example:
```
SELECT
domainWithoutWWW(URL) AS domain,
domainWithoutWWW(REFERRER_URL) AS referrer,
device_type,
count() cnt
FROM hits
GROUP BY domain, referrer, device_type
ORDER BY cnt DESC
LIMIT 5 BY domain, device_type
LIMIT 100
```
The query will select the top 5 referrers for each `domain, device_type` pair, but not more than 100 rows (`LIMIT n BY + LIMIT`).
### HAVING Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#having-clause "Permanent link")
Allows filtering the result received after GROUP BY, similar to the WHERE clause. WHERE and HAVING differ in that WHERE is performed before aggregation (GROUP BY), while HAVING is performed after it. If aggregation is not performed, HAVING can't be used.
### ORDER BY Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#order-by-clause "Permanent link")
The ORDER BY clause contains a list of expressions, which can each be assigned DESC or ASC (the sorting direction). If the direction is not specified, ASC is assumed. ASC is sorted in ascending order, and DESC in descending order. The sorting direction applies to a single expression, not to the entire list. Example: `ORDER BY Visits DESC, SearchPhrase`
For sorting by String values, you can specify collation (comparison). Example: `ORDER BY SearchPhrase COLLATE 'tr'` - for sorting by keyword in ascending order, using the Turkish alphabet, case insensitive, assuming that strings are UTF-8 encoded. COLLATE can be specified or not for each expression in ORDER BY independently. If ASC or DESC is specified, COLLATE is specified after it. When using COLLATE, sorting is always case-insensitive.
We only recommend using COLLATE for final sorting of a small number of rows, since sorting with COLLATE is less efficient than normal sorting by bytes.
Rows that have identical values for the list of sorting expressions are output in an arbitrary order, which can also be nondeterministic (different each time). If the ORDER BY clause is omitted, the order of the rows is also undefined, and may be nondeterministic as well.
`NaN` and `NULL` sorting order:
- With the modifier `NULLS FIRST` β First `NULL`, then `NaN`, then other values.
- With the modifier `NULLS LAST` β First the values, then `NaN`, then `NULL`.
- Default β The same as with the `NULLS LAST` modifier.
Example:
For the table
```
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 2 β 2 β
β 1 β nan β
β 2 β 2 β
β 3 β 4 β
β 5 β 6 β
β 6 β nan β
β 7 β α΄Ία΅α΄Έα΄Έ β
β 6 β 7 β
β 8 β 9 β
βββββ΄βββββββ
```
Run the query `SELECT * FROM t_null_nan ORDER BY y NULLS FIRST` to get:
```
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 7 β α΄Ία΅α΄Έα΄Έ β
β 1 β nan β
β 6 β nan β
β 2 β 2 β
β 2 β 2 β
β 3 β 4 β
β 5 β 6 β
β 6 β 7 β
β 8 β 9 β
βββββ΄βββββββ
```
When floating point numbers are sorted, NaNs are separate from the other values. Regardless of the sorting order, NaNs come at the end. In other words, for ascending sorting they are placed as if they are larger than all the other numbers, while for descending sorting they are placed as if they are smaller than the rest.
Less RAM is used if a small enough LIMIT is specified in addition to ORDER BY. Otherwise, the amount of memory spent is proportional to the volume of data for sorting. For distributed query processing, if GROUP BY is omitted, sorting is partially done on remote servers, and the results are merged on the requestor server. This means that for distributed sorting, the volume of data to sort can be greater than the amount of memory on a single server.
If there is not enough RAM, it is possible to perform sorting in external memory (creating temporary files on a disk). Use the setting `max_bytes_before_external_sort` for this purpose. If it is set to 0 (the default), external sorting is disabled. If it is enabled, when the volume of data to sort reaches the specified number of bytes, the collected data is sorted and dumped into a temporary file. After all data is read, all the sorted files are merged and the results are output. Files are written to the /var/lib/clickhouse/tmp/ directory in the config (by default, but you can use the 'tmp\_path' parameter to change this setting).
Running a query may use more memory than 'max\_bytes\_before\_external\_sort'. For this reason, this setting must have a value significantly smaller than 'max\_memory\_usage'. As an example, if your server has 128 GB of RAM and you need to run a single query, set 'max\_memory\_usage' to 100 GB, and 'max\_bytes\_before\_external\_sort' to 80 GB.
External sorting works much less effectively than sorting in RAM.
### SELECT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-clause "Permanent link")
The expressions specified in the SELECT clause are analyzed after the calculations for all the clauses listed above are completed. More specifically, expressions are analyzed that are above the aggregate functions, if there are any aggregate functions. The aggregate functions and everything below them are calculated during aggregation (GROUP BY). These expressions work as if they are applied to separate rows in the result.
### DISTINCT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distinct "Permanent link")
If DISTINCT is specified, only a single row will remain out of all the sets of fully matching rows in the result. The result will be the same as if GROUP BY were specified across all the fields specified in SELECT without aggregate functions. But there are several differences from GROUP BY:
- DISTINCT can be applied together with GROUP BY.
- When ORDER BY is omitted and LIMIT is defined, the query stops running immediately after the required number of different rows has been read.
- Data blocks are output as they are processed, without waiting for the entire query to finish running.
DISTINCT is not supported if SELECT has at least one array column.
`DISTINCT` works with [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
### LIMIT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#limit-clause "Permanent link")
`LIMIT m` allows you to select the first `m` rows from the result. `LIMIT n`, m allows you to select the first `m` rows from the result after skipping the first `n` rows.
`n` and `m` must be non-negative integers.
If there isn't an ORDER BY clause that explicitly sorts results, the result may be arbitrary and nondeterministic.
### UNION ALL Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#union-all-clause "Permanent link")
You can use UNION ALL to combine any number of queries. Example:
```
SELECT CounterID, 1 AS table, toInt64(count()) AS c
FROM test.hits
GROUP BY CounterID
UNION ALL
SELECT CounterID, 2 AS table, sum(Sign) AS c
FROM test.visits
GROUP BY CounterID
HAVING c > 0
```
Only UNION ALL is supported. The regular UNION (UNION DISTINCT) is not supported. If you need UNION DISTINCT, you can write SELECT DISTINCT from a subquery containing UNION ALL.
Queries that are parts of UNION ALL can be run simultaneously, and their results can be mixed together.
The structure of results (the number and type of columns) must match for the queries. But the column names can differ. In this case, the column names for the final result will be taken from the first query. Type casting is performed for unions. For example, if two queries being combined have the same field with non-`Nullable` and `Nullable` types from a compatible type, the resulting `UNION ALL` has a `Nullable` type field.
Queries that are parts of UNION ALL can't be enclosed in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause.
### INTO OUTFILE Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#into-outfile-clause "Permanent link")
Add the `INTO OUTFILE filename` clause (where filename is a string literal) to redirect query output to the specified file. In contrast to MySQL, the file is created on the client side. The query will fail if a file with the same filename already exists. This functionality is available in the command-line client and clickhouse-local (a query sent via HTTP interface will fail).
The default output format is TabSeparated (the same as in the command-line client batch mode).
### FORMAT Clause[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#format-clause "Permanent link")
Specify 'FORMAT format' to get data in any specified format. You can use this for convenience, or for creating dumps. For more information, see the section "Formats". If the FORMAT clause is omitted, the default format is used, which depends on both the settings and the interface used for accessing the DB. For the HTTP interface and the command-line client in batch mode, the default format is TabSeparated. For the command-line client in interactive mode, the default format is PrettyCompact (it has attractive and compact tables).
When using the command-line client, data is passed to the client in an internal efficient format. The client independently interprets the FORMAT clause of the query and formats the data itself (thus relieving the network and the server from the load).
### IN Operators[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-in-operators "Permanent link")
The `IN`, `NOT IN`, `GLOBAL IN`, and `GLOBAL NOT IN` operators are covered separately, since their functionality is quite rich.
The left side of the operator is either a single column or a tuple.
Examples:
```
SELECT UserID IN (123, 456) FROM ...
SELECT (CounterID, UserID) IN ((34, 123), (101500, 456)) FROM ...
```
If the left side is a single column that is in the index, and the right side is a set of constants, the system uses the index for processing the query.
Don't list too many values explicitly (i.e. millions). If a data set is large, put it in a temporary table (for example, see the section "External data for query processing"), then use a subquery.
The right side of the operator can be a set of constant expressions, a set of tuples with constant expressions (shown in the examples above), or the name of a database table or SELECT subquery in brackets.
If the right side of the operator is the name of a table (for example, `UserID IN users`), this is equivalent to the subquery `UserID IN (SELECT * FROM users)`. Use this when working with external data that is sent along with the query. For example, the query can be sent together with a set of user IDs loaded to the 'users' temporary table, which should be filtered.
If the right side of the operator is a table name that has the Set engine (a prepared data set that is always in RAM), the data set will not be created over again for each query.
The subquery may specify more than one column for filtering tuples. Example:
```
SELECT (CounterID, UserID) IN (SELECT CounterID, UserID FROM ...) FROM ...
```
The columns to the left and right of the IN operator should have the same type.
The IN operator and subquery may occur in any part of the query, including in aggregate functions and lambda functions. Example:
```
SELECT
EventDate,
avg(UserID IN
(
SELECT UserID
FROM test.hits
WHERE EventDate = toDate('2014-03-17')
)) AS ratio
FROM test.hits
GROUP BY EventDate
ORDER BY EventDate ASC
```
```
βββEventDateββ¬ββββratioββ
β 2014-03-17 β 1 β
β 2014-03-18 β 0.807696 β
β 2014-03-19 β 0.755406 β
β 2014-03-20 β 0.723218 β
β 2014-03-21 β 0.697021 β
β 2014-03-22 β 0.647851 β
β 2014-03-23 β 0.648416 β
ββββββββββββββ΄βββββββββββ
```
For each day after March 17th, count the percentage of pageviews made by users who visited the site on March 17th. A subquery in the IN clause is always run just one time on a single server. There are no dependent subqueries.
#### NULL processing[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#null-processing_1 "Permanent link")
During request processing, the IN operator assumes that the result of an operation with [NULL](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/syntax/) is always equal to `0`, regardless of whether `NULL` is on the right or left side of the operator. `NULL` values are not included in any dataset, do not correspond to each other and cannot be compared.
Here is an example with the `t_null` table:
```
ββxββ¬ββββyββ
β 1 β α΄Ία΅α΄Έα΄Έ β
β 2 β 3 β
βββββ΄βββββββ
```
Running the query `SELECT x FROM t_null WHERE y IN (NULL,3)` gives you the following result:
```
ββxββ
β 2 β
βββββ
```
You can see that the row in which `y = NULL` is thrown out of the query results. This is because ClickHouse can't decide whether `NULL` is included in the `(NULL,3)` set, returns `0` as the result of the operation, and `SELECT` excludes this row from the final output.
```
SELECT y IN (NULL, 3)
FROM t_null
ββin(y, tuple(NULL, 3))ββ
β 0 β
β 1 β
βββββββββββββββββββββββββ
```
#### Distributed Subqueries[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#select-distributed-subqueries "Permanent link")
There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / `JOIN` and `GLOBAL IN` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
Attention
Remember that the algorithms described below may work differently depending on the [settings](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/operations/settings/settings/) `distributed_product_mode` setting.
When using the regular IN, the query is sent to remote servers, and each of them runs the subqueries in the `IN` or `JOIN` clause.
When using `GLOBAL IN` / `GLOBAL JOINs`, first all the subqueries are run for `GLOBAL IN` / `GLOBAL JOINs`, and the results are collected in temporary tables. Then the temporary tables are sent to each remote server, where the queries are run using this temporary data.
For a non-distributed query, use the regular `IN` / `JOIN`.
Be careful when using subqueries in the `IN` / `JOIN` clauses for distributed query processing.
Let's look at some examples. Assume that each server in the cluster has a normal **local\_table**. Each server also has a **distributed\_table** table with the **Distributed** type, which looks at all the servers in the cluster.
For a query to the **distributed\_table**, the query will be sent to all the remote servers and run on them using the **local\_table**.
For example, the query
```
SELECT uniq(UserID) FROM distributed_table
```
will be sent to all remote servers as
```
SELECT uniq(UserID) FROM local_table
```
and run on each of them in parallel, until it reaches the stage where intermediate results can be combined. Then the intermediate results will be returned to the requestor server and merged on it, and the final result will be sent to the client.
Now let's examine a query with IN:
```
SELECT uniq(UserID) FROM distributed_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM local_table WHERE CounterID = 34)
```
- Calculation of the intersection of audiences of two sites.
This query will be sent to all remote servers as
```
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM local_table WHERE CounterID = 34)
```
In other words, the data set in the IN clause will be collected on each server independently, only across the data that is stored locally on each of the servers.
This will work correctly and optimally if you are prepared for this case and have spread data across the cluster servers such that the data for a single UserID resides entirely on a single server. In this case, all the necessary data will be available locally on each server. Otherwise, the result will be inaccurate. We refer to this variation of the query as "local IN".
To correct how the query works when data is spread randomly across the cluster servers, you could specify **distributed\_table** inside a subquery. The query would look like this:
```
SELECT uniq(UserID) FROM distributed_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM distributed_table WHERE CounterID = 34)
```
This query will be sent to all remote servers as
```
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID IN (SELECT UserID FROM distributed_table WHERE CounterID = 34)
```
The subquery will begin running on each remote server. Since the subquery uses a distributed table, the subquery that is on each remote server will be resent to every remote server as
```
SELECT UserID FROM local_table WHERE CounterID = 34
```
For example, if you have a cluster of 100 servers, executing the entire query will require 10,000 elementary requests, which is generally considered unacceptable.
In such cases, you should always use GLOBAL IN instead of IN. Let's look at how it works for the query
```
SELECT uniq(UserID) FROM distributed_table WHERE CounterID = 101500 AND UserID GLOBAL IN (SELECT UserID FROM distributed_table WHERE CounterID = 34)
```
The requestor server will run the subquery
```
SELECT UserID FROM distributed_table WHERE CounterID = 34
```
and the result will be put in a temporary table in RAM. Then the request will be sent to each remote server as
```
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID GLOBAL IN _data1
```
and the temporary table `_data1` will be sent to every remote server with the query (the name of the temporary table is implementation-defined).
This is more optimal than using the normal IN. However, keep the following points in mind:
1. When creating a temporary table, data is not made unique. To reduce the volume of data transmitted over the network, specify DISTINCT in the subquery. (You don't need to do this for a normal IN.)
2. The temporary table will be sent to all the remote servers. Transmission does not account for network topology. For example, if 10 remote servers reside in a datacenter that is very remote in relation to the requestor server, the data will be sent 10 times over the channel to the remote datacenter. Try to avoid large data sets when using GLOBAL IN.
3. When transmitting data to remote servers, restrictions on network bandwidth are not configurable. You might overload the network.
4. Try to distribute data across servers so that you don't need to use GLOBAL IN on a regular basis.
5. If you need to use GLOBAL IN often, plan the location of the ClickHouse cluster so that a single group of replicas resides in no more than one data center with a fast network between them, so that a query can be processed entirely within a single data center.
It also makes sense to specify a local table in the `GLOBAL IN` clause, in case this local table is only available on the requestor server and you want to use data from it on remote servers.
### Extreme Values[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#extreme-values "Permanent link")
In addition to results, you can also get minimum and maximum values for the results columns. To do this, set the **extremes** setting to 1. Minimums and maximums are calculated for numeric types, dates, and dates with times. For other columns, the default values are output.
An extra two rows are calculated β the minimums and maximums, respectively. These extra two rows are output in JSON\*, TabSeparated\*, and Pretty\* formats, separate from the other rows. They are not output for other formats.
In JSON\* formats, the extreme values are output in a separate 'extremes' field. In TabSeparated\* formats, the row comes after the main result, and after 'totals' if present. It is preceded by an empty row (after the other data). In Pretty\* formats, the row is output as a separate table after the main result, and after 'totals' if present.
Extreme values are calculated for rows that have passed through LIMIT. However, when using 'LIMIT offset, size', the rows before 'offset' are included in 'extremes'. In stream requests, the result may also include a small number of rows that passed through LIMIT.
### Notes[ΒΆ](http://devdoc.net/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/#notes "Permanent link")
The `GROUP BY` and `ORDER BY` clauses do not support positional arguments. This contradicts MySQL, but conforms to standard SQL. For example, `GROUP BY 1, 2` will be interpreted as grouping by constants (i.e. aggregation of all rows into one).
You can use synonyms (`AS` aliases) in any part of a query.
You can put an asterisk in any part of a query instead of an expression. When the query is analyzed, the asterisk is expanded to a list of all table columns (excluding the `MATERIALIZED` and `ALIAS` columns). There are only a few cases when using an asterisk is justified:
- When creating a table dump.
- For tables containing just a few columns, such as system tables.
- For getting information about what columns are in a table. In this case, set `LIMIT 1`. But it is better to use the `DESC TABLE` query.
- When there is strong filtration on a small number of columns using `PREWHERE`.
- In subqueries (since columns that aren't needed for the external query are excluded from subqueries).
In all other cases, we don't recommend using the asterisk, since it only gives you the drawbacks of a columnar DBMS instead of the advantages. In other words using the asterisk is not recommended.
[Original article](https://clickhouse.yandex/docs/en/query_language/select/) |
| Shard | 192 (laksa) |
| Root Hash | 8504065000381170792 |
| Unparsed URL | net,devdoc!/database/ClickhouseDocs_19.4.1.3-docs/query_language/select/ h80 |