โน๏ธ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.3 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/ |
| Last Crawled | 2026-04-06 07:16:24 (8 days ago) |
| First Indexed | 2022-06-23 07:53:42 (3 years ago) |
| HTTP Status Code | 200 |
| Meta Title | GROUP BY tricks | Altinityยฎ Knowledge Base for ClickHouseยฎ |
| Meta Description | Tricks for GROUP BY memory usage optimization |
| Meta Canonical | null |
| Boilerpipe Text | Tricks for GROUP BY memory usage optimization
Tricks
Testing dataset
CREATE
TABLE
sessions
(
`
app
`
LowCardinality
(
String
),
`
user_id
`
String
,
`
created_at
`
DateTime
,
`
platform
`
LowCardinality
(
String
),
`
clicks
`
UInt32
,
`
session_id
`
UUID
)
ENGINE
=
MergeTree
PARTITION
BY
toYYYYMM
(
created_at
)
ORDER
BY
(
app
,
user_id
,
session_id
,
created_at
)
INSERT
INTO
sessions
WITH
CAST
(
number
%
4
,
'Enum8(\'
Orange
\
' = 0, \'
Melon
\
' = 1, \'
Red
\
' = 2, \'
Blue
\
' = 3)'
)
AS
app
,
concat
(
'UID: '
,
leftPad
(
toString
(
number
%
20000000
),
8
,
'0'
))
AS
user_id
,
toDateTime
(
'2021-10-01 10:11:12'
)
+
(
number
/
300
)
AS
created_at
,
CAST
((
number
+
14
)
%
3
,
'Enum8(\'
Bat
\
' = 0, \'
Mice
\
' = 1, \'
Rat
\
' = 2)'
)
AS
platform
,
number
%
17
AS
clicks
,
generateUUIDv4
()
AS
session_id
SELECT
app
,
user_id
,
created_at
,
platform
,
clicks
,
session_id
FROM
numbers_mt
(
1000000000
)
0
rows
in
set
.
Elapsed
:
46
.
078
sec
.
Processed
1
.
00
billion
rows
,
8
.
00
GB
(
21
.
70
million
rows
/
s
.,
173
.
62
MB
/
s
.)
โโ
database
โโฌโ
table
โโโโโฌโ
column
โโโโโโฌโ
type
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโ
rows
โโฌโ
compressed_bytes
โโฌโ
compressed
โโฌโ
uncompressed
โโฌโโโโโโโโโโโโโโ
ratio
โโฌโ
codec
โโ
โ
default
โ
sessions
โ
session_id
โ
UUID
โ
1000000000
โ
16065918103
โ
14
.
96
GiB
โ
14
.
90
GiB
โ
0
.
9958970223439835
โ
โ
โ
default
โ
sessions
โ
user_id
โ
String
โ
1000000000
โ
3056977462
โ
2
.
85
GiB
โ
13
.
04
GiB
โ
4
.
57968701896828
โ
โ
โ
default
โ
sessions
โ
clicks
โ
UInt32
โ
1000000000
โ
1859359032
โ
1
.
73
GiB
โ
3
.
73
GiB
โ
2
.
151278979023993
โ
โ
โ
default
โ
sessions
โ
created_at
โ
DateTime
โ
1000000000
โ
1332089630
โ
1
.
24
GiB
โ
3
.
73
GiB
โ
3
.
0028009451586226
โ
โ
โ
default
โ
sessions
โ
platform
โ
LowCardinality
(
String
)
โ
1000000000
โ
329702248
โ
314
.
43
MiB
โ
956
.
63
MiB
โ
3
.
042446801879252
โ
โ
โ
default
โ
sessions
โ
app
โ
LowCardinality
(
String
)
โ
1000000000
โ
4825544
โ
4
.
60
MiB
โ
956
.
63
MiB
โ
207
.
87333386660654
โ
โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโ
All queries and datasets are unique, so in different situations different hacks could work better or worse.
PreFilter values before GROUP BY
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
created_at
>
'2021-11-01 00:00:00'
GROUP
BY
user_id
HAVING
(
argMax
(
clicks
,
created_at
)
=
16
)
AND
(
argMax
(
platform
,
created_at
)
=
'Rat'
)
FORMAT
`
Null
`
<
Debug
>
MemoryTracker
:
Peak
memory
usage
(
for
query
):
18
.
36
GiB
.
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
user_id
IN
(
SELECT
user_id
FROM
sessions
WHERE
(
platform
=
'Rat'
)
AND
(
clicks
=
16
)
AND
(
created_at
>
'2021-11-01 00:00:00'
)
-- So we will select user_id which could potentially match our HAVING clause in OUTER query.
)
AND
(
created_at
>
'2021-11-01 00:00:00'
)
GROUP
BY
user_id
HAVING
(
argMax
(
clicks
,
created_at
)
=
16
)
AND
(
argMax
(
platform
,
created_at
)
=
'Rat'
)
FORMAT
`
Null
`
<
Debug
>
MemoryTracker
:
Peak
memory
usage
(
for
query
):
4
.
43
GiB
.
Use Fixed-width data types instead of String
For example, you have 2 strings which has values in special form like this
โABX 1412312312313โ
You can just remove the first 4 characters and convert the rest to UInt64
toUInt64(substr(โABX 1412312312313โ,5))
And you packed 17 bytes in 8, more than 2 times the improvement of size!
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_id
,
platform
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
serialized
<
Debug
>
MemoryTracker
:
Peak
memory
usage
(
for
query
):
28
.
19
GiB
.
Elapsed
:
7
.
375
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
135
.
60
million
rows
/
s
.,
3
.
66
GB
/
s
.)
WITH
CAST
(
user_id
,
'FixedString(14)'
)
AS
user_fx
,
CAST
(
platform
,
'FixedString(4)'
)
AS
platform_fx
SELECT
user_fx
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_fx
,
platform_fx
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
keys256
MemoryTracker
:
Peak
memory
usage
(
for
query
):
22
.
24
GiB
.
Elapsed
:
6
.
637
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
150
.
67
million
rows
/
s
.,
4
.
07
GB
/
s
.)
WITH
CAST
(
user_id
,
'FixedString(14)'
)
AS
user_fx
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 0)'
)
AS
platform_enum
SELECT
user_fx
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_fx
,
platform_enum
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
keys128
MemoryTracker
:
Peak
memory
usage
(
for
query
):
14
.
14
GiB
.
Elapsed
:
5
.
335
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
187
.
43
million
rows
/
s
.,
5
.
06
GB
/
s
.)
WITH
toUInt32OrZero
(
trim
(
LEADING
'0'
FROM
substr
(
user_id
,
6
)))
AS
user_int
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 0)'
)
AS
platform_enum
SELECT
user_int
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_int
,
platform_enum
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
keys64
MemoryTracker
:
Peak
memory
usage
(
for
query
):
10
.
14
GiB
.
Elapsed
:
8
.
549
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
116
.
97
million
rows
/
s
.,
3
.
16
GB
/
s
.)
WITH
toUInt32
(
'1'
||
substr
(
user_id
,
6
))
AS
user_int
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 0)'
)
AS
platform_enum
SELECT
user_int
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_int
,
platform_enum
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
keys64
Peak
memory
usage
(
for
query
):
10
.
14
GiB
.
Elapsed
:
6
.
247
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
160
.
09
million
rows
/
s
.,
4
.
32
GB
/
s
.)
It can be especially useful when you tries to do GROUP BY lc_column_1, lc_column_2 and ClickHouseยฎ falls back to serialized algorithm.
Two LowCardinality Columns in GROUP BY
SELECT
app
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
app
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
low_cardinality_key_string
MemoryTracker
:
Peak
memory
usage
(
for
query
):
43
.
81
MiB
.
Elapsed
:
0
.
545
sec
.
Processed
1
.
00
billion
rows
,
5
.
00
GB
(
1
.
83
billion
rows
/
s
.,
9
.
17
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
app
,
platform
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
serialized
-- Slowest method!
MemoryTracker
:
Peak
memory
usage
(
for
query
):
222
.
86
MiB
.
Elapsed
:
2
.
923
sec
.
Processed
1
.
00
billion
rows
,
6
.
00
GB
(
342
.
11
million
rows
/
s
.,
2
.
05
GB
/
s
.)
SELECT
CAST
(
app
,
'FixedString(6)'
)
AS
app_fx
,
CAST
(
platform
,
'FixedString(4)'
)
AS
platform_fx
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
app_fx
,
platform_fx
FORMAT
`
Null
`
Aggregator
:
Aggregation
method
:
keys128
MemoryTracker
:
Peak
memory
usage
(
for
query
):
160
.
23
MiB
.
Elapsed
:
1
.
632
sec
.
Processed
1
.
00
billion
rows
,
6
.
00
GB
(
612
.
63
million
rows
/
s
.,
3
.
68
GB
/
s
.)
Split your query in multiple smaller queries and execute them one BY one
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_id
,
platform
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
28
.
19
GiB
.
Elapsed
:
7
.
375
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
135
.
60
million
rows
/
s
.,
3
.
66
GB
/
s
.)
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
(
cityHash64
(
user_id
)
%
4
)
=
0
GROUP
BY
user_id
,
platform
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
8
.
16
GiB
.
Elapsed
:
2
.
910
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
343
.
64
million
rows
/
s
.,
9
.
28
GB
/
s
.)
Shard your data by one of common high cardinal GROUP BY key
So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables.
Letโs create 2 distributed tables with different distribution: rand() and by user_id
CREATE
TABLE
sessions_distributed
AS
sessions
ENGINE
=
Distributed
(
'distr-groupby'
,
default
,
sessions
,
rand
());
INSERT
INTO
sessions_distributed
WITH
CAST
(
number
%
4
,
'Enum8(\'
Orange
\
' = 0, \'
Melon
\
' = 1, \'
Red
\
' = 2, \'
Blue
\
' = 3)'
)
AS
app
,
concat
(
'UID: '
,
leftPad
(
toString
(
number
%
20000000
),
8
,
'0'
))
AS
user_id
,
toDateTime
(
'2021-10-01 10:11:12'
)
+
(
number
/
300
)
AS
created_at
,
CAST
((
number
+
14
)
%
3
,
'Enum8(\'
Bat
\
' = 0, \'
Mice
\
' = 1, \'
Rat
\
' = 2)'
)
AS
platform
,
number
%
17
AS
clicks
,
generateUUIDv4
()
AS
session_id
SELECT
app
,
user_id
,
created_at
,
platform
,
clicks
,
session_id
FROM
numbers_mt
(
1000000000
);
CREATE
TABLE
sessions_2
ON
CLUSTER
'distr-groupby'
(
`
app
`
LowCardinality
(
String
),
`
user_id
`
String
,
`
created_at
`
DateTime
,
`
platform
`
LowCardinality
(
String
),
`
clicks
`
UInt32
,
`
session_id
`
UUID
)
ENGINE
=
MergeTree
PARTITION
BY
toYYYYMM
(
created_at
)
ORDER
BY
(
app
,
user_id
,
session_id
,
created_at
);
CREATE
TABLE
sessions_distributed_2
AS
sessions
ENGINE
=
Distributed
(
'distr-groupby'
,
default
,
sessions_2
,
cityHash64
(
user_id
));
INSERT
INTO
sessions_distributed_2
WITH
CAST
(
number
%
4
,
'Enum8(\'
Orange
\
' = 0, \'
Melon
\
' = 1, \'
Red
\
' = 2, \'
Blue
\
' = 3)'
)
AS
app
,
concat
(
'UID: '
,
leftPad
(
toString
(
number
%
20000000
),
8
,
'0'
))
AS
user_id
,
toDateTime
(
'2021-10-01 10:11:12'
)
+
(
number
/
300
)
AS
created_at
,
CAST
((
number
+
14
)
%
3
,
'Enum8(\'
Bat
\
' = 0, \'
Mice
\
' = 1, \'
Rat
\
' = 2)'
)
AS
platform
,
number
%
17
AS
clicks
,
generateUUIDv4
()
AS
session_id
SELECT
app
,
user_id
,
created_at
,
platform
,
clicks
,
session_id
FROM
numbers_mt
(
1000000000
);
SELECT
app
,
platform
,
sum
(
clicks
)
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
clicks
FROM
sessions_distributed
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
;
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
0
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
12
.
02
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
1
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
12
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
2
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
12
.
05
GiB
.
MemoryTracker
:
Peak
memory
usage
(
for
query
):
12
.
20
GiB
.
12
rows
in
set
.
Elapsed
:
28
.
345
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
35
.
28
million
rows
/
s
.,
1
.
13
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
clicks
)
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
clicks
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
;
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
0
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
1
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
2
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
MemoryTracker
:
Peak
memory
usage
(
for
query
):
5
.
61
GiB
.
12
rows
in
set
.
Elapsed
:
11
.
952
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
83
.
66
million
rows
/
s
.,
2
.
68
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
clicks
)
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
clicks
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
SETTINGS
optimize_distributed_group_by_sharding_key
=
1
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
0
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
1
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
2
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
MemoryTracker
:
Peak
memory
usage
(
for
query
):
5
.
61
GiB
.
12
rows
in
set
.
Elapsed
:
11
.
916
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
83
.
92
million
rows
/
s
.,
2
.
69
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
clicks
)
FROM
cluster
(
'distr-groupby'
,
view
(
SELECT
app
,
platform
,
sum
(
clicks
)
as
clicks
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
clicks
FROM
sessions_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
))
GROUP
BY
app
,
platform
;
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
0
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
1
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
[
chi
-
distr
-
groupby
-
distr
-
groupby
-
2
-
0
-
0
]
MemoryTracker
:
Current
memory
usage
(
for
query
):
5
.
05
GiB
.
MemoryTracker
:
Peak
memory
usage
(
for
query
):
5
.
55
GiB
.
12
rows
in
set
.
Elapsed
:
9
.
491
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
105
.
36
million
rows
/
s
.,
3
.
37
GB
/
s
.)
Query with bigger state:
SELECT
app
,
platform
,
sum
(
last_click
)
as
sum
,
max
(
max_clicks
)
as
max
,
min
(
min_clicks
)
as
min
,
max
(
max_time
)
as
max_time
,
min
(
min_time
)
as
min_time
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
last_click
,
max
(
clicks
)
AS
max_clicks
,
min
(
clicks
)
AS
min_clicks
,
max
(
created_at
)
AS
max_time
,
min
(
created_at
)
AS
min_time
FROM
sessions_distributed
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
19
.
95
GiB
.
12
rows
in
set
.
Elapsed
:
34
.
339
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
29
.
12
million
rows
/
s
.,
932
.
03
MB
/
s
.)
SELECT
app
,
platform
,
sum
(
last_click
)
as
sum
,
max
(
max_clicks
)
as
max
,
min
(
min_clicks
)
as
min
,
max
(
max_time
)
as
max_time
,
min
(
min_time
)
as
min_time
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
last_click
,
max
(
clicks
)
AS
max_clicks
,
min
(
clicks
)
AS
min_clicks
,
max
(
created_at
)
AS
max_time
,
min
(
created_at
)
AS
min_time
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
10
.
09
GiB
.
12
rows
in
set
.
Elapsed
:
13
.
220
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
75
.
64
million
rows
/
s
.,
2
.
42
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
last_click
)
AS
sum
,
max
(
max_clicks
)
AS
max
,
min
(
min_clicks
)
AS
min
,
max
(
max_time
)
AS
max_time
,
min
(
min_time
)
AS
min_time
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
last_click
,
max
(
clicks
)
AS
max_clicks
,
min
(
clicks
)
AS
min_clicks
,
max
(
created_at
)
AS
max_time
,
min
(
created_at
)
AS
min_time
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
SETTINGS
optimize_distributed_group_by_sharding_key
=
1
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
10
.
09
GiB
.
12
rows
in
set
.
Elapsed
:
13
.
361
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
74
.
85
million
rows
/
s
.,
2
.
40
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
last_click
)
AS
sum
,
max
(
max_clicks
)
AS
max
,
min
(
min_clicks
)
AS
min
,
max
(
max_time
)
AS
max_time
,
min
(
min_time
)
AS
min_time
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
last_click
,
max
(
clicks
)
AS
max_clicks
,
min
(
clicks
)
AS
min_clicks
,
max
(
created_at
)
AS
max_time
,
min
(
created_at
)
AS
min_time
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
SETTINGS
distributed_group_by_no_merge
=
2
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
10
.
02
GiB
.
12
rows
in
set
.
Elapsed
:
9
.
789
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
102
.
15
million
rows
/
s
.,
3
.
27
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
sum
),
max
(
max
),
min
(
min
),
max
(
max_time
)
AS
max_time
,
min
(
min_time
)
AS
min_time
FROM
cluster
(
'distr-groupby'
,
view
(
SELECT
app
,
platform
,
sum
(
last_click
)
AS
sum
,
max
(
max_clicks
)
AS
max
,
min
(
min_clicks
)
AS
min
,
max
(
max_time
)
AS
max_time
,
min
(
min_time
)
AS
min_time
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
argMax
(
clicks
,
created_at
)
AS
last_click
,
max
(
clicks
)
AS
max_clicks
,
min
(
clicks
)
AS
min_clicks
,
max
(
created_at
)
AS
max_time
,
min
(
created_at
)
AS
min_time
FROM
sessions_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
))
GROUP
BY
app
,
platform
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
10
.
09
GiB
.
12
rows
in
set
.
Elapsed
:
9
.
525
sec
.
Processed
1
.
00
billion
rows
,
32
.
00
GB
(
104
.
98
million
rows
/
s
.,
3
.
36
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
sessions
)
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
uniq
(
session_id
)
as
sessions
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
MemoryTracker
:
Peak
memory
usage
(
for
query
):
14
.
57
GiB
.
12
rows
in
set
.
Elapsed
:
37
.
730
sec
.
Processed
1
.
00
billion
rows
,
44
.
01
GB
(
26
.
50
million
rows
/
s
.,
1
.
17
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
sessions
)
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
uniq
(
session_id
)
as
sessions
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
SETTINGS
optimize_distributed_group_by_sharding_key
=
1
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
14
.
56
GiB
.
12
rows
in
set
.
Elapsed
:
37
.
792
sec
.
Processed
1
.
00
billion
rows
,
44
.
01
GB
(
26
.
46
million
rows
/
s
.,
1
.
16
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
sessions
)
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
uniq
(
session_id
)
as
sessions
FROM
sessions_distributed_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
SETTINGS
distributed_group_by_no_merge
=
2
;
MemoryTracker
:
Peak
memory
usage
(
for
query
):
14
.
54
GiB
.
12
rows
in
set
.
Elapsed
:
17
.
762
sec
.
Processed
1
.
00
billion
rows
,
44
.
01
GB
(
56
.
30
million
rows
/
s
.,
2
.
48
GB
/
s
.)
SELECT
app
,
platform
,
sum
(
sessions
)
FROM
cluster
(
'distr-groupby'
,
view
(
SELECT
app
,
platform
,
sum
(
sessions
)
as
sessions
FROM
(
SELECT
argMax
(
app
,
created_at
)
AS
app
,
argMax
(
platform
,
created_at
)
AS
platform
,
user_id
,
uniq
(
session_id
)
as
sessions
FROM
sessions_2
GROUP
BY
user_id
)
GROUP
BY
app
,
platform
))
GROUP
BY
app
,
platform
MemoryTracker
:
Peak
memory
usage
(
for
query
):
14
.
55
GiB
.
12
rows
in
set
.
Elapsed
:
17
.
574
sec
.
Processed
1
.
00
billion
rows
,
44
.
01
GB
(
56
.
90
million
rows
/
s
.,
2
.
50
GB
/
s
.)
Reduce number of threads
Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution.
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_id
,
platform
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
28
.
19
GiB
.
Elapsed
:
7
.
375
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
135
.
60
million
rows
/
s
.,
3
.
66
GB
/
s
.)
SET
max_threads
=
2
;
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
user_id
,
platform
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
13
.
26
GiB
.
Elapsed
:
62
.
014
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
16
.
13
million
rows
/
s
.,
435
.
41
MB
/
s
.)
UNION ALL
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
GROUP
BY
app
,
user_id
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
24
.
19
GiB
.
Elapsed
:
5
.
043
sec
.
Processed
1
.
00
billion
rows
,
27
.
00
GB
(
198
.
29
million
rows
/
s
.,
5
.
35
GB
/
s
.)
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
app
=
'Orange'
GROUP
BY
user_id
UNION
ALL
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
app
=
'Red'
GROUP
BY
user_id
UNION
ALL
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
app
=
'Melon'
GROUP
BY
user_id
UNION
ALL
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
app
=
'Blue'
GROUP
BY
user_id
FORMAT
Null
MemoryTracker
:
Peak
memory
usage
(
for
query
):
7
.
99
GiB
.
Elapsed
:
2
.
852
sec
.
Processed
1
.
00
billion
rows
,
27
.
01
GB
(
350
.
74
million
rows
/
s
.,
9
.
47
GB
/
s
.)
aggregation_in_order
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
app
=
'Orange'
GROUP
BY
user_id
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
969
.
33
MiB
.
Elapsed
:
2
.
494
sec
.
Processed
250
.
09
million
rows
,
6
.
75
GB
(
100
.
27
million
rows
/
s
.,
2
.
71
GB
/
s
.)
SET
optimize_aggregation_in_order
=
1
;
SELECT
user_id
,
sum
(
clicks
)
FROM
sessions
WHERE
app
=
'Orange'
GROUP
BY
app
,
user_id
FORMAT
`
Null
`
AggregatingInOrderTransform
:
Aggregating
in
order
MemoryTracker
:
Peak
memory
usage
(
for
query
):
169
.
24
MiB
.
Elapsed
:
4
.
925
sec
.
Processed
250
.
09
million
rows
,
6
.
75
GB
(
50
.
78
million
rows
/
s
.,
1
.
37
GB
/
s
.)
Reduce dimensions from GROUP BY with functions like sumMap, *Resample
One
SELECT
user_id
,
toDate
(
created_at
)
AS
day
,
sum
(
clicks
)
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
,
'Red'
,
'Blue'
))
GROUP
BY
user_id
,
day
FORMAT
`
Null
`
MemoryTracker
:
Peak
memory
usage
(
for
query
):
50
.
74
GiB
.
Elapsed
:
22
.
671
sec
.
Processed
594
.
39
million
rows
,
18
.
46
GB
(
26
.
22
million
rows
/
s
.,
814
.
41
MB
/
s
.)
SELECT
user_id
,
(
toDate
(
'2021-10-01'
)
+
date_diff
)
-
1
AS
day
,
clicks
FROM
(
SELECT
user_id
,
sumResample
(
0
,
31
,
1
)(
clicks
,
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
))
AS
clicks_arr
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
,
'Red'
,
'Blue'
))
GROUP
BY
user_id
)
ARRAY
JOIN
clicks_arr
AS
clicks
,
arrayEnumerate
(
clicks_arr
)
AS
date_diff
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
8
.
24
GiB
.
Elapsed
:
5
.
191
sec
.
Processed
594
.
39
million
rows
,
18
.
46
GB
(
114
.
50
million
rows
/
s
.,
3
.
56
GB
/
s
.)
Multiple
SELECT
user_id
,
platform
,
toDate
(
created_at
)
AS
day
,
sum
(
clicks
)
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
AND
user_id
=
'UID: 08525196'
GROUP
BY
user_id
,
platform
,
day
ORDER
BY
user_id
,
day
,
platform
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
29
.
50
GiB
.
Elapsed
:
8
.
181
sec
.
Processed
198
.
14
million
rows
,
6
.
34
GB
(
24
.
22
million
rows
/
s
.,
775
.
14
MB
/
s
.)
WITH
arrayJoin
(
arrayZip
(
clicks_arr_lvl_2
,
range
(
3
)))
AS
clicks_res
SELECT
user_id
,
CAST
(
clicks_res
.
2
+
1
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
)
AS
platform
,
(
toDate
(
'2021-10-01'
)
+
date_diff
)
-
1
AS
day
,
clicks_res
.
1
AS
clicks
FROM
(
SELECT
user_id
,
sumResampleResample
(
1
,
4
,
1
,
0
,
31
,
1
)(
clicks
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
),
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
))
AS
clicks_arr
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
GROUP
BY
user_id
)
ARRAY
JOIN
clicks_arr
AS
clicks_arr_lvl_2
,
range
(
31
)
AS
date_diff
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
9
.
92
GiB
.
Elapsed
:
4
.
170
sec
.
Processed
198
.
14
million
rows
,
6
.
34
GB
(
47
.
52
million
rows
/
s
.,
1
.
52
GB
/
s
.)
WITH
arrayJoin
(
arrayZip
(
clicks_arr_lvl_2
,
range
(
3
)))
AS
clicks_res
SELECT
user_id
,
CAST
(
clicks_res
.
2
+
1
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
)
AS
platform
,
(
toDate
(
'2021-10-01'
)
+
date_diff
)
-
1
AS
day
,
clicks_res
.
1
AS
clicks
FROM
(
SELECT
user_id
,
sumResampleResample
(
1
,
4
,
1
,
0
,
31
,
1
)(
clicks
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
),
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
))
AS
clicks_arr
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
GROUP
BY
user_id
)
ARRAY
JOIN
clicks_arr
AS
clicks_arr_lvl_2
,
range
(
31
)
AS
date_diff
WHERE
clicks
>
0
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
10
.
14
GiB
.
Elapsed
:
9
.
533
sec
.
Processed
198
.
14
million
rows
,
6
.
34
GB
(
20
.
78
million
rows
/
s
.,
665
.
20
MB
/
s
.)
SELECT
user_id
,
CAST
(
plat
+
1
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
)
AS
platform
,
(
toDate
(
'2021-10-01'
)
+
date_diff
)
-
1
AS
day
,
clicks
FROM
(
WITH
(
SELECT
flatten
(
arrayMap
(
x
->
range
(
3
)
AS
platforms
,
range
(
31
)
as
days
)))
AS
platform_arr
,
(
SELECT
flatten
(
arrayMap
(
x
->
[
x
,
x
,
x
],
range
(
31
)
as
days
)))
AS
days_arr
SELECT
user_id
,
flatten
(
sumResampleResample
(
1
,
4
,
1
,
0
,
31
,
1
)(
clicks
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
),
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
)))
AS
clicks_arr
,
platform_arr
,
days_arr
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
GROUP
BY
user_id
)
ARRAY
JOIN
clicks_arr
AS
clicks
,
platform_arr
AS
plat
,
days_arr
AS
date_diff
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
9
.
95
GiB
.
Elapsed
:
3
.
095
sec
.
Processed
198
.
14
million
rows
,
6
.
34
GB
(
64
.
02
million
rows
/
s
.,
2
.
05
GB
/
s
.)
SELECT
user_id
,
CAST
(
plat
+
1
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
)
AS
platform
,
(
toDate
(
'2021-10-01'
)
+
date_diff
)
-
1
AS
day
,
clicks
FROM
(
WITH
(
SELECT
flatten
(
arrayMap
(
x
->
range
(
3
)
AS
platforms
,
range
(
31
)
as
days
)))
AS
platform_arr
,
(
SELECT
flatten
(
arrayMap
(
x
->
[
x
,
x
,
x
],
range
(
31
)
as
days
)))
AS
days_arr
SELECT
user_id
,
sumResampleResample
(
1
,
4
,
1
,
0
,
31
,
1
)(
clicks
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
),
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
))
AS
clicks_arr
,
arrayFilter
(
x
->
((
x
.
1
)
>
0
),
arrayZip
(
flatten
(
clicks_arr
),
platform_arr
,
days_arr
))
AS
result
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
GROUP
BY
user_id
)
ARRAY
JOIN
result
.
1
AS
clicks
,
result
.
2
AS
plat
,
result
.
3
AS
date_diff
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
9
.
93
GiB
.
Elapsed
:
4
.
717
sec
.
Processed
198
.
14
million
rows
,
6
.
34
GB
(
42
.
00
million
rows
/
s
.,
1
.
34
GB
/
s
.)
SELECT
user_id
,
CAST
(
range
%
3
,
'Enum8(\'
Rat
\
' = 0, \'
Mice
\
' = 1, \'
Bat
\
' = 2)'
)
AS
platform
,
toDate
(
'2021-10-01'
)
+
intDiv
(
range
,
3
)
AS
day
,
clicks
FROM
(
WITH
(
SELECT
range
(
93
)
)
AS
range_arr
SELECT
user_id
,
sumResample
(
0
,
93
,
1
)(
clicks
,
((
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
))
*
3
)
+
toUInt8
(
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 0, \'
Mice
\
' = 1, \'
Bat
\
' = 2)'
)))
AS
clicks_arr
,
range_arr
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
GROUP
BY
user_id
)
ARRAY
JOIN
clicks_arr
AS
clicks
,
range_arr
AS
range
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
8
.
24
GiB
.
Elapsed
:
4
.
838
sec
.
Processed
198
.
14
million
rows
,
6
.
36
GB
(
40
.
95
million
rows
/
s
.,
1
.
31
GB
/
s
.)
SELECT
user_id
,
sumResampleResample
(
1
,
4
,
1
,
0
,
31
,
1
)(
clicks
,
CAST
(
platform
,
'Enum8(\'
Rat
\
' = 1, \'
Mice
\
' = 2, \'
Bat
\
' = 3)'
),
toDate
(
created_at
)
-
toDate
(
'2021-10-01'
))
AS
clicks_arr
FROM
sessions
WHERE
(
created_at
>=
toDate
(
'2021-10-01'
))
AND
(
created_at
<
toDate
(
'2021-11-01'
))
AND
(
app
IN
(
'Orange'
))
GROUP
BY
user_id
FORMAT
`
Null
`
Peak
memory
usage
(
for
query
):
5
.
19
GiB
.
0
rows
in
set
.
Elapsed
:
1
.
160
sec
.
Processed
198
.
14
million
rows
,
6
.
34
GB
(
170
.
87
million
rows
/
s
.,
5
.
47
GB
/
s
.)
ARRAY JOIN can be expensive
https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/
sumMap, *Resample
https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/
Play with two-level
Disable:
SET
group_by_two_level_threshold
=
0
,
group_by_two_level_threshold_bytes
=
0
;
From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly.
It can improve query time.
https://github.com/ClickHouse/ClickHouse/pull/33439
GROUP BY in external memory
Slow!
Use hash function for GROUP BY keys
GROUP BY cityHash64(โxxxxโ)
Can lead to incorrect results as hash functions is not 1 to 1 mapping.
Performance bugs
https://github.com/ClickHouse/ClickHouse/issues/15005
https://github.com/ClickHouse/ClickHouse/issues/29131
https://github.com/ClickHouse/ClickHouse/issues/31120
https://github.com/ClickHouse/ClickHouse/issues/35096
Fixed in 22.7 |
| Markdown | [](https://altinity.com/)
- [Products](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- ##### PRODUCTS
Altinity.Cloud
[Managed cloud](https://altinity.com/managed-clickhouse/)
Deploy on Altinityโs cloud
[Bring your own cloud](https://altinity.com/managed-clickhouse/bring-your-own-cloud/)
Any region, any environment
[Vendor comparison](https://altinity.com/altinity-cloud-vs-clickhouse-cloud-faq)
[Pricing](https://altinity.com/clickhouse-pricing/)
Support for ClickHouseยฎ
[24/7 Support](https://altinity.com/clickhouse-support/)
[POC and Evaluative Support](https://altinity.com/poc-evaluation-support/)
[Training for ClickHouseยฎ](https://altinity.com/clickhouse-training/)
##### OPEN SOURCE SOFTWARE
[NEW Project Antalya Builds](https://altinity.com/project-antalya-real-time-data-lakes/)
Scale ClickHouse queries infinitely with 10X cheaper data lakes
[Altinity Stable Builds for ClickHouseยฎ](https://altinity.com/altinity-stable/)
LTS-certified binaries with 3 yearsโ support
[Altinity Kubernetes Operator for ClickHouseยฎ](https://altinity.com/kubernetes-operator/)
Manage ClickHouse clusters on Kubernetes effortlessly
[Altinity Backup for ClickHouseยฎ](https://altinity.com/altinity-backup-for-clickhouse/)
Simple backup and restore tool for ClickHouse
[See more \>](https://altinity.com/ecosystem/)
##### CLICKHOUSEยฎ SOLUTIONS
[Observability & Logging](https://altinity.com/observability-and-logging/)
Manage high-volume observability data efficiently
[Security Information and Event Management (SIEM)](https://altinity.com/security-information-and-event-management/)
Efficiently process and analyze high-volume security events
[Trading Analytics](https://altinity.com/trading-systems/)
Make faster market decisions
[Customer stories](https://altinity.com/customer-stories/)
See why our customers love us
[Altinity Plans and Features](https://altinity.com/plans-and-features-clickhouse/)
Managed service or support? Compare plans and features
- [Resources](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- Developer Center
[Documentation](https://docs.altinity.com/)
Product guides and tutorials
[Knowledge Base](https://kb.altinity.com/)
Answers to common questions and issues
[Past Webinars](https://altinity.com/blog/?cat=webinarspage)
Watch anytime, on demand.
[More Technical Content](https://altinity.com/developer-resources/)
How-to guides and videos
Learning
[What is ClickHouse?](https://altinity.com/clickhouse-database/)
New to ClickHouse? Start here
[Kubernetes Operator](https://altinity.com/kubernetes-operator/)
Learn how it works and who uses it
[Unevenly Distributed](https://altinity.com/unevenly-distributed/)
A Thought Leadership Series on Real-Time Data Lakes
[Altinity Stable Builds](https://altinity.com/altinity-stable/)
Learn why you should be using them in your prod environments
Community
[Open Source at Altinity](https://altinity.com/ecosystem/)
Explore our open source projects and contributions
[Events](https://altinity.com/events/)
Upcoming conferences, events, and webinars
[Slack Community](https://altinity.com/slack/)
Get help for any ClickHouse issue from experts
[Monthly Office Hours](https://altinity.com/office-hours-for-clickhouse/)
Meet our engineers live and ask your questions
- [Blog](https://altinity.com/blog)
- [Company](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- [About the company](https://altinity.com/about-us/)
[Leadership](https://altinity.com/about-us/#leadership)
[Press Releases](https://altinity.com/about-us/press-releases/)
[Partners](https://altinity.com/partners/)
[Customer Stories](https://altinity.com/customer-stories/)
[Careers](https://altinity.com/careers/)
##### Get in touch with ClickHouse experts.
[Contact Us](https://altinity.com/contact)
- [Contact](https://altinity.com/contact)
- [Tech Support](https://altinity.com/free-clickhouse-consultation/) [ClickHouseยฎ in Cloud](https://acm.altinity.cloud/signup)
- [Products]()
- [Altinity.Cloud](https://altinity.com/managed-clickhouse/)
- [Managed cloud](https://altinity.com/managed-clickhouse/)
- [Bring your own cloud](https://altinity.com/managed-clickhouse/bring-your-own-cloud/)
- [Vendor comparison](https://altinity.com/altinity-cloud-vs-clickhouse-cloud-faq/)
- [Pricing](https://altinity.com/clickhouse-pricing/)
- [Support for ClickHouseยฎ](https://altinity.com/clickhouse-support/)
- [24/7 Support](https://altinity.com/clickhouse-support/)
- [POC and evaluative support](https://altinity.com/poc-evaluation-support/)
- [Training for ClickHouse](https://altinity.com/clickhouse-training/)
- [Open Source Software](https://altinity.com/ecosystem/)
- [NEW Project Antalya Builds](https://altinity.com/project-antalya-real-time-data-lakes/)
- [Altinity Stable Builds for ClickHouseยฎ](https://altinity.com/altinity-stable/)
- [Altinity Kubernetes Operator for ClickHouseยฎ](https://altinity.com/kubernetes-operator/)
- [Altinity Backup for ClickHouseยฎ](https://altinity.com/altinity-backup-for-clickhouse/)
- [Customer Stories](https://altinity.com/customer-stories/)
- [Altinity Plans and Features](https://altinity.com/plans-and-features-clickhouse/)
- [Resources]()
- [Developer Center](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- [Documentation](https://docs.altinity.com/)
- [Knowledge Base](https://kb.altinity.com/)
- [Past Webinars](https://altinity.com/blog/?cat=webinarspage)
- [More Technical Content](https://altinity.com/developer-resources/)
- [Learning](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- [What is ClickHouse?](https://altinity.com/clickhouse-database/)
- [Kubernetes Operator](https://altinity.com/what-is-kubernetes/)
- [Unevenly Distributed](https://altinity.com/unevenly-distributed/)
- [Altinity Stable Builds](https://altinity.com/altinity-stable/)
- [Community](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- [Open Source at Altinity](https://altinity.com/ecosystem/)
- [Events](https://altinity.com/events)
- [Slack Community](http://www.localhost:10008/slack)
- [Monthly Office Hours](https://altinity.com/office-hours-for-clickhouse/)
- [Blog](https://altinity.com/blog)
- [Company]()
- [About the company](https://altinity.com/about-us/)
- [Leadership](https://altinity.com/about-us/#leadership)
- [Press Releases](https://altinity.com/about-us/press-releases/)
- [Partners](https://altinity.com/partners/)
- [Customer Stories](https://altinity.com/customer-stories/)
- [Careers](https://altinity.com/careers/)
- [Contact](https://altinity.com/contact)
- [Tech Support](https://altinity.com/free-clickhouse-consultation)
- [ClickHouseยฎ in Cloud](https://acm.altinity.cloud/signup?__hstc=173202428.421802398cbec3b5d6457748d00d825f.1702395478849.1769622795179.1769630357065.395&__hssc=173202428.9.1769630357065&__hsfp=e5063c5ad56e06035f046e27f2ea897a)
- [Altinityยฎ Knowledge Base for ClickHouseยฎ](https://kb.altinity.com/)
- [Engines](https://kb.altinity.com/engines/)
- [Atomic Database Engine](https://kb.altinity.com/engines/altinity-kb-atomic-database-engine/ "ClickHouseยฎ Atomic Database Engine")
- [How to Convert Ordinary to Atomic](https://kb.altinity.com/engines/altinity-kb-atomic-database-engine/how-to-convert-ordinary-to-atomic/)
- [How to Convert Atomic to Ordinary](https://kb.altinity.com/engines/altinity-kb-atomic-database-engine/altinity-kb-how-to-convert-atomic-to-ordinary/)
- [EmbeddedRocksDB & dictionary](https://kb.altinity.com/engines/altinity-kb-embeddedrocksdb-and-dictionary/)
- [MergeTree table engine family](https://kb.altinity.com/engines/mergetree-table-engine-family/)
- [CollapsingMergeTree vs ReplacingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/collapsing-vs-replacing/)
- [Part names & MVCC](https://kb.altinity.com/engines/mergetree-table-engine-family/part-naming-and-mvcc/)
- [Properly ordering and partitioning MergeTree tables](https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/ "How to pick an ORDER BY / PRIMARY KEY / PARTITION BY for the MergeTree family table")
- [AggregatingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/aggregatingmergetree/ "ClickHouseยฎ AggregatingMergeTree")
- [index & column files](https://kb.altinity.com/engines/mergetree-table-engine-family/index-and-column-files/)
- [Merge performance and OPTIMIZE FINAL](https://kb.altinity.com/engines/mergetree-table-engine-family/merge-performance-final-optimize-by/)
- [Nulls in order by](https://kb.altinity.com/engines/mergetree-table-engine-family/altinity-kb-nulls-in-order-by/)
- [ReplacingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/replacingmergetree/)
- [ReplacingMergeTree does not collapse duplicates](https://kb.altinity.com/engines/mergetree-table-engine-family/replacingmergetree/altinity-kb-replacingmergetree-does-not-collapse-duplicates/)
- [Skip index](https://kb.altinity.com/engines/mergetree-table-engine-family/skip-index/)
- [SummingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/summingmergetree/)
- [VersionedCollapsingMT](https://kb.altinity.com/engines/mergetree-table-engine-family/versioned-collapsing-mergetree/ "UPSERT by VersionedCollapsingMergeTree")
- [Queries & Syntax](https://kb.altinity.com/altinity-kb-queries-and-syntax/)
- [GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/)
- [GROUP BY tricks](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/)
- [Adjustable table partitioning](https://kb.altinity.com/altinity-kb-queries-and-syntax/variable-partitioning/)
- [DateTime64](https://kb.altinity.com/altinity-kb-queries-and-syntax/datetime64/)
- [DISTINCT & GROUP BY & LIMIT 1 BY what the difference](https://kb.altinity.com/altinity-kb-queries-and-syntax/distinct-vs-group-by-vs-limit-by/)
- [Imprecise literal Decimal or Float64 values](https://kb.altinity.com/altinity-kb-queries-and-syntax/literal-decimal-or-float/ "Imprecise parsing of literal Decimal or Float64")
- [Multiple aligned date columns in PARTITION BY expression](https://kb.altinity.com/altinity-kb-queries-and-syntax/multiple-date-column-in-partition-key/)
- [Row policies overhead](https://kb.altinity.com/altinity-kb-queries-and-syntax/row_policy_using_dictionary/ "Row policies overhead (hiding 'removed' tenants)")
- [Slow \`SELECT count()\`](https://kb.altinity.com/altinity-kb-queries-and-syntax/slow_select_count/ "Why is simple `SELECT count()` Slow in ClickHouseยฎ?")
- [trace\_log](https://kb.altinity.com/altinity-kb-queries-and-syntax/trace_log/ "Collecting query execution flamegraphs using system.trace_log")
- [Using array functions to mimic window-functions alike behavior](https://kb.altinity.com/altinity-kb-queries-and-syntax/array-functions-as-window/)
- [\-State & -Merge combinators](https://kb.altinity.com/altinity-kb-queries-and-syntax/state-and-merge-combinators/)
- [ALTER MODIFY COLUMN is stuck, the column is inaccessible.](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-alter-modify-column-is-stuck-the-column-is-inaccessible/)
- [ANSI SQL mode](https://kb.altinity.com/altinity-kb-queries-and-syntax/ansi-sql-mode/)
- [Async INSERTs](https://kb.altinity.com/altinity-kb-queries-and-syntax/async-inserts/)
- [Atomic insert](https://kb.altinity.com/altinity-kb-queries-and-syntax/atomic-insert/)
- [ClickHouse Projections](https://kb.altinity.com/altinity-kb-queries-and-syntax/projections-examples/ "ClickHouseยฎ Projections")
- [Cumulative Anything](https://kb.altinity.com/altinity-kb-queries-and-syntax/cumulative-unique/)
- [Data types on disk and in RAM](https://kb.altinity.com/altinity-kb-queries-and-syntax/data-types-on-disk-and-in-ram/)
- [DELETE via tombstone column](https://kb.altinity.com/altinity-kb-queries-and-syntax/delete-via-tombstone-column/)
- [EXPLAIN query](https://kb.altinity.com/altinity-kb-queries-and-syntax/explain-query/)
- [Fill missing values at query time](https://kb.altinity.com/altinity-kb-queries-and-syntax/fill-missing-values-at-query-time/)
- [FINAL clause speed](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-final-clause-speed/)
- [Join with Calendar using Arrays](https://kb.altinity.com/altinity-kb-queries-and-syntax/join-with-calendar-using-arrays/)
- [JOINs](https://kb.altinity.com/altinity-kb-queries-and-syntax/joins/)
- [JOIN optimization tricks](https://kb.altinity.com/altinity-kb-queries-and-syntax/joins/joins-tricks/)
- [JSONExtract to parse many attributes at a time](https://kb.altinity.com/altinity-kb-queries-and-syntax/jsonextract-to-parse-many-attributes-at-a-time/)
- [KILL QUERY](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-kill-query/)
- [Lag / Lead](https://kb.altinity.com/altinity-kb-queries-and-syntax/lag-lead/)
- [Machine learning in ClickHouse](https://kb.altinity.com/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse/)
- [Mutations](https://kb.altinity.com/altinity-kb-queries-and-syntax/mutations/)
- [OPTIMIZE vs OPTIMIZE FINAL](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-optimize-vs-optimize-final/)
- [Parameterized views](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-parameterized-views/)
- [partial-projection-optimization](https://kb.altinity.com/altinity-kb-queries-and-syntax/partial-projection-optimization/ "Use both projection and raw data in single query")
- [PIVOT / UNPIVOT](https://kb.altinity.com/altinity-kb-queries-and-syntax/pivot-unpivot/)
- [Possible deadlock avoided. Client should retry](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-possible-deadlock-avoided.-client-should-retry/)
- [Roaring bitmaps for calculating retention](https://kb.altinity.com/altinity-kb-queries-and-syntax/roaring-bitmaps-for-calculating-retention/)
- [SAMPLE by](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-sample-by/)
- [Sampling Example](https://kb.altinity.com/altinity-kb-queries-and-syntax/sampling-example/)
- [Simple aggregate functions & combinators](https://kb.altinity.com/altinity-kb-queries-and-syntax/simplestateif-or-ifstate-for-simple-aggregate-functions/)
- [Skip indexes](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/)
- [Example: minmax](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/minmax/)
- [Skip index bloom\_filter Example](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column/)
- [Skip indexes examples](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/skip-indexes-examples/)
- [Time zones](https://kb.altinity.com/altinity-kb-queries-and-syntax/time-zones/)
- [Time-series alignment with interpolation](https://kb.altinity.com/altinity-kb-queries-and-syntax/ts-interpolation/)
- [Top N & Remain](https://kb.altinity.com/altinity-kb-queries-and-syntax/top-n-and-remain/)
- [Troubleshooting](https://kb.altinity.com/altinity-kb-queries-and-syntax/troubleshooting/)
- [TTL](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/)
- [MODIFY (ADD) TTL](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/modify-ttl/ "MODIFY (ADD) TTL in ClickHouseยฎ")
- [What are my TTL settings](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/what-are-my-ttls/ "What are my TTL settings?")
- [TTL GROUP BY Examples](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/ttl-group-by-examples/)
- [TTL Recompress example](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example/)
- [UPDATE via Dictionary](https://kb.altinity.com/altinity-kb-queries-and-syntax/update-via-dictionary/)
- [Values mapping](https://kb.altinity.com/altinity-kb-queries-and-syntax/values-mapping/)
- [Window functions](https://kb.altinity.com/altinity-kb-queries-and-syntax/window-functions/)
- [Functions](https://kb.altinity.com/altinity-kb-functions/)
- [Encoding and Decoding of quantileTDigest states](https://kb.altinity.com/altinity-kb-functions/how-to-encode-decode-quantiletdigest-state/ "How to encode/decode quantileTDigest states from/to list of centroids")
- [kurt & skew](https://kb.altinity.com/altinity-kb-functions/kurt_skew_statistics/ "kurt & skew statistical functions in ClickHouseยฎ ")
- [\-Resample vs -If vs -Map vs Subquery](https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/)
- [arrayFold](https://kb.altinity.com/altinity-kb-functions/arrayfold/)
- [arrayMap, arrayJoin or ARRAY JOIN memory usage](https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/)
- [assumeNotNull and friends](https://kb.altinity.com/altinity-kb-functions/assumenotnull-and-friends/)
- [Encrypt](https://kb.altinity.com/altinity-kb-functions/encrypt/)
- [sequenceMatch](https://kb.altinity.com/altinity-kb-functions/altinity-kb-sequencematch/)
- [Integrations](https://kb.altinity.com/altinity-kb-integrations/)
- [Altinity Cloud Access Management](https://kb.altinity.com/altinity-kb-integrations/altinity-cloud/)
- [ClickHouseยฎ python drivers](https://kb.altinity.com/altinity-kb-integrations/clickhouse_python_drivers/)
- [Integrating ClickHouseยฎ with MySQL](https://kb.altinity.com/altinity-kb-integrations/mysql-clickhouse/ "MySQL")
- [ODBC Driver for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-integrations/clickhouse-odbc/)
- [Spark](https://kb.altinity.com/altinity-kb-integrations/spark/ "ClickHouseยฎ + Spark")
- [BI Tools](https://kb.altinity.com/altinity-kb-integrations/bi-tools/)
- [CatBoost / MindsDB / Fast.ai](https://kb.altinity.com/altinity-kb-integrations/catboost-mindsdb-fast.ai/)
- [Google S3 (GCS)](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-google-s3-gcs/)
- [Kafka](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/)
- [Kafka virtual columns](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-virtual-columns/ "Kafka engine Virtual columns")
- [Schema Inference for Kafka](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/kafka-schema-inference/ "Inferring Schema from AvroConfluent Messages in Kafka for ClickHouseยฎ")
- [Setting the background message broker schedule pool size](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/background_message_broker_schedule_pool_size/)
- [Adjusting librdkafka settings](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-adjusting-librdkafka-settings/)
- [Error handling](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/error-handling/)
- [Exactly once semantics](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-exactly-once-semantics/)
- [Kafka main parsing loop](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-main-parsing-loop/)
- [Kafka parallel consuming](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-parallel-consuming/)
- [Multiple MVs attached to Kafka table](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-mv-consuming/)
- [Rewind / fast-forward / replay](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-rewind-fast-forward-replay/)
- [SELECTs from engine=Kafka](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-selects-from-engine-kafka/)
- [RabbitMQ](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-rabbitmq/)
- [RabbitMQ Error handling](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-rabbitmq/error-handling/)
- [Setup & maintenance](https://kb.altinity.com/altinity-kb-setup-and-maintenance/)
- [S3 & object storage](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/)
- [AWS S3 Recipes](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/aws-s3-recipes/)
- [Clean up orphaned objects on s3](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/clean-up-orphaned-objects-on-s3.md/)
- [s3 and mutations](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3_and_mutations/ "How much data are written to S3 during mutations")
- [s3 cached table](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3_cache_example/ "Example of the table at s3 with cache")
- [S3Disk](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3disk/)
- [AggregateFunction(uniq, UUID) doubled after ClickHouseยฎ upgrade](https://kb.altinity.com/altinity-kb-setup-and-maintenance/uniq-uuid-doubled-clickhouse-upgrade/)
- [Can not connect to my ClickHouseยฎ server](https://kb.altinity.com/altinity-kb-setup-and-maintenance/connection-problems/)
- [cgroups and k8s](https://kb.altinity.com/altinity-kb-setup-and-maintenance/cgroups_k8s/ "cgroups and kubernetes cloud providers")
- [ClickHouse logs and Vector.dev](https://kb.altinity.com/altinity-kb-setup-and-maintenance/ch-logs-2-json-vectordev/ "Transforming ClickHouse logs to ndjson using Vector.dev")
- [ClickHouse operator](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-operator/ "Altinity Kubernetes Operator For ClickHouseยฎ")
- [ClickHouseยฎ and different filesystems](https://kb.altinity.com/altinity-kb-setup-and-maintenance/filesystems/)
- [ClickHouseยฎ RBAC example](https://kb.altinity.com/altinity-kb-setup-and-maintenance/rbac/ "ClickHouseยฎ Access Control and Account Management (RBAC)")
- [Client Timeouts](https://kb.altinity.com/altinity-kb-setup-and-maintenance/client-timeouts/)
- [Compatibility layer for the Altinity Kubernetes Operator for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/monitoring-operator-exporter-compatibility/)
- [Convert uniqExact to uniq(Combined)](https://kb.altinity.com/altinity-kb-setup-and-maintenance/uniqexact-to-uniq-combined/ "How to convert uniqExact states to approximate uniq functions states")
- [Custom Settings](https://kb.altinity.com/altinity-kb-setup-and-maintenance/custom_settings/)
- [Description of asynchronous\_metrics](https://kb.altinity.com/altinity-kb-setup-and-maintenance/asynchronous_metrics_descr/)
- [disk encryption](https://kb.altinity.com/altinity-kb-setup-and-maintenance/disk_encryption/ "ClickHouseยฎ data/disk encryption (at rest)")
- [DR two DC](https://kb.altinity.com/altinity-kb-setup-and-maintenance/dr-two-dc/)
- [How ALTERs work in ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/alters/)
- [How to recreate a table](https://kb.altinity.com/altinity-kb-setup-and-maintenance/how_to_recreate_table/ "How to recreate a table in case of total corruption of the replication queue")
- [http\_handlers](https://kb.altinity.com/altinity-kb-setup-and-maintenance/http_handlers/ "http handler example")
- [Jemalloc heap profiling](https://kb.altinity.com/altinity-kb-setup-and-maintenance/jemalloc_heap_profiling/)
- [Logging](https://kb.altinity.com/altinity-kb-setup-and-maintenance/logging/)
- [Merge Memory in metric\_log](https://kb.altinity.com/altinity-kb-setup-and-maintenance/metric_log_ram/ "High Memory Usage During Merge in system.metric_log")
- [Precreate parts using clickhouse-local](https://kb.altinity.com/altinity-kb-setup-and-maintenance/precreate_parts_using_clickhouse_local.sh/)
- [Recovery after complete data loss](https://kb.altinity.com/altinity-kb-setup-and-maintenance/recovery-after-complete-data-loss/)
- [Replicate RBAC with Keeper](https://kb.altinity.com/altinity-kb-setup-and-maintenance/users_in_keeper/ "How to Replicate ClickHouse RBAC Users and Grants with ZooKeeper/Keeper")
- [Replication: Can not resolve host of another ClickHouseยฎ server](https://kb.altinity.com/altinity-kb-setup-and-maintenance/change-me/)
- [source parts sizeis greater than the current maximum](https://kb.altinity.com/altinity-kb-setup-and-maintenance/source-pars-size-is-greater-than-maximum/ "source parts size is greater than the current maximum")
- [Successful ClickHouseยฎ deployment plan](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-deployment-plan/)
- [sysall database](https://kb.altinity.com/altinity-kb-setup-and-maintenance/sysall/ "sysall database (system tables on a cluster level)")
- [Timeouts during OPTIMIZE FINAL](https://kb.altinity.com/altinity-kb-setup-and-maintenance/timeouts-during-optimize-final/)
- [Use an executable dictionary as cron task](https://kb.altinity.com/altinity-kb-setup-and-maintenance/executable-dictionary/)
- [Useful settings to turn on](https://kb.altinity.com/altinity-kb-setup-and-maintenance/useful-setting-to-turn-on/ "Useful settings to turn on/Defaults that should be reconsidered")
- [Who ate my CPU](https://kb.altinity.com/altinity-kb-setup-and-maintenance/who-ate-my-cpu/)
- [Zookeeper session has expired](https://kb.altinity.com/altinity-kb-setup-and-maintenance/zookeeper-session-expired/)
- [Server config files](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-server-config-files/ "Server configuration files")
- [Aggressive merges](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-aggressive_merges/)
- [Altinity Backup for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-backup/)
- [Altinity packaging compatibility \>21.x and earlier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-packaging-compatibility-greater-than-21.x-and-earlier/)
- [AWS EC2 Storage](https://kb.altinity.com/altinity-kb-setup-and-maintenance/aws-ec2-storage/)
- [ClickHouseยฎ in Docker](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-clickhouse-in-docker/)
- [ClickHouseยฎ Monitoring](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-monitoring/)
- [ClickHouseยฎ versions](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-versions/)
- [Configure ClickHouseยฎ for low memory environments](https://kb.altinity.com/altinity-kb-setup-and-maintenance/configure_clickhouse_for_low_mem_envs/)
- [Converting MergeTree to Replicated](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-converting-mergetree-to-replicated/)
- [Data Migration](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/)
- [Export from MSSQL to ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/mssql-clickhouse/ "MSSQL bcp pipe to clickhouse-client")
- [add\_remove\_replica](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/add_remove_replica/ "Add/Remove a new replica to a ClickHouseยฎ cluster")
- [clickhouse-copier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/)
- [clickhouse-copier 20.3 and earlier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/altinity-kb-clickhouse-copier-20.3-and-earlier/)
- [clickhouse-copier 20.4 - 21.6](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/altinity-kb-clickhouse-copier-20.4_21.6/)
- [Kubernetes job for clickhouse-copier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/altinity-kb-clickhouse-copier-kubernetes-job/)
- [Distributed table to cluster](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/distributed-table-cluster/ "Distributed table to ClickHouseยฎ Cluster")
- [Fetch Alter Table](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/fetch_alter_table/)
- [Remote table function](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/remote-table-function/)
- [rsync](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/rsync/ "Moving ClickHouse to Another Server")
- [DDLWorker and DDL queue problems](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-ddlworker/)
- [There are N unfinished hosts (0 of them are currently active).](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-ddlworker/there-are-n-unfinished-hosts-0-of-them-are-currently-active/)
- [Desharding](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-desharding/ "Merge Shards")
- [differential backups using clickhouse-backup](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-backup-diff/)
- [High CPU usage](https://kb.altinity.com/altinity-kb-setup-and-maintenance/high-cpu-usage/ "High CPU usage in ClickHouseยฎ")
- [Load balancers](https://kb.altinity.com/altinity-kb-setup-and-maintenance/load-balancers/)
- [memory configuration settings](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-memory-configuration-settings/)
- [Useful queries](https://kb.altinity.com/altinity-kb-useful-queries/)
- [Check table metadata in zookeeper](https://kb.altinity.com/altinity-kb-useful-queries/table-meta-in-zookeeper/)
- [Compare query\_log for 2 intervals](https://kb.altinity.com/altinity-kb-useful-queries/compare_query_log_for_2_intervals/)
- [Debug hanging thing](https://kb.altinity.com/altinity-kb-useful-queries/debug-hang/)
- [Handy queries for system.query\_log](https://kb.altinity.com/altinity-kb-useful-queries/query_log/)
- [Ingestion metrics from system.part\_log](https://kb.altinity.com/altinity-kb-useful-queries/ingestion-rate-part_log/)
- [Remove block numbers from zookeeper for removed partitions](https://kb.altinity.com/altinity-kb-useful-queries/remove_unneeded_block_numbers/)
- [Removing tasks in the replication queue related to empty partitions](https://kb.altinity.com/altinity-kb-useful-queries/remove_empty_partitions_from_rq/)
- [Can detached parts be dropped?](https://kb.altinity.com/altinity-kb-useful-queries/detached-parts/ "Can detached parts in ClickHouseยฎ be dropped?")
- [Database Size - Table - Column size](https://kb.altinity.com/altinity-kb-useful-queries/altinity-kb-database-size-table-column-size/)
- [Notes on Various Errors with respect to replication and distributed connections](https://kb.altinity.com/altinity-kb-useful-queries/connection-issues-distributed-parts/)
- [Number of active parts in a partition](https://kb.altinity.com/altinity-kb-useful-queries/altinity-kb-number-of-active-parts-in-a-partition/)
- [Parts consistency](https://kb.altinity.com/altinity-kb-useful-queries/parts-consistency/)
- [Schema design](https://kb.altinity.com/altinity-kb-schema-design/)
- [ClickHouse limitations](https://kb.altinity.com/altinity-kb-schema-design/how-much-is-too-much/ "ClickHouseยฎ limitations")
- [ClickHouseยฎ row-level deduplication](https://kb.altinity.com/altinity-kb-schema-design/row-level-deduplication/)
- [Column backfilling from dictionary](https://kb.altinity.com/altinity-kb-schema-design/backfill_column/ "Column backfilling with alter/update using a dictionary")
- [Functions to count uniqs](https://kb.altinity.com/altinity-kb-schema-design/uniq-functions/)
- [How to change ORDER BY](https://kb.altinity.com/altinity-kb-schema-design/change-order-by/)
- [Ingestion of AggregateFunction](https://kb.altinity.com/altinity-kb-schema-design/ingestion-aggregate-function/)
- [insert deduplication](https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/ "Insert Deduplication / Insert Idempotency")
- [JSONEachRow, tuple, map and MVs](https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsoneachrow-tuples-and-mvs/ "JSONEachRow, Tuples, Maps and Materialized Views")
- [Pre-Aggregation approaches](https://kb.altinity.com/altinity-kb-schema-design/preaggregations/)
- [SnowflakeID](https://kb.altinity.com/altinity-kb-schema-design/snowflakeid/ "SnowflakeID for Efficient Primary Keys ")
- [Two columns indexing](https://kb.altinity.com/altinity-kb-schema-design/two-columns-indexing/)
- [Best schema for storing many metrics registered from the single source](https://kb.altinity.com/altinity-kb-schema-design/best-schema-for-storing-many-metrics-registered-from-the-single-source/)
- [Codecs](https://kb.altinity.com/altinity-kb-schema-design/codecs/)
- [Codecs on array columns](https://kb.altinity.com/altinity-kb-schema-design/codecs/codecs-on-array-columns/)
- [Codecs speed](https://kb.altinity.com/altinity-kb-schema-design/codecs/codecs-speed/)
- [How to test different compression codecs](https://kb.altinity.com/altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs/)
- [Dictionaries vs LowCardinality](https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-dictionaries-vs-lowcardinality/)
- [Flattened table](https://kb.altinity.com/altinity-kb-schema-design/flattened-table/)
- [Floats vs Decimals](https://kb.altinity.com/altinity-kb-schema-design/floats-vs-decimals/)
- [Ingestion performance and formats](https://kb.altinity.com/altinity-kb-schema-design/ingestion-performance-and-formats/)
- [IPs/masks](https://kb.altinity.com/altinity-kb-schema-design/how-to-store-ips/)
- [JSONAsString and Mat. View as JSON parser](https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsonasstring-and-mat.-view-as-json-parser/)
- [LowCardinality](https://kb.altinity.com/altinity-kb-schema-design/lowcardinality/)
- [Materialized Views](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/ "ClickHouseยฎ Materialized Views")
- [Idempotent insert MV](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/idempotent_inserts_mv/ "Idempotent inserts into a materialized view")
- [Backfill/populate MV in a controlled manner](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/backfill-populate-mv-in-a-controlled-manner/)
- [Using the Altinity Kubernetes Operator for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-kubernetes/)
- [Istio Issues](https://kb.altinity.com/altinity-kb-kubernetes/altinity-kb-istio-user-issue-k8s/)
- [Interfaces](https://kb.altinity.com/altinity-kb-interfaces/)
- [clickhouse-client](https://kb.altinity.com/altinity-kb-interfaces/altinity-kb-clickhouse-client/)
- [Upgrade](https://kb.altinity.com/upgrade/)
- [Vulnerabilities](https://kb.altinity.com/upgrade/vulnerabilities/)
- [ClickHouseยฎ Function/Engines/Settings Report](https://kb.altinity.com/upgrade/clickhouse-feature-report/)
- [Removing empty parts](https://kb.altinity.com/upgrade/removing-empty-parts/)
- [Removing lost parts](https://kb.altinity.com/upgrade/removing-lost-parts/)
- [Dictionaries](https://kb.altinity.com/altinity-kb-dictionaries/)
- [Dictionaries & arrays](https://kb.altinity.com/altinity-kb-dictionaries/dictionaries-and-arrays/)
- [Dictionary on the top of several tables using VIEW](https://kb.altinity.com/altinity-kb-dictionaries/dictionary-on-top-tables/)
- [Dimension table design](https://kb.altinity.com/altinity-kb-dictionaries/dimension_table_desing/)
- [Example of PostgreSQL dictionary](https://kb.altinity.com/altinity-kb-dictionaries/example-of-postgresql-dictionary/)
- [MySQL8 source for dictionaries](https://kb.altinity.com/altinity-kb-dictionaries/mysql8-source-for-dictionaries/)
- [Partial updates](https://kb.altinity.com/altinity-kb-dictionaries/partial-updates/)
- [range\_hashed example - open intervals](https://kb.altinity.com/altinity-kb-dictionaries/altinity-kb-range_hashed-example-open-intervals/)
- [Security named collections](https://kb.altinity.com/altinity-kb-dictionaries/security-named-collections/)
- [SPARSE\_HASHED VS HASHED vs HASHED\_ARRAY](https://kb.altinity.com/altinity-kb-dictionaries/altinity-kb-sparse_hashed-vs-hashed/)
- [Using This Knowledge Base](https://kb.altinity.com/using-this-knowledgebase/)
- [Mermaid Example](https://kb.altinity.com/using-this-knowledgebase/mermaid_example/)
- [ClickHouseยฎ Upgrade eBook](https://hubs.la/Q03ccpmq0 "Upgrade eBook")
- [ClickHouse TTLs in Practice (eBook)](https://altinity.com/clickhouse-ttl/?utm_campaign=34417448-CH_TTL_eBook&utm_source=KB&utm_medium=CTA_Sidebar)
[Edit this page](https://github.com/Altinity/altinityknowledgebase/edit/main/content/en/altinity-kb-queries-and-syntax/group-by/tricks.md) [Create child page](https://github.com/Altinity/altinityknowledgebase/new/main/content/en/altinity-kb-queries-and-syntax/group-by/tricks.md?filename=change-me.md&value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+the+top+of+the+page+%28weight+controls+how+its+ordered+amongst+other+pages+in+the+same+directory%3B+lowest+number+first%29.%0A%2A+Add+a+good+commit+message+at+the+bottom+of+the+page+%28%3C80+characters%3B+use+the+extended+description+field+for+more+detail%29.%0A%2A+Create+a+new+branch+so+you+can+preview+your+new+file+and+request+a+review+via+Pull+Request.%0A) [Create documentation issue](https://github.com/Altinity/altinityknowledgebase/issues/new?title=/altinity-kb-queries-and-syntax/group-by/tricks/) [Print entire section](http://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/printview/)
- [Tricks](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#tricks)
- [PreFilter values before GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#prefilter-values-before-group-by)
- [Use Fixed-width data types instead of String](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#use-fixed-width-data-types-instead-of-string)
- [Two LowCardinality Columns in GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#two-lowcardinality-columns-in-group-by)
- [Split your query in multiple smaller queries and execute them one BY one](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#split-your-query-in-multiple-smaller-queries-and-execute-them-one-by-one)
- [Shard your data by one of common high cardinal GROUP BY key](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#shard-your-data-by-one-of-common-high-cardinal-group-by-key)
- [Reduce number of threads](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#reduce-number-of-threads)
- [UNION ALL](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#union-all)
- [aggregation\_in\_order](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#aggregation_in_order)
- [Reduce dimensions from GROUP BY with functions like sumMap, \*Resample](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#reduce-dimensions-from-group-by-with-functions-like-summap-resample)
- [Play with two-level](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#play-with-two-level)
- [GROUP BY in external memory](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#group-by-in-external-memory)
- [Use hash function for GROUP BY keys](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#use-hash-function-for-group-by-keys)
- [Performance bugs](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#performance-bugs)
Altinityยฎ, Altinity.Cloudยฎ, and Altinity Stableยฎ are registered trademarks of Altinity, Inc. ClickHouseยฎ is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.
Run open-source ClickHouseยฎ better
Blogs, webinars, office hours, in-person events: [check out the latest technical content from Altinity.](https://hubs.la/Q0498sPf0)
1. [Queries & Syntax](https://kb.altinity.com/altinity-kb-queries-and-syntax/)
2. [GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/)
3. GROUP BY tricks
# GROUP BY tricks
Tricks for GROUP BY memory usage optimization
## Tricks
Testing dataset
```
CREATE TABLE sessions
(
`app` LowCardinality(String),
`user_id` String,
`created_at` DateTime,
`platform` LowCardinality(String),
`clicks` UInt32,
`session_id` UUID
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (app, user_id, session_id, created_at)
INSERT INTO sessions WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000)
0 rows in set. Elapsed: 46.078 sec. Processed 1.00 billion rows, 8.00 GB (21.70 million rows/s., 173.62 MB/s.)
โโdatabaseโโฌโtableโโโโโฌโcolumnโโโโโโฌโtypeโโโโโโโโโโโโโโโโโโโโฌโโโโโโโrowsโโฌโcompressed_bytesโโฌโcompressedโโฌโuncompressedโโฌโโโโโโโโโโโโโโratioโโฌโcodecโโ
โ default โ sessions โ session_id โ UUID โ 1000000000 โ 16065918103 โ 14.96 GiB โ 14.90 GiB โ 0.9958970223439835 โ โ
โ default โ sessions โ user_id โ String โ 1000000000 โ 3056977462 โ 2.85 GiB โ 13.04 GiB โ 4.57968701896828 โ โ
โ default โ sessions โ clicks โ UInt32 โ 1000000000 โ 1859359032 โ 1.73 GiB โ 3.73 GiB โ 2.151278979023993 โ โ
โ default โ sessions โ created_at โ DateTime โ 1000000000 โ 1332089630 โ 1.24 GiB โ 3.73 GiB โ 3.0028009451586226 โ โ
โ default โ sessions โ platform โ LowCardinality(String) โ 1000000000 โ 329702248 โ 314.43 MiB โ 956.63 MiB โ 3.042446801879252 โ โ
โ default โ sessions โ app โ LowCardinality(String) โ 1000000000 โ 4825544 โ 4.60 MiB โ 956.63 MiB โ 207.87333386660654 โ โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโ
```
All queries and datasets are unique, so in different situations different hacks could work better or worse.
### PreFilter values before GROUP BY
```
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE created_at > '2021-11-01 00:00:00'
GROUP BY user_id
HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat')
FORMAT `Null`
<Debug> MemoryTracker: Peak memory usage (for query): 18.36 GiB.
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE user_id IN (
SELECT user_id
FROM sessions
WHERE (platform = 'Rat') AND (clicks = 16) AND (created_at > '2021-11-01 00:00:00') -- So we will select user_id which could potentially match our HAVING clause in OUTER query.
) AND (created_at > '2021-11-01 00:00:00')
GROUP BY user_id
HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat')
FORMAT `Null`
<Debug> MemoryTracker: Peak memory usage (for query): 4.43 GiB.
```
### Use Fixed-width data types instead of String
For example, you have 2 strings which has values in special form like this
โABX 1412312312313โ
You can just remove the first 4 characters and convert the rest to UInt64
toUInt64(substr(โABX 1412312312313โ,5))
And you packed 17 bytes in 8, more than 2 times the improvement of size\!
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
Aggregator: Aggregation method: serialized
<Debug> MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
WITH
CAST(user_id, 'FixedString(14)') AS user_fx,
CAST(platform, 'FixedString(4)') AS platform_fx
SELECT
user_fx,
sum(clicks)
FROM sessions
GROUP BY
user_fx,
platform_fx
FORMAT `Null`
Aggregator: Aggregation method: keys256
MemoryTracker: Peak memory usage (for query): 22.24 GiB.
Elapsed: 6.637 sec. Processed 1.00 billion rows, 27.00 GB (150.67 million rows/s., 4.07 GB/s.)
WITH
CAST(user_id, 'FixedString(14)') AS user_fx,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_fx,
sum(clicks)
FROM sessions
GROUP BY
user_fx,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys128
MemoryTracker: Peak memory usage (for query): 14.14 GiB.
Elapsed: 5.335 sec. Processed 1.00 billion rows, 27.00 GB (187.43 million rows/s., 5.06 GB/s.)
WITH
toUInt32OrZero(trim( LEADING '0' FROM substr(user_id,6))) AS user_int,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_int,
sum(clicks)
FROM sessions
GROUP BY
user_int,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys64
MemoryTracker: Peak memory usage (for query): 10.14 GiB.
Elapsed: 8.549 sec. Processed 1.00 billion rows, 27.00 GB (116.97 million rows/s., 3.16 GB/s.)
WITH
toUInt32('1' || substr(user_id,6)) AS user_int,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_int,
sum(clicks)
FROM sessions
GROUP BY
user_int,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys64
Peak memory usage (for query): 10.14 GiB.
Elapsed: 6.247 sec. Processed 1.00 billion rows, 27.00 GB (160.09 million rows/s., 4.32 GB/s.)
```
It can be especially useful when you tries to do GROUP BY lc\_column\_1, lc\_column\_2 and ClickHouseยฎ falls back to serialized algorithm.
### Two LowCardinality Columns in GROUP BY
```
SELECT
app,
sum(clicks)
FROM sessions
GROUP BY app
FORMAT `Null`
Aggregator: Aggregation method: low_cardinality_key_string
MemoryTracker: Peak memory usage (for query): 43.81 MiB.
Elapsed: 0.545 sec. Processed 1.00 billion rows, 5.00 GB (1.83 billion rows/s., 9.17 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM sessions
GROUP BY
app,
platform
FORMAT `Null`
Aggregator: Aggregation method: serialized -- Slowest method!
MemoryTracker: Peak memory usage (for query): 222.86 MiB.
Elapsed: 2.923 sec. Processed 1.00 billion rows, 6.00 GB (342.11 million rows/s., 2.05 GB/s.)
SELECT
CAST(app, 'FixedString(6)') AS app_fx,
CAST(platform, 'FixedString(4)') AS platform_fx,
sum(clicks)
FROM sessions
GROUP BY
app_fx,
platform_fx
FORMAT `Null`
Aggregator: Aggregation method: keys128
MemoryTracker: Peak memory usage (for query): 160.23 MiB.
Elapsed: 1.632 sec. Processed 1.00 billion rows, 6.00 GB (612.63 million rows/s., 3.68 GB/s.)
```
### Split your query in multiple smaller queries and execute them one BY one
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE (cityHash64(user_id) % 4) = 0
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 8.16 GiB.
Elapsed: 2.910 sec. Processed 1.00 billion rows, 27.00 GB (343.64 million rows/s., 9.28 GB/s.)
```
### Shard your data by one of common high cardinal GROUP BY key
So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables.
Letโs create 2 distributed tables with different distribution: rand() and by user\_id
```
CREATE TABLE sessions_distributed AS sessions
ENGINE = Distributed('distr-groupby', default, sessions, rand());
INSERT INTO sessions_distributed WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000);
CREATE TABLE sessions_2 ON CLUSTER 'distr-groupby'
(
`app` LowCardinality(String),
`user_id` String,
`created_at` DateTime,
`platform` LowCardinality(String),
`clicks` UInt32,
`session_id` UUID
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (app, user_id, session_id, created_at);
CREATE TABLE sessions_distributed_2 AS sessions
ENGINE = Distributed('distr-groupby', default, sessions_2, cityHash64(user_id));
INSERT INTO sessions_distributed_2 WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000);
```
```
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed
GROUP BY user_id
)
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 12.02 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB.
MemoryTracker: Peak memory usage (for query): 12.20 GiB.
12 rows in set. Elapsed: 28.345 sec. Processed 1.00 billion rows, 32.00 GB (35.28 million rows/s., 1.13 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.61 GiB.
12 rows in set. Elapsed: 11.952 sec. Processed 1.00 billion rows, 32.00 GB (83.66 million rows/s., 2.68 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.61 GiB.
12 rows in set. Elapsed: 11.916 sec. Processed 1.00 billion rows, 32.00 GB (83.92 million rows/s., 2.69 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(clicks) as clicks
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform
))
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.55 GiB.
12 rows in set. Elapsed: 9.491 sec. Processed 1.00 billion rows, 32.00 GB (105.36 million rows/s., 3.37 GB/s.)
```
Query with bigger state:
```
SELECT
app,
platform,
sum(last_click) as sum,
max(max_clicks) as max,
min(min_clicks) as min,
max(max_time) as max_time,
min(min_time) as min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed
GROUP BY user_id
)
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 19.95 GiB.
12 rows in set. Elapsed: 34.339 sec. Processed 1.00 billion rows, 32.00 GB (29.12 million rows/s., 932.03 MB/s.)
SELECT
app,
platform,
sum(last_click) as sum,
max(max_clicks) as max,
min(min_clicks) as min,
max(max_time) as max_time,
min(min_time) as min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 13.220 sec. Processed 1.00 billion rows, 32.00 GB (75.64 million rows/s., 2.42 GB/s.)
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 13.361 sec. Processed 1.00 billion rows, 32.00 GB (74.85 million rows/s., 2.40 GB/s.)
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS distributed_group_by_no_merge=2;
MemoryTracker: Peak memory usage (for query): 10.02 GiB.
12 rows in set. Elapsed: 9.789 sec. Processed 1.00 billion rows, 32.00 GB (102.15 million rows/s., 3.27 GB/s.)
SELECT
app,
platform,
sum(sum),
max(max),
min(min),
max(max_time) AS max_time,
min(min_time) AS min_time
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform
))
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 9.525 sec. Processed 1.00 billion rows, 32.00 GB (104.98 million rows/s., 3.36 GB/s.)
```
```
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
MemoryTracker: Peak memory usage (for query): 14.57 GiB.
12 rows in set. Elapsed: 37.730 sec. Processed 1.00 billion rows, 44.01 GB (26.50 million rows/s., 1.17 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1;
MemoryTracker: Peak memory usage (for query): 14.56 GiB.
12 rows in set. Elapsed: 37.792 sec. Processed 1.00 billion rows, 44.01 GB (26.46 million rows/s., 1.16 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS distributed_group_by_no_merge = 2;
MemoryTracker: Peak memory usage (for query): 14.54 GiB.
12 rows in set. Elapsed: 17.762 sec. Processed 1.00 billion rows, 44.01 GB (56.30 million rows/s., 2.48 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(sessions) as sessions
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform))
GROUP BY
app,
platform
MemoryTracker: Peak memory usage (for query): 14.55 GiB.
12 rows in set. Elapsed: 17.574 sec. Processed 1.00 billion rows, 44.01 GB (56.90 million rows/s., 2.50 GB/s.)
```
### Reduce number of threads
Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution.
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
SET max_threads = 2;
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 13.26 GiB.
Elapsed: 62.014 sec. Processed 1.00 billion rows, 27.00 GB (16.13 million rows/s., 435.41 MB/s.)
```
### UNION ALL
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
app,
user_id
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 24.19 GiB.
Elapsed: 5.043 sec. Processed 1.00 billion rows, 27.00 GB (198.29 million rows/s., 5.35 GB/s.)
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Orange'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Red'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Melon'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Blue'
GROUP BY
user_id
FORMAT Null
MemoryTracker: Peak memory usage (for query): 7.99 GiB.
Elapsed: 2.852 sec. Processed 1.00 billion rows, 27.01 GB (350.74 million rows/s., 9.47 GB/s.)
```
### aggregation\_in\_order
```
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE app = 'Orange'
GROUP BY user_id
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 969.33 MiB.
Elapsed: 2.494 sec. Processed 250.09 million rows, 6.75 GB (100.27 million rows/s., 2.71 GB/s.)
SET optimize_aggregation_in_order = 1;
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE app = 'Orange'
GROUP BY
app,
user_id
FORMAT `Null`
AggregatingInOrderTransform: Aggregating in order
MemoryTracker: Peak memory usage (for query): 169.24 MiB.
Elapsed: 4.925 sec. Processed 250.09 million rows, 6.75 GB (50.78 million rows/s., 1.37 GB/s.)
```
### Reduce dimensions from GROUP BY with functions like sumMap, \*Resample
One
```
SELECT
user_id,
toDate(created_at) AS day,
sum(clicks)
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue'))
GROUP BY
user_id,
day
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 50.74 GiB.
Elapsed: 22.671 sec. Processed 594.39 million rows, 18.46 GB (26.22 million rows/s., 814.41 MB/s.)
SELECT
user_id,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
SELECT
user_id,
sumResample(0, 31, 1)(clicks, toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
arrayEnumerate(clicks_arr) AS date_diff
FORMAT `Null`
Peak memory usage (for query): 8.24 GiB.
Elapsed: 5.191 sec. Processed 594.39 million rows, 18.46 GB (114.50 million rows/s., 3.56 GB/s.)
```
Multiple
```
SELECT
user_id,
platform,
toDate(created_at) AS day,
sum(clicks)
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) AND user_id ='UID: 08525196'
GROUP BY
user_id,
platform,
day
ORDER BY user_id,
day,
platform
FORMAT `Null`
Peak memory usage (for query): 29.50 GiB.
Elapsed: 8.181 sec. Processed 198.14 million rows, 6.34 GB (24.22 million rows/s., 775.14 MB/s.)
WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res
SELECT
user_id,
CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks_res.1 AS clicks
FROM
(
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks_arr_lvl_2,
range(31) AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.92 GiB.
Elapsed: 4.170 sec. Processed 198.14 million rows, 6.34 GB (47.52 million rows/s., 1.52 GB/s.)
WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res
SELECT
user_id,
CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks_res.1 AS clicks
FROM
(
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks_arr_lvl_2,
range(31) AS date_diff
WHERE clicks > 0
FORMAT `Null`
Peak memory usage (for query): 10.14 GiB.
Elapsed: 9.533 sec. Processed 198.14 million rows, 6.34 GB (20.78 million rows/s., 665.20 MB/s.)
SELECT
user_id,
CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
WITH
(SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr,
(SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr
SELECT
user_id,
flatten(sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01'))) AS clicks_arr,
platform_arr,
days_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
platform_arr AS plat,
days_arr AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.95 GiB.
Elapsed: 3.095 sec. Processed 198.14 million rows, 6.34 GB (64.02 million rows/s., 2.05 GB/s.)
SELECT
user_id,
CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
WITH
(SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr,
(SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr,
arrayFilter(x -> ((x.1) > 0), arrayZip(flatten(clicks_arr), platform_arr, days_arr)) AS result
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
result.1 AS clicks,
result.2 AS plat,
result.3 AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.93 GiB.
Elapsed: 4.717 sec. Processed 198.14 million rows, 6.34 GB (42.00 million rows/s., 1.34 GB/s.)
SELECT
user_id,
CAST(range % 3, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)') AS platform,
toDate('2021-10-01') + intDiv(range, 3) AS day,
clicks
FROM
(
WITH (
SELECT range(93)
) AS range_arr
SELECT
user_id,
sumResample(0, 93, 1)(clicks, ((toDate(created_at) - toDate('2021-10-01')) * 3) + toUInt8(CAST(platform, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)'))) AS clicks_arr,
range_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
range_arr AS range
FORMAT `Null`
Peak memory usage (for query): 8.24 GiB.
Elapsed: 4.838 sec. Processed 198.14 million rows, 6.36 GB (40.95 million rows/s., 1.31 GB/s.)
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
FORMAT `Null`
Peak memory usage (for query): 5.19 GiB.
0 rows in set. Elapsed: 1.160 sec. Processed 198.14 million rows, 6.34 GB (170.87 million rows/s., 5.47 GB/s.)
```
ARRAY JOIN can be expensive
<https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/>
sumMap, \*Resample
<https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/>
### Play with two-level
Disable:
```
SET group_by_two_level_threshold = 0, group_by_two_level_threshold_bytes = 0;
```
From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly. It can improve query time. <https://github.com/ClickHouse/ClickHouse/pull/33439>
### GROUP BY in external memory
Slow\!
### Use hash function for GROUP BY keys
GROUP BY cityHash64(โxxxxโ)
Can lead to incorrect results as hash functions is not 1 to 1 mapping.
### Performance bugs
<https://github.com/ClickHouse/ClickHouse/issues/15005>
<https://github.com/ClickHouse/ClickHouse/issues/29131>
<https://github.com/ClickHouse/ClickHouse/issues/31120>
<https://github.com/ClickHouse/ClickHouse/issues/35096> Fixed in 22.7
Last modified 2024.07.29: [Site cleanup, mostly minor changes (3e41a19)](https://github.com/Altinity/altinityknowledgebase/commit/3e41a19644b66d46db743db20321bd5b94b545df)
[](https://altinity.com/)
- **PRODUCT**
- [Altinity.Cloud](https://altinity.com/managed-clickhouse/)
- [Support for ClickHouse](https://altinity.com/clickhouse-support/)
- [Training for ClickHouse](https://altinity.com/clickhouse-training)
- [Altinity.Cloud Pricing](https://altinity.com/clickhouse-pricing/)
- [Altinity Plans and Features](https://altinity.com/plans-and-features-clickhouse/)

- **RESOURCES**
- [Blog](https://altinity.com/blog)
- [Documentation](https://docs.altinity.com/)
- [Knowledge Base](https://kb.altinity.com/)
- [Altinity Stable Builds](https://altinity.com/releases)
- [Kubernetes Operator](https://hubs.la/Q02pLTV20)
- [Altinity Open Source Projects](https://altinity.com/ecosystem/)
- [Events](https://altinity.com/events/)
- **COMPANY**
- [About Altinity](https://altinity.com/about-us/)
- [Press Releases](https://altinity.com/about-us/press-releases)
- [Partners](https://altinity.com/partners/)
- [Customer Stories](https://altinity.com/customer-stories/)
- [Careers](https://altinity.com/careers/)
- [Contact Us](https://altinity.com/contact/)
Get the latest ClickHouse news straight to your inbox every month.
[Sign Up](https://altinity.com/newsletter/)
ยฉ 2026 Altinity Inc. Altinityยฎ, Altinity.Cloudยฎ, and Altinity Stableยฎ are registered trademarks of Altinity, Inc. ClickHouseยฎ is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kafka, Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners. All Rights Reserved
[Privacy Policy](https://altinity.com/privacy-policy/)
 |
| Readable Markdown | Tricks for GROUP BY memory usage optimization
## Tricks
Testing dataset
```
CREATE TABLE sessions
(
`app` LowCardinality(String),
`user_id` String,
`created_at` DateTime,
`platform` LowCardinality(String),
`clicks` UInt32,
`session_id` UUID
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (app, user_id, session_id, created_at)
INSERT INTO sessions WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000)
0 rows in set. Elapsed: 46.078 sec. Processed 1.00 billion rows, 8.00 GB (21.70 million rows/s., 173.62 MB/s.)
โโdatabaseโโฌโtableโโโโโฌโcolumnโโโโโโฌโtypeโโโโโโโโโโโโโโโโโโโโฌโโโโโโโrowsโโฌโcompressed_bytesโโฌโcompressedโโฌโuncompressedโโฌโโโโโโโโโโโโโโratioโโฌโcodecโโ
โ default โ sessions โ session_id โ UUID โ 1000000000 โ 16065918103 โ 14.96 GiB โ 14.90 GiB โ 0.9958970223439835 โ โ
โ default โ sessions โ user_id โ String โ 1000000000 โ 3056977462 โ 2.85 GiB โ 13.04 GiB โ 4.57968701896828 โ โ
โ default โ sessions โ clicks โ UInt32 โ 1000000000 โ 1859359032 โ 1.73 GiB โ 3.73 GiB โ 2.151278979023993 โ โ
โ default โ sessions โ created_at โ DateTime โ 1000000000 โ 1332089630 โ 1.24 GiB โ 3.73 GiB โ 3.0028009451586226 โ โ
โ default โ sessions โ platform โ LowCardinality(String) โ 1000000000 โ 329702248 โ 314.43 MiB โ 956.63 MiB โ 3.042446801879252 โ โ
โ default โ sessions โ app โ LowCardinality(String) โ 1000000000 โ 4825544 โ 4.60 MiB โ 956.63 MiB โ 207.87333386660654 โ โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโ
```
All queries and datasets are unique, so in different situations different hacks could work better or worse.
### PreFilter values before GROUP BY
```
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE created_at > '2021-11-01 00:00:00'
GROUP BY user_id
HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat')
FORMAT `Null`
<Debug> MemoryTracker: Peak memory usage (for query): 18.36 GiB.
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE user_id IN (
SELECT user_id
FROM sessions
WHERE (platform = 'Rat') AND (clicks = 16) AND (created_at > '2021-11-01 00:00:00') -- So we will select user_id which could potentially match our HAVING clause in OUTER query.
) AND (created_at > '2021-11-01 00:00:00')
GROUP BY user_id
HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat')
FORMAT `Null`
<Debug> MemoryTracker: Peak memory usage (for query): 4.43 GiB.
```
### Use Fixed-width data types instead of String
For example, you have 2 strings which has values in special form like this
โABX 1412312312313โ
You can just remove the first 4 characters and convert the rest to UInt64
toUInt64(substr(โABX 1412312312313โ,5))
And you packed 17 bytes in 8, more than 2 times the improvement of size\!
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
Aggregator: Aggregation method: serialized
<Debug> MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
WITH
CAST(user_id, 'FixedString(14)') AS user_fx,
CAST(platform, 'FixedString(4)') AS platform_fx
SELECT
user_fx,
sum(clicks)
FROM sessions
GROUP BY
user_fx,
platform_fx
FORMAT `Null`
Aggregator: Aggregation method: keys256
MemoryTracker: Peak memory usage (for query): 22.24 GiB.
Elapsed: 6.637 sec. Processed 1.00 billion rows, 27.00 GB (150.67 million rows/s., 4.07 GB/s.)
WITH
CAST(user_id, 'FixedString(14)') AS user_fx,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_fx,
sum(clicks)
FROM sessions
GROUP BY
user_fx,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys128
MemoryTracker: Peak memory usage (for query): 14.14 GiB.
Elapsed: 5.335 sec. Processed 1.00 billion rows, 27.00 GB (187.43 million rows/s., 5.06 GB/s.)
WITH
toUInt32OrZero(trim( LEADING '0' FROM substr(user_id,6))) AS user_int,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_int,
sum(clicks)
FROM sessions
GROUP BY
user_int,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys64
MemoryTracker: Peak memory usage (for query): 10.14 GiB.
Elapsed: 8.549 sec. Processed 1.00 billion rows, 27.00 GB (116.97 million rows/s., 3.16 GB/s.)
WITH
toUInt32('1' || substr(user_id,6)) AS user_int,
CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum
SELECT
user_int,
sum(clicks)
FROM sessions
GROUP BY
user_int,
platform_enum
FORMAT `Null`
Aggregator: Aggregation method: keys64
Peak memory usage (for query): 10.14 GiB.
Elapsed: 6.247 sec. Processed 1.00 billion rows, 27.00 GB (160.09 million rows/s., 4.32 GB/s.)
```
It can be especially useful when you tries to do GROUP BY lc\_column\_1, lc\_column\_2 and ClickHouseยฎ falls back to serialized algorithm.
### Two LowCardinality Columns in GROUP BY
```
SELECT
app,
sum(clicks)
FROM sessions
GROUP BY app
FORMAT `Null`
Aggregator: Aggregation method: low_cardinality_key_string
MemoryTracker: Peak memory usage (for query): 43.81 MiB.
Elapsed: 0.545 sec. Processed 1.00 billion rows, 5.00 GB (1.83 billion rows/s., 9.17 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM sessions
GROUP BY
app,
platform
FORMAT `Null`
Aggregator: Aggregation method: serialized -- Slowest method!
MemoryTracker: Peak memory usage (for query): 222.86 MiB.
Elapsed: 2.923 sec. Processed 1.00 billion rows, 6.00 GB (342.11 million rows/s., 2.05 GB/s.)
SELECT
CAST(app, 'FixedString(6)') AS app_fx,
CAST(platform, 'FixedString(4)') AS platform_fx,
sum(clicks)
FROM sessions
GROUP BY
app_fx,
platform_fx
FORMAT `Null`
Aggregator: Aggregation method: keys128
MemoryTracker: Peak memory usage (for query): 160.23 MiB.
Elapsed: 1.632 sec. Processed 1.00 billion rows, 6.00 GB (612.63 million rows/s., 3.68 GB/s.)
```
### Split your query in multiple smaller queries and execute them one BY one
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE (cityHash64(user_id) % 4) = 0
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 8.16 GiB.
Elapsed: 2.910 sec. Processed 1.00 billion rows, 27.00 GB (343.64 million rows/s., 9.28 GB/s.)
```
### Shard your data by one of common high cardinal GROUP BY key
So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables.
Letโs create 2 distributed tables with different distribution: rand() and by user\_id
```
CREATE TABLE sessions_distributed AS sessions
ENGINE = Distributed('distr-groupby', default, sessions, rand());
INSERT INTO sessions_distributed WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000);
CREATE TABLE sessions_2 ON CLUSTER 'distr-groupby'
(
`app` LowCardinality(String),
`user_id` String,
`created_at` DateTime,
`platform` LowCardinality(String),
`clicks` UInt32,
`session_id` UUID
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (app, user_id, session_id, created_at);
CREATE TABLE sessions_distributed_2 AS sessions
ENGINE = Distributed('distr-groupby', default, sessions_2, cityHash64(user_id));
INSERT INTO sessions_distributed_2 WITH
CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app,
concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id,
toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at,
CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform,
number % 17 AS clicks,
generateUUIDv4() AS session_id
SELECT
app,
user_id,
created_at,
platform,
clicks,
session_id
FROM numbers_mt(1000000000);
```
```
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed
GROUP BY user_id
)
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 12.02 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB.
MemoryTracker: Peak memory usage (for query): 12.20 GiB.
12 rows in set. Elapsed: 28.345 sec. Processed 1.00 billion rows, 32.00 GB (35.28 million rows/s., 1.13 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.61 GiB.
12 rows in set. Elapsed: 11.952 sec. Processed 1.00 billion rows, 32.00 GB (83.66 million rows/s., 2.68 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.61 GiB.
12 rows in set. Elapsed: 11.916 sec. Processed 1.00 billion rows, 32.00 GB (83.92 million rows/s., 2.69 GB/s.)
SELECT
app,
platform,
sum(clicks)
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(clicks) as clicks
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS clicks
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform
))
GROUP BY
app,
platform;
[chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
[chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB.
MemoryTracker: Peak memory usage (for query): 5.55 GiB.
12 rows in set. Elapsed: 9.491 sec. Processed 1.00 billion rows, 32.00 GB (105.36 million rows/s., 3.37 GB/s.)
```
Query with bigger state:
```
SELECT
app,
platform,
sum(last_click) as sum,
max(max_clicks) as max,
min(min_clicks) as min,
max(max_time) as max_time,
min(min_time) as min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed
GROUP BY user_id
)
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 19.95 GiB.
12 rows in set. Elapsed: 34.339 sec. Processed 1.00 billion rows, 32.00 GB (29.12 million rows/s., 932.03 MB/s.)
SELECT
app,
platform,
sum(last_click) as sum,
max(max_clicks) as max,
min(min_clicks) as min,
max(max_time) as max_time,
min(min_time) as min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 13.220 sec. Processed 1.00 billion rows, 32.00 GB (75.64 million rows/s., 2.42 GB/s.)
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 13.361 sec. Processed 1.00 billion rows, 32.00 GB (74.85 million rows/s., 2.40 GB/s.)
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS distributed_group_by_no_merge=2;
MemoryTracker: Peak memory usage (for query): 10.02 GiB.
12 rows in set. Elapsed: 9.789 sec. Processed 1.00 billion rows, 32.00 GB (102.15 million rows/s., 3.27 GB/s.)
SELECT
app,
platform,
sum(sum),
max(max),
min(min),
max(max_time) AS max_time,
min(min_time) AS min_time
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(last_click) AS sum,
max(max_clicks) AS max,
min(min_clicks) AS min,
max(max_time) AS max_time,
min(min_time) AS min_time
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
argMax(clicks, created_at) AS last_click,
max(clicks) AS max_clicks,
min(clicks) AS min_clicks,
max(created_at) AS max_time,
min(created_at) AS min_time
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform
))
GROUP BY
app,
platform;
MemoryTracker: Peak memory usage (for query): 10.09 GiB.
12 rows in set. Elapsed: 9.525 sec. Processed 1.00 billion rows, 32.00 GB (104.98 million rows/s., 3.36 GB/s.)
```
```
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
MemoryTracker: Peak memory usage (for query): 14.57 GiB.
12 rows in set. Elapsed: 37.730 sec. Processed 1.00 billion rows, 44.01 GB (26.50 million rows/s., 1.17 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS optimize_distributed_group_by_sharding_key = 1;
MemoryTracker: Peak memory usage (for query): 14.56 GiB.
12 rows in set. Elapsed: 37.792 sec. Processed 1.00 billion rows, 44.01 GB (26.46 million rows/s., 1.16 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_distributed_2
GROUP BY user_id
)
GROUP BY
app,
platform
SETTINGS distributed_group_by_no_merge = 2;
MemoryTracker: Peak memory usage (for query): 14.54 GiB.
12 rows in set. Elapsed: 17.762 sec. Processed 1.00 billion rows, 44.01 GB (56.30 million rows/s., 2.48 GB/s.)
SELECT
app,
platform,
sum(sessions)
FROM cluster('distr-groupby', view(
SELECT
app,
platform,
sum(sessions) as sessions
FROM
(
SELECT
argMax(app, created_at) AS app,
argMax(platform, created_at) AS platform,
user_id,
uniq(session_id) as sessions
FROM sessions_2
GROUP BY user_id
)
GROUP BY
app,
platform))
GROUP BY
app,
platform
MemoryTracker: Peak memory usage (for query): 14.55 GiB.
12 rows in set. Elapsed: 17.574 sec. Processed 1.00 billion rows, 44.01 GB (56.90 million rows/s., 2.50 GB/s.)
```
### Reduce number of threads
Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution.
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 28.19 GiB.
Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.)
SET max_threads = 2;
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
user_id,
platform
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 13.26 GiB.
Elapsed: 62.014 sec. Processed 1.00 billion rows, 27.00 GB (16.13 million rows/s., 435.41 MB/s.)
```
### UNION ALL
```
SELECT
user_id,
sum(clicks)
FROM sessions
GROUP BY
app,
user_id
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 24.19 GiB.
Elapsed: 5.043 sec. Processed 1.00 billion rows, 27.00 GB (198.29 million rows/s., 5.35 GB/s.)
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Orange'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Red'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Melon'
GROUP BY
user_id
UNION ALL
SELECT
user_id,
sum(clicks)
FROM sessions WHERE app = 'Blue'
GROUP BY
user_id
FORMAT Null
MemoryTracker: Peak memory usage (for query): 7.99 GiB.
Elapsed: 2.852 sec. Processed 1.00 billion rows, 27.01 GB (350.74 million rows/s., 9.47 GB/s.)
```
### aggregation\_in\_order
```
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE app = 'Orange'
GROUP BY user_id
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 969.33 MiB.
Elapsed: 2.494 sec. Processed 250.09 million rows, 6.75 GB (100.27 million rows/s., 2.71 GB/s.)
SET optimize_aggregation_in_order = 1;
SELECT
user_id,
sum(clicks)
FROM sessions
WHERE app = 'Orange'
GROUP BY
app,
user_id
FORMAT `Null`
AggregatingInOrderTransform: Aggregating in order
MemoryTracker: Peak memory usage (for query): 169.24 MiB.
Elapsed: 4.925 sec. Processed 250.09 million rows, 6.75 GB (50.78 million rows/s., 1.37 GB/s.)
```
### Reduce dimensions from GROUP BY with functions like sumMap, \*Resample
One
```
SELECT
user_id,
toDate(created_at) AS day,
sum(clicks)
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue'))
GROUP BY
user_id,
day
FORMAT `Null`
MemoryTracker: Peak memory usage (for query): 50.74 GiB.
Elapsed: 22.671 sec. Processed 594.39 million rows, 18.46 GB (26.22 million rows/s., 814.41 MB/s.)
SELECT
user_id,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
SELECT
user_id,
sumResample(0, 31, 1)(clicks, toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
arrayEnumerate(clicks_arr) AS date_diff
FORMAT `Null`
Peak memory usage (for query): 8.24 GiB.
Elapsed: 5.191 sec. Processed 594.39 million rows, 18.46 GB (114.50 million rows/s., 3.56 GB/s.)
```
Multiple
```
SELECT
user_id,
platform,
toDate(created_at) AS day,
sum(clicks)
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) AND user_id ='UID: 08525196'
GROUP BY
user_id,
platform,
day
ORDER BY user_id,
day,
platform
FORMAT `Null`
Peak memory usage (for query): 29.50 GiB.
Elapsed: 8.181 sec. Processed 198.14 million rows, 6.34 GB (24.22 million rows/s., 775.14 MB/s.)
WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res
SELECT
user_id,
CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks_res.1 AS clicks
FROM
(
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks_arr_lvl_2,
range(31) AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.92 GiB.
Elapsed: 4.170 sec. Processed 198.14 million rows, 6.34 GB (47.52 million rows/s., 1.52 GB/s.)
WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res
SELECT
user_id,
CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks_res.1 AS clicks
FROM
(
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks_arr_lvl_2,
range(31) AS date_diff
WHERE clicks > 0
FORMAT `Null`
Peak memory usage (for query): 10.14 GiB.
Elapsed: 9.533 sec. Processed 198.14 million rows, 6.34 GB (20.78 million rows/s., 665.20 MB/s.)
SELECT
user_id,
CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
WITH
(SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr,
(SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr
SELECT
user_id,
flatten(sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01'))) AS clicks_arr,
platform_arr,
days_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
platform_arr AS plat,
days_arr AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.95 GiB.
Elapsed: 3.095 sec. Processed 198.14 million rows, 6.34 GB (64.02 million rows/s., 2.05 GB/s.)
SELECT
user_id,
CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform,
(toDate('2021-10-01') + date_diff) - 1 AS day,
clicks
FROM
(
WITH
(SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr,
(SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr,
arrayFilter(x -> ((x.1) > 0), arrayZip(flatten(clicks_arr), platform_arr, days_arr)) AS result
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
result.1 AS clicks,
result.2 AS plat,
result.3 AS date_diff
FORMAT `Null`
Peak memory usage (for query): 9.93 GiB.
Elapsed: 4.717 sec. Processed 198.14 million rows, 6.34 GB (42.00 million rows/s., 1.34 GB/s.)
SELECT
user_id,
CAST(range % 3, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)') AS platform,
toDate('2021-10-01') + intDiv(range, 3) AS day,
clicks
FROM
(
WITH (
SELECT range(93)
) AS range_arr
SELECT
user_id,
sumResample(0, 93, 1)(clicks, ((toDate(created_at) - toDate('2021-10-01')) * 3) + toUInt8(CAST(platform, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)'))) AS clicks_arr,
range_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
)
ARRAY JOIN
clicks_arr AS clicks,
range_arr AS range
FORMAT `Null`
Peak memory usage (for query): 8.24 GiB.
Elapsed: 4.838 sec. Processed 198.14 million rows, 6.36 GB (40.95 million rows/s., 1.31 GB/s.)
SELECT
user_id,
sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr
FROM sessions
WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange'))
GROUP BY user_id
FORMAT `Null`
Peak memory usage (for query): 5.19 GiB.
0 rows in set. Elapsed: 1.160 sec. Processed 198.14 million rows, 6.34 GB (170.87 million rows/s., 5.47 GB/s.)
```
ARRAY JOIN can be expensive
<https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/>
sumMap, \*Resample
<https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/>
### Play with two-level
Disable:
```
SET group_by_two_level_threshold = 0, group_by_two_level_threshold_bytes = 0;
```
From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly. It can improve query time. <https://github.com/ClickHouse/ClickHouse/pull/33439>
### GROUP BY in external memory
Slow\!
### Use hash function for GROUP BY keys
GROUP BY cityHash64(โxxxxโ)
Can lead to incorrect results as hash functions is not 1 to 1 mapping.
### Performance bugs
<https://github.com/ClickHouse/ClickHouse/issues/15005>
<https://github.com/ClickHouse/ClickHouse/issues/29131>
<https://github.com/ClickHouse/ClickHouse/issues/31120>
<https://github.com/ClickHouse/ClickHouse/issues/35096> Fixed in 22.7 |
| Shard | 195 (laksa) |
| Root Hash | 3533317439195346595 |
| Unparsed URL | com,altinity!kb,/altinity-kb-queries-and-syntax/group-by/tricks/ s443 |