๐Ÿ•ท๏ธ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 195 (from laksa165)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

โ„น๏ธ Skipped - page is already crawled

๐Ÿ“„
INDEXABLE
โœ…
CRAWLED
8 days ago
๐Ÿค–
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0.3 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/
Last Crawled2026-04-06 07:16:24 (8 days ago)
First Indexed2022-06-23 07:53:42 (3 years ago)
HTTP Status Code200
Meta TitleGROUP BY tricks | Altinityยฎ Knowledge Base for ClickHouseยฎ
Meta DescriptionTricks for GROUP BY memory usage optimization
Meta Canonicalnull
Boilerpipe Text
Tricks for GROUP BY memory usage optimization Tricks Testing dataset CREATE TABLE sessions ( ` app ` LowCardinality ( String ), ` user_id ` String , ` created_at ` DateTime , ` platform ` LowCardinality ( String ), ` clicks ` UInt32 , ` session_id ` UUID ) ENGINE = MergeTree PARTITION BY toYYYYMM ( created_at ) ORDER BY ( app , user_id , session_id , created_at ) INSERT INTO sessions WITH CAST ( number % 4 , 'Enum8(\' Orange \ ' = 0, \' Melon \ ' = 1, \' Red \ ' = 2, \' Blue \ ' = 3)' ) AS app , concat ( 'UID: ' , leftPad ( toString ( number % 20000000 ), 8 , '0' )) AS user_id , toDateTime ( '2021-10-01 10:11:12' ) + ( number / 300 ) AS created_at , CAST (( number + 14 ) % 3 , 'Enum8(\' Bat \ ' = 0, \' Mice \ ' = 1, \' Rat \ ' = 2)' ) AS platform , number % 17 AS clicks , generateUUIDv4 () AS session_id SELECT app , user_id , created_at , platform , clicks , session_id FROM numbers_mt ( 1000000000 ) 0 rows in set . Elapsed : 46 . 078 sec . Processed 1 . 00 billion rows , 8 . 00 GB ( 21 . 70 million rows / s ., 173 . 62 MB / s .) โ”Œโ”€ database โ”€โ”ฌโ”€ table โ”€โ”€โ”€โ”€โ”ฌโ”€ column โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€ type โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€ rows โ”€โ”ฌโ”€ compressed_bytes โ”€โ”ฌโ”€ compressed โ”€โ”ฌโ”€ uncompressed โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ratio โ”€โ”ฌโ”€ codec โ”€โ” โ”‚ default โ”‚ sessions โ”‚ session_id โ”‚ UUID โ”‚ 1000000000 โ”‚ 16065918103 โ”‚ 14 . 96 GiB โ”‚ 14 . 90 GiB โ”‚ 0 . 9958970223439835 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ user_id โ”‚ String โ”‚ 1000000000 โ”‚ 3056977462 โ”‚ 2 . 85 GiB โ”‚ 13 . 04 GiB โ”‚ 4 . 57968701896828 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ clicks โ”‚ UInt32 โ”‚ 1000000000 โ”‚ 1859359032 โ”‚ 1 . 73 GiB โ”‚ 3 . 73 GiB โ”‚ 2 . 151278979023993 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ created_at โ”‚ DateTime โ”‚ 1000000000 โ”‚ 1332089630 โ”‚ 1 . 24 GiB โ”‚ 3 . 73 GiB โ”‚ 3 . 0028009451586226 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ platform โ”‚ LowCardinality ( String ) โ”‚ 1000000000 โ”‚ 329702248 โ”‚ 314 . 43 MiB โ”‚ 956 . 63 MiB โ”‚ 3 . 042446801879252 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ app โ”‚ LowCardinality ( String ) โ”‚ 1000000000 โ”‚ 4825544 โ”‚ 4 . 60 MiB โ”‚ 956 . 63 MiB โ”‚ 207 . 87333386660654 โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ All queries and datasets are unique, so in different situations different hacks could work better or worse. PreFilter values before GROUP BY SELECT user_id , sum ( clicks ) FROM sessions WHERE created_at > '2021-11-01 00:00:00' GROUP BY user_id HAVING ( argMax ( clicks , created_at ) = 16 ) AND ( argMax ( platform , created_at ) = 'Rat' ) FORMAT ` Null ` < Debug > MemoryTracker : Peak memory usage ( for query ): 18 . 36 GiB . SELECT user_id , sum ( clicks ) FROM sessions WHERE user_id IN ( SELECT user_id FROM sessions WHERE ( platform = 'Rat' ) AND ( clicks = 16 ) AND ( created_at > '2021-11-01 00:00:00' ) -- So we will select user_id which could potentially match our HAVING clause in OUTER query. ) AND ( created_at > '2021-11-01 00:00:00' ) GROUP BY user_id HAVING ( argMax ( clicks , created_at ) = 16 ) AND ( argMax ( platform , created_at ) = 'Rat' ) FORMAT ` Null ` < Debug > MemoryTracker : Peak memory usage ( for query ): 4 . 43 GiB . Use Fixed-width data types instead of String For example, you have 2 strings which has values in special form like this โ€˜ABX 1412312312313โ€™ You can just remove the first 4 characters and convert the rest to UInt64 toUInt64(substr(โ€˜ABX 1412312312313โ€™,5)) And you packed 17 bytes in 8, more than 2 times the improvement of size! SELECT user_id , sum ( clicks ) FROM sessions GROUP BY user_id , platform FORMAT ` Null ` Aggregator : Aggregation method : serialized < Debug > MemoryTracker : Peak memory usage ( for query ): 28 . 19 GiB . Elapsed : 7 . 375 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 135 . 60 million rows / s ., 3 . 66 GB / s .) WITH CAST ( user_id , 'FixedString(14)' ) AS user_fx , CAST ( platform , 'FixedString(4)' ) AS platform_fx SELECT user_fx , sum ( clicks ) FROM sessions GROUP BY user_fx , platform_fx FORMAT ` Null ` Aggregator : Aggregation method : keys256 MemoryTracker : Peak memory usage ( for query ): 22 . 24 GiB . Elapsed : 6 . 637 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 150 . 67 million rows / s ., 4 . 07 GB / s .) WITH CAST ( user_id , 'FixedString(14)' ) AS user_fx , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 0)' ) AS platform_enum SELECT user_fx , sum ( clicks ) FROM sessions GROUP BY user_fx , platform_enum FORMAT ` Null ` Aggregator : Aggregation method : keys128 MemoryTracker : Peak memory usage ( for query ): 14 . 14 GiB . Elapsed : 5 . 335 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 187 . 43 million rows / s ., 5 . 06 GB / s .) WITH toUInt32OrZero ( trim ( LEADING '0' FROM substr ( user_id , 6 ))) AS user_int , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 0)' ) AS platform_enum SELECT user_int , sum ( clicks ) FROM sessions GROUP BY user_int , platform_enum FORMAT ` Null ` Aggregator : Aggregation method : keys64 MemoryTracker : Peak memory usage ( for query ): 10 . 14 GiB . Elapsed : 8 . 549 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 116 . 97 million rows / s ., 3 . 16 GB / s .) WITH toUInt32 ( '1' || substr ( user_id , 6 )) AS user_int , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 0)' ) AS platform_enum SELECT user_int , sum ( clicks ) FROM sessions GROUP BY user_int , platform_enum FORMAT ` Null ` Aggregator : Aggregation method : keys64 Peak memory usage ( for query ): 10 . 14 GiB . Elapsed : 6 . 247 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 160 . 09 million rows / s ., 4 . 32 GB / s .) It can be especially useful when you tries to do GROUP BY lc_column_1, lc_column_2 and ClickHouseยฎ falls back to serialized algorithm. Two LowCardinality Columns in GROUP BY SELECT app , sum ( clicks ) FROM sessions GROUP BY app FORMAT ` Null ` Aggregator : Aggregation method : low_cardinality_key_string MemoryTracker : Peak memory usage ( for query ): 43 . 81 MiB . Elapsed : 0 . 545 sec . Processed 1 . 00 billion rows , 5 . 00 GB ( 1 . 83 billion rows / s ., 9 . 17 GB / s .) SELECT app , platform , sum ( clicks ) FROM sessions GROUP BY app , platform FORMAT ` Null ` Aggregator : Aggregation method : serialized -- Slowest method! MemoryTracker : Peak memory usage ( for query ): 222 . 86 MiB . Elapsed : 2 . 923 sec . Processed 1 . 00 billion rows , 6 . 00 GB ( 342 . 11 million rows / s ., 2 . 05 GB / s .) SELECT CAST ( app , 'FixedString(6)' ) AS app_fx , CAST ( platform , 'FixedString(4)' ) AS platform_fx , sum ( clicks ) FROM sessions GROUP BY app_fx , platform_fx FORMAT ` Null ` Aggregator : Aggregation method : keys128 MemoryTracker : Peak memory usage ( for query ): 160 . 23 MiB . Elapsed : 1 . 632 sec . Processed 1 . 00 billion rows , 6 . 00 GB ( 612 . 63 million rows / s ., 3 . 68 GB / s .) Split your query in multiple smaller queries and execute them one BY one SELECT user_id , sum ( clicks ) FROM sessions GROUP BY user_id , platform FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 28 . 19 GiB . Elapsed : 7 . 375 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 135 . 60 million rows / s ., 3 . 66 GB / s .) SELECT user_id , sum ( clicks ) FROM sessions WHERE ( cityHash64 ( user_id ) % 4 ) = 0 GROUP BY user_id , platform FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 8 . 16 GiB . Elapsed : 2 . 910 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 343 . 64 million rows / s ., 9 . 28 GB / s .) Shard your data by one of common high cardinal GROUP BY key So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables. Letโ€™s create 2 distributed tables with different distribution: rand() and by user_id CREATE TABLE sessions_distributed AS sessions ENGINE = Distributed ( 'distr-groupby' , default , sessions , rand ()); INSERT INTO sessions_distributed WITH CAST ( number % 4 , 'Enum8(\' Orange \ ' = 0, \' Melon \ ' = 1, \' Red \ ' = 2, \' Blue \ ' = 3)' ) AS app , concat ( 'UID: ' , leftPad ( toString ( number % 20000000 ), 8 , '0' )) AS user_id , toDateTime ( '2021-10-01 10:11:12' ) + ( number / 300 ) AS created_at , CAST (( number + 14 ) % 3 , 'Enum8(\' Bat \ ' = 0, \' Mice \ ' = 1, \' Rat \ ' = 2)' ) AS platform , number % 17 AS clicks , generateUUIDv4 () AS session_id SELECT app , user_id , created_at , platform , clicks , session_id FROM numbers_mt ( 1000000000 ); CREATE TABLE sessions_2 ON CLUSTER 'distr-groupby' ( ` app ` LowCardinality ( String ), ` user_id ` String , ` created_at ` DateTime , ` platform ` LowCardinality ( String ), ` clicks ` UInt32 , ` session_id ` UUID ) ENGINE = MergeTree PARTITION BY toYYYYMM ( created_at ) ORDER BY ( app , user_id , session_id , created_at ); CREATE TABLE sessions_distributed_2 AS sessions ENGINE = Distributed ( 'distr-groupby' , default , sessions_2 , cityHash64 ( user_id )); INSERT INTO sessions_distributed_2 WITH CAST ( number % 4 , 'Enum8(\' Orange \ ' = 0, \' Melon \ ' = 1, \' Red \ ' = 2, \' Blue \ ' = 3)' ) AS app , concat ( 'UID: ' , leftPad ( toString ( number % 20000000 ), 8 , '0' )) AS user_id , toDateTime ( '2021-10-01 10:11:12' ) + ( number / 300 ) AS created_at , CAST (( number + 14 ) % 3 , 'Enum8(\' Bat \ ' = 0, \' Mice \ ' = 1, \' Rat \ ' = 2)' ) AS platform , number % 17 AS clicks , generateUUIDv4 () AS session_id SELECT app , user_id , created_at , platform , clicks , session_id FROM numbers_mt ( 1000000000 ); SELECT app , platform , sum ( clicks ) FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS clicks FROM sessions_distributed GROUP BY user_id ) GROUP BY app , platform ; [ chi - distr - groupby - distr - groupby - 0 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 12 . 02 GiB . [ chi - distr - groupby - distr - groupby - 1 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 12 . 05 GiB . [ chi - distr - groupby - distr - groupby - 2 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 12 . 05 GiB . MemoryTracker : Peak memory usage ( for query ): 12 . 20 GiB . 12 rows in set . Elapsed : 28 . 345 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 35 . 28 million rows / s ., 1 . 13 GB / s .) SELECT app , platform , sum ( clicks ) FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS clicks FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform ; [ chi - distr - groupby - distr - groupby - 0 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . [ chi - distr - groupby - distr - groupby - 1 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . [ chi - distr - groupby - distr - groupby - 2 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . MemoryTracker : Peak memory usage ( for query ): 5 . 61 GiB . 12 rows in set . Elapsed : 11 . 952 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 83 . 66 million rows / s ., 2 . 68 GB / s .) SELECT app , platform , sum ( clicks ) FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS clicks FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform SETTINGS optimize_distributed_group_by_sharding_key = 1 [ chi - distr - groupby - distr - groupby - 0 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . [ chi - distr - groupby - distr - groupby - 1 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . [ chi - distr - groupby - distr - groupby - 2 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . MemoryTracker : Peak memory usage ( for query ): 5 . 61 GiB . 12 rows in set . Elapsed : 11 . 916 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 83 . 92 million rows / s ., 2 . 69 GB / s .) SELECT app , platform , sum ( clicks ) FROM cluster ( 'distr-groupby' , view ( SELECT app , platform , sum ( clicks ) as clicks FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS clicks FROM sessions_2 GROUP BY user_id ) GROUP BY app , platform )) GROUP BY app , platform ; [ chi - distr - groupby - distr - groupby - 0 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . [ chi - distr - groupby - distr - groupby - 1 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . [ chi - distr - groupby - distr - groupby - 2 - 0 - 0 ] MemoryTracker : Current memory usage ( for query ): 5 . 05 GiB . MemoryTracker : Peak memory usage ( for query ): 5 . 55 GiB . 12 rows in set . Elapsed : 9 . 491 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 105 . 36 million rows / s ., 3 . 37 GB / s .) Query with bigger state: SELECT app , platform , sum ( last_click ) as sum , max ( max_clicks ) as max , min ( min_clicks ) as min , max ( max_time ) as max_time , min ( min_time ) as min_time FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS last_click , max ( clicks ) AS max_clicks , min ( clicks ) AS min_clicks , max ( created_at ) AS max_time , min ( created_at ) AS min_time FROM sessions_distributed GROUP BY user_id ) GROUP BY app , platform ; MemoryTracker : Peak memory usage ( for query ): 19 . 95 GiB . 12 rows in set . Elapsed : 34 . 339 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 29 . 12 million rows / s ., 932 . 03 MB / s .) SELECT app , platform , sum ( last_click ) as sum , max ( max_clicks ) as max , min ( min_clicks ) as min , max ( max_time ) as max_time , min ( min_time ) as min_time FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS last_click , max ( clicks ) AS max_clicks , min ( clicks ) AS min_clicks , max ( created_at ) AS max_time , min ( created_at ) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform ; MemoryTracker : Peak memory usage ( for query ): 10 . 09 GiB . 12 rows in set . Elapsed : 13 . 220 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 75 . 64 million rows / s ., 2 . 42 GB / s .) SELECT app , platform , sum ( last_click ) AS sum , max ( max_clicks ) AS max , min ( min_clicks ) AS min , max ( max_time ) AS max_time , min ( min_time ) AS min_time FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS last_click , max ( clicks ) AS max_clicks , min ( clicks ) AS min_clicks , max ( created_at ) AS max_time , min ( created_at ) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform SETTINGS optimize_distributed_group_by_sharding_key = 1 ; MemoryTracker : Peak memory usage ( for query ): 10 . 09 GiB . 12 rows in set . Elapsed : 13 . 361 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 74 . 85 million rows / s ., 2 . 40 GB / s .) SELECT app , platform , sum ( last_click ) AS sum , max ( max_clicks ) AS max , min ( min_clicks ) AS min , max ( max_time ) AS max_time , min ( min_time ) AS min_time FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS last_click , max ( clicks ) AS max_clicks , min ( clicks ) AS min_clicks , max ( created_at ) AS max_time , min ( created_at ) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform SETTINGS distributed_group_by_no_merge = 2 ; MemoryTracker : Peak memory usage ( for query ): 10 . 02 GiB . 12 rows in set . Elapsed : 9 . 789 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 102 . 15 million rows / s ., 3 . 27 GB / s .) SELECT app , platform , sum ( sum ), max ( max ), min ( min ), max ( max_time ) AS max_time , min ( min_time ) AS min_time FROM cluster ( 'distr-groupby' , view ( SELECT app , platform , sum ( last_click ) AS sum , max ( max_clicks ) AS max , min ( min_clicks ) AS min , max ( max_time ) AS max_time , min ( min_time ) AS min_time FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , argMax ( clicks , created_at ) AS last_click , max ( clicks ) AS max_clicks , min ( clicks ) AS min_clicks , max ( created_at ) AS max_time , min ( created_at ) AS min_time FROM sessions_2 GROUP BY user_id ) GROUP BY app , platform )) GROUP BY app , platform ; MemoryTracker : Peak memory usage ( for query ): 10 . 09 GiB . 12 rows in set . Elapsed : 9 . 525 sec . Processed 1 . 00 billion rows , 32 . 00 GB ( 104 . 98 million rows / s ., 3 . 36 GB / s .) SELECT app , platform , sum ( sessions ) FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , uniq ( session_id ) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform MemoryTracker : Peak memory usage ( for query ): 14 . 57 GiB . 12 rows in set . Elapsed : 37 . 730 sec . Processed 1 . 00 billion rows , 44 . 01 GB ( 26 . 50 million rows / s ., 1 . 17 GB / s .) SELECT app , platform , sum ( sessions ) FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , uniq ( session_id ) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform SETTINGS optimize_distributed_group_by_sharding_key = 1 ; MemoryTracker : Peak memory usage ( for query ): 14 . 56 GiB . 12 rows in set . Elapsed : 37 . 792 sec . Processed 1 . 00 billion rows , 44 . 01 GB ( 26 . 46 million rows / s ., 1 . 16 GB / s .) SELECT app , platform , sum ( sessions ) FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , uniq ( session_id ) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app , platform SETTINGS distributed_group_by_no_merge = 2 ; MemoryTracker : Peak memory usage ( for query ): 14 . 54 GiB . 12 rows in set . Elapsed : 17 . 762 sec . Processed 1 . 00 billion rows , 44 . 01 GB ( 56 . 30 million rows / s ., 2 . 48 GB / s .) SELECT app , platform , sum ( sessions ) FROM cluster ( 'distr-groupby' , view ( SELECT app , platform , sum ( sessions ) as sessions FROM ( SELECT argMax ( app , created_at ) AS app , argMax ( platform , created_at ) AS platform , user_id , uniq ( session_id ) as sessions FROM sessions_2 GROUP BY user_id ) GROUP BY app , platform )) GROUP BY app , platform MemoryTracker : Peak memory usage ( for query ): 14 . 55 GiB . 12 rows in set . Elapsed : 17 . 574 sec . Processed 1 . 00 billion rows , 44 . 01 GB ( 56 . 90 million rows / s ., 2 . 50 GB / s .) Reduce number of threads Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution. SELECT user_id , sum ( clicks ) FROM sessions GROUP BY user_id , platform FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 28 . 19 GiB . Elapsed : 7 . 375 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 135 . 60 million rows / s ., 3 . 66 GB / s .) SET max_threads = 2 ; SELECT user_id , sum ( clicks ) FROM sessions GROUP BY user_id , platform FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 13 . 26 GiB . Elapsed : 62 . 014 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 16 . 13 million rows / s ., 435 . 41 MB / s .) UNION ALL SELECT user_id , sum ( clicks ) FROM sessions GROUP BY app , user_id FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 24 . 19 GiB . Elapsed : 5 . 043 sec . Processed 1 . 00 billion rows , 27 . 00 GB ( 198 . 29 million rows / s ., 5 . 35 GB / s .) SELECT user_id , sum ( clicks ) FROM sessions WHERE app = 'Orange' GROUP BY user_id UNION ALL SELECT user_id , sum ( clicks ) FROM sessions WHERE app = 'Red' GROUP BY user_id UNION ALL SELECT user_id , sum ( clicks ) FROM sessions WHERE app = 'Melon' GROUP BY user_id UNION ALL SELECT user_id , sum ( clicks ) FROM sessions WHERE app = 'Blue' GROUP BY user_id FORMAT Null MemoryTracker : Peak memory usage ( for query ): 7 . 99 GiB . Elapsed : 2 . 852 sec . Processed 1 . 00 billion rows , 27 . 01 GB ( 350 . 74 million rows / s ., 9 . 47 GB / s .) aggregation_in_order SELECT user_id , sum ( clicks ) FROM sessions WHERE app = 'Orange' GROUP BY user_id FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 969 . 33 MiB . Elapsed : 2 . 494 sec . Processed 250 . 09 million rows , 6 . 75 GB ( 100 . 27 million rows / s ., 2 . 71 GB / s .) SET optimize_aggregation_in_order = 1 ; SELECT user_id , sum ( clicks ) FROM sessions WHERE app = 'Orange' GROUP BY app , user_id FORMAT ` Null ` AggregatingInOrderTransform : Aggregating in order MemoryTracker : Peak memory usage ( for query ): 169 . 24 MiB . Elapsed : 4 . 925 sec . Processed 250 . 09 million rows , 6 . 75 GB ( 50 . 78 million rows / s ., 1 . 37 GB / s .) Reduce dimensions from GROUP BY with functions like sumMap, *Resample One SELECT user_id , toDate ( created_at ) AS day , sum ( clicks ) FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' , 'Red' , 'Blue' )) GROUP BY user_id , day FORMAT ` Null ` MemoryTracker : Peak memory usage ( for query ): 50 . 74 GiB . Elapsed : 22 . 671 sec . Processed 594 . 39 million rows , 18 . 46 GB ( 26 . 22 million rows / s ., 814 . 41 MB / s .) SELECT user_id , ( toDate ( '2021-10-01' ) + date_diff ) - 1 AS day , clicks FROM ( SELECT user_id , sumResample ( 0 , 31 , 1 )( clicks , toDate ( created_at ) - toDate ( '2021-10-01' )) AS clicks_arr FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' , 'Red' , 'Blue' )) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks , arrayEnumerate ( clicks_arr ) AS date_diff FORMAT ` Null ` Peak memory usage ( for query ): 8 . 24 GiB . Elapsed : 5 . 191 sec . Processed 594 . 39 million rows , 18 . 46 GB ( 114 . 50 million rows / s ., 3 . 56 GB / s .) Multiple SELECT user_id , platform , toDate ( created_at ) AS day , sum ( clicks ) FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) AND user_id = 'UID: 08525196' GROUP BY user_id , platform , day ORDER BY user_id , day , platform FORMAT ` Null ` Peak memory usage ( for query ): 29 . 50 GiB . Elapsed : 8 . 181 sec . Processed 198 . 14 million rows , 6 . 34 GB ( 24 . 22 million rows / s ., 775 . 14 MB / s .) WITH arrayJoin ( arrayZip ( clicks_arr_lvl_2 , range ( 3 ))) AS clicks_res SELECT user_id , CAST ( clicks_res . 2 + 1 , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ) AS platform , ( toDate ( '2021-10-01' ) + date_diff ) - 1 AS day , clicks_res . 1 AS clicks FROM ( SELECT user_id , sumResampleResample ( 1 , 4 , 1 , 0 , 31 , 1 )( clicks , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ), toDate ( created_at ) - toDate ( '2021-10-01' )) AS clicks_arr FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks_arr_lvl_2 , range ( 31 ) AS date_diff FORMAT ` Null ` Peak memory usage ( for query ): 9 . 92 GiB . Elapsed : 4 . 170 sec . Processed 198 . 14 million rows , 6 . 34 GB ( 47 . 52 million rows / s ., 1 . 52 GB / s .) WITH arrayJoin ( arrayZip ( clicks_arr_lvl_2 , range ( 3 ))) AS clicks_res SELECT user_id , CAST ( clicks_res . 2 + 1 , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ) AS platform , ( toDate ( '2021-10-01' ) + date_diff ) - 1 AS day , clicks_res . 1 AS clicks FROM ( SELECT user_id , sumResampleResample ( 1 , 4 , 1 , 0 , 31 , 1 )( clicks , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ), toDate ( created_at ) - toDate ( '2021-10-01' )) AS clicks_arr FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks_arr_lvl_2 , range ( 31 ) AS date_diff WHERE clicks > 0 FORMAT ` Null ` Peak memory usage ( for query ): 10 . 14 GiB . Elapsed : 9 . 533 sec . Processed 198 . 14 million rows , 6 . 34 GB ( 20 . 78 million rows / s ., 665 . 20 MB / s .) SELECT user_id , CAST ( plat + 1 , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ) AS platform , ( toDate ( '2021-10-01' ) + date_diff ) - 1 AS day , clicks FROM ( WITH ( SELECT flatten ( arrayMap ( x -> range ( 3 ) AS platforms , range ( 31 ) as days ))) AS platform_arr , ( SELECT flatten ( arrayMap ( x -> [ x , x , x ], range ( 31 ) as days ))) AS days_arr SELECT user_id , flatten ( sumResampleResample ( 1 , 4 , 1 , 0 , 31 , 1 )( clicks , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ), toDate ( created_at ) - toDate ( '2021-10-01' ))) AS clicks_arr , platform_arr , days_arr FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks , platform_arr AS plat , days_arr AS date_diff FORMAT ` Null ` Peak memory usage ( for query ): 9 . 95 GiB . Elapsed : 3 . 095 sec . Processed 198 . 14 million rows , 6 . 34 GB ( 64 . 02 million rows / s ., 2 . 05 GB / s .) SELECT user_id , CAST ( plat + 1 , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ) AS platform , ( toDate ( '2021-10-01' ) + date_diff ) - 1 AS day , clicks FROM ( WITH ( SELECT flatten ( arrayMap ( x -> range ( 3 ) AS platforms , range ( 31 ) as days ))) AS platform_arr , ( SELECT flatten ( arrayMap ( x -> [ x , x , x ], range ( 31 ) as days ))) AS days_arr SELECT user_id , sumResampleResample ( 1 , 4 , 1 , 0 , 31 , 1 )( clicks , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ), toDate ( created_at ) - toDate ( '2021-10-01' )) AS clicks_arr , arrayFilter ( x -> (( x . 1 ) > 0 ), arrayZip ( flatten ( clicks_arr ), platform_arr , days_arr )) AS result FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) GROUP BY user_id ) ARRAY JOIN result . 1 AS clicks , result . 2 AS plat , result . 3 AS date_diff FORMAT ` Null ` Peak memory usage ( for query ): 9 . 93 GiB . Elapsed : 4 . 717 sec . Processed 198 . 14 million rows , 6 . 34 GB ( 42 . 00 million rows / s ., 1 . 34 GB / s .) SELECT user_id , CAST ( range % 3 , 'Enum8(\' Rat \ ' = 0, \' Mice \ ' = 1, \' Bat \ ' = 2)' ) AS platform , toDate ( '2021-10-01' ) + intDiv ( range , 3 ) AS day , clicks FROM ( WITH ( SELECT range ( 93 ) ) AS range_arr SELECT user_id , sumResample ( 0 , 93 , 1 )( clicks , (( toDate ( created_at ) - toDate ( '2021-10-01' )) * 3 ) + toUInt8 ( CAST ( platform , 'Enum8(\' Rat \ ' = 0, \' Mice \ ' = 1, \' Bat \ ' = 2)' ))) AS clicks_arr , range_arr FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks , range_arr AS range FORMAT ` Null ` Peak memory usage ( for query ): 8 . 24 GiB . Elapsed : 4 . 838 sec . Processed 198 . 14 million rows , 6 . 36 GB ( 40 . 95 million rows / s ., 1 . 31 GB / s .) SELECT user_id , sumResampleResample ( 1 , 4 , 1 , 0 , 31 , 1 )( clicks , CAST ( platform , 'Enum8(\' Rat \ ' = 1, \' Mice \ ' = 2, \' Bat \ ' = 3)' ), toDate ( created_at ) - toDate ( '2021-10-01' )) AS clicks_arr FROM sessions WHERE ( created_at >= toDate ( '2021-10-01' )) AND ( created_at < toDate ( '2021-11-01' )) AND ( app IN ( 'Orange' )) GROUP BY user_id FORMAT ` Null ` Peak memory usage ( for query ): 5 . 19 GiB . 0 rows in set . Elapsed : 1 . 160 sec . Processed 198 . 14 million rows , 6 . 34 GB ( 170 . 87 million rows / s ., 5 . 47 GB / s .) ARRAY JOIN can be expensive https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/ sumMap, *Resample https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/ Play with two-level Disable: SET group_by_two_level_threshold = 0 , group_by_two_level_threshold_bytes = 0 ; From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly. It can improve query time. https://github.com/ClickHouse/ClickHouse/pull/33439 GROUP BY in external memory Slow! Use hash function for GROUP BY keys GROUP BY cityHash64(โ€˜xxxxโ€™) Can lead to incorrect results as hash functions is not 1 to 1 mapping. Performance bugs https://github.com/ClickHouse/ClickHouse/issues/15005 https://github.com/ClickHouse/ClickHouse/issues/29131 https://github.com/ClickHouse/ClickHouse/issues/31120 https://github.com/ClickHouse/ClickHouse/issues/35096 Fixed in 22.7
Markdown
[![Altinity.com](https://kb.altinity.com/images/general/altinity-logo_horizontal_blue_white.svg)](https://altinity.com/) - [Products](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - ##### PRODUCTS Altinity.Cloud [Managed cloud](https://altinity.com/managed-clickhouse/) Deploy on Altinityโ€™s cloud [Bring your own cloud](https://altinity.com/managed-clickhouse/bring-your-own-cloud/) Any region, any environment [Vendor comparison](https://altinity.com/altinity-cloud-vs-clickhouse-cloud-faq) [Pricing](https://altinity.com/clickhouse-pricing/) Support for ClickHouseยฎ [24/7 Support](https://altinity.com/clickhouse-support/) [POC and Evaluative Support](https://altinity.com/poc-evaluation-support/) [Training for ClickHouseยฎ](https://altinity.com/clickhouse-training/) ##### OPEN SOURCE SOFTWARE [NEW Project Antalya Builds](https://altinity.com/project-antalya-real-time-data-lakes/) Scale ClickHouse queries infinitely with 10X cheaper data lakes [Altinity Stable Builds for ClickHouseยฎ](https://altinity.com/altinity-stable/) LTS-certified binaries with 3 yearsโ€™ support [Altinity Kubernetes Operator for ClickHouseยฎ](https://altinity.com/kubernetes-operator/) Manage ClickHouse clusters on Kubernetes effortlessly [Altinity Backup for ClickHouseยฎ](https://altinity.com/altinity-backup-for-clickhouse/) Simple backup and restore tool for ClickHouse [See more \>](https://altinity.com/ecosystem/) ##### CLICKHOUSEยฎ SOLUTIONS [Observability & Logging](https://altinity.com/observability-and-logging/) Manage high-volume observability data efficiently [Security Information and Event Management (SIEM)](https://altinity.com/security-information-and-event-management/) Efficiently process and analyze high-volume security events [Trading Analytics](https://altinity.com/trading-systems/) Make faster market decisions [Customer stories](https://altinity.com/customer-stories/) See why our customers love us [Altinity Plans and Features](https://altinity.com/plans-and-features-clickhouse/) Managed service or support? Compare plans and features - [Resources](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - Developer Center [Documentation](https://docs.altinity.com/) Product guides and tutorials [Knowledge Base](https://kb.altinity.com/) Answers to common questions and issues [Past Webinars](https://altinity.com/blog/?cat=webinarspage) Watch anytime, on demand. [More Technical Content](https://altinity.com/developer-resources/) How-to guides and videos Learning [What is ClickHouse?](https://altinity.com/clickhouse-database/) New to ClickHouse? Start here [Kubernetes Operator](https://altinity.com/kubernetes-operator/) Learn how it works and who uses it [Unevenly Distributed](https://altinity.com/unevenly-distributed/) A Thought Leadership Series on Real-Time Data Lakes [Altinity Stable Builds](https://altinity.com/altinity-stable/) Learn why you should be using them in your prod environments Community [Open Source at Altinity](https://altinity.com/ecosystem/) Explore our open source projects and contributions [Events](https://altinity.com/events/) Upcoming conferences, events, and webinars [Slack Community](https://altinity.com/slack/) Get help for any ClickHouse issue from experts [Monthly Office Hours](https://altinity.com/office-hours-for-clickhouse/) Meet our engineers live and ask your questions - [Blog](https://altinity.com/blog) - [Company](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - [About the company](https://altinity.com/about-us/) [Leadership](https://altinity.com/about-us/#leadership) [Press Releases](https://altinity.com/about-us/press-releases/) [Partners](https://altinity.com/partners/) [Customer Stories](https://altinity.com/customer-stories/) [Careers](https://altinity.com/careers/) ##### Get in touch with ClickHouse experts. [Contact Us](https://altinity.com/contact) - [Contact](https://altinity.com/contact) - [Tech Support](https://altinity.com/free-clickhouse-consultation/) [ClickHouseยฎ in Cloud](https://acm.altinity.cloud/signup) - [Products]() - [Altinity.Cloud](https://altinity.com/managed-clickhouse/) - [Managed cloud](https://altinity.com/managed-clickhouse/) - [Bring your own cloud](https://altinity.com/managed-clickhouse/bring-your-own-cloud/) - [Vendor comparison](https://altinity.com/altinity-cloud-vs-clickhouse-cloud-faq/) - [Pricing](https://altinity.com/clickhouse-pricing/) - [Support for ClickHouseยฎ](https://altinity.com/clickhouse-support/) - [24/7 Support](https://altinity.com/clickhouse-support/) - [POC and evaluative support](https://altinity.com/poc-evaluation-support/) - [Training for ClickHouse](https://altinity.com/clickhouse-training/) - [Open Source Software](https://altinity.com/ecosystem/) - [NEW Project Antalya Builds](https://altinity.com/project-antalya-real-time-data-lakes/) - [Altinity Stable Builds for ClickHouseยฎ](https://altinity.com/altinity-stable/) - [Altinity Kubernetes Operator for ClickHouseยฎ](https://altinity.com/kubernetes-operator/) - [Altinity Backup for ClickHouseยฎ](https://altinity.com/altinity-backup-for-clickhouse/) - [Customer Stories](https://altinity.com/customer-stories/) - [Altinity Plans and Features](https://altinity.com/plans-and-features-clickhouse/) - [Resources]() - [Developer Center](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - [Documentation](https://docs.altinity.com/) - [Knowledge Base](https://kb.altinity.com/) - [Past Webinars](https://altinity.com/blog/?cat=webinarspage) - [More Technical Content](https://altinity.com/developer-resources/) - [Learning](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - [What is ClickHouse?](https://altinity.com/clickhouse-database/) - [Kubernetes Operator](https://altinity.com/what-is-kubernetes/) - [Unevenly Distributed](https://altinity.com/unevenly-distributed/) - [Altinity Stable Builds](https://altinity.com/altinity-stable/) - [Community](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - [Open Source at Altinity](https://altinity.com/ecosystem/) - [Events](https://altinity.com/events) - [Slack Community](http://www.localhost:10008/slack) - [Monthly Office Hours](https://altinity.com/office-hours-for-clickhouse/) - [Blog](https://altinity.com/blog) - [Company]() - [About the company](https://altinity.com/about-us/) - [Leadership](https://altinity.com/about-us/#leadership) - [Press Releases](https://altinity.com/about-us/press-releases/) - [Partners](https://altinity.com/partners/) - [Customer Stories](https://altinity.com/customer-stories/) - [Careers](https://altinity.com/careers/) - [Contact](https://altinity.com/contact) - [Tech Support](https://altinity.com/free-clickhouse-consultation) - [ClickHouseยฎ in Cloud](https://acm.altinity.cloud/signup?__hstc=173202428.421802398cbec3b5d6457748d00d825f.1702395478849.1769622795179.1769630357065.395&__hssc=173202428.9.1769630357065&__hsfp=e5063c5ad56e06035f046e27f2ea897a) - [Altinityยฎ Knowledge Base for ClickHouseยฎ](https://kb.altinity.com/) - [Engines](https://kb.altinity.com/engines/) - [Atomic Database Engine](https://kb.altinity.com/engines/altinity-kb-atomic-database-engine/ "ClickHouseยฎ Atomic Database Engine") - [How to Convert Ordinary to Atomic](https://kb.altinity.com/engines/altinity-kb-atomic-database-engine/how-to-convert-ordinary-to-atomic/) - [How to Convert Atomic to Ordinary](https://kb.altinity.com/engines/altinity-kb-atomic-database-engine/altinity-kb-how-to-convert-atomic-to-ordinary/) - [EmbeddedRocksDB & dictionary](https://kb.altinity.com/engines/altinity-kb-embeddedrocksdb-and-dictionary/) - [MergeTree table engine family](https://kb.altinity.com/engines/mergetree-table-engine-family/) - [CollapsingMergeTree vs ReplacingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/collapsing-vs-replacing/) - [Part names & MVCC](https://kb.altinity.com/engines/mergetree-table-engine-family/part-naming-and-mvcc/) - [Properly ordering and partitioning MergeTree tables](https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/ "How to pick an ORDER BY / PRIMARY KEY / PARTITION BY for the MergeTree family table") - [AggregatingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/aggregatingmergetree/ "ClickHouseยฎ AggregatingMergeTree") - [index & column files](https://kb.altinity.com/engines/mergetree-table-engine-family/index-and-column-files/) - [Merge performance and OPTIMIZE FINAL](https://kb.altinity.com/engines/mergetree-table-engine-family/merge-performance-final-optimize-by/) - [Nulls in order by](https://kb.altinity.com/engines/mergetree-table-engine-family/altinity-kb-nulls-in-order-by/) - [ReplacingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/replacingmergetree/) - [ReplacingMergeTree does not collapse duplicates](https://kb.altinity.com/engines/mergetree-table-engine-family/replacingmergetree/altinity-kb-replacingmergetree-does-not-collapse-duplicates/) - [Skip index](https://kb.altinity.com/engines/mergetree-table-engine-family/skip-index/) - [SummingMergeTree](https://kb.altinity.com/engines/mergetree-table-engine-family/summingmergetree/) - [VersionedCollapsingMT](https://kb.altinity.com/engines/mergetree-table-engine-family/versioned-collapsing-mergetree/ "UPSERT by VersionedCollapsingMergeTree") - [Queries & Syntax](https://kb.altinity.com/altinity-kb-queries-and-syntax/) - [GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/) - [GROUP BY tricks](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/) - [Adjustable table partitioning](https://kb.altinity.com/altinity-kb-queries-and-syntax/variable-partitioning/) - [DateTime64](https://kb.altinity.com/altinity-kb-queries-and-syntax/datetime64/) - [DISTINCT & GROUP BY & LIMIT 1 BY what the difference](https://kb.altinity.com/altinity-kb-queries-and-syntax/distinct-vs-group-by-vs-limit-by/) - [Imprecise literal Decimal or Float64 values](https://kb.altinity.com/altinity-kb-queries-and-syntax/literal-decimal-or-float/ "Imprecise parsing of literal Decimal or Float64") - [Multiple aligned date columns in PARTITION BY expression](https://kb.altinity.com/altinity-kb-queries-and-syntax/multiple-date-column-in-partition-key/) - [Row policies overhead](https://kb.altinity.com/altinity-kb-queries-and-syntax/row_policy_using_dictionary/ "Row policies overhead (hiding 'removed' tenants)") - [Slow \`SELECT count()\`](https://kb.altinity.com/altinity-kb-queries-and-syntax/slow_select_count/ "Why is simple `SELECT count()` Slow in ClickHouseยฎ?") - [trace\_log](https://kb.altinity.com/altinity-kb-queries-and-syntax/trace_log/ "Collecting query execution flamegraphs using system.trace_log") - [Using array functions to mimic window-functions alike behavior](https://kb.altinity.com/altinity-kb-queries-and-syntax/array-functions-as-window/) - [\-State & -Merge combinators](https://kb.altinity.com/altinity-kb-queries-and-syntax/state-and-merge-combinators/) - [ALTER MODIFY COLUMN is stuck, the column is inaccessible.](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-alter-modify-column-is-stuck-the-column-is-inaccessible/) - [ANSI SQL mode](https://kb.altinity.com/altinity-kb-queries-and-syntax/ansi-sql-mode/) - [Async INSERTs](https://kb.altinity.com/altinity-kb-queries-and-syntax/async-inserts/) - [Atomic insert](https://kb.altinity.com/altinity-kb-queries-and-syntax/atomic-insert/) - [ClickHouse Projections](https://kb.altinity.com/altinity-kb-queries-and-syntax/projections-examples/ "ClickHouseยฎ Projections") - [Cumulative Anything](https://kb.altinity.com/altinity-kb-queries-and-syntax/cumulative-unique/) - [Data types on disk and in RAM](https://kb.altinity.com/altinity-kb-queries-and-syntax/data-types-on-disk-and-in-ram/) - [DELETE via tombstone column](https://kb.altinity.com/altinity-kb-queries-and-syntax/delete-via-tombstone-column/) - [EXPLAIN query](https://kb.altinity.com/altinity-kb-queries-and-syntax/explain-query/) - [Fill missing values at query time](https://kb.altinity.com/altinity-kb-queries-and-syntax/fill-missing-values-at-query-time/) - [FINAL clause speed](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-final-clause-speed/) - [Join with Calendar using Arrays](https://kb.altinity.com/altinity-kb-queries-and-syntax/join-with-calendar-using-arrays/) - [JOINs](https://kb.altinity.com/altinity-kb-queries-and-syntax/joins/) - [JOIN optimization tricks](https://kb.altinity.com/altinity-kb-queries-and-syntax/joins/joins-tricks/) - [JSONExtract to parse many attributes at a time](https://kb.altinity.com/altinity-kb-queries-and-syntax/jsonextract-to-parse-many-attributes-at-a-time/) - [KILL QUERY](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-kill-query/) - [Lag / Lead](https://kb.altinity.com/altinity-kb-queries-and-syntax/lag-lead/) - [Machine learning in ClickHouse](https://kb.altinity.com/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse/) - [Mutations](https://kb.altinity.com/altinity-kb-queries-and-syntax/mutations/) - [OPTIMIZE vs OPTIMIZE FINAL](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-optimize-vs-optimize-final/) - [Parameterized views](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-parameterized-views/) - [partial-projection-optimization](https://kb.altinity.com/altinity-kb-queries-and-syntax/partial-projection-optimization/ "Use both projection and raw data in single query") - [PIVOT / UNPIVOT](https://kb.altinity.com/altinity-kb-queries-and-syntax/pivot-unpivot/) - [Possible deadlock avoided. Client should retry](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-possible-deadlock-avoided.-client-should-retry/) - [Roaring bitmaps for calculating retention](https://kb.altinity.com/altinity-kb-queries-and-syntax/roaring-bitmaps-for-calculating-retention/) - [SAMPLE by](https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-sample-by/) - [Sampling Example](https://kb.altinity.com/altinity-kb-queries-and-syntax/sampling-example/) - [Simple aggregate functions & combinators](https://kb.altinity.com/altinity-kb-queries-and-syntax/simplestateif-or-ifstate-for-simple-aggregate-functions/) - [Skip indexes](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/) - [Example: minmax](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/minmax/) - [Skip index bloom\_filter Example](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column/) - [Skip indexes examples](https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/skip-indexes-examples/) - [Time zones](https://kb.altinity.com/altinity-kb-queries-and-syntax/time-zones/) - [Time-series alignment with interpolation](https://kb.altinity.com/altinity-kb-queries-and-syntax/ts-interpolation/) - [Top N & Remain](https://kb.altinity.com/altinity-kb-queries-and-syntax/top-n-and-remain/) - [Troubleshooting](https://kb.altinity.com/altinity-kb-queries-and-syntax/troubleshooting/) - [TTL](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/) - [MODIFY (ADD) TTL](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/modify-ttl/ "MODIFY (ADD) TTL in ClickHouseยฎ") - [What are my TTL settings](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/what-are-my-ttls/ "What are my TTL settings?") - [TTL GROUP BY Examples](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/ttl-group-by-examples/) - [TTL Recompress example](https://kb.altinity.com/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example/) - [UPDATE via Dictionary](https://kb.altinity.com/altinity-kb-queries-and-syntax/update-via-dictionary/) - [Values mapping](https://kb.altinity.com/altinity-kb-queries-and-syntax/values-mapping/) - [Window functions](https://kb.altinity.com/altinity-kb-queries-and-syntax/window-functions/) - [Functions](https://kb.altinity.com/altinity-kb-functions/) - [Encoding and Decoding of quantileTDigest states](https://kb.altinity.com/altinity-kb-functions/how-to-encode-decode-quantiletdigest-state/ "How to encode/decode quantileTDigest states from/to list of centroids") - [kurt & skew](https://kb.altinity.com/altinity-kb-functions/kurt_skew_statistics/ "kurt & skew statistical functions in ClickHouseยฎ ") - [\-Resample vs -If vs -Map vs Subquery](https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/) - [arrayFold](https://kb.altinity.com/altinity-kb-functions/arrayfold/) - [arrayMap, arrayJoin or ARRAY JOIN memory usage](https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/) - [assumeNotNull and friends](https://kb.altinity.com/altinity-kb-functions/assumenotnull-and-friends/) - [Encrypt](https://kb.altinity.com/altinity-kb-functions/encrypt/) - [sequenceMatch](https://kb.altinity.com/altinity-kb-functions/altinity-kb-sequencematch/) - [Integrations](https://kb.altinity.com/altinity-kb-integrations/) - [Altinity Cloud Access Management](https://kb.altinity.com/altinity-kb-integrations/altinity-cloud/) - [ClickHouseยฎ python drivers](https://kb.altinity.com/altinity-kb-integrations/clickhouse_python_drivers/) - [Integrating ClickHouseยฎ with MySQL](https://kb.altinity.com/altinity-kb-integrations/mysql-clickhouse/ "MySQL") - [ODBC Driver for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-integrations/clickhouse-odbc/) - [Spark](https://kb.altinity.com/altinity-kb-integrations/spark/ "ClickHouseยฎ + Spark") - [BI Tools](https://kb.altinity.com/altinity-kb-integrations/bi-tools/) - [CatBoost / MindsDB / Fast.ai](https://kb.altinity.com/altinity-kb-integrations/catboost-mindsdb-fast.ai/) - [Google S3 (GCS)](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-google-s3-gcs/) - [Kafka](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/) - [Kafka virtual columns](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-virtual-columns/ "Kafka engine Virtual columns") - [Schema Inference for Kafka](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/kafka-schema-inference/ "Inferring Schema from AvroConfluent Messages in Kafka for ClickHouseยฎ") - [Setting the background message broker schedule pool size](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/background_message_broker_schedule_pool_size/) - [Adjusting librdkafka settings](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-adjusting-librdkafka-settings/) - [Error handling](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/error-handling/) - [Exactly once semantics](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-exactly-once-semantics/) - [Kafka main parsing loop](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-main-parsing-loop/) - [Kafka parallel consuming](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-parallel-consuming/) - [Multiple MVs attached to Kafka table](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-kafka-mv-consuming/) - [Rewind / fast-forward / replay](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-rewind-fast-forward-replay/) - [SELECTs from engine=Kafka](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-selects-from-engine-kafka/) - [RabbitMQ](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-rabbitmq/) - [RabbitMQ Error handling](https://kb.altinity.com/altinity-kb-integrations/altinity-kb-rabbitmq/error-handling/) - [Setup & maintenance](https://kb.altinity.com/altinity-kb-setup-and-maintenance/) - [S3 & object storage](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/) - [AWS S3 Recipes](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/aws-s3-recipes/) - [Clean up orphaned objects on s3](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/clean-up-orphaned-objects-on-s3.md/) - [s3 and mutations](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3_and_mutations/ "How much data are written to S3 during mutations") - [s3 cached table](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3_cache_example/ "Example of the table at s3 with cache") - [S3Disk](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-s3-object-storage/s3disk/) - [AggregateFunction(uniq, UUID) doubled after ClickHouseยฎ upgrade](https://kb.altinity.com/altinity-kb-setup-and-maintenance/uniq-uuid-doubled-clickhouse-upgrade/) - [Can not connect to my ClickHouseยฎ server](https://kb.altinity.com/altinity-kb-setup-and-maintenance/connection-problems/) - [cgroups and k8s](https://kb.altinity.com/altinity-kb-setup-and-maintenance/cgroups_k8s/ "cgroups and kubernetes cloud providers") - [ClickHouse logs and Vector.dev](https://kb.altinity.com/altinity-kb-setup-and-maintenance/ch-logs-2-json-vectordev/ "Transforming ClickHouse logs to ndjson using Vector.dev") - [ClickHouse operator](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-operator/ "Altinity Kubernetes Operator For ClickHouseยฎ") - [ClickHouseยฎ and different filesystems](https://kb.altinity.com/altinity-kb-setup-and-maintenance/filesystems/) - [ClickHouseยฎ RBAC example](https://kb.altinity.com/altinity-kb-setup-and-maintenance/rbac/ "ClickHouseยฎ Access Control and Account Management (RBAC)") - [Client Timeouts](https://kb.altinity.com/altinity-kb-setup-and-maintenance/client-timeouts/) - [Compatibility layer for the Altinity Kubernetes Operator for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/monitoring-operator-exporter-compatibility/) - [Convert uniqExact to uniq(Combined)](https://kb.altinity.com/altinity-kb-setup-and-maintenance/uniqexact-to-uniq-combined/ "How to convert uniqExact states to approximate uniq functions states") - [Custom Settings](https://kb.altinity.com/altinity-kb-setup-and-maintenance/custom_settings/) - [Description of asynchronous\_metrics](https://kb.altinity.com/altinity-kb-setup-and-maintenance/asynchronous_metrics_descr/) - [disk encryption](https://kb.altinity.com/altinity-kb-setup-and-maintenance/disk_encryption/ "ClickHouseยฎ data/disk encryption (at rest)") - [DR two DC](https://kb.altinity.com/altinity-kb-setup-and-maintenance/dr-two-dc/) - [How ALTERs work in ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/alters/) - [How to recreate a table](https://kb.altinity.com/altinity-kb-setup-and-maintenance/how_to_recreate_table/ "How to recreate a table in case of total corruption of the replication queue") - [http\_handlers](https://kb.altinity.com/altinity-kb-setup-and-maintenance/http_handlers/ "http handler example") - [Jemalloc heap profiling](https://kb.altinity.com/altinity-kb-setup-and-maintenance/jemalloc_heap_profiling/) - [Logging](https://kb.altinity.com/altinity-kb-setup-and-maintenance/logging/) - [Merge Memory in metric\_log](https://kb.altinity.com/altinity-kb-setup-and-maintenance/metric_log_ram/ "High Memory Usage During Merge in system.metric_log") - [Precreate parts using clickhouse-local](https://kb.altinity.com/altinity-kb-setup-and-maintenance/precreate_parts_using_clickhouse_local.sh/) - [Recovery after complete data loss](https://kb.altinity.com/altinity-kb-setup-and-maintenance/recovery-after-complete-data-loss/) - [Replicate RBAC with Keeper](https://kb.altinity.com/altinity-kb-setup-and-maintenance/users_in_keeper/ "How to Replicate ClickHouse RBAC Users and Grants with ZooKeeper/Keeper") - [Replication: Can not resolve host of another ClickHouseยฎ server](https://kb.altinity.com/altinity-kb-setup-and-maintenance/change-me/) - [source parts sizeis greater than the current maximum](https://kb.altinity.com/altinity-kb-setup-and-maintenance/source-pars-size-is-greater-than-maximum/ "source parts size is greater than the current maximum") - [Successful ClickHouseยฎ deployment plan](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-deployment-plan/) - [sysall database](https://kb.altinity.com/altinity-kb-setup-and-maintenance/sysall/ "sysall database (system tables on a cluster level)") - [Timeouts during OPTIMIZE FINAL](https://kb.altinity.com/altinity-kb-setup-and-maintenance/timeouts-during-optimize-final/) - [Use an executable dictionary as cron task](https://kb.altinity.com/altinity-kb-setup-and-maintenance/executable-dictionary/) - [Useful settings to turn on](https://kb.altinity.com/altinity-kb-setup-and-maintenance/useful-setting-to-turn-on/ "Useful settings to turn on/Defaults that should be reconsidered") - [Who ate my CPU](https://kb.altinity.com/altinity-kb-setup-and-maintenance/who-ate-my-cpu/) - [Zookeeper session has expired](https://kb.altinity.com/altinity-kb-setup-and-maintenance/zookeeper-session-expired/) - [Server config files](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-server-config-files/ "Server configuration files") - [Aggressive merges](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-aggressive_merges/) - [Altinity Backup for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-backup/) - [Altinity packaging compatibility \>21.x and earlier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-packaging-compatibility-greater-than-21.x-and-earlier/) - [AWS EC2 Storage](https://kb.altinity.com/altinity-kb-setup-and-maintenance/aws-ec2-storage/) - [ClickHouseยฎ in Docker](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-clickhouse-in-docker/) - [ClickHouseยฎ Monitoring](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-monitoring/) - [ClickHouseยฎ versions](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-versions/) - [Configure ClickHouseยฎ for low memory environments](https://kb.altinity.com/altinity-kb-setup-and-maintenance/configure_clickhouse_for_low_mem_envs/) - [Converting MergeTree to Replicated](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-converting-mergetree-to-replicated/) - [Data Migration](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/) - [Export from MSSQL to ClickHouseยฎ](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/mssql-clickhouse/ "MSSQL bcp pipe to clickhouse-client") - [add\_remove\_replica](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/add_remove_replica/ "Add/Remove a new replica to a ClickHouseยฎ cluster") - [clickhouse-copier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/) - [clickhouse-copier 20.3 and earlier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/altinity-kb-clickhouse-copier-20.3-and-earlier/) - [clickhouse-copier 20.4 - 21.6](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/altinity-kb-clickhouse-copier-20.4_21.6/) - [Kubernetes job for clickhouse-copier](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/altinity-kb-clickhouse-copier/altinity-kb-clickhouse-copier-kubernetes-job/) - [Distributed table to cluster](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/distributed-table-cluster/ "Distributed table to ClickHouseยฎ Cluster") - [Fetch Alter Table](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/fetch_alter_table/) - [Remote table function](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/remote-table-function/) - [rsync](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/rsync/ "Moving ClickHouse to Another Server") - [DDLWorker and DDL queue problems](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-ddlworker/) - [There are N unfinished hosts (0 of them are currently active).](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-ddlworker/there-are-n-unfinished-hosts-0-of-them-are-currently-active/) - [Desharding](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-desharding/ "Merge Shards") - [differential backups using clickhouse-backup](https://kb.altinity.com/altinity-kb-setup-and-maintenance/clickhouse-backup-diff/) - [High CPU usage](https://kb.altinity.com/altinity-kb-setup-and-maintenance/high-cpu-usage/ "High CPU usage in ClickHouseยฎ") - [Load balancers](https://kb.altinity.com/altinity-kb-setup-and-maintenance/load-balancers/) - [memory configuration settings](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-memory-configuration-settings/) - [Useful queries](https://kb.altinity.com/altinity-kb-useful-queries/) - [Check table metadata in zookeeper](https://kb.altinity.com/altinity-kb-useful-queries/table-meta-in-zookeeper/) - [Compare query\_log for 2 intervals](https://kb.altinity.com/altinity-kb-useful-queries/compare_query_log_for_2_intervals/) - [Debug hanging thing](https://kb.altinity.com/altinity-kb-useful-queries/debug-hang/) - [Handy queries for system.query\_log](https://kb.altinity.com/altinity-kb-useful-queries/query_log/) - [Ingestion metrics from system.part\_log](https://kb.altinity.com/altinity-kb-useful-queries/ingestion-rate-part_log/) - [Remove block numbers from zookeeper for removed partitions](https://kb.altinity.com/altinity-kb-useful-queries/remove_unneeded_block_numbers/) - [Removing tasks in the replication queue related to empty partitions](https://kb.altinity.com/altinity-kb-useful-queries/remove_empty_partitions_from_rq/) - [Can detached parts be dropped?](https://kb.altinity.com/altinity-kb-useful-queries/detached-parts/ "Can detached parts in ClickHouseยฎ be dropped?") - [Database Size - Table - Column size](https://kb.altinity.com/altinity-kb-useful-queries/altinity-kb-database-size-table-column-size/) - [Notes on Various Errors with respect to replication and distributed connections](https://kb.altinity.com/altinity-kb-useful-queries/connection-issues-distributed-parts/) - [Number of active parts in a partition](https://kb.altinity.com/altinity-kb-useful-queries/altinity-kb-number-of-active-parts-in-a-partition/) - [Parts consistency](https://kb.altinity.com/altinity-kb-useful-queries/parts-consistency/) - [Schema design](https://kb.altinity.com/altinity-kb-schema-design/) - [ClickHouse limitations](https://kb.altinity.com/altinity-kb-schema-design/how-much-is-too-much/ "ClickHouseยฎ limitations") - [ClickHouseยฎ row-level deduplication](https://kb.altinity.com/altinity-kb-schema-design/row-level-deduplication/) - [Column backfilling from dictionary](https://kb.altinity.com/altinity-kb-schema-design/backfill_column/ "Column backfilling with alter/update using a dictionary") - [Functions to count uniqs](https://kb.altinity.com/altinity-kb-schema-design/uniq-functions/) - [How to change ORDER BY](https://kb.altinity.com/altinity-kb-schema-design/change-order-by/) - [Ingestion of AggregateFunction](https://kb.altinity.com/altinity-kb-schema-design/ingestion-aggregate-function/) - [insert deduplication](https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/ "Insert Deduplication / Insert Idempotency") - [JSONEachRow, tuple, map and MVs](https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsoneachrow-tuples-and-mvs/ "JSONEachRow, Tuples, Maps and Materialized Views") - [Pre-Aggregation approaches](https://kb.altinity.com/altinity-kb-schema-design/preaggregations/) - [SnowflakeID](https://kb.altinity.com/altinity-kb-schema-design/snowflakeid/ "SnowflakeID for Efficient Primary Keys ") - [Two columns indexing](https://kb.altinity.com/altinity-kb-schema-design/two-columns-indexing/) - [Best schema for storing many metrics registered from the single source](https://kb.altinity.com/altinity-kb-schema-design/best-schema-for-storing-many-metrics-registered-from-the-single-source/) - [Codecs](https://kb.altinity.com/altinity-kb-schema-design/codecs/) - [Codecs on array columns](https://kb.altinity.com/altinity-kb-schema-design/codecs/codecs-on-array-columns/) - [Codecs speed](https://kb.altinity.com/altinity-kb-schema-design/codecs/codecs-speed/) - [How to test different compression codecs](https://kb.altinity.com/altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs/) - [Dictionaries vs LowCardinality](https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-dictionaries-vs-lowcardinality/) - [Flattened table](https://kb.altinity.com/altinity-kb-schema-design/flattened-table/) - [Floats vs Decimals](https://kb.altinity.com/altinity-kb-schema-design/floats-vs-decimals/) - [Ingestion performance and formats](https://kb.altinity.com/altinity-kb-schema-design/ingestion-performance-and-formats/) - [IPs/masks](https://kb.altinity.com/altinity-kb-schema-design/how-to-store-ips/) - [JSONAsString and Mat. View as JSON parser](https://kb.altinity.com/altinity-kb-schema-design/altinity-kb-jsonasstring-and-mat.-view-as-json-parser/) - [LowCardinality](https://kb.altinity.com/altinity-kb-schema-design/lowcardinality/) - [Materialized Views](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/ "ClickHouseยฎ Materialized Views") - [Idempotent insert MV](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/idempotent_inserts_mv/ "Idempotent inserts into a materialized view") - [Backfill/populate MV in a controlled manner](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/backfill-populate-mv-in-a-controlled-manner/) - [Using the Altinity Kubernetes Operator for ClickHouseยฎ](https://kb.altinity.com/altinity-kb-kubernetes/) - [Istio Issues](https://kb.altinity.com/altinity-kb-kubernetes/altinity-kb-istio-user-issue-k8s/) - [Interfaces](https://kb.altinity.com/altinity-kb-interfaces/) - [clickhouse-client](https://kb.altinity.com/altinity-kb-interfaces/altinity-kb-clickhouse-client/) - [Upgrade](https://kb.altinity.com/upgrade/) - [Vulnerabilities](https://kb.altinity.com/upgrade/vulnerabilities/) - [ClickHouseยฎ Function/Engines/Settings Report](https://kb.altinity.com/upgrade/clickhouse-feature-report/) - [Removing empty parts](https://kb.altinity.com/upgrade/removing-empty-parts/) - [Removing lost parts](https://kb.altinity.com/upgrade/removing-lost-parts/) - [Dictionaries](https://kb.altinity.com/altinity-kb-dictionaries/) - [Dictionaries & arrays](https://kb.altinity.com/altinity-kb-dictionaries/dictionaries-and-arrays/) - [Dictionary on the top of several tables using VIEW](https://kb.altinity.com/altinity-kb-dictionaries/dictionary-on-top-tables/) - [Dimension table design](https://kb.altinity.com/altinity-kb-dictionaries/dimension_table_desing/) - [Example of PostgreSQL dictionary](https://kb.altinity.com/altinity-kb-dictionaries/example-of-postgresql-dictionary/) - [MySQL8 source for dictionaries](https://kb.altinity.com/altinity-kb-dictionaries/mysql8-source-for-dictionaries/) - [Partial updates](https://kb.altinity.com/altinity-kb-dictionaries/partial-updates/) - [range\_hashed example - open intervals](https://kb.altinity.com/altinity-kb-dictionaries/altinity-kb-range_hashed-example-open-intervals/) - [Security named collections](https://kb.altinity.com/altinity-kb-dictionaries/security-named-collections/) - [SPARSE\_HASHED VS HASHED vs HASHED\_ARRAY](https://kb.altinity.com/altinity-kb-dictionaries/altinity-kb-sparse_hashed-vs-hashed/) - [Using This Knowledge Base](https://kb.altinity.com/using-this-knowledgebase/) - [Mermaid Example](https://kb.altinity.com/using-this-knowledgebase/mermaid_example/) - [ClickHouseยฎ Upgrade eBook](https://hubs.la/Q03ccpmq0 "Upgrade eBook") - [ClickHouse TTLs in Practice (eBook)](https://altinity.com/clickhouse-ttl/?utm_campaign=34417448-CH_TTL_eBook&utm_source=KB&utm_medium=CTA_Sidebar) [Edit this page](https://github.com/Altinity/altinityknowledgebase/edit/main/content/en/altinity-kb-queries-and-syntax/group-by/tricks.md) [Create child page](https://github.com/Altinity/altinityknowledgebase/new/main/content/en/altinity-kb-queries-and-syntax/group-by/tricks.md?filename=change-me.md&value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+the+top+of+the+page+%28weight+controls+how+its+ordered+amongst+other+pages+in+the+same+directory%3B+lowest+number+first%29.%0A%2A+Add+a+good+commit+message+at+the+bottom+of+the+page+%28%3C80+characters%3B+use+the+extended+description+field+for+more+detail%29.%0A%2A+Create+a+new+branch+so+you+can+preview+your+new+file+and+request+a+review+via+Pull+Request.%0A) [Create documentation issue](https://github.com/Altinity/altinityknowledgebase/issues/new?title=/altinity-kb-queries-and-syntax/group-by/tricks/) [Print entire section](http://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/printview/) - [Tricks](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#tricks) - [PreFilter values before GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#prefilter-values-before-group-by) - [Use Fixed-width data types instead of String](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#use-fixed-width-data-types-instead-of-string) - [Two LowCardinality Columns in GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#two-lowcardinality-columns-in-group-by) - [Split your query in multiple smaller queries and execute them one BY one](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#split-your-query-in-multiple-smaller-queries-and-execute-them-one-by-one) - [Shard your data by one of common high cardinal GROUP BY key](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#shard-your-data-by-one-of-common-high-cardinal-group-by-key) - [Reduce number of threads](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#reduce-number-of-threads) - [UNION ALL](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#union-all) - [aggregation\_in\_order](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#aggregation_in_order) - [Reduce dimensions from GROUP BY with functions like sumMap, \*Resample](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#reduce-dimensions-from-group-by-with-functions-like-summap-resample) - [Play with two-level](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#play-with-two-level) - [GROUP BY in external memory](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#group-by-in-external-memory) - [Use hash function for GROUP BY keys](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#use-hash-function-for-group-by-keys) - [Performance bugs](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/tricks/#performance-bugs) Altinityยฎ, Altinity.Cloudยฎ, and Altinity Stableยฎ are registered trademarks of Altinity, Inc. ClickHouseยฎ is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Run open-source ClickHouseยฎ better Blogs, webinars, office hours, in-person events: [check out the latest technical content from Altinity.](https://hubs.la/Q0498sPf0) 1. [Queries & Syntax](https://kb.altinity.com/altinity-kb-queries-and-syntax/) 2. [GROUP BY](https://kb.altinity.com/altinity-kb-queries-and-syntax/group-by/) 3. GROUP BY tricks # GROUP BY tricks Tricks for GROUP BY memory usage optimization ## Tricks Testing dataset ``` CREATE TABLE sessions ( `app` LowCardinality(String), `user_id` String, `created_at` DateTime, `platform` LowCardinality(String), `clicks` UInt32, `session_id` UUID ) ENGINE = MergeTree PARTITION BY toYYYYMM(created_at) ORDER BY (app, user_id, session_id, created_at) INSERT INTO sessions WITH CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app, concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id, toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at, CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform, number % 17 AS clicks, generateUUIDv4() AS session_id SELECT app, user_id, created_at, platform, clicks, session_id FROM numbers_mt(1000000000) 0 rows in set. Elapsed: 46.078 sec. Processed 1.00 billion rows, 8.00 GB (21.70 million rows/s., 173.62 MB/s.) โ”Œโ”€databaseโ”€โ”ฌโ”€tableโ”€โ”€โ”€โ”€โ”ฌโ”€columnโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€typeโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€rowsโ”€โ”ฌโ”€compressed_bytesโ”€โ”ฌโ”€compressedโ”€โ”ฌโ”€uncompressedโ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ratioโ”€โ”ฌโ”€codecโ”€โ” โ”‚ default โ”‚ sessions โ”‚ session_id โ”‚ UUID โ”‚ 1000000000 โ”‚ 16065918103 โ”‚ 14.96 GiB โ”‚ 14.90 GiB โ”‚ 0.9958970223439835 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ user_id โ”‚ String โ”‚ 1000000000 โ”‚ 3056977462 โ”‚ 2.85 GiB โ”‚ 13.04 GiB โ”‚ 4.57968701896828 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ clicks โ”‚ UInt32 โ”‚ 1000000000 โ”‚ 1859359032 โ”‚ 1.73 GiB โ”‚ 3.73 GiB โ”‚ 2.151278979023993 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ created_at โ”‚ DateTime โ”‚ 1000000000 โ”‚ 1332089630 โ”‚ 1.24 GiB โ”‚ 3.73 GiB โ”‚ 3.0028009451586226 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ platform โ”‚ LowCardinality(String) โ”‚ 1000000000 โ”‚ 329702248 โ”‚ 314.43 MiB โ”‚ 956.63 MiB โ”‚ 3.042446801879252 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ app โ”‚ LowCardinality(String) โ”‚ 1000000000 โ”‚ 4825544 โ”‚ 4.60 MiB โ”‚ 956.63 MiB โ”‚ 207.87333386660654 โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` All queries and datasets are unique, so in different situations different hacks could work better or worse. ### PreFilter values before GROUP BY ``` SELECT user_id, sum(clicks) FROM sessions WHERE created_at > '2021-11-01 00:00:00' GROUP BY user_id HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat') FORMAT `Null` <Debug> MemoryTracker: Peak memory usage (for query): 18.36 GiB. SELECT user_id, sum(clicks) FROM sessions WHERE user_id IN ( SELECT user_id FROM sessions WHERE (platform = 'Rat') AND (clicks = 16) AND (created_at > '2021-11-01 00:00:00') -- So we will select user_id which could potentially match our HAVING clause in OUTER query. ) AND (created_at > '2021-11-01 00:00:00') GROUP BY user_id HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat') FORMAT `Null` <Debug> MemoryTracker: Peak memory usage (for query): 4.43 GiB. ``` ### Use Fixed-width data types instead of String For example, you have 2 strings which has values in special form like this โ€˜ABX 1412312312313โ€™ You can just remove the first 4 characters and convert the rest to UInt64 toUInt64(substr(โ€˜ABX 1412312312313โ€™,5)) And you packed 17 bytes in 8, more than 2 times the improvement of size\! ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` Aggregator: Aggregation method: serialized <Debug> MemoryTracker: Peak memory usage (for query): 28.19 GiB. Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.) WITH CAST(user_id, 'FixedString(14)') AS user_fx, CAST(platform, 'FixedString(4)') AS platform_fx SELECT user_fx, sum(clicks) FROM sessions GROUP BY user_fx, platform_fx FORMAT `Null` Aggregator: Aggregation method: keys256 MemoryTracker: Peak memory usage (for query): 22.24 GiB. Elapsed: 6.637 sec. Processed 1.00 billion rows, 27.00 GB (150.67 million rows/s., 4.07 GB/s.) WITH CAST(user_id, 'FixedString(14)') AS user_fx, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum SELECT user_fx, sum(clicks) FROM sessions GROUP BY user_fx, platform_enum FORMAT `Null` Aggregator: Aggregation method: keys128 MemoryTracker: Peak memory usage (for query): 14.14 GiB. Elapsed: 5.335 sec. Processed 1.00 billion rows, 27.00 GB (187.43 million rows/s., 5.06 GB/s.) WITH toUInt32OrZero(trim( LEADING '0' FROM substr(user_id,6))) AS user_int, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum SELECT user_int, sum(clicks) FROM sessions GROUP BY user_int, platform_enum FORMAT `Null` Aggregator: Aggregation method: keys64 MemoryTracker: Peak memory usage (for query): 10.14 GiB. Elapsed: 8.549 sec. Processed 1.00 billion rows, 27.00 GB (116.97 million rows/s., 3.16 GB/s.) WITH toUInt32('1' || substr(user_id,6)) AS user_int, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum SELECT user_int, sum(clicks) FROM sessions GROUP BY user_int, platform_enum FORMAT `Null` Aggregator: Aggregation method: keys64 Peak memory usage (for query): 10.14 GiB. Elapsed: 6.247 sec. Processed 1.00 billion rows, 27.00 GB (160.09 million rows/s., 4.32 GB/s.) ``` It can be especially useful when you tries to do GROUP BY lc\_column\_1, lc\_column\_2 and ClickHouseยฎ falls back to serialized algorithm. ### Two LowCardinality Columns in GROUP BY ``` SELECT app, sum(clicks) FROM sessions GROUP BY app FORMAT `Null` Aggregator: Aggregation method: low_cardinality_key_string MemoryTracker: Peak memory usage (for query): 43.81 MiB. Elapsed: 0.545 sec. Processed 1.00 billion rows, 5.00 GB (1.83 billion rows/s., 9.17 GB/s.) SELECT app, platform, sum(clicks) FROM sessions GROUP BY app, platform FORMAT `Null` Aggregator: Aggregation method: serialized -- Slowest method! MemoryTracker: Peak memory usage (for query): 222.86 MiB. Elapsed: 2.923 sec. Processed 1.00 billion rows, 6.00 GB (342.11 million rows/s., 2.05 GB/s.) SELECT CAST(app, 'FixedString(6)') AS app_fx, CAST(platform, 'FixedString(4)') AS platform_fx, sum(clicks) FROM sessions GROUP BY app_fx, platform_fx FORMAT `Null` Aggregator: Aggregation method: keys128 MemoryTracker: Peak memory usage (for query): 160.23 MiB. Elapsed: 1.632 sec. Processed 1.00 billion rows, 6.00 GB (612.63 million rows/s., 3.68 GB/s.) ``` ### Split your query in multiple smaller queries and execute them one BY one ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 28.19 GiB. Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.) SELECT user_id, sum(clicks) FROM sessions WHERE (cityHash64(user_id) % 4) = 0 GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 8.16 GiB. Elapsed: 2.910 sec. Processed 1.00 billion rows, 27.00 GB (343.64 million rows/s., 9.28 GB/s.) ``` ### Shard your data by one of common high cardinal GROUP BY key So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables. Letโ€™s create 2 distributed tables with different distribution: rand() and by user\_id ``` CREATE TABLE sessions_distributed AS sessions ENGINE = Distributed('distr-groupby', default, sessions, rand()); INSERT INTO sessions_distributed WITH CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app, concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id, toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at, CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform, number % 17 AS clicks, generateUUIDv4() AS session_id SELECT app, user_id, created_at, platform, clicks, session_id FROM numbers_mt(1000000000); CREATE TABLE sessions_2 ON CLUSTER 'distr-groupby' ( `app` LowCardinality(String), `user_id` String, `created_at` DateTime, `platform` LowCardinality(String), `clicks` UInt32, `session_id` UUID ) ENGINE = MergeTree PARTITION BY toYYYYMM(created_at) ORDER BY (app, user_id, session_id, created_at); CREATE TABLE sessions_distributed_2 AS sessions ENGINE = Distributed('distr-groupby', default, sessions_2, cityHash64(user_id)); INSERT INTO sessions_distributed_2 WITH CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app, concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id, toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at, CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform, number % 17 AS clicks, generateUUIDv4() AS session_id SELECT app, user_id, created_at, platform, clicks, session_id FROM numbers_mt(1000000000); ``` ``` SELECT app, platform, sum(clicks) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_distributed GROUP BY user_id ) GROUP BY app, platform; [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 12.02 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB. MemoryTracker: Peak memory usage (for query): 12.20 GiB. 12 rows in set. Elapsed: 28.345 sec. Processed 1.00 billion rows, 32.00 GB (35.28 million rows/s., 1.13 GB/s.) SELECT app, platform, sum(clicks) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform; [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. MemoryTracker: Peak memory usage (for query): 5.61 GiB. 12 rows in set. Elapsed: 11.952 sec. Processed 1.00 billion rows, 32.00 GB (83.66 million rows/s., 2.68 GB/s.) SELECT app, platform, sum(clicks) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS optimize_distributed_group_by_sharding_key = 1 [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. MemoryTracker: Peak memory usage (for query): 5.61 GiB. 12 rows in set. Elapsed: 11.916 sec. Processed 1.00 billion rows, 32.00 GB (83.92 million rows/s., 2.69 GB/s.) SELECT app, platform, sum(clicks) FROM cluster('distr-groupby', view( SELECT app, platform, sum(clicks) as clicks FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_2 GROUP BY user_id ) GROUP BY app, platform )) GROUP BY app, platform; [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. MemoryTracker: Peak memory usage (for query): 5.55 GiB. 12 rows in set. Elapsed: 9.491 sec. Processed 1.00 billion rows, 32.00 GB (105.36 million rows/s., 3.37 GB/s.) ``` Query with bigger state: ``` SELECT app, platform, sum(last_click) as sum, max(max_clicks) as max, min(min_clicks) as min, max(max_time) as max_time, min(min_time) as min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed GROUP BY user_id ) GROUP BY app, platform; MemoryTracker: Peak memory usage (for query): 19.95 GiB. 12 rows in set. Elapsed: 34.339 sec. Processed 1.00 billion rows, 32.00 GB (29.12 million rows/s., 932.03 MB/s.) SELECT app, platform, sum(last_click) as sum, max(max_clicks) as max, min(min_clicks) as min, max(max_time) as max_time, min(min_time) as min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform; MemoryTracker: Peak memory usage (for query): 10.09 GiB. 12 rows in set. Elapsed: 13.220 sec. Processed 1.00 billion rows, 32.00 GB (75.64 million rows/s., 2.42 GB/s.) SELECT app, platform, sum(last_click) AS sum, max(max_clicks) AS max, min(min_clicks) AS min, max(max_time) AS max_time, min(min_time) AS min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS optimize_distributed_group_by_sharding_key = 1; MemoryTracker: Peak memory usage (for query): 10.09 GiB. 12 rows in set. Elapsed: 13.361 sec. Processed 1.00 billion rows, 32.00 GB (74.85 million rows/s., 2.40 GB/s.) SELECT app, platform, sum(last_click) AS sum, max(max_clicks) AS max, min(min_clicks) AS min, max(max_time) AS max_time, min(min_time) AS min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS distributed_group_by_no_merge=2; MemoryTracker: Peak memory usage (for query): 10.02 GiB. 12 rows in set. Elapsed: 9.789 sec. Processed 1.00 billion rows, 32.00 GB (102.15 million rows/s., 3.27 GB/s.) SELECT app, platform, sum(sum), max(max), min(min), max(max_time) AS max_time, min(min_time) AS min_time FROM cluster('distr-groupby', view( SELECT app, platform, sum(last_click) AS sum, max(max_clicks) AS max, min(min_clicks) AS min, max(max_time) AS max_time, min(min_time) AS min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_2 GROUP BY user_id ) GROUP BY app, platform )) GROUP BY app, platform; MemoryTracker: Peak memory usage (for query): 10.09 GiB. 12 rows in set. Elapsed: 9.525 sec. Processed 1.00 billion rows, 32.00 GB (104.98 million rows/s., 3.36 GB/s.) ``` ``` SELECT app, platform, sum(sessions) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform MemoryTracker: Peak memory usage (for query): 14.57 GiB. 12 rows in set. Elapsed: 37.730 sec. Processed 1.00 billion rows, 44.01 GB (26.50 million rows/s., 1.17 GB/s.) SELECT app, platform, sum(sessions) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS optimize_distributed_group_by_sharding_key = 1; MemoryTracker: Peak memory usage (for query): 14.56 GiB. 12 rows in set. Elapsed: 37.792 sec. Processed 1.00 billion rows, 44.01 GB (26.46 million rows/s., 1.16 GB/s.) SELECT app, platform, sum(sessions) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS distributed_group_by_no_merge = 2; MemoryTracker: Peak memory usage (for query): 14.54 GiB. 12 rows in set. Elapsed: 17.762 sec. Processed 1.00 billion rows, 44.01 GB (56.30 million rows/s., 2.48 GB/s.) SELECT app, platform, sum(sessions) FROM cluster('distr-groupby', view( SELECT app, platform, sum(sessions) as sessions FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_2 GROUP BY user_id ) GROUP BY app, platform)) GROUP BY app, platform MemoryTracker: Peak memory usage (for query): 14.55 GiB. 12 rows in set. Elapsed: 17.574 sec. Processed 1.00 billion rows, 44.01 GB (56.90 million rows/s., 2.50 GB/s.) ``` ### Reduce number of threads Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution. ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 28.19 GiB. Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.) SET max_threads = 2; SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 13.26 GiB. Elapsed: 62.014 sec. Processed 1.00 billion rows, 27.00 GB (16.13 million rows/s., 435.41 MB/s.) ``` ### UNION ALL ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY app, user_id FORMAT `Null` MemoryTracker: Peak memory usage (for query): 24.19 GiB. Elapsed: 5.043 sec. Processed 1.00 billion rows, 27.00 GB (198.29 million rows/s., 5.35 GB/s.) SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Orange' GROUP BY user_id UNION ALL SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Red' GROUP BY user_id UNION ALL SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Melon' GROUP BY user_id UNION ALL SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Blue' GROUP BY user_id FORMAT Null MemoryTracker: Peak memory usage (for query): 7.99 GiB. Elapsed: 2.852 sec. Processed 1.00 billion rows, 27.01 GB (350.74 million rows/s., 9.47 GB/s.) ``` ### aggregation\_in\_order ``` SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Orange' GROUP BY user_id FORMAT `Null` MemoryTracker: Peak memory usage (for query): 969.33 MiB. Elapsed: 2.494 sec. Processed 250.09 million rows, 6.75 GB (100.27 million rows/s., 2.71 GB/s.) SET optimize_aggregation_in_order = 1; SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Orange' GROUP BY app, user_id FORMAT `Null` AggregatingInOrderTransform: Aggregating in order MemoryTracker: Peak memory usage (for query): 169.24 MiB. Elapsed: 4.925 sec. Processed 250.09 million rows, 6.75 GB (50.78 million rows/s., 1.37 GB/s.) ``` ### Reduce dimensions from GROUP BY with functions like sumMap, \*Resample One ``` SELECT user_id, toDate(created_at) AS day, sum(clicks) FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue')) GROUP BY user_id, day FORMAT `Null` MemoryTracker: Peak memory usage (for query): 50.74 GiB. Elapsed: 22.671 sec. Processed 594.39 million rows, 18.46 GB (26.22 million rows/s., 814.41 MB/s.) SELECT user_id, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks FROM ( SELECT user_id, sumResample(0, 31, 1)(clicks, toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks, arrayEnumerate(clicks_arr) AS date_diff FORMAT `Null` Peak memory usage (for query): 8.24 GiB. Elapsed: 5.191 sec. Processed 594.39 million rows, 18.46 GB (114.50 million rows/s., 3.56 GB/s.) ``` Multiple ``` SELECT user_id, platform, toDate(created_at) AS day, sum(clicks) FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) AND user_id ='UID: 08525196' GROUP BY user_id, platform, day ORDER BY user_id, day, platform FORMAT `Null` Peak memory usage (for query): 29.50 GiB. Elapsed: 8.181 sec. Processed 198.14 million rows, 6.34 GB (24.22 million rows/s., 775.14 MB/s.) WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res SELECT user_id, CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks_res.1 AS clicks FROM ( SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks_arr_lvl_2, range(31) AS date_diff FORMAT `Null` Peak memory usage (for query): 9.92 GiB. Elapsed: 4.170 sec. Processed 198.14 million rows, 6.34 GB (47.52 million rows/s., 1.52 GB/s.) WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res SELECT user_id, CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks_res.1 AS clicks FROM ( SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks_arr_lvl_2, range(31) AS date_diff WHERE clicks > 0 FORMAT `Null` Peak memory usage (for query): 10.14 GiB. Elapsed: 9.533 sec. Processed 198.14 million rows, 6.34 GB (20.78 million rows/s., 665.20 MB/s.) SELECT user_id, CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks FROM ( WITH (SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr, (SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr SELECT user_id, flatten(sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01'))) AS clicks_arr, platform_arr, days_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks, platform_arr AS plat, days_arr AS date_diff FORMAT `Null` Peak memory usage (for query): 9.95 GiB. Elapsed: 3.095 sec. Processed 198.14 million rows, 6.34 GB (64.02 million rows/s., 2.05 GB/s.) SELECT user_id, CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks FROM ( WITH (SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr, (SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr, arrayFilter(x -> ((x.1) > 0), arrayZip(flatten(clicks_arr), platform_arr, days_arr)) AS result FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN result.1 AS clicks, result.2 AS plat, result.3 AS date_diff FORMAT `Null` Peak memory usage (for query): 9.93 GiB. Elapsed: 4.717 sec. Processed 198.14 million rows, 6.34 GB (42.00 million rows/s., 1.34 GB/s.) SELECT user_id, CAST(range % 3, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)') AS platform, toDate('2021-10-01') + intDiv(range, 3) AS day, clicks FROM ( WITH ( SELECT range(93) ) AS range_arr SELECT user_id, sumResample(0, 93, 1)(clicks, ((toDate(created_at) - toDate('2021-10-01')) * 3) + toUInt8(CAST(platform, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)'))) AS clicks_arr, range_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks, range_arr AS range FORMAT `Null` Peak memory usage (for query): 8.24 GiB. Elapsed: 4.838 sec. Processed 198.14 million rows, 6.36 GB (40.95 million rows/s., 1.31 GB/s.) SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id FORMAT `Null` Peak memory usage (for query): 5.19 GiB. 0 rows in set. Elapsed: 1.160 sec. Processed 198.14 million rows, 6.34 GB (170.87 million rows/s., 5.47 GB/s.) ``` ARRAY JOIN can be expensive <https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/> sumMap, \*Resample <https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/> ### Play with two-level Disable: ``` SET group_by_two_level_threshold = 0, group_by_two_level_threshold_bytes = 0; ``` From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly. It can improve query time. <https://github.com/ClickHouse/ClickHouse/pull/33439> ### GROUP BY in external memory Slow\! ### Use hash function for GROUP BY keys GROUP BY cityHash64(โ€˜xxxxโ€™) Can lead to incorrect results as hash functions is not 1 to 1 mapping. ### Performance bugs <https://github.com/ClickHouse/ClickHouse/issues/15005> <https://github.com/ClickHouse/ClickHouse/issues/29131> <https://github.com/ClickHouse/ClickHouse/issues/31120> <https://github.com/ClickHouse/ClickHouse/issues/35096> Fixed in 22.7 Last modified 2024.07.29: [Site cleanup, mostly minor changes (3e41a19)](https://github.com/Altinity/altinityknowledgebase/commit/3e41a19644b66d46db743db20321bd5b94b545df) [![Altinity.com](https://kb.altinity.com/images/general/altinity-logo_horizontal_blue_white.svg)](https://altinity.com/) - **PRODUCT** - [Altinity.Cloud](https://altinity.com/managed-clickhouse/) - [Support for ClickHouse](https://altinity.com/clickhouse-support/) - [Training for ClickHouse](https://altinity.com/clickhouse-training) - [Altinity.Cloud Pricing](https://altinity.com/clickhouse-pricing/) - [Altinity Plans and Features](https://altinity.com/plans-and-features-clickhouse/) ![SOC Certified](https://kb.altinity.com/images/general/soc.webp)![SOC2 TYPE II Certified](https://kb.altinity.com/images/general/soc2.webp) - **RESOURCES** - [Blog](https://altinity.com/blog) - [Documentation](https://docs.altinity.com/) - [Knowledge Base](https://kb.altinity.com/) - [Altinity Stable Builds](https://altinity.com/releases) - [Kubernetes Operator](https://hubs.la/Q02pLTV20) - [Altinity Open Source Projects](https://altinity.com/ecosystem/) - [Events](https://altinity.com/events/) - **COMPANY** - [About Altinity](https://altinity.com/about-us/) - [Press Releases](https://altinity.com/about-us/press-releases) - [Partners](https://altinity.com/partners/) - [Customer Stories](https://altinity.com/customer-stories/) - [Careers](https://altinity.com/careers/) - [Contact Us](https://altinity.com/contact/) Get the latest ClickHouse news straight to your inbox every month. [Sign Up](https://altinity.com/newsletter/) ยฉ 2026 Altinity Inc. Altinityยฎ, Altinity.Cloudยฎ, and Altinity Stableยฎ are registered trademarks of Altinity, Inc. ClickHouseยฎ is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kafka, Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners. All Rights Reserved [Privacy Policy](https://altinity.com/privacy-policy/) ![](https://px.ads.linkedin.com/collect/?pid=1601132&fmt=gif)
Readable Markdown
Tricks for GROUP BY memory usage optimization ## Tricks Testing dataset ``` CREATE TABLE sessions ( `app` LowCardinality(String), `user_id` String, `created_at` DateTime, `platform` LowCardinality(String), `clicks` UInt32, `session_id` UUID ) ENGINE = MergeTree PARTITION BY toYYYYMM(created_at) ORDER BY (app, user_id, session_id, created_at) INSERT INTO sessions WITH CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app, concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id, toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at, CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform, number % 17 AS clicks, generateUUIDv4() AS session_id SELECT app, user_id, created_at, platform, clicks, session_id FROM numbers_mt(1000000000) 0 rows in set. Elapsed: 46.078 sec. Processed 1.00 billion rows, 8.00 GB (21.70 million rows/s., 173.62 MB/s.) โ”Œโ”€databaseโ”€โ”ฌโ”€tableโ”€โ”€โ”€โ”€โ”ฌโ”€columnโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€typeโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€rowsโ”€โ”ฌโ”€compressed_bytesโ”€โ”ฌโ”€compressedโ”€โ”ฌโ”€uncompressedโ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ratioโ”€โ”ฌโ”€codecโ”€โ” โ”‚ default โ”‚ sessions โ”‚ session_id โ”‚ UUID โ”‚ 1000000000 โ”‚ 16065918103 โ”‚ 14.96 GiB โ”‚ 14.90 GiB โ”‚ 0.9958970223439835 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ user_id โ”‚ String โ”‚ 1000000000 โ”‚ 3056977462 โ”‚ 2.85 GiB โ”‚ 13.04 GiB โ”‚ 4.57968701896828 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ clicks โ”‚ UInt32 โ”‚ 1000000000 โ”‚ 1859359032 โ”‚ 1.73 GiB โ”‚ 3.73 GiB โ”‚ 2.151278979023993 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ created_at โ”‚ DateTime โ”‚ 1000000000 โ”‚ 1332089630 โ”‚ 1.24 GiB โ”‚ 3.73 GiB โ”‚ 3.0028009451586226 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ platform โ”‚ LowCardinality(String) โ”‚ 1000000000 โ”‚ 329702248 โ”‚ 314.43 MiB โ”‚ 956.63 MiB โ”‚ 3.042446801879252 โ”‚ โ”‚ โ”‚ default โ”‚ sessions โ”‚ app โ”‚ LowCardinality(String) โ”‚ 1000000000 โ”‚ 4825544 โ”‚ 4.60 MiB โ”‚ 956.63 MiB โ”‚ 207.87333386660654 โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` All queries and datasets are unique, so in different situations different hacks could work better or worse. ### PreFilter values before GROUP BY ``` SELECT user_id, sum(clicks) FROM sessions WHERE created_at > '2021-11-01 00:00:00' GROUP BY user_id HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat') FORMAT `Null` <Debug> MemoryTracker: Peak memory usage (for query): 18.36 GiB. SELECT user_id, sum(clicks) FROM sessions WHERE user_id IN ( SELECT user_id FROM sessions WHERE (platform = 'Rat') AND (clicks = 16) AND (created_at > '2021-11-01 00:00:00') -- So we will select user_id which could potentially match our HAVING clause in OUTER query. ) AND (created_at > '2021-11-01 00:00:00') GROUP BY user_id HAVING (argMax(clicks, created_at) = 16) AND (argMax(platform, created_at) = 'Rat') FORMAT `Null` <Debug> MemoryTracker: Peak memory usage (for query): 4.43 GiB. ``` ### Use Fixed-width data types instead of String For example, you have 2 strings which has values in special form like this โ€˜ABX 1412312312313โ€™ You can just remove the first 4 characters and convert the rest to UInt64 toUInt64(substr(โ€˜ABX 1412312312313โ€™,5)) And you packed 17 bytes in 8, more than 2 times the improvement of size\! ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` Aggregator: Aggregation method: serialized <Debug> MemoryTracker: Peak memory usage (for query): 28.19 GiB. Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.) WITH CAST(user_id, 'FixedString(14)') AS user_fx, CAST(platform, 'FixedString(4)') AS platform_fx SELECT user_fx, sum(clicks) FROM sessions GROUP BY user_fx, platform_fx FORMAT `Null` Aggregator: Aggregation method: keys256 MemoryTracker: Peak memory usage (for query): 22.24 GiB. Elapsed: 6.637 sec. Processed 1.00 billion rows, 27.00 GB (150.67 million rows/s., 4.07 GB/s.) WITH CAST(user_id, 'FixedString(14)') AS user_fx, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum SELECT user_fx, sum(clicks) FROM sessions GROUP BY user_fx, platform_enum FORMAT `Null` Aggregator: Aggregation method: keys128 MemoryTracker: Peak memory usage (for query): 14.14 GiB. Elapsed: 5.335 sec. Processed 1.00 billion rows, 27.00 GB (187.43 million rows/s., 5.06 GB/s.) WITH toUInt32OrZero(trim( LEADING '0' FROM substr(user_id,6))) AS user_int, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum SELECT user_int, sum(clicks) FROM sessions GROUP BY user_int, platform_enum FORMAT `Null` Aggregator: Aggregation method: keys64 MemoryTracker: Peak memory usage (for query): 10.14 GiB. Elapsed: 8.549 sec. Processed 1.00 billion rows, 27.00 GB (116.97 million rows/s., 3.16 GB/s.) WITH toUInt32('1' || substr(user_id,6)) AS user_int, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 0)') AS platform_enum SELECT user_int, sum(clicks) FROM sessions GROUP BY user_int, platform_enum FORMAT `Null` Aggregator: Aggregation method: keys64 Peak memory usage (for query): 10.14 GiB. Elapsed: 6.247 sec. Processed 1.00 billion rows, 27.00 GB (160.09 million rows/s., 4.32 GB/s.) ``` It can be especially useful when you tries to do GROUP BY lc\_column\_1, lc\_column\_2 and ClickHouseยฎ falls back to serialized algorithm. ### Two LowCardinality Columns in GROUP BY ``` SELECT app, sum(clicks) FROM sessions GROUP BY app FORMAT `Null` Aggregator: Aggregation method: low_cardinality_key_string MemoryTracker: Peak memory usage (for query): 43.81 MiB. Elapsed: 0.545 sec. Processed 1.00 billion rows, 5.00 GB (1.83 billion rows/s., 9.17 GB/s.) SELECT app, platform, sum(clicks) FROM sessions GROUP BY app, platform FORMAT `Null` Aggregator: Aggregation method: serialized -- Slowest method! MemoryTracker: Peak memory usage (for query): 222.86 MiB. Elapsed: 2.923 sec. Processed 1.00 billion rows, 6.00 GB (342.11 million rows/s., 2.05 GB/s.) SELECT CAST(app, 'FixedString(6)') AS app_fx, CAST(platform, 'FixedString(4)') AS platform_fx, sum(clicks) FROM sessions GROUP BY app_fx, platform_fx FORMAT `Null` Aggregator: Aggregation method: keys128 MemoryTracker: Peak memory usage (for query): 160.23 MiB. Elapsed: 1.632 sec. Processed 1.00 billion rows, 6.00 GB (612.63 million rows/s., 3.68 GB/s.) ``` ### Split your query in multiple smaller queries and execute them one BY one ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 28.19 GiB. Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.) SELECT user_id, sum(clicks) FROM sessions WHERE (cityHash64(user_id) % 4) = 0 GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 8.16 GiB. Elapsed: 2.910 sec. Processed 1.00 billion rows, 27.00 GB (343.64 million rows/s., 9.28 GB/s.) ``` ### Shard your data by one of common high cardinal GROUP BY key So on each shard you will have 1/N of all unique combination and this will result in smaller hash tables. Letโ€™s create 2 distributed tables with different distribution: rand() and by user\_id ``` CREATE TABLE sessions_distributed AS sessions ENGINE = Distributed('distr-groupby', default, sessions, rand()); INSERT INTO sessions_distributed WITH CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app, concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id, toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at, CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform, number % 17 AS clicks, generateUUIDv4() AS session_id SELECT app, user_id, created_at, platform, clicks, session_id FROM numbers_mt(1000000000); CREATE TABLE sessions_2 ON CLUSTER 'distr-groupby' ( `app` LowCardinality(String), `user_id` String, `created_at` DateTime, `platform` LowCardinality(String), `clicks` UInt32, `session_id` UUID ) ENGINE = MergeTree PARTITION BY toYYYYMM(created_at) ORDER BY (app, user_id, session_id, created_at); CREATE TABLE sessions_distributed_2 AS sessions ENGINE = Distributed('distr-groupby', default, sessions_2, cityHash64(user_id)); INSERT INTO sessions_distributed_2 WITH CAST(number % 4, 'Enum8(\'Orange\' = 0, \'Melon\' = 1, \'Red\' = 2, \'Blue\' = 3)') AS app, concat('UID: ', leftPad(toString(number % 20000000), 8, '0')) AS user_id, toDateTime('2021-10-01 10:11:12') + (number / 300) AS created_at, CAST((number + 14) % 3, 'Enum8(\'Bat\' = 0, \'Mice\' = 1, \'Rat\' = 2)') AS platform, number % 17 AS clicks, generateUUIDv4() AS session_id SELECT app, user_id, created_at, platform, clicks, session_id FROM numbers_mt(1000000000); ``` ``` SELECT app, platform, sum(clicks) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_distributed GROUP BY user_id ) GROUP BY app, platform; [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 12.02 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 12.05 GiB. MemoryTracker: Peak memory usage (for query): 12.20 GiB. 12 rows in set. Elapsed: 28.345 sec. Processed 1.00 billion rows, 32.00 GB (35.28 million rows/s., 1.13 GB/s.) SELECT app, platform, sum(clicks) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform; [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. MemoryTracker: Peak memory usage (for query): 5.61 GiB. 12 rows in set. Elapsed: 11.952 sec. Processed 1.00 billion rows, 32.00 GB (83.66 million rows/s., 2.68 GB/s.) SELECT app, platform, sum(clicks) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS optimize_distributed_group_by_sharding_key = 1 [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. MemoryTracker: Peak memory usage (for query): 5.61 GiB. 12 rows in set. Elapsed: 11.916 sec. Processed 1.00 billion rows, 32.00 GB (83.92 million rows/s., 2.69 GB/s.) SELECT app, platform, sum(clicks) FROM cluster('distr-groupby', view( SELECT app, platform, sum(clicks) as clicks FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS clicks FROM sessions_2 GROUP BY user_id ) GROUP BY app, platform )) GROUP BY app, platform; [chi-distr-groupby-distr-groupby-0-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-1-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. [chi-distr-groupby-distr-groupby-2-0-0] MemoryTracker: Current memory usage (for query): 5.05 GiB. MemoryTracker: Peak memory usage (for query): 5.55 GiB. 12 rows in set. Elapsed: 9.491 sec. Processed 1.00 billion rows, 32.00 GB (105.36 million rows/s., 3.37 GB/s.) ``` Query with bigger state: ``` SELECT app, platform, sum(last_click) as sum, max(max_clicks) as max, min(min_clicks) as min, max(max_time) as max_time, min(min_time) as min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed GROUP BY user_id ) GROUP BY app, platform; MemoryTracker: Peak memory usage (for query): 19.95 GiB. 12 rows in set. Elapsed: 34.339 sec. Processed 1.00 billion rows, 32.00 GB (29.12 million rows/s., 932.03 MB/s.) SELECT app, platform, sum(last_click) as sum, max(max_clicks) as max, min(min_clicks) as min, max(max_time) as max_time, min(min_time) as min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform; MemoryTracker: Peak memory usage (for query): 10.09 GiB. 12 rows in set. Elapsed: 13.220 sec. Processed 1.00 billion rows, 32.00 GB (75.64 million rows/s., 2.42 GB/s.) SELECT app, platform, sum(last_click) AS sum, max(max_clicks) AS max, min(min_clicks) AS min, max(max_time) AS max_time, min(min_time) AS min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS optimize_distributed_group_by_sharding_key = 1; MemoryTracker: Peak memory usage (for query): 10.09 GiB. 12 rows in set. Elapsed: 13.361 sec. Processed 1.00 billion rows, 32.00 GB (74.85 million rows/s., 2.40 GB/s.) SELECT app, platform, sum(last_click) AS sum, max(max_clicks) AS max, min(min_clicks) AS min, max(max_time) AS max_time, min(min_time) AS min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS distributed_group_by_no_merge=2; MemoryTracker: Peak memory usage (for query): 10.02 GiB. 12 rows in set. Elapsed: 9.789 sec. Processed 1.00 billion rows, 32.00 GB (102.15 million rows/s., 3.27 GB/s.) SELECT app, platform, sum(sum), max(max), min(min), max(max_time) AS max_time, min(min_time) AS min_time FROM cluster('distr-groupby', view( SELECT app, platform, sum(last_click) AS sum, max(max_clicks) AS max, min(min_clicks) AS min, max(max_time) AS max_time, min(min_time) AS min_time FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, argMax(clicks, created_at) AS last_click, max(clicks) AS max_clicks, min(clicks) AS min_clicks, max(created_at) AS max_time, min(created_at) AS min_time FROM sessions_2 GROUP BY user_id ) GROUP BY app, platform )) GROUP BY app, platform; MemoryTracker: Peak memory usage (for query): 10.09 GiB. 12 rows in set. Elapsed: 9.525 sec. Processed 1.00 billion rows, 32.00 GB (104.98 million rows/s., 3.36 GB/s.) ``` ``` SELECT app, platform, sum(sessions) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform MemoryTracker: Peak memory usage (for query): 14.57 GiB. 12 rows in set. Elapsed: 37.730 sec. Processed 1.00 billion rows, 44.01 GB (26.50 million rows/s., 1.17 GB/s.) SELECT app, platform, sum(sessions) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS optimize_distributed_group_by_sharding_key = 1; MemoryTracker: Peak memory usage (for query): 14.56 GiB. 12 rows in set. Elapsed: 37.792 sec. Processed 1.00 billion rows, 44.01 GB (26.46 million rows/s., 1.16 GB/s.) SELECT app, platform, sum(sessions) FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_distributed_2 GROUP BY user_id ) GROUP BY app, platform SETTINGS distributed_group_by_no_merge = 2; MemoryTracker: Peak memory usage (for query): 14.54 GiB. 12 rows in set. Elapsed: 17.762 sec. Processed 1.00 billion rows, 44.01 GB (56.30 million rows/s., 2.48 GB/s.) SELECT app, platform, sum(sessions) FROM cluster('distr-groupby', view( SELECT app, platform, sum(sessions) as sessions FROM ( SELECT argMax(app, created_at) AS app, argMax(platform, created_at) AS platform, user_id, uniq(session_id) as sessions FROM sessions_2 GROUP BY user_id ) GROUP BY app, platform)) GROUP BY app, platform MemoryTracker: Peak memory usage (for query): 14.55 GiB. 12 rows in set. Elapsed: 17.574 sec. Processed 1.00 billion rows, 44.01 GB (56.90 million rows/s., 2.50 GB/s.) ``` ### Reduce number of threads Because each thread uses an independent hash table, if you lower thread amount it will reduce number of hash tables as well and lower memory usage at the cost of slower query execution. ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 28.19 GiB. Elapsed: 7.375 sec. Processed 1.00 billion rows, 27.00 GB (135.60 million rows/s., 3.66 GB/s.) SET max_threads = 2; SELECT user_id, sum(clicks) FROM sessions GROUP BY user_id, platform FORMAT `Null` MemoryTracker: Peak memory usage (for query): 13.26 GiB. Elapsed: 62.014 sec. Processed 1.00 billion rows, 27.00 GB (16.13 million rows/s., 435.41 MB/s.) ``` ### UNION ALL ``` SELECT user_id, sum(clicks) FROM sessions GROUP BY app, user_id FORMAT `Null` MemoryTracker: Peak memory usage (for query): 24.19 GiB. Elapsed: 5.043 sec. Processed 1.00 billion rows, 27.00 GB (198.29 million rows/s., 5.35 GB/s.) SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Orange' GROUP BY user_id UNION ALL SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Red' GROUP BY user_id UNION ALL SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Melon' GROUP BY user_id UNION ALL SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Blue' GROUP BY user_id FORMAT Null MemoryTracker: Peak memory usage (for query): 7.99 GiB. Elapsed: 2.852 sec. Processed 1.00 billion rows, 27.01 GB (350.74 million rows/s., 9.47 GB/s.) ``` ### aggregation\_in\_order ``` SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Orange' GROUP BY user_id FORMAT `Null` MemoryTracker: Peak memory usage (for query): 969.33 MiB. Elapsed: 2.494 sec. Processed 250.09 million rows, 6.75 GB (100.27 million rows/s., 2.71 GB/s.) SET optimize_aggregation_in_order = 1; SELECT user_id, sum(clicks) FROM sessions WHERE app = 'Orange' GROUP BY app, user_id FORMAT `Null` AggregatingInOrderTransform: Aggregating in order MemoryTracker: Peak memory usage (for query): 169.24 MiB. Elapsed: 4.925 sec. Processed 250.09 million rows, 6.75 GB (50.78 million rows/s., 1.37 GB/s.) ``` ### Reduce dimensions from GROUP BY with functions like sumMap, \*Resample One ``` SELECT user_id, toDate(created_at) AS day, sum(clicks) FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue')) GROUP BY user_id, day FORMAT `Null` MemoryTracker: Peak memory usage (for query): 50.74 GiB. Elapsed: 22.671 sec. Processed 594.39 million rows, 18.46 GB (26.22 million rows/s., 814.41 MB/s.) SELECT user_id, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks FROM ( SELECT user_id, sumResample(0, 31, 1)(clicks, toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange', 'Red', 'Blue')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks, arrayEnumerate(clicks_arr) AS date_diff FORMAT `Null` Peak memory usage (for query): 8.24 GiB. Elapsed: 5.191 sec. Processed 594.39 million rows, 18.46 GB (114.50 million rows/s., 3.56 GB/s.) ``` Multiple ``` SELECT user_id, platform, toDate(created_at) AS day, sum(clicks) FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) AND user_id ='UID: 08525196' GROUP BY user_id, platform, day ORDER BY user_id, day, platform FORMAT `Null` Peak memory usage (for query): 29.50 GiB. Elapsed: 8.181 sec. Processed 198.14 million rows, 6.34 GB (24.22 million rows/s., 775.14 MB/s.) WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res SELECT user_id, CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks_res.1 AS clicks FROM ( SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks_arr_lvl_2, range(31) AS date_diff FORMAT `Null` Peak memory usage (for query): 9.92 GiB. Elapsed: 4.170 sec. Processed 198.14 million rows, 6.34 GB (47.52 million rows/s., 1.52 GB/s.) WITH arrayJoin(arrayZip(clicks_arr_lvl_2, range(3))) AS clicks_res SELECT user_id, CAST(clicks_res.2 + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks_res.1 AS clicks FROM ( SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks_arr_lvl_2, range(31) AS date_diff WHERE clicks > 0 FORMAT `Null` Peak memory usage (for query): 10.14 GiB. Elapsed: 9.533 sec. Processed 198.14 million rows, 6.34 GB (20.78 million rows/s., 665.20 MB/s.) SELECT user_id, CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks FROM ( WITH (SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr, (SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr SELECT user_id, flatten(sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01'))) AS clicks_arr, platform_arr, days_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks, platform_arr AS plat, days_arr AS date_diff FORMAT `Null` Peak memory usage (for query): 9.95 GiB. Elapsed: 3.095 sec. Processed 198.14 million rows, 6.34 GB (64.02 million rows/s., 2.05 GB/s.) SELECT user_id, CAST(plat + 1, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)') AS platform, (toDate('2021-10-01') + date_diff) - 1 AS day, clicks FROM ( WITH (SELECT flatten(arrayMap(x -> range(3) AS platforms, range(31) as days))) AS platform_arr, (SELECT flatten(arrayMap(x -> [x, x, x], range(31) as days))) AS days_arr SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr, arrayFilter(x -> ((x.1) > 0), arrayZip(flatten(clicks_arr), platform_arr, days_arr)) AS result FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN result.1 AS clicks, result.2 AS plat, result.3 AS date_diff FORMAT `Null` Peak memory usage (for query): 9.93 GiB. Elapsed: 4.717 sec. Processed 198.14 million rows, 6.34 GB (42.00 million rows/s., 1.34 GB/s.) SELECT user_id, CAST(range % 3, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)') AS platform, toDate('2021-10-01') + intDiv(range, 3) AS day, clicks FROM ( WITH ( SELECT range(93) ) AS range_arr SELECT user_id, sumResample(0, 93, 1)(clicks, ((toDate(created_at) - toDate('2021-10-01')) * 3) + toUInt8(CAST(platform, 'Enum8(\'Rat\' = 0, \'Mice\' = 1, \'Bat\' = 2)'))) AS clicks_arr, range_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id ) ARRAY JOIN clicks_arr AS clicks, range_arr AS range FORMAT `Null` Peak memory usage (for query): 8.24 GiB. Elapsed: 4.838 sec. Processed 198.14 million rows, 6.36 GB (40.95 million rows/s., 1.31 GB/s.) SELECT user_id, sumResampleResample(1, 4, 1, 0, 31, 1)(clicks, CAST(platform, 'Enum8(\'Rat\' = 1, \'Mice\' = 2, \'Bat\' = 3)'), toDate(created_at) - toDate('2021-10-01')) AS clicks_arr FROM sessions WHERE (created_at >= toDate('2021-10-01')) AND (created_at < toDate('2021-11-01')) AND (app IN ('Orange')) GROUP BY user_id FORMAT `Null` Peak memory usage (for query): 5.19 GiB. 0 rows in set. Elapsed: 1.160 sec. Processed 198.14 million rows, 6.34 GB (170.87 million rows/s., 5.47 GB/s.) ``` ARRAY JOIN can be expensive <https://kb.altinity.com/altinity-kb-functions/array-like-memory-usage/> sumMap, \*Resample <https://kb.altinity.com/altinity-kb-functions/resample-vs-if-vs-map-vs-subquery/> ### Play with two-level Disable: ``` SET group_by_two_level_threshold = 0, group_by_two_level_threshold_bytes = 0; ``` From 22.4 ClickHouse can predict, when it make sense to initialize aggregation with two-level from start, instead of rehashing on fly. It can improve query time. <https://github.com/ClickHouse/ClickHouse/pull/33439> ### GROUP BY in external memory Slow\! ### Use hash function for GROUP BY keys GROUP BY cityHash64(โ€˜xxxxโ€™) Can lead to incorrect results as hash functions is not 1 to 1 mapping. ### Performance bugs <https://github.com/ClickHouse/ClickHouse/issues/15005> <https://github.com/ClickHouse/ClickHouse/issues/29131> <https://github.com/ClickHouse/ClickHouse/issues/31120> <https://github.com/ClickHouse/ClickHouse/issues/35096> Fixed in 22.7
Shard195 (laksa)
Root Hash3533317439195346595
Unparsed URLcom,altinity!kb,/altinity-kb-queries-and-syntax/group-by/tricks/ s443