โน๏ธ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.1 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://blog.hellmar-becker.de/2025/01/12/clickhouse-data-cookbook-aggregating-latest-values/ |
| Last Crawled | 2026-04-10 03:05:49 (2 days ago) |
| First Indexed | 2025-01-13 11:13:43 (1 year ago) |
| HTTP Status Code | 200 |
| Meta Title | ClickHouse Data Cookbook: Aggregating Latest Values |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Letโs pick up one of my
earlier blogs
. In that blog, I took a very simple data set to show how Druid can aggregate data upon ingestion. Today, I will show how much easier this works with ClickHouse.
You can use any version of ClickHouse for this exercise. For instance, you can run ClickHouse
locally
on your laptop. Or check out the
installation docs
for other options.
In todayโs tutorial, you will
create a detail table for stock ticker data using schema inference
create an aggregate table to store the latest price value per ticker symbol and time interval
create a materialized view to automatically populate the aggregate whenever new rows are added to the detail table
insert some data and query the aggregate table
insert some more data and observe how the query result changes.
Creating the sample data set
First, create a text file with these data:
timestamp,symbol,price
2022-08-10 10:01:00,AAAA,25.90
2022-08-10 10:11:00,AAAA,26.90
2022-08-10 11:55:00,AAAA,28.10
Save this file as
ticker1.csv
.
Likewise, create a file
ticker2.csv
with these data:
timestamp,symbol,price
2022-08-10 10:50:00,AAAA,23.90
2022-08-10 11:20:00,AAAA,22.10
Setting up the schema
Create a table using the first file as a template. We use the file just to infer the table schema, we donโt want to insert any data just yet. ClickHouse allows you to use
CREATE AS SELECT
with the
EMPTY
clause to do just that. This will be a
MergeTree
table that stores the detail rows.
CREATE
TABLE
ticker_data
ENGINE
=
MergeTree
ORDER
BY
(
symbol
,
timestamp
)
EMPTY
AS
SELECT
*
FROM
file
(
'ticker1.csv'
)
SETTINGS
schema_inference_make_columns_nullable
=
0
Worth mentioning is the last line which makes the inferred columns non nullable. When inferring from a CSV file, by default ClickHouse makes all columns nullable. But columns that participate in the sort key cannot be nullable (this can be overridden but is probably not a good idea.) Also, since there are no
NULL
values in the data, the best way to manage this is to force all columns to be non-
NULL
.
Now define the aggregate table. This will be an
AggregatingMergeTree
table, where some of the aggregation magic happens inside the table engine:
CREATE
TABLE
ticker_data_agg
(
`timestampHour`
DateTime
,
`symbol`
String
,
`price`
AggregateFunction
(
argMax
,
Float64
,
Timestamp
)
)
ENGINE
=
AggregatingMergeTree
ORDER
BY
(
symbol
,
timestampHour
)
We will aggregate the data by hour - the timestamp will be truncated as we will see in a moment.
Data aggregation is defined in three steps:
In the aggregate table
definition
the
dimensions
go into the order key
the
measures
rather than plain values, are defined as
AggregateFunction
types, which store intermediate state information. The
AggregateFunction
constructor is fed information about the aggregate function to be used, and the data types it operates on. In our case, this is
argMax
, which returns the value of its first argument where the second argument has a maximum. The second argument will be the timestamp.
When
populating
the aggregate table
the
dimensions
go into the group by clause
measures
use
-State
aggregators, in our case it will be
argMaxState
.
Query
the aggregated measures with
-Merge
aggregator functions, here:
argMaxMerge
.
Now letโs define a materialized view that will aggregate the ticker data:
CREATE
MATERIALIZED
VIEW
ticker_data_mv
TO
ticker_data_agg
AS
SELECT
toStartOfHour
(
timestamp
)
AS
timestampHour
,
symbol
,
argMaxState
(
price
,
timestamp
)
AS
price
FROM
ticker_data
GROUP
BY
timestampHour
,
symbol
This will truncate the timestamp to the desired granularity (hour) and perform the aggregation.
Populating the tables
Insert the first set of data:
INSERT
INTO
ticker_data
SELECT
*
FROM
file
(
'ticker1.csv'
);
and query the aggregate at the configured time granularity:
:) SELECT timestampHour, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampHour, symbol ORDER BY timestampHour, symbol;
SELECT
timestampHour,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampHour,
symbol
ORDER BY
timestampHour ASC,
symbol ASC
Query id: b832dc41-a3db-4463-87a4-fe829bc120d0
โโโโโโโโtimestampHourโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ 26.9 โ
2. โ 2022-08-10 11:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.012 sec.
and at a coarser granularity:
:) SELECT toStartOfDay(timestampHour) AS timestampDay, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampDay, symbol ORDER BY timestampDay, symbol;
SELECT
toStartOfDay(timestampHour) AS timestampDay,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampDay,
symbol
ORDER BY
timestampDay ASC,
symbol ASC
Query id: bb33cc74-73fe-4d35-8009-136ca73b6579
โโโโโโโโโtimestampDayโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 00:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
1 row in set. Elapsed: 0.008 sec.
Adding more data
Insert the second batch
:)
INSERT
INTO
ticker_data
SELECT
*
FROM
file
(
'ticker2.csv'
);
If you query the data now with group by and merge, you get the updated rows
:) SELECT timestampHour, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampHour, symbol ORDER BY timestampHour, symbol;
SELECT
timestampHour,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampHour,
symbol
ORDER BY
timestampHour ASC,
symbol ASC
Query id: 2638d26a-ecdb-484f-9de7-3e8ca26e3724
โโโโโโโโtimestampHourโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ 23.9 โ
2. โ 2022-08-10 11:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.020 sec.
Note how the 11am entry has not changed because the added data had an earlier timestamp than the data already present in the table. Thus, late arriving data is automatically handled correctly.
Also, now there are actually four rows in the aggregate table (at least until a background merge happens). A
SELECT *
statement shows this. It also shows that the aggregated measures are actually stored in a binary format that is output as garbled characters.
:) SELECT * FROM ticker_data_agg;
SELECT *
FROM ticker_data_agg
Query id: 27981039-a429-4399-a98c-fd2620b4b1b8
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ fffff?7@8q?b โ
2. โ 2022-08-10 11:00:00 โ AAAA โ ?????6@@x?b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโ
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโ
3. โ 2022-08-10 10:00:00 โ AAAA โ fffff?:@h?b โ
4. โ 2022-08-10 11:00:00 โ AAAA โ ?????<@t??b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโ
4 rows in set. Elapsed: 0.010 sec.
You can (but probably shouldnโt) force a background merge like so:
:) OPTIMIZE TABLE ticker_data_agg;
OPTIMIZE TABLE ticker_data_agg
Query id: 294b0d27-d826-4f72-899f-06b9ff80f159
Ok.
0 rows in set. Elapsed: 0.015 sec.
:) SELECT * FROM ticker_data_agg;
SELECT *
FROM ticker_data_agg
Query id: 2f5c65ed-b3ef-4fa1-88cd-76ebdd194072
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ fffff?7@8q?b โ
2. โ 2022-08-10 11:00:00 โ AAAA โ ?????<@t??b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.006 sec.
Now, there are only two rows left.
Conclusion
ClickHouseโs
AggregatingMergeTree
table engine allows aggregating data on the fly. Data should be merged upon query, and is compacted asynchronously in the background.
Each standard aggregation function has modifiers that allow it to store intermediate aggregation state inside the aggregate table.
Using materialized views, you can populate a detail and aggregate table simultaneously.
โ
This image is taken from Page 377 of Praktisches Kochbuch fรผr die gewรถhnliche und feinere Kรผche
โ by
Medical Heritage Library, Inc.
is licensed under
CC BY-NC-SA 2.0
. |
| Markdown | 
[Hellmar Becker's Blog](https://blog.hellmar-becker.de/)
[About](https://blog.hellmar-becker.de/about/) [Archives](https://blog.hellmar-becker.de/archives/) [Blogroll](https://blog.hellmar-becker.de/blogroll/) [Publications](https://blog.hellmar-becker.de/publications/) [GitHub](https://github.com/hellmarbecker)
# ClickHouse Data Cookbook: Aggregating Latest Values
Jan 12, 2025 โข [blog](https://blog.hellmar-becker.de/categories/blog/), [clickhouse](https://blog.hellmar-becker.de/categories/clickhouse/), [sql](https://blog.hellmar-becker.de/categories/sql/), [tutorial](https://blog.hellmar-becker.de/categories/tutorial/)

Letโs pick up one of my [earlier blogs](https://blog.hellmar-becker.de/2022/08/10/timeseries-data-in-apache-druid-getting-the-latest-value-with-rolled-up-data/). In that blog, I took a very simple data set to show how Druid can aggregate data upon ingestion. Today, I will show how much easier this works with ClickHouse.
You can use any version of ClickHouse for this exercise. For instance, you can run ClickHouse [locally](https://clickhouse.com/docs/en/operations/utilities/clickhouse-local) on your laptop. Or check out the [installation docs](https://clickhouse.com/docs/en/install) for other options.
In todayโs tutorial, you will
- create a detail table for stock ticker data using schema inference
- create an aggregate table to store the latest price value per ticker symbol and time interval
- create a materialized view to automatically populate the aggregate whenever new rows are added to the detail table
- insert some data and query the aggregate table
- insert some more data and observe how the query result changes.
## Creating the sample data set
First, create a text file with these data:
```
timestamp,symbol,price
2022-08-10 10:01:00,AAAA,25.90
2022-08-10 10:11:00,AAAA,26.90
2022-08-10 11:55:00,AAAA,28.10
```
Save this file as `ticker1.csv`.
Likewise, create a file `ticker2.csv` with these data:
```
timestamp,symbol,price
2022-08-10 10:50:00,AAAA,23.90
2022-08-10 11:20:00,AAAA,22.10
```
## Setting up the schema
Create a table using the first file as a template. We use the file just to infer the table schema, we donโt want to insert any data just yet. ClickHouse allows you to use `CREATE AS SELECT` with the `EMPTY` clause to do just that. This will be a [MergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#mergetree) table that stores the detail rows.
```
CREATE TABLE ticker_data
ENGINE = MergeTree
ORDER BY (symbol, timestamp) EMPTY
AS SELECT *
FROM file('ticker1.csv')
SETTINGS schema_inference_make_columns_nullable = 0
```
Worth mentioning is the last line which makes the inferred columns non nullable. When inferring from a CSV file, by default ClickHouse makes all columns nullable. But columns that participate in the sort key cannot be nullable (this can be overridden but is probably not a good idea.) Also, since there are no *NULL* values in the data, the best way to manage this is to force all columns to be non-*NULL*.
Now define the aggregate table. This will be an [AggregatingMergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree#aggregatingmergetree) table, where some of the aggregation magic happens inside the table engine:
```
CREATE TABLE ticker_data_agg
(
`timestampHour` DateTime,
`symbol` String,
`price` AggregateFunction(argMax, Float64, Timestamp)
)
ENGINE = AggregatingMergeTree
ORDER BY (symbol, timestampHour)
```
We will aggregate the data by hour - the timestamp will be truncated as we will see in a moment.
Data aggregation is defined in three steps:
1. In the aggregate table **definition**
- the *dimensions* go into the order key
- the *measures* rather than plain values, are defined as `AggregateFunction` types, which store intermediate state information. The `AggregateFunction` constructor is fed information about the aggregate function to be used, and the data types it operates on. In our case, this is `argMax`, which returns the value of its first argument where the second argument has a maximum. The second argument will be the timestamp.
2. When **populating** the aggregate table
- the *dimensions* go into the group by clause
- *measures* use [\-State](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-state) aggregators, in our case it will be `argMaxState`.
3. **Query** the aggregated measures with [\-Merge](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-merge) aggregator functions, here: `argMaxMerge`.
Now letโs define a materialized view that will aggregate the ticker data:
```
CREATE MATERIALIZED VIEW ticker_data_mv TO ticker_data_agg
AS SELECT
toStartOfHour(timestamp) AS timestampHour,
symbol,
argMaxState(price, timestamp) AS price
FROM ticker_data
GROUP BY
timestampHour,
symbol
```
This will truncate the timestamp to the desired granularity (hour) and perform the aggregation.
## Populating the tables
Insert the first set of data:
```
INSERT INTO ticker_data SELECT * FROM file('ticker1.csv');
```
and query the aggregate at the configured time granularity:
```
:) SELECT timestampHour, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampHour, symbol ORDER BY timestampHour, symbol;
SELECT
timestampHour,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampHour,
symbol
ORDER BY
timestampHour ASC,
symbol ASC
Query id: b832dc41-a3db-4463-87a4-fe829bc120d0
โโโโโโโโtimestampHourโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ 26.9 โ
2. โ 2022-08-10 11:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.012 sec.
```
and at a coarser granularity:
```
:) SELECT toStartOfDay(timestampHour) AS timestampDay, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampDay, symbol ORDER BY timestampDay, symbol;
SELECT
toStartOfDay(timestampHour) AS timestampDay,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampDay,
symbol
ORDER BY
timestampDay ASC,
symbol ASC
Query id: bb33cc74-73fe-4d35-8009-136ca73b6579
โโโโโโโโโtimestampDayโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 00:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
1 row in set. Elapsed: 0.008 sec.
```
## Adding more data
Insert the second batch
```
:) INSERT INTO ticker_data SELECT * FROM file('ticker2.csv');
```
If you query the data now with group by and merge, you get the updated rows
```
:) SELECT timestampHour, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampHour, symbol ORDER BY timestampHour, symbol;
SELECT
timestampHour,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampHour,
symbol
ORDER BY
timestampHour ASC,
symbol ASC
Query id: 2638d26a-ecdb-484f-9de7-3e8ca26e3724
โโโโโโโโtimestampHourโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ 23.9 โ
2. โ 2022-08-10 11:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.020 sec.
```
Note how the 11am entry has not changed because the added data had an earlier timestamp than the data already present in the table. Thus, late arriving data is automatically handled correctly.
Also, now there are actually four rows in the aggregate table (at least until a background merge happens). A `SELECT *` statement shows this. It also shows that the aggregated measures are actually stored in a binary format that is output as garbled characters.
```
:) SELECT * FROM ticker_data_agg;
SELECT *
FROM ticker_data_agg
Query id: 27981039-a429-4399-a98c-fd2620b4b1b8
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ fffff?7@8q?b โ
2. โ 2022-08-10 11:00:00 โ AAAA โ ?????6@@x?b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโ
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโ
3. โ 2022-08-10 10:00:00 โ AAAA โ fffff?:@h?b โ
4. โ 2022-08-10 11:00:00 โ AAAA โ ?????<@t??b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโ
4 rows in set. Elapsed: 0.010 sec.
```
You can (but probably shouldnโt) force a background merge like so:
```
:) OPTIMIZE TABLE ticker_data_agg;
OPTIMIZE TABLE ticker_data_agg
Query id: 294b0d27-d826-4f72-899f-06b9ff80f159
Ok.
0 rows in set. Elapsed: 0.015 sec.
:) SELECT * FROM ticker_data_agg;
SELECT *
FROM ticker_data_agg
Query id: 2f5c65ed-b3ef-4fa1-88cd-76ebdd194072
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ fffff?7@8q?b โ
2. โ 2022-08-10 11:00:00 โ AAAA โ ?????<@t??b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.006 sec.
```
Now, there are only two rows left.
## Conclusion
- ClickHouseโs *AggregatingMergeTree* table engine allows aggregating data on the fly. Data should be merged upon query, and is compacted asynchronously in the background.
- Each standard aggregation function has modifiers that allow it to store intermediate aggregation state inside the aggregate table.
- Using materialized views, you can populate a detail and aggregate table simultaneously.
***
โ[This image is taken from Page 377 of Praktisches Kochbuch fรผr die gewรถhnliche und feinere Kรผche](https://www.flickr.com/photos/mhlimages/48051263811/)โ by [Medical Heritage Library, Inc.](https://www.flickr.com/photos/mhlimages/) is licensed under [CC BY-NC-SA 2.0 ](https://creativecommons.org/licenses/by-nc-sa/2.0/).
ยฉ - Powered by [Jekyll](https://jekyllrb.com/) & [whiteglass](https://github.com/yous/whiteglass) - Subscribe via [RSS](https://blog.hellmar-becker.de/feed.xml) |
| Readable Markdown | 
Letโs pick up one of my [earlier blogs](https://blog.hellmar-becker.de/2022/08/10/timeseries-data-in-apache-druid-getting-the-latest-value-with-rolled-up-data/). In that blog, I took a very simple data set to show how Druid can aggregate data upon ingestion. Today, I will show how much easier this works with ClickHouse.
You can use any version of ClickHouse for this exercise. For instance, you can run ClickHouse [locally](https://clickhouse.com/docs/en/operations/utilities/clickhouse-local) on your laptop. Or check out the [installation docs](https://clickhouse.com/docs/en/install) for other options.
In todayโs tutorial, you will
- create a detail table for stock ticker data using schema inference
- create an aggregate table to store the latest price value per ticker symbol and time interval
- create a materialized view to automatically populate the aggregate whenever new rows are added to the detail table
- insert some data and query the aggregate table
- insert some more data and observe how the query result changes.
## Creating the sample data set
First, create a text file with these data:
```
timestamp,symbol,price
2022-08-10 10:01:00,AAAA,25.90
2022-08-10 10:11:00,AAAA,26.90
2022-08-10 11:55:00,AAAA,28.10
```
Save this file as `ticker1.csv`.
Likewise, create a file `ticker2.csv` with these data:
```
timestamp,symbol,price
2022-08-10 10:50:00,AAAA,23.90
2022-08-10 11:20:00,AAAA,22.10
```
## Setting up the schema
Create a table using the first file as a template. We use the file just to infer the table schema, we donโt want to insert any data just yet. ClickHouse allows you to use `CREATE AS SELECT` with the `EMPTY` clause to do just that. This will be a [MergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#mergetree) table that stores the detail rows.
```
CREATE TABLE ticker_data
ENGINE = MergeTree
ORDER BY (symbol, timestamp) EMPTY
AS SELECT *
FROM file('ticker1.csv')
SETTINGS schema_inference_make_columns_nullable = 0
```
Worth mentioning is the last line which makes the inferred columns non nullable. When inferring from a CSV file, by default ClickHouse makes all columns nullable. But columns that participate in the sort key cannot be nullable (this can be overridden but is probably not a good idea.) Also, since there are no *NULL* values in the data, the best way to manage this is to force all columns to be non-*NULL*.
Now define the aggregate table. This will be an [AggregatingMergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree#aggregatingmergetree) table, where some of the aggregation magic happens inside the table engine:
```
CREATE TABLE ticker_data_agg
(
`timestampHour` DateTime,
`symbol` String,
`price` AggregateFunction(argMax, Float64, Timestamp)
)
ENGINE = AggregatingMergeTree
ORDER BY (symbol, timestampHour)
```
We will aggregate the data by hour - the timestamp will be truncated as we will see in a moment.
Data aggregation is defined in three steps:
1. In the aggregate table **definition**
- the *dimensions* go into the order key
- the *measures* rather than plain values, are defined as `AggregateFunction` types, which store intermediate state information. The `AggregateFunction` constructor is fed information about the aggregate function to be used, and the data types it operates on. In our case, this is `argMax`, which returns the value of its first argument where the second argument has a maximum. The second argument will be the timestamp.
2. When **populating** the aggregate table
- the *dimensions* go into the group by clause
- *measures* use [\-State](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-state) aggregators, in our case it will be `argMaxState`.
3. **Query** the aggregated measures with [\-Merge](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-merge) aggregator functions, here: `argMaxMerge`.
Now letโs define a materialized view that will aggregate the ticker data:
```
CREATE MATERIALIZED VIEW ticker_data_mv TO ticker_data_agg
AS SELECT
toStartOfHour(timestamp) AS timestampHour,
symbol,
argMaxState(price, timestamp) AS price
FROM ticker_data
GROUP BY
timestampHour,
symbol
```
This will truncate the timestamp to the desired granularity (hour) and perform the aggregation.
## Populating the tables
Insert the first set of data:
```
INSERT INTO ticker_data SELECT * FROM file('ticker1.csv');
```
and query the aggregate at the configured time granularity:
```
:) SELECT timestampHour, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampHour, symbol ORDER BY timestampHour, symbol;
SELECT
timestampHour,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampHour,
symbol
ORDER BY
timestampHour ASC,
symbol ASC
Query id: b832dc41-a3db-4463-87a4-fe829bc120d0
โโโโโโโโtimestampHourโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ 26.9 โ
2. โ 2022-08-10 11:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.012 sec.
```
and at a coarser granularity:
```
:) SELECT toStartOfDay(timestampHour) AS timestampDay, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampDay, symbol ORDER BY timestampDay, symbol;
SELECT
toStartOfDay(timestampHour) AS timestampDay,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampDay,
symbol
ORDER BY
timestampDay ASC,
symbol ASC
Query id: bb33cc74-73fe-4d35-8009-136ca73b6579
โโโโโโโโโtimestampDayโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 00:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
1 row in set. Elapsed: 0.008 sec.
```
## Adding more data
Insert the second batch
```
:) INSERT INTO ticker_data SELECT * FROM file('ticker2.csv');
```
If you query the data now with group by and merge, you get the updated rows
```
:) SELECT timestampHour, symbol, argMaxMerge(price) AS latestPrice FROM ticker_data_agg GROUP BY timestampHour, symbol ORDER BY timestampHour, symbol;
SELECT
timestampHour,
symbol,
argMaxMerge(price) AS latestPrice
FROM ticker_data_agg
GROUP BY
timestampHour,
symbol
ORDER BY
timestampHour ASC,
symbol ASC
Query id: 2638d26a-ecdb-484f-9de7-3e8ca26e3724
โโโโโโโโtimestampHourโโฌโsymbolโโฌโlatestPriceโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ 23.9 โ
2. โ 2022-08-10 11:00:00 โ AAAA โ 28.1 โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.020 sec.
```
Note how the 11am entry has not changed because the added data had an earlier timestamp than the data already present in the table. Thus, late arriving data is automatically handled correctly.
Also, now there are actually four rows in the aggregate table (at least until a background merge happens). A `SELECT *` statement shows this. It also shows that the aggregated measures are actually stored in a binary format that is output as garbled characters.
```
:) SELECT * FROM ticker_data_agg;
SELECT *
FROM ticker_data_agg
Query id: 27981039-a429-4399-a98c-fd2620b4b1b8
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ fffff?7@8q?b โ
2. โ 2022-08-10 11:00:00 โ AAAA โ ?????6@@x?b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโ
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโ
3. โ 2022-08-10 10:00:00 โ AAAA โ fffff?:@h?b โ
4. โ 2022-08-10 11:00:00 โ AAAA โ ?????<@t??b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโ
4 rows in set. Elapsed: 0.010 sec.
```
You can (but probably shouldnโt) force a background merge like so:
```
:) OPTIMIZE TABLE ticker_data_agg;
OPTIMIZE TABLE ticker_data_agg
Query id: 294b0d27-d826-4f72-899f-06b9ff80f159
Ok.
0 rows in set. Elapsed: 0.015 sec.
:) SELECT * FROM ticker_data_agg;
SELECT *
FROM ticker_data_agg
Query id: 2f5c65ed-b3ef-4fa1-88cd-76ebdd194072
โโโโโโโโtimestampHourโโฌโsymbolโโฌโpriceโโโโโโโ
1. โ 2022-08-10 10:00:00 โ AAAA โ fffff?7@8q?b โ
2. โ 2022-08-10 11:00:00 โ AAAA โ ?????<@t??b โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโ
2 rows in set. Elapsed: 0.006 sec.
```
Now, there are only two rows left.
## Conclusion
- ClickHouseโs *AggregatingMergeTree* table engine allows aggregating data on the fly. Data should be merged upon query, and is compacted asynchronously in the background.
- Each standard aggregation function has modifiers that allow it to store intermediate aggregation state inside the aggregate table.
- Using materialized views, you can populate a detail and aggregate table simultaneously.
***
โ[This image is taken from Page 377 of Praktisches Kochbuch fรผr die gewรถhnliche und feinere Kรผche](https://www.flickr.com/photos/mhlimages/48051263811/)โ by [Medical Heritage Library, Inc.](https://www.flickr.com/photos/mhlimages/) is licensed under [CC BY-NC-SA 2.0 ](https://creativecommons.org/licenses/by-nc-sa/2.0/). |
| Shard | 198 (laksa) |
| Root Hash | 1995484448304381198 |
| Unparsed URL | de,hellmar-becker!blog,/2025/01/12/clickhouse-data-cookbook-aggregating-latest-values/ s443 |