βΉοΈ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.1 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse |
| Last Crawled | 2026-04-08 02:51:20 (2 days ago) |
| First Indexed | 2023-01-11 00:06:40 (3 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Working with Time Series Data in ClickHouse |
| Meta Description | Discover the power of ClickHouse for storing & querying time series data through an array of functions & SQL techniques, allowing you to unleash it's potential. |
| Meta Canonical | null |
| Boilerpipe Text | Many datasets are collected over time to analyze and discover meaningful trends. Each data point usually has a time assigned when we collect logs or business events. When exploring our data during an analysis stage, we often slice or group by different periods to understand how our data changes over time. Any data that changes over time in any way is
time-series data
. ClickHouse has powerful tools to store and process time-series data efficiently and can be used for both simple solutions and data discovery, as well as for powering real-time analytical applications at the Petabyte scale.
This blog post provides tips and tricks for working with
time series data
based on everyday tasks that we see our users needing to perform. We cover querying and common data type problems, such as handling gauges, and explore how performance can be improved as we scale.
All examples in this post can be reproduced in our
sql.clickhouse.com
environment (see the
blogs
database). Alternatively, if you want to dive deeper into this dataset,
ClickHouse Cloud
is a great starting point - spin up a cluster using a free trial, load the data, let us deal with the infrastructure, and get querying!
Date and time types available in ClickHouse
ClickHouse has several date and time types. Depending on your use case, different types can be applied. Using the
Date
type for dates should be sufficient in most cases. This type only requires 2 bytes to store a date but limits the range to
[1970-01-01, 2149-06-06]
. The
DateTime
allows storing dates and times up to the year 2106. For cases where more precision is required, the
DateTime64
can be used. This allows storing time with up to nanoseconds precision:
CREATE TABLE dates
(
`date` Date,
`datetime` DateTime,
`precise_datetime` DateTime64(3),
`very_precise_datetime` DateTime64(9)
)
ENGINE = MergeTree
ORDER BY tuple()
We can use the
now()
function to return the current time and
now64()
to get it in a specified precision via the first argument.
INSERT INTO dates SELECT NOW(), NOW(), NOW64(3), NOW64(9);
This will populate our columns with time accordingly to the column type:
SELECT * FROM dates
Row 1:
ββββββ
date: 2022-12-27
datetime: 2022-12-27 12:51:15
precise_datetime: 2022-12-27 12:51:15.329
very_precise_datetime: 2022-12-27 12:51:15.329098089
Practical cases require having timezones stored as well in many cases. ClickHouse letβs us set timezone as a last argument to the
DateTime
or
DateTime64
types:
CREATE TABLE dtz
(
`id` Int8,
`t` DateTime('Europe/Berlin')
)
ENGINE = MergeTree
ORDER BY tuple()
Having defined a timezone in our DDL, we can now insert times using different timezones:
INSERT INTO dtz SELECT 1, toDateTime('2022-12-12 12:13:14', 'America/New_York')
INSERT INTO dtz SELECT 2, toDateTime('2022-12-12 12:13:14')
SELECT * FROM dtz
ββidββ¬βββββββββββββββββββtββ
β 1 β 2022-12-12 18:13:14 β
β 2 β 2022-12-12 13:13:14 β
ββββββ΄ββββββββββββββββββββββ
Note how we have inserted time in
America/New_York
format, and it was automatically converted to
Europe/Berlin
at query time. When no time zone is specified, the server's local time zone is used.
Weβre going to explore ClickHouse time-series querying capabilities with the
Wikistat
(Wikipedia pageviews data) dataset:
CREATE TABLE wikistat
(
`time` DateTime,
`project` String,
`subproject` String,
`path` String,
`hits` UInt64
)
ENGINE = MergeTree
ORDER BY (time)
Letβs populate this table with 1b records:
INSERT INTO wikistat SELECT *
FROM s3('https://ClickHouse-public-datasets.s3.amazonaws.com/wikistat/partitioned/wikistat*.native.zst') LIMIT 1e9
0 rows in set. Elapsed: 421.868 sec. Processed 2.00 billion rows, 100.89 GB (4.74 million rows/s., 239.15 MB/s.)
The most popular requirement is to aggregate data based on periods, e.g. get the total amount of hits for each day:
SELECT
sum(hits) AS h,
toDate(time) AS d
FROM wikistat
GROUP BY d
ORDER BY d ASC
LIMIT 5
βββββββββhββ¬ββββββββββdββ
β 31045470 β 2015-05-01 β
β 30309226 β 2015-05-02 β
β 33704223 β 2015-05-03 β
β 34700248 β 2015-05-04 β
β 34634353 β 2015-05-05 β
ββββββββββββ΄βββββββββββββ
5 rows in set. Elapsed: 0.264 sec. Processed 1.00 billion rows, 12.00 GB (3.78 billion rows/s., 45.40 GB/s.)
β
Weβve used
toDate()
function here, which converts the specified time to a date type. Alternatively, we can batch by an hour and filter on the specific date:
SELECT
sum(hits) AS v,
toStartOfHour(time) AS h
FROM wikistat
WHERE date(time) = '2015-05-01'
GROUP BY h
ORDER BY h ASC
LIMIT 5
ββββββββvββ¬βββββββββββββββββββhββ
β 1199168 β 2015-05-01 01:00:00 β
β 1207276 β 2015-05-01 02:00:00 β
β 1189451 β 2015-05-01 03:00:00 β
β 1119078 β 2015-05-01 04:00:00 β
β 1037526 β 2015-05-01 05:00:00 β
βββββββββββ΄ββββββββββββββββββββββ
5 rows in set. Elapsed: 0.013 sec. Processed 7.72 million rows, 92.54 MB (593.64 million rows/s., 7.12 GB/s.)
β
The
toStartOfHour()
function used here converts the given time to the start of the hour. ClickHouse has
batching functions
for generating periods that cover almost all imaginable cases, allowing you to group by year, month, day, hour, or even arbitrary intervals, e.g.,
5 minutes
, easily.
We can also use the
toStartOfInterval()
function to group by custom intervals. Letβs say we want to group by 4-hour intervals:
SELECT
sum(hits) AS v,
toStartOfInterval(time, INTERVAL 4 HOUR) AS h
FROM wikistat
WHERE date(time) = '2015-05-01'
GROUP BY h
ORDER BY h ASC
LIMIT 6
ββββββββvββ¬βββββββββββββββββββhββ
β 3595895 β 2015-05-01 00:00:00 β
β 4161080 β 2015-05-01 04:00:00 β
β 4602523 β 2015-05-01 08:00:00 β
β 6072107 β 2015-05-01 12:00:00 β
β 6604783 β 2015-05-01 16:00:00 β
β 6009082 β 2015-05-01 20:00:00 β
βββββββββββ΄ββββββββββββββββββββββ
6 rows in set. Elapsed: 0.020 sec. Processed 7.72 million rows, 92.54 MB (386.78 million rows/s., 4.64 GB/s.)
β
With the
toStartOfInterval()
function, we use
INTERVAL
clause to set the required batching period:
In a lot of cases we deal with sparse data with some absent intervals. This results in empty buckets. Letβs take the following example where we group data by 1-hour intervals. This will out the following stats with some hours missing values:
SELECT
toStartOfHour(time) AS h,
sum(hits)
FROM wikistat
WHERE (project = 'it') AND (subproject = 'm') AND (date(time) = '2015-06-12')
GROUP BY h
ORDER BY h ASC
ββββββββββββββββββββhββ¬βsum(hits)ββ
β 2015-06-12 00:00:00 β 16246 β
β 2015-06-12 01:00:00 β 7900 β
β 2015-06-12 02:00:00 β 4517 β
β 2015-06-12 03:00:00 β 2982 β
β 2015-06-12 04:00:00 β 2748 β
β 2015-06-12 05:00:00 β 4581 β
β 2015-06-12 06:00:00 β 8908 β
β 2015-06-12 07:00:00 β 13514 β
β 2015-06-12 08:00:00 β 18327 β
β 2015-06-12 09:00:00 β 22541 β
β 2015-06-12 10:00:00 β 25366 β
β 2015-06-12 11:00:00 β 25383 β
β 2015-06-12 12:00:00 β 29074 β <- missing values
β 2015-06-12 23:00:00 β 27199 β
βββββββββββββββββββββββ΄ββββββββββββ
14 rows in set. Elapsed: 0.029 sec. Processed 6.98 million rows, 225.76 MB (237.19 million rows/s., 7.67 GB/s.)
β
ClickHouse provides the
WITH FILL
modifier to address this. This will fill out all the empty hours with zeros, so we can better understand distribution over time:
SELECT
toStartOfHour(time) AS h,
sum(hits)
FROM wikistat
WHERE (project = 'it') AND (subproject = 'm') AND (date(time) = '2015-06-12')
GROUP BY h
ORDER BY h ASC WITH FILL STEP toIntervalHour(1)
ββββββββββββββββββββhββ¬βsum(hits)ββ
β 2015-06-12 00:00:00 β 16246 β
β 2015-06-12 01:00:00 β 7900 β
β 2015-06-12 02:00:00 β 4517 β
β 2015-06-12 03:00:00 β 2982 β
β 2015-06-12 04:00:00 β 2748 β
β 2015-06-12 05:00:00 β 4581 β
β 2015-06-12 06:00:00 β 8908 β
β 2015-06-12 07:00:00 β 13514 β
β 2015-06-12 08:00:00 β 18327 β
β 2015-06-12 09:00:00 β 22541 β
β 2015-06-12 10:00:00 β 25366 β
β 2015-06-12 11:00:00 β 25383 β
β 2015-06-12 12:00:00 β 29074 β
β 2015-06-12 13:00:00 β 0 β
β 2015-06-12 14:00:00 β 0 β
β 2015-06-12 15:00:00 β 0 β
β 2015-06-12 16:00:00 β 0 β
β 2015-06-12 17:00:00 β 0 β
β 2015-06-12 18:00:00 β 0 β
β 2015-06-12 19:00:00 β 0 β
β 2015-06-12 20:00:00 β 0 β
β 2015-06-12 21:00:00 β 0 β
β 2015-06-12 22:00:00 β 0 β
β 2015-06-12 23:00:00 β 27199 β
βββββββββββββββββββββββ΄ββββββββββββ
24 rows in set. Elapsed: 0.039 sec. Processed 6.98 million rows, 225.76 MB (180.92 million rows/s., 5.85 GB/s.)
β
Sometimes, we donβt want to deal with the start of intervals (like the start of the day or an hour) but window intervals. Letβs say we want to understand the total hits for a window, not based on days but on a 24-hour period offset from 6 pm. Weβve used
date_diff()
function to calculate the difference between a basepoint time and each recordβs time. In this case, the
d
column will represent the difference in days (e.g., 1 day ago, 2 days ago, etc.):
SELECT
sum(hits),
dateDiff('day', toDateTime('2015-05-01 18:00:00'), time) AS d
FROM wikistat
GROUP BY d
ORDER BY d ASC
LIMIT 5
ββsum(hits)ββ¬βdββ
β 31045470 β 0 β
β 30309226 β 1 β
β 33704223 β 2 β
β 34700248 β 3 β
β 34634353 β 4 β
βββββββββββββ΄ββββ
5 rows in set. Elapsed: 0.283 sec. Processed 1.00 billion rows, 12.00 GB (3.54 billion rows/s., 42.46 GB/s.)
β
ClickHouse provides the
bar()
function to build quick visuals and help with the analysis of data. This will quickly visualize the most and least popular hours in terms of page views:
SELECT
toHour(time) AS h,
sum(hits) AS t,
bar(t, 0, max(t) OVER (), 50) AS bar
FROM wikistat
GROUP BY h
ORDER BY h ASC
βββhββ¬βββββββββtββ¬βbarβββββββββββββββββββββββββββββββββββββββββββββββββ
β 0 β 146208847 β βββββββββββββββββββββββββββββββββββββββ β
β 1 β 143713140 β ββββββββββββββββββββββββββββββββββββββ β
β 2 β 144977675 β βββββββββββββββββββββββββββββββββββββββ β
β 3 β 145089174 β βββββββββββββββββββββββββββββββββββββββ β
β 4 β 139602368 β βββββββββββββββββββββββββββββββββββββ β
β 5 β 130795734 β βββββββββββββββββββββββββββββββββββ β
β 6 β 126456113 β ββββββββββββββββββββββββββββββββββ β
β 7 β 127328312 β ββββββββββββββββββββββββββββββββββ β
β 8 β 131772449 β βββββββββββββββββββββββββββββββββββ β
β 9 β 137695533 β βββββββββββββββββββββββββββββββββββββ β
β 10 β 143381876 β ββββββββββββββββββββββββββββββββββββββ β
β 11 β 146690963 β βββββββββββββββββββββββββββββββββββββββ β
β 12 β 155662847 β ββββββββββββββββββββββββββββββββββββββββββ β
β 13 β 169130249 β βββββββββββββββββββββββββββββββββββββββββββββ β
β 14 β 182213956 β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β 15 β 188534642 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β 16 β 189214224 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β 17 β 186824967 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β 18 β 185885433 β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β 19 β 186112653 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β 20 β 187530882 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β 21 β 185485979 β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β 22 β 175522556 β βββββββββββββββββββββββββββββββββββββββββββββββ β
β 23 β 157537595 β ββββββββββββββββββββββββββββββββββββββββββ β
ββββββ΄ββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββ
24 rows in set. Elapsed: 0.264 sec. Processed 1.00 billion rows, 12.00 GB (3.79 billion rows/s., 45.53 GB/s.)
β
Note how weβve used a window max() to compute the max hits per hour, passing this to the
bar()
function for visualization.
Counters and Gauge metrics
There are two basic types of metrics we encounter when working with time series:
Counters are used to count the total number of tracked events sliced by attributes and grouped by a time frame. A popular example here is tracking website visitors.
Gauges are used to set a metric value that tends to change over time. A good example here is tracking CPU load.
Both metric types are easy to work with in ClickHouse and donβt require any additional configuration. Counters can be easily queried using
count()
or
sum()
functions, depending on the storage policy. To efficiently query for gauges, the
any()
aggregate function can be used together with
INTERPOLATE
modifier to fill any missing data points:
CREATE TABLE metrics
( `time` DateTime, `name` String, `value` UInt32 )
ENGINE = MergeTree ORDER BY tuple();
INSERT INTO metrics VALUES
('2022-12-28 06:32:16', 'cpu', 7), ('2022-12-28 14:31:22', 'cpu', 50), ('2022-12-28 14:30:30', 'cpu', 25), ('2022-12-28 14:25:36', 'cpu', 10), ('2022-12-28 11:32:08', 'cpu', 5), ('2022-12-28 10:32:12', 'cpu', 5);
SELECT
toStartOfHour(time) AS h,
any(value) AS v
FROM metrics
GROUP BY h
ORDER BY h ASC WITH FILL STEP toIntervalHour(1)
INTERPOLATE ( v AS v )
ββββββββββββββββββββhββ¬ββvββ
β 2022-12-28 06:00:00 β 7 β
β 2022-12-28 07:00:00 β 7 β <- filled
β 2022-12-28 08:00:00 β 7 β <- filled
β 2022-12-28 09:00:00 β 7 β <- filled
β 2022-12-28 10:00:00 β 5 β
β 2022-12-28 11:00:00 β 5 β <- filled
β 2022-12-28 12:00:00 β 5 β <- filled
β 2022-12-28 13:00:00 β 5 β <- filled
β 2022-12-28 14:00:00 β 50 β
βββββββββββββββββββββββ΄βββββ
In this case, highlighted values were automatically filled by ClickHouse, to follow the gauge nature of the metric within a continuous time range.
A popular use case for time series data is to build histograms based on tracked events. Suppose we wanted to understand the distribution of a number of pages based on their total hits for a specific date. We can use the
histogram()
function to automatically generate an adaptive histogram based on the number of bins and then use
arrayJoin()
and
bar()
to visualize it:
WITH histogram(10)(hits) AS h
SELECT
round(arrayJoin(h).1) AS l,
round(arrayJoin(h).2) AS u,
arrayJoin(h).3 AS w,
bar(w, 0, max(w) OVER (), 20) AS b
FROM
(
SELECT
path,
sum(hits) AS hits
FROM wikistat
WHERE date(time) = '2015-06-15'
GROUP BY path
HAVING hits > 10000.
)
ββββββββlββ¬βββββββuββ¬ββββββwββ¬βbβββββββββββββββββββββ
β 10034 β 27769 β 84.375 β ββββββββββββββββββββ β
β 27769 β 54281 β 19.75 β βββββ β
β 54281 β 79020 β 3.875 β β β
β 79020 β 96858 β 2.75 β β β
β 96858 β 117182 β 1.25 β β β
β 117182 β 173244 β 1 β β β
β 173244 β 232806 β 1.125 β β β
β 232806 β 405693 β 1.75 β β β
β 405693 β 1126826 β 1.125 β β β
β 1126826 β 1691188 β 1 β β β
βββββββββββ΄ββββββββββ΄βββββββββ΄βββββββββββββββββββββββ
10 rows in set. Elapsed: 0.134 sec. Processed 6.64 million rows, 268.25 MB (49.48 million rows/s., 2.00 GB/s.)
β
Weβve filtered only pages with more than 10k views. In the result set,
l
and
r
are the left and right boundaries of the bin, and
w
is a bin width (count of items in this bin).
Sometimes we want to understand how metrics change over time by calculating the difference between consecutive values. Letβs compute daily hits for a given page (
path
column) and the change in this value from the previous day:
SELECT
toDate(time) AS d,
sum(hits) AS h,
lagInFrame(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS p,
h - p AS trend
FROM wikistat
WHERE path = 'Ana_Sayfa'
GROUP BY d
ORDER BY d ASC
LIMIT 15
βββββββββββdββ¬ββββββhββ¬ββββββpββ¬ββtrendββ
β 2015-05-01 β 214612 β 0 β 214612 β
β 2015-05-02 β 211546 β 214612 β -3066 β
β 2015-05-03 β 221412 β 211546 β 9866 β
β 2015-05-04 β 219940 β 221412 β -1472 β
β 2015-05-05 β 211548 β 219940 β -8392 β
β 2015-05-06 β 212358 β 211548 β 810 β
β 2015-05-07 β 208150 β 212358 β -4208 β
β 2015-05-08 β 208871 β 208150 β 721 β
β 2015-05-09 β 210753 β 208871 β 1882 β
β 2015-05-10 β 212918 β 210753 β 2165 β
β 2015-05-11 β 211884 β 212918 β -1034 β
β 2015-05-12 β 212314 β 211884 β 430 β
β 2015-05-13 β 211192 β 212314 β -1122 β
β 2015-05-14 β 206172 β 211192 β -5020 β
β 2015-05-15 β 195832 β 206172 β -10340 β
ββββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββ
15 rows in set. Elapsed: 0.550 sec. Processed 1.00 billion rows, 28.62 GB (1.82 billion rows/s., 52.00 GB/s.)
β
Weβve used
lagInFrame() window
function to get the previous
hits
value, and then used this to calculate the difference as a
trend
column.
Following the previous example, sometimes we want to do the opposite - get a cumulative sum of certain metrics over time. This is usually used for counters to visualize cumulative growth and can be easily implemented using window functions:
SELECT
toDate(time) AS d,
sum(hits) AS h,
sum(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS c,
bar(c, 0, 3200000, 25) AS b
FROM wikistat
WHERE path = 'Ana_Sayfa'
GROUP BY d
ORDER BY d ASC
LIMIT 15
βββββββββββdββ¬ββββββhββ¬βββββββcββ¬βbββββββββββββββββββββββββββ
β 2015-05-01 β 214612 β 214612 β ββ β
β 2015-05-02 β 211546 β 426158 β ββββ β
β 2015-05-03 β 221412 β 647570 β βββββ β
β 2015-05-04 β 219940 β 867510 β βββββββ β
β 2015-05-05 β 211548 β 1079058 β βββββββββ β
β 2015-05-06 β 212358 β 1291416 β ββββββββββ β
β 2015-05-07 β 208150 β 1499566 β ββββββββββββ β
β 2015-05-08 β 208871 β 1708437 β ββββββββββββββ β
β 2015-05-09 β 210753 β 1919190 β βββββββββββββββ β
β 2015-05-10 β 212918 β 2132108 β βββββββββββββββββ β
β 2015-05-11 β 211884 β 2343992 β βββββββββββββββββββ β
β 2015-05-12 β 212314 β 2556306 β ββββββββββββββββββββ β
β 2015-05-13 β 211192 β 2767498 β ββββββββββββββββββββββ β
β 2015-05-14 β 206172 β 2973670 β ββββββββββββββββββββββββ β
β 2015-05-15 β 195832 β 3169502 β βββββββββββββββββββββββββ β
ββββββββββββββ΄βββββββββ΄ββββββββββ΄ββββββββββββββββββββββββββββ
15 rows in set. Elapsed: 0.557 sec. Processed 1.00 billion rows, 28.62 GB (1.80 billion rows/s., 51.40 GB/s.)
β
Weβve built cumulative daily hits sum and visualized growth for a given page within a 15-day period.
Calculating metric rates (speed per time unit) is also popular when working with time series. Suppose we want to get a certain page hit rate per second for a given date grouped by hours:
SELECT
toStartOfHour(time) AS t,
sum(hits) AS h,
round(h / (60 * 60), 2) AS rate,
bar(rate * 10, 0, max(rate * 10) OVER (), 25) AS b
FROM wikistat
WHERE path = 'Ana_Sayfa'
GROUP BY t
ORDER BY t ASC
LIMIT 23
ββββββββββββββββββββtββ¬βββββhββ¬βrateββ¬βbββββββββββββββββββββββββ
β 2015-05-01 01:00:00 β 6749 β 1.87 β βββββββββββββ β
β 2015-05-01 02:00:00 β 6054 β 1.68 β ββββββββββββ β
β 2015-05-01 03:00:00 β 5823 β 1.62 β ββββββββββββ β
β 2015-05-01 04:00:00 β 5908 β 1.64 β ββββββββββββ β
β 2015-05-01 05:00:00 β 6131 β 1.7 β ββββββββββββ β
β 2015-05-01 06:00:00 β 7067 β 1.96 β ββββββββββββββ β
β 2015-05-01 07:00:00 β 8169 β 2.27 β ββββββββββββββββ β
β 2015-05-01 08:00:00 β 9526 β 2.65 β βββββββββββββββββββ β
β 2015-05-01 09:00:00 β 10474 β 2.91 β βββββββββββββββββββββ β
β 2015-05-01 10:00:00 β 10389 β 2.89 β ββββββββββββββββββββ β
β 2015-05-01 11:00:00 β 9830 β 2.73 β βββββββββββββββββββ β
β 2015-05-01 12:00:00 β 10712 β 2.98 β βββββββββββββββββββββ β
β 2015-05-01 13:00:00 β 10301 β 2.86 β ββββββββββββββββββββ β
β 2015-05-01 14:00:00 β 10181 β 2.83 β ββββββββββββββββββββ β
β 2015-05-01 15:00:00 β 10324 β 2.87 β ββββββββββββββββββββ β
β 2015-05-01 16:00:00 β 10497 β 2.92 β βββββββββββββββββββββ β
β 2015-05-01 17:00:00 β 10676 β 2.97 β βββββββββββββββββββββ β
β 2015-05-01 18:00:00 β 11121 β 3.09 β ββββββββββββββββββββββ β
β 2015-05-01 19:00:00 β 11277 β 3.13 β ββββββββββββββββββββββ β
β 2015-05-01 20:00:00 β 11553 β 3.21 β βββββββββββββββββββββββ β
β 2015-05-01 21:00:00 β 11637 β 3.23 β βββββββββββββββββββββββ β
β 2015-05-01 22:00:00 β 11298 β 3.14 β ββββββββββββββββββββββ β
β 2015-05-01 23:00:00 β 8915 β 2.48 β ββββββββββββββββββ β
βββββββββββββββββββββββ΄ββββββββ΄βββββββ΄ββββββββββββββββββββββββββ
23 rows in set. Elapsed: 0.572 sec. Processed 1.00 billion rows, 28.62 GB (1.75 billion rows/s., 50.06 GB/s.)
β
The general approach to optimizing storage efficiency is using
optimal data types
. Letβs take the
project
and
subprojects
columns. These columns are of type String, but have a relatively small amount of unique values:
SELECT
uniq(project),
uniq(subproject)
FROM wikistat
ββuniq(project)ββ¬βuniq(subproject)ββ
β 1095 β 99 β
βββββββββββββββββ΄βββββββββββββββββββ
1 row in set. Elapsed: 0.895 sec. Processed 1.00 billion rows, 20.43 GB (1.12 billion rows/s., 22.84 GB/s.)
β
This means we can use the
LowCardinality()
data type, which uses dictionary-based encoding. This causes ClickHouse to store the internal value ID instead of the original string value, which in turn saves a lot of space:
ALTER TABLE wikistat
MODIFY COLUMN `project` LowCardinality(String),
MODIFY COLUMN `subproject` LowCardinality(String)
Weβve also used
UInt64
type for the
hits
column, which takes 8 bytes, but has a relatively small max value:
SELECT max(hits)
FROM wikistat
ββmax(hits)ββ
β 237913 β
βββββββββββββ
β
Given this value, we can use
UInt32
instead, which takes only 4 bytes, and allows us to store up to ~4b as a max value:
ALTER TABLE wikistat
MODIFY COLUMN `hits` UInt32
This will reduce the size of this column in memory by at least 2 times. Note that the size on disk will remain unchanged due to compression. But be careful, pick data types that are not too small!
When we deal with sequential data, which time-series data effectively is, we can further improve storage efficiency by using
special codecs
. The general idea is to store changes between values instead of absolute values themselves, which results in much less space needed when dealing with slowly changing data:
ALTER TABLE wikistat
MODIFY COLUMN `time` CODEC(Delta, ZSTD)
Weβve used
Delta
codec for
time
column, which fits time series data best. The
right ordering key
can also save disk space. Since we usually want to filter by a path, we should also add this to the key. This requires recreation of the table. Letβs wrap it all and compare storage efficiency with and without optimized types:
Unoptimized table
Optimized table
CREATE TABLE wikistat
(
`time` DateTime,
`project` String,
`subproject` String,
`path` String,
`hits` UInt64
)
ENGINE = MergeTree
ORDER BY (time)
CREATE TABLE optimized_wikistat
(
`time` DateTime CODEC(Delta(4), ZSTD(1)),
`project` LowCardinality(String),
`subproject` LowCardinality(String),
`path` String,
`hits` UInt32
)
ENGINE = MergeTree
ORDER BY (path, time)
11.09 GiB
1.68 GiB
As we can see, we have optimized storage by ten times without any actual loss in data. For further details on optimizing storage using types and codecs, see our recent blog
Optimizing ClickHouse with Schemas and Codecs
.
Before attempting other optimizations, users should optimize their
ordering key
to ensure ClickHouse produces the fastest possible results. Choosing the key right largely depends on the queries youβre going to run. Suppose most of our queries filter by
project
and
subproject
columns. In this case, its a good idea to add them to the ordering key - as well as the
time
column since we query on time as well:
CREATE TABLE optimized_wikistat
(β¦)
ENGINE = MergeTree
ORDER BY (project, subproject, time)
Letβs now compare multiple queries to get an idea of how essential our ordering key expression is to performance. Note that we have also applied our previous data type and codec optimizations:
Query
Ordering Key
(time)
(project, subproject, time)
SELECT
project,
sum(hits) AS h
FROM wikistat
GROUP BY project
ORDER BY h DESC
LIMIT 10
0.518 sec β
0.258 sec β
SELECT
subproject,
sum(hits) AS h
FROM wikistat
WHERE project = 'it'
GROUP BY subproject
ORDER BY h DESC
LIMIT 10
0.67 sec β
0.025 sec β
SELECT
toStartOfMonth(time) AS m,
sum(hits) AS h
FROM wikistat
WHERE (project = 'it') AND (subproject = 'zero')
GROUP BY m
ORDER BY m DESC
LIMIT 10
0.65 sec β
0.014 sec β
SELECT
path,
sum(hits) AS h
FROM wikistat
WHERE (project = 'it') AND (subproject = 'zero')
GROUP BY path
ORDER BY h DESC
LIMIT 10
0.148 sec β
0.010 sec β
Note how we got a 2β¦40x performance increase by picking up a more appropriate ordering key. For further details on choosing a primary key, including how to decide the order of the columns, read our excellent guide
here
.
Another option is to use
materialized views
to aggregate and store the results of popular queries. These results can be queried instead of the original table. Suppose the following query is executed quite often in our case:
SELECT
path,
SUM(hits) AS v
FROM wikistat
WHERE toStartOfMonth(time) = '2015-05-01'
GROUP BY path
ORDER BY v DESC
LIMIT 10
ββpathβββββββββββββββββββ¬ββββββββvββ
β - β 89742164 β
β Angelsberg β 19191582 β
β Ana_Sayfa β 6376578 β
β Academy_Awards β 4901470 β
β Accueil_(homonymie) β 3810047 β
β 2015_in_spaceflight β 2077195 β
β Albert_Einstein β 1621175 β
β 19_Kids_and_Counting β 1432484 β
β 2015_Nepal_earthquake β 1406457 β
β Alive β 1390624 β
βββββββββββββββββββββββββ΄βββββββββββ
10 rows in set. Elapsed: 1.016 sec. Processed 256.84 million rows, 10.17 GB (252.69 million rows/s., 10.01 GB/s.)
β
We can create the following materialized view:
CREATE MATERIALIZED VIEW blogs.wikistat_top
ENGINE = SummingMergeTree
ORDER BY (month, hits) POPULATE AS
SELECT
path,
toStartOfMonth(time) AS month,
sum(hits) AS hits
FROM blogs.wikistat
GROUP BY
path,
month
0 rows in set. Elapsed: 8.589 sec. Processed 1.00 billion rows, 40.52 GB (116.43 million rows/s., 4.72 GB/s.)
Now we can query the materialized view instead of the original table:
SELECT
path,
hits
FROM wikistat_top
WHERE month = '2015-05-01'
ORDER BY hits DESC
LIMIT 10
ββpathβββββββββββββββββββ¬βββββhitsββ
β - β 89742164 β
β Angelsberg β 19191582 β
β Ana_Sayfa β 6376578 β
β Academy_Awards β 4901470 β
β Accueil_(homonymie) β 3810047 β
β 2015_in_spaceflight β 2077195 β
β Albert_Einstein β 1621175 β
β 19_Kids_and_Counting β 1432484 β
β 2015_Nepal_earthquake β 1406457 β
β Alive β 1390624 β
βββββββββββββββββββββββββ΄βββββββββββ
10 rows in set. Elapsed: 0.005 sec. Processed 24.58 thousand rows, 935.16 KB (5.26 million rows/s., 200.31 MB/s.)
β
Our performance improvement here is dramatic. We will publish a blog post on materialized views soon, so watch this space!
ClickHouse is efficient in storage and queries and easily scalable to Petabytes, maintaining the same level of performance and simplicity. In a future post, we will explore techniques for scaling to almost 400 billion rows using the
full Wikistat dataset
. We will show how you can scale in terms of storage and processing capacity using our Cloud service, which separates storage and compute and deals with this automatically, or by using a manual clustering solution.
In this post, we have shown how you can efficiently store and query time series data using the power of SQL and the performance of ClickHouse. Given this, you wonβt need to install additional extensions or tools to collect and process time series, as ClickHouse has everything in place. |
| Markdown | [](https://clickhouse.com/)
Copy logo as SVG
[Download full logo](https://clickhouse.com/brand-assets/clickhouse-logo.zip)[Download logomark](https://clickhouse.com/brand-assets/clickhouse-logomark.zip)
Open search
Open region selector
- [English](https://clickhouse.com/?country=en)
- [Japanese](https://clickhouse.com/jp?country=jp)
Open menu
- [Products](https://clickhouse.com/blog/[slug])
**Products**
- [ClickHouse Cloud The best way to use ClickHouse. Available on AWS, GCP, and Azure.](https://clickhouse.com/cloud)
- [Bring Your Own Cloud A fully managed ClickHouse service, deployed in your own AWS, GCP, or Azure account.](https://clickhouse.com/cloud/bring-your-own-cloud)
- [Postgres managed by ClickHouse Unified data stack for transactions and analytics.](https://clickhouse.com/cloud/postgres)
- [Managed ClickStack Managed observability with high-performance queries and long-term retention.](https://clickhouse.com/cloud/clickstack)
- [Langfuse Cloud LLM observability and evaluations for reliable AI applications and agents.](https://langfuse.com/?utm_source=clickhouse_topnav)
**Open source**
- [ClickHouse Fast open-source OLAP database for real-time analytics.](https://clickhouse.com/clickhouse)
- [ClickStack Open-source observability stack for logs, metrics, traces, and session replays.](https://clickhouse.com/clickstack)
- [Agentic Data Stack Build AI-powered applications with ClickHouse.](https://clickhouse.com/ai)
- [chDB In-process SQL Engine powered by ClickHouse, with a Pandas-compatible API](https://clickhouse.com/chdb)
- [Solutions](https://clickhouse.com/blog/[slug])
**Use cases**
- [Real-time analytics](https://clickhouse.com/use-cases/real-time-analytics)
- [Observability](https://clickhouse.com/cloud/clickstack)
- [Data warehousing](https://clickhouse.com/use-cases/data-warehousing)
- [Machine learning and GenAI](https://clickhouse.com/use-cases/machine-learning-and-data-science)
- [All use cases](https://clickhouse.com/use-cases)
[All use cases \-\>\-\>](https://clickhouse.com/use-cases)
**Industries**
- [Cybersecurity](https://clickhouse.com/industries/cybersecurity)
- [Gaming and entertainment](https://clickhouse.com/industries/gaming)
- [E-commerce and retail](https://clickhouse.com/industries/retail)
- [Automotive](https://clickhouse.com/industries/automotive)
- [Energy](https://clickhouse.com/industries/energy)
- [All industries](https://clickhouse.com/industries)
[All industries \-\>\-\>](https://clickhouse.com/industries)
- [Docs](https://clickhouse.com/docs)
- [Resources](https://clickhouse.com/blog/[slug])
- [User stories](https://clickhouse.com/user-stories)
- [Blog](https://clickhouse.com/blog)
- [Events](https://clickhouse.com/company/events)
- [News](https://clickhouse.com/company/news)
- [Learning and certification](https://clickhouse.com/learn)
- [Comparisons](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse)
- [Benchmark hub](https://clickhouse.com/benchmarks)
- [BigQuery](https://clickhouse.com/comparison/bigquery)
- [PostgreSQL](https://clickhouse.com/comparison/postgresql)
- [Redshift](https://clickhouse.com/comparison/redshift)
- [Snowflake](https://clickhouse.com/comparison/snowflake)
- [Elastic Observability](https://clickhouse.com/comparison/elastic-for-observability)
- [Splunk](https://clickhouse.com/comparison/splunk-for-observability)
- [OpenSearch For observability](https://clickhouse.com/comparison/opensearch-for-observability)
- [Videos](https://clickhouse.com/videos)
- [Demos](https://clickhouse.com/demos)
- [Pricing](https://clickhouse.com/pricing)
- [Contact us](https://clickhouse.com/company/contact?loc=nav)
Open search
Open region selector
- [English](https://clickhouse.com/?country=en)
- [Japanese](https://clickhouse.com/jp?country=jp)
[46\.7k](https://github.com/ClickHouse/ClickHouse?utm_source=clickhouse&utm_medium=website&utm_campaign=website-nav)
[Sign in](https://console.clickhouse.cloud/signIn?glxid=435a47ae-8af8-4e30-8db5-90fc93a55744&pagePath=%2Fblog%2Fworking-with-time-series-data-and-functions-ClickHouse&origPath=%2Fblog%2Fworking-with-time-series-data-and-functions-ClickHouse)
[Get started](https://console.clickhouse.cloud/signUp?loc=nav-get-started&glxid=435a47ae-8af8-4e30-8db5-90fc93a55744&pagePath=%2Fblog%2Fworking-with-time-series-data-and-functions-ClickHouse&origPath=%2Fblog%2Fworking-with-time-series-data-and-functions-ClickHouse)
\-\>Scroll to top
Back
- [Blog](https://clickhouse.com/blog)
- /
- [Engineering](https://clickhouse.com/blog?category=engineering)
Copy pageCopied\!
More actions
- 
**View as Markdown**
Open this page in Markdown
- 
**Open in ChatGPT**
Ask questions about this page
- 
**Open in Claude**
Ask questions about this page
- 
**Open in v0**
Ask questions about this page
# Working with Time Series Data in ClickHouse

[Denys Golotiuk](https://clickhouse.com/authors/denys-golotiuk)
Jan 10, 2023 Β· 24 minutes read

## Introduction [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#introduction)
Many datasets are collected over time to analyze and discover meaningful trends. Each data point usually has a time assigned when we collect logs or business events. When exploring our data during an analysis stage, we often slice or group by different periods to understand how our data changes over time. Any data that changes over time in any way is **time-series data**. ClickHouse has powerful tools to store and process time-series data efficiently and can be used for both simple solutions and data discovery, as well as for powering real-time analytical applications at the Petabyte scale.
This blog post provides tips and tricks for working with [time series data](https://clickhouse.com/engineering-resources/what-is-time-series-database) based on everyday tasks that we see our users needing to perform. We cover querying and common data type problems, such as handling gauges, and explore how performance can be improved as we scale.
All examples in this post can be reproduced in our [sql.clickhouse.com](https://sql.clickhouse.com/) environment (see the `blogs` database). Alternatively, if you want to dive deeper into this dataset, [ClickHouse Cloud](https://clickhouse.cloud/signUp) is a great starting point - spin up a cluster using a free trial, load the data, let us deal with the infrastructure, and get querying\!
## Date and time types available in ClickHouse [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#date-and-time-types-available-in-clickhouse)
ClickHouse has several date and time types. Depending on your use case, different types can be applied. Using the [Date](https://clickhouse.com/docs/en/sql-reference/data-types/date) type for dates should be sufficient in most cases. This type only requires 2 bytes to store a date but limits the range to `[1970-01-01, 2149-06-06]`. The [DateTime](https://clickhouse.com/docs/en/sql-reference/data-types/datetime) allows storing dates and times up to the year 2106. For cases where more precision is required, the [DateTime64](https://clickhouse.com/docs/en/sql-reference/data-types/datetime64) can be used. This allows storing time with up to nanoseconds precision:
```
CREATE TABLE dates ( `date` Date, `datetime` DateTime, `precise_datetime` DateTime64(3), `very_precise_datetime` DateTime64(9) ) ENGINE = MergeTree ORDER BY tuple()
```
We can use the `now()` function to return the current time and `now64()` to get it in a specified precision via the first argument.
```
INSERT INTO dates SELECT NOW(), NOW(), NOW64(3), NOW64(9);
```
This will populate our columns with time accordingly to the column type:
```
SELECT * FROM dates Row 1: ββββββ date: 2022-12-27 datetime: 2022-12-27 12:51:15 precise_datetime: 2022-12-27 12:51:15.329 very_precise_datetime: 2022-12-27 12:51:15.329098089
```
### Timezones [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#timezones)
Practical cases require having timezones stored as well in many cases. ClickHouse letβs us set timezone as a last argument to the [DateTime](https://clickhouse.com/docs/en/sql-reference/data-types/datetime) or [DateTime64](https://clickhouse.com/docs/en/sql-reference/data-types/datetime64) types:
```
CREATE TABLE dtz ( `id` Int8, `t` DateTime('Europe/Berlin') ) ENGINE = MergeTree ORDER BY tuple()
```
Having defined a timezone in our DDL, we can now insert times using different timezones:
```
INSERT INTO dtz SELECT 1, toDateTime('2022-12-12 12:13:14', 'America/New_York') INSERT INTO dtz SELECT 2, toDateTime('2022-12-12 12:13:14') SELECT * FROM dtz ββidββ¬βββββββββββββββββββtββ β 1 β 2022-12-12 18:13:14 β β 2 β 2022-12-12 13:13:14 β ββββββ΄ββββββββββββββββββββββ
```
Note how we have inserted time in `America/New_York` format, and it was automatically converted to `Europe/Berlin` at query time. When no time zone is specified, the server's local time zone is used.
## Querying [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#querying)
Weβre going to explore ClickHouse time-series querying capabilities with the [Wikistat](https://github.com/ClickHouse/ClickHouse/issues/15318) (Wikipedia pageviews data) dataset:
```
CREATE TABLE wikistat ( `time` DateTime, `project` String, `subproject` String, `path` String, `hits` UInt64 ) ENGINE = MergeTree ORDER BY (time)
```
Letβs populate this table with 1b records:
```
INSERT INTO wikistat SELECT * FROM s3('https://ClickHouse-public-datasets.s3.amazonaws.com/wikistat/partitioned/wikistat*.native.zst') LIMIT 1e9 0 rows in set. Elapsed: 421.868 sec. Processed 2.00 billion rows, 100.89 GB (4.74 million rows/s., 239.15 MB/s.)
```
### Aggregating based on time periods [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#aggregating-based-on-time-periods)
The most popular requirement is to aggregate data based on periods, e.g. get the total amount of hits for each day:
```
SELECT sum(hits) AS h, toDate(time) AS d FROM wikistat GROUP BY d ORDER BY d ASC LIMIT 5 βββββββββhββ¬ββββββββββdββ β 31045470 β 2015-05-01 β β 30309226 β 2015-05-02 β β 33704223 β 2015-05-03 β β 34700248 β 2015-05-04 β β 34634353 β 2015-05-05 β ββββββββββββ΄βββββββββββββ 5 rows in set. Elapsed: 0.264 sec. Processed 1.00 billion rows, 12.00 GB (3.78 billion rows/s., 45.40 GB/s.)
β
```
Weβve used [toDate()](https://clickhouse.com/docs/en/sql-reference/functions/type-conversion-functions/#todate) function here, which converts the specified time to a date type. Alternatively, we can batch by an hour and filter on the specific date:
```
SELECT sum(hits) AS v, toStartOfHour(time) AS h FROM wikistat WHERE date(time) = '2015-05-01' GROUP BY h ORDER BY h ASC LIMIT 5 ββββββββvββ¬βββββββββββββββββββhββ β 1199168 β 2015-05-01 01:00:00 β β 1207276 β 2015-05-01 02:00:00 β β 1189451 β 2015-05-01 03:00:00 β β 1119078 β 2015-05-01 04:00:00 β β 1037526 β 2015-05-01 05:00:00 β βββββββββββ΄ββββββββββββββββββββββ 5 rows in set. Elapsed: 0.013 sec. Processed 7.72 million rows, 92.54 MB (593.64 million rows/s., 7.12 GB/s.)
β
```
The [`toStartOfHour()`](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofhour) function used here converts the given time to the start of the hour. ClickHouse has [batching functions](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofyear) for generating periods that cover almost all imaginable cases, allowing you to group by year, month, day, hour, or even arbitrary intervals, e.g., [5 minutes](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartoffiveminutes), easily.
### Custom grouping intervals [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#custom-grouping-intervals)
We can also use the [toStartOfInterval()](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofintervaltime_or_data-interval-x-unit--time_zone) function to group by custom intervals. Letβs say we want to group by 4-hour intervals:
```
SELECT sum(hits) AS v, toStartOfInterval(time, INTERVAL 4 HOUR) AS h FROM wikistat WHERE date(time) = '2015-05-01' GROUP BY h ORDER BY h ASC LIMIT 6 ββββββββvββ¬βββββββββββββββββββhββ β 3595895 β 2015-05-01 00:00:00 β β 4161080 β 2015-05-01 04:00:00 β β 4602523 β 2015-05-01 08:00:00 β β 6072107 β 2015-05-01 12:00:00 β β 6604783 β 2015-05-01 16:00:00 β β 6009082 β 2015-05-01 20:00:00 β βββββββββββ΄ββββββββββββββββββββββ 6 rows in set. Elapsed: 0.020 sec. Processed 7.72 million rows, 92.54 MB (386.78 million rows/s., 4.64 GB/s.)
β
```
With the [`toStartOfInterval()`](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofintervaltime_or_data-interval-x-unit--time_zone) function, we use [INTERVAL](https://clickhouse.com/docs/en/sql-reference/data-types/special-data-types/interval/) clause to set the required batching period:
### Filling empty groups [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#filling-empty-groups)
In a lot of cases we deal with sparse data with some absent intervals. This results in empty buckets. Letβs take the following example where we group data by 1-hour intervals. This will out the following stats with some hours missing values:
```
SELECT toStartOfHour(time) AS h, sum(hits) FROM wikistat WHERE (project = 'it') AND (subproject = 'm') AND (date(time) = '2015-06-12') GROUP BY h ORDER BY h ASC ββββββββββββββββββββhββ¬βsum(hits)ββ β 2015-06-12 00:00:00 β 16246 β β 2015-06-12 01:00:00 β 7900 β β 2015-06-12 02:00:00 β 4517 β β 2015-06-12 03:00:00 β 2982 β β 2015-06-12 04:00:00 β 2748 β β 2015-06-12 05:00:00 β 4581 β β 2015-06-12 06:00:00 β 8908 β β 2015-06-12 07:00:00 β 13514 β β 2015-06-12 08:00:00 β 18327 β β 2015-06-12 09:00:00 β 22541 β β 2015-06-12 10:00:00 β 25366 β β 2015-06-12 11:00:00 β 25383 β β 2015-06-12 12:00:00 β 29074 β <- missing values β 2015-06-12 23:00:00 β 27199 β βββββββββββββββββββββββ΄ββββββββββββ 14 rows in set. Elapsed: 0.029 sec. Processed 6.98 million rows, 225.76 MB (237.19 million rows/s., 7.67 GB/s.)
β
```
ClickHouse provides the [WITH FILL](https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier) modifier to address this. This will fill out all the empty hours with zeros, so we can better understand distribution over time:
```
SELECT toStartOfHour(time) AS h, sum(hits) FROM wikistat WHERE (project = 'it') AND (subproject = 'm') AND (date(time) = '2015-06-12') GROUP BY h ORDER BY h ASC WITH FILL STEP toIntervalHour(1) ββββββββββββββββββββhββ¬βsum(hits)ββ β 2015-06-12 00:00:00 β 16246 β β 2015-06-12 01:00:00 β 7900 β β 2015-06-12 02:00:00 β 4517 β β 2015-06-12 03:00:00 β 2982 β β 2015-06-12 04:00:00 β 2748 β β 2015-06-12 05:00:00 β 4581 β β 2015-06-12 06:00:00 β 8908 β β 2015-06-12 07:00:00 β 13514 β β 2015-06-12 08:00:00 β 18327 β β 2015-06-12 09:00:00 β 22541 β β 2015-06-12 10:00:00 β 25366 β β 2015-06-12 11:00:00 β 25383 β β 2015-06-12 12:00:00 β 29074 β β 2015-06-12 13:00:00 β 0 β β 2015-06-12 14:00:00 β 0 β β 2015-06-12 15:00:00 β 0 β β 2015-06-12 16:00:00 β 0 β β 2015-06-12 17:00:00 β 0 β β 2015-06-12 18:00:00 β 0 β β 2015-06-12 19:00:00 β 0 β β 2015-06-12 20:00:00 β 0 β β 2015-06-12 21:00:00 β 0 β β 2015-06-12 22:00:00 β 0 β β 2015-06-12 23:00:00 β 27199 β βββββββββββββββββββββββ΄ββββββββββββ 24 rows in set. Elapsed: 0.039 sec. Processed 6.98 million rows, 225.76 MB (180.92 million rows/s., 5.85 GB/s.)
β
```
### Rolling time windows [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#rolling-time-windows)
Sometimes, we donβt want to deal with the start of intervals (like the start of the day or an hour) but window intervals. Letβs say we want to understand the total hits for a window, not based on days but on a 24-hour period offset from 6 pm. Weβve used [date\_diff()](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#date_diff) function to calculate the difference between a basepoint time and each recordβs time. In this case, the `d` column will represent the difference in days (e.g., 1 day ago, 2 days ago, etc.):
```
SELECT sum(hits), dateDiff('day', toDateTime('2015-05-01 18:00:00'), time) AS d FROM wikistat GROUP BY d ORDER BY d ASC LIMIT 5 ββsum(hits)ββ¬βdββ β 31045470 β 0 β β 30309226 β 1 β β 33704223 β 2 β β 34700248 β 3 β β 34634353 β 4 β βββββββββββββ΄ββββ 5 rows in set. Elapsed: 0.283 sec. Processed 1.00 billion rows, 12.00 GB (3.54 billion rows/s., 42.46 GB/s.)
β
```
## Quick visual analysis [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#quick-visual-analysis)
ClickHouse provides the [bar()](https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#bar) function to build quick visuals and help with the analysis of data. This will quickly visualize the most and least popular hours in terms of page views:
```
SELECT toHour(time) AS h, sum(hits) AS t, bar(t, 0, max(t) OVER (), 50) AS bar FROM wikistat GROUP BY h ORDER BY h ASC βββhββ¬βββββββββtββ¬βbarβββββββββββββββββββββββββββββββββββββββββββββββββ β 0 β 146208847 β βββββββββββββββββββββββββββββββββββββββ β β 1 β 143713140 β ββββββββββββββββββββββββββββββββββββββ β β 2 β 144977675 β βββββββββββββββββββββββββββββββββββββββ β β 3 β 145089174 β βββββββββββββββββββββββββββββββββββββββ β β 4 β 139602368 β βββββββββββββββββββββββββββββββββββββ β β 5 β 130795734 β βββββββββββββββββββββββββββββββββββ β β 6 β 126456113 β ββββββββββββββββββββββββββββββββββ β β 7 β 127328312 β ββββββββββββββββββββββββββββββββββ β β 8 β 131772449 β βββββββββββββββββββββββββββββββββββ β β 9 β 137695533 β βββββββββββββββββββββββββββββββββββββ β β 10 β 143381876 β ββββββββββββββββββββββββββββββββββββββ β β 11 β 146690963 β βββββββββββββββββββββββββββββββββββββββ β β 12 β 155662847 β ββββββββββββββββββββββββββββββββββββββββββ β β 13 β 169130249 β βββββββββββββββββββββββββββββββββββββββββββββ β β 14 β 182213956 β βββββββββββββββββββββββββββββββββββββββββββββββββ β β 15 β 188534642 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 16 β 189214224 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 17 β 186824967 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 18 β 185885433 β βββββββββββββββββββββββββββββββββββββββββββββββββ β β 19 β 186112653 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 20 β 187530882 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 21 β 185485979 β βββββββββββββββββββββββββββββββββββββββββββββββββ β β 22 β 175522556 β βββββββββββββββββββββββββββββββββββββββββββββββ β β 23 β 157537595 β ββββββββββββββββββββββββββββββββββββββββββ β ββββββ΄ββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββ 24 rows in set. Elapsed: 0.264 sec. Processed 1.00 billion rows, 12.00 GB (3.79 billion rows/s., 45.53 GB/s.)
β
```
Note how weβve used a window max() to compute the max hits per hour, passing this to the `bar()` function for visualization.
## Counters and Gauge metrics [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#counters-and-gauge-metrics)
There are two basic types of metrics we encounter when working with time series:
- Counters are used to count the total number of tracked events sliced by attributes and grouped by a time frame. A popular example here is tracking website visitors.
- Gauges are used to set a metric value that tends to change over time. A good example here is tracking CPU load.
Both metric types are easy to work with in ClickHouse and donβt require any additional configuration. Counters can be easily queried using `count()` or `sum()` functions, depending on the storage policy. To efficiently query for gauges, the [`any()`](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/any/) aggregate function can be used together with [INTERPOLATE](https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier) modifier to fill any missing data points:
```
CREATE TABLE metrics ( `time` DateTime, `name` String, `value` UInt32 ) ENGINE = MergeTree ORDER BY tuple(); INSERT INTO metrics VALUES ('2022-12-28 06:32:16', 'cpu', 7), ('2022-12-28 14:31:22', 'cpu', 50), ('2022-12-28 14:30:30', 'cpu', 25), ('2022-12-28 14:25:36', 'cpu', 10), ('2022-12-28 11:32:08', 'cpu', 5), ('2022-12-28 10:32:12', 'cpu', 5); SELECT toStartOfHour(time) AS h, any(value) AS v FROM metrics GROUP BY h ORDER BY h ASC WITH FILL STEP toIntervalHour(1) INTERPOLATE ( v AS v ) ββββββββββββββββββββhββ¬ββvββ β 2022-12-28 06:00:00 β 7 β β 2022-12-28 07:00:00 β 7 β <- filled β 2022-12-28 08:00:00 β 7 β <- filled β 2022-12-28 09:00:00 β 7 β <- filled β 2022-12-28 10:00:00 β 5 β β 2022-12-28 11:00:00 β 5 β <- filled β 2022-12-28 12:00:00 β 5 β <- filled β 2022-12-28 13:00:00 β 5 β <- filled β 2022-12-28 14:00:00 β 50 β βββββββββββββββββββββββ΄βββββ
```
In this case, highlighted values were automatically filled by ClickHouse, to follow the gauge nature of the metric within a continuous time range.
### Histograms [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#histograms)
A popular use case for time series data is to build histograms based on tracked events. Suppose we wanted to understand the distribution of a number of pages based on their total hits for a specific date. We can use the [histogram()](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/parametric-functions/#histogram) function to automatically generate an adaptive histogram based on the number of bins and then use [arrayJoin()](https://clickhouse.com/docs/en/sql-reference/functions/array-join/) and [bar()](https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#bar) to visualize it:
```
WITH histogram(10)(hits) AS h SELECT round(arrayJoin(h).1) AS l, round(arrayJoin(h).2) AS u, arrayJoin(h).3 AS w, bar(w, 0, max(w) OVER (), 20) AS b FROM ( SELECT path, sum(hits) AS hits FROM wikistat WHERE date(time) = '2015-06-15' GROUP BY path HAVING hits > 10000. ) ββββββββlββ¬βββββββuββ¬ββββββwββ¬βbβββββββββββββββββββββ β 10034 β 27769 β 84.375 β ββββββββββββββββββββ β β 27769 β 54281 β 19.75 β βββββ β β 54281 β 79020 β 3.875 β β β β 79020 β 96858 β 2.75 β β β β 96858 β 117182 β 1.25 β β β β 117182 β 173244 β 1 β β β β 173244 β 232806 β 1.125 β β β β 232806 β 405693 β 1.75 β β β β 405693 β 1126826 β 1.125 β β β β 1126826 β 1691188 β 1 β β β βββββββββββ΄ββββββββββ΄βββββββββ΄βββββββββββββββββββββββ 10 rows in set. Elapsed: 0.134 sec. Processed 6.64 million rows, 268.25 MB (49.48 million rows/s., 2.00 GB/s.)
β
```
Weβve filtered only pages with more than 10k views. In the result set, `l` and `r` are the left and right boundaries of the bin, and `w` is a bin width (count of items in this bin).
### Trends [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#trends)
Sometimes we want to understand how metrics change over time by calculating the difference between consecutive values. Letβs compute daily hits for a given page (`path` column) and the change in this value from the previous day:
```
SELECT toDate(time) AS d, sum(hits) AS h, lagInFrame(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS p, h - p AS trend FROM wikistat WHERE path = 'Ana_Sayfa' GROUP BY d ORDER BY d ASC LIMIT 15 βββββββββββdββ¬ββββββhββ¬ββββββpββ¬ββtrendββ β 2015-05-01 β 214612 β 0 β 214612 β β 2015-05-02 β 211546 β 214612 β -3066 β β 2015-05-03 β 221412 β 211546 β 9866 β β 2015-05-04 β 219940 β 221412 β -1472 β β 2015-05-05 β 211548 β 219940 β -8392 β β 2015-05-06 β 212358 β 211548 β 810 β β 2015-05-07 β 208150 β 212358 β -4208 β β 2015-05-08 β 208871 β 208150 β 721 β β 2015-05-09 β 210753 β 208871 β 1882 β β 2015-05-10 β 212918 β 210753 β 2165 β β 2015-05-11 β 211884 β 212918 β -1034 β β 2015-05-12 β 212314 β 211884 β 430 β β 2015-05-13 β 211192 β 212314 β -1122 β β 2015-05-14 β 206172 β 211192 β -5020 β β 2015-05-15 β 195832 β 206172 β -10340 β ββββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββ 15 rows in set. Elapsed: 0.550 sec. Processed 1.00 billion rows, 28.62 GB (1.82 billion rows/s., 52.00 GB/s.)
β
```
Weβve used [lagInFrame() window](https://clickhouse.com/docs/en/sql-reference/window-functions/) function to get the previous `hits` value, and then used this to calculate the difference as a `trend` column.
### Cumulative values [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#cumulative-values)
Following the previous example, sometimes we want to do the opposite - get a cumulative sum of certain metrics over time. This is usually used for counters to visualize cumulative growth and can be easily implemented using window functions:
```
SELECT toDate(time) AS d, sum(hits) AS h, sum(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS c, bar(c, 0, 3200000, 25) AS b FROM wikistat WHERE path = 'Ana_Sayfa' GROUP BY d ORDER BY d ASC LIMIT 15 βββββββββββdββ¬ββββββhββ¬βββββββcββ¬βbββββββββββββββββββββββββββ β 2015-05-01 β 214612 β 214612 β ββ β β 2015-05-02 β 211546 β 426158 β ββββ β β 2015-05-03 β 221412 β 647570 β βββββ β β 2015-05-04 β 219940 β 867510 β βββββββ β β 2015-05-05 β 211548 β 1079058 β βββββββββ β β 2015-05-06 β 212358 β 1291416 β ββββββββββ β β 2015-05-07 β 208150 β 1499566 β ββββββββββββ β β 2015-05-08 β 208871 β 1708437 β ββββββββββββββ β β 2015-05-09 β 210753 β 1919190 β βββββββββββββββ β β 2015-05-10 β 212918 β 2132108 β βββββββββββββββββ β β 2015-05-11 β 211884 β 2343992 β βββββββββββββββββββ β β 2015-05-12 β 212314 β 2556306 β ββββββββββββββββββββ β β 2015-05-13 β 211192 β 2767498 β ββββββββββββββββββββββ β β 2015-05-14 β 206172 β 2973670 β ββββββββββββββββββββββββ β β 2015-05-15 β 195832 β 3169502 β βββββββββββββββββββββββββ β ββββββββββββββ΄βββββββββ΄ββββββββββ΄ββββββββββββββββββββββββββββ 15 rows in set. Elapsed: 0.557 sec. Processed 1.00 billion rows, 28.62 GB (1.80 billion rows/s., 51.40 GB/s.)
β
```
Weβve built cumulative daily hits sum and visualized growth for a given page within a 15-day period.
### Rates [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#rates)
Calculating metric rates (speed per time unit) is also popular when working with time series. Suppose we want to get a certain page hit rate per second for a given date grouped by hours:
```
SELECT toStartOfHour(time) AS t, sum(hits) AS h, round(h / (60 * 60), 2) AS rate, bar(rate * 10, 0, max(rate * 10) OVER (), 25) AS b FROM wikistat WHERE path = 'Ana_Sayfa' GROUP BY t ORDER BY t ASC LIMIT 23 ββββββββββββββββββββtββ¬βββββhββ¬βrateββ¬βbββββββββββββββββββββββββ β 2015-05-01 01:00:00 β 6749 β 1.87 β βββββββββββββ β β 2015-05-01 02:00:00 β 6054 β 1.68 β ββββββββββββ β β 2015-05-01 03:00:00 β 5823 β 1.62 β ββββββββββββ β β 2015-05-01 04:00:00 β 5908 β 1.64 β ββββββββββββ β β 2015-05-01 05:00:00 β 6131 β 1.7 β ββββββββββββ β β 2015-05-01 06:00:00 β 7067 β 1.96 β ββββββββββββββ β β 2015-05-01 07:00:00 β 8169 β 2.27 β ββββββββββββββββ β β 2015-05-01 08:00:00 β 9526 β 2.65 β βββββββββββββββββββ β β 2015-05-01 09:00:00 β 10474 β 2.91 β βββββββββββββββββββββ β β 2015-05-01 10:00:00 β 10389 β 2.89 β ββββββββββββββββββββ β β 2015-05-01 11:00:00 β 9830 β 2.73 β βββββββββββββββββββ β β 2015-05-01 12:00:00 β 10712 β 2.98 β βββββββββββββββββββββ β β 2015-05-01 13:00:00 β 10301 β 2.86 β ββββββββββββββββββββ β β 2015-05-01 14:00:00 β 10181 β 2.83 β ββββββββββββββββββββ β β 2015-05-01 15:00:00 β 10324 β 2.87 β ββββββββββββββββββββ β β 2015-05-01 16:00:00 β 10497 β 2.92 β βββββββββββββββββββββ β β 2015-05-01 17:00:00 β 10676 β 2.97 β βββββββββββββββββββββ β β 2015-05-01 18:00:00 β 11121 β 3.09 β ββββββββββββββββββββββ β β 2015-05-01 19:00:00 β 11277 β 3.13 β ββββββββββββββββββββββ β β 2015-05-01 20:00:00 β 11553 β 3.21 β βββββββββββββββββββββββ β β 2015-05-01 21:00:00 β 11637 β 3.23 β βββββββββββββββββββββββ β β 2015-05-01 22:00:00 β 11298 β 3.14 β ββββββββββββββββββββββ β β 2015-05-01 23:00:00 β 8915 β 2.48 β ββββββββββββββββββ β βββββββββββββββββββββββ΄ββββββββ΄βββββββ΄ββββββββββββββββββββββββββ 23 rows in set. Elapsed: 0.572 sec. Processed 1.00 billion rows, 28.62 GB (1.75 billion rows/s., 50.06 GB/s.)
β
```
## Improving time series storage efficiency [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#improving-time-series-storage-efficiency)
### Type optimization [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#type-optimization)
The general approach to optimizing storage efficiency is using [optimal data types](https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema). Letβs take the `project` and `subprojects` columns. These columns are of type String, but have a relatively small amount of unique values:
```
SELECT uniq(project), uniq(subproject) FROM wikistat ββuniq(project)ββ¬βuniq(subproject)ββ β 1095 β 99 β βββββββββββββββββ΄βββββββββββββββββββ 1 row in set. Elapsed: 0.895 sec. Processed 1.00 billion rows, 20.43 GB (1.12 billion rows/s., 22.84 GB/s.)
β
```
This means we can use the [LowCardinality()](https://clickhouse.com/docs/en/sql-reference/data-types/lowcardinality/) data type, which uses dictionary-based encoding. This causes ClickHouse to store the internal value ID instead of the original string value, which in turn saves a lot of space:
```
ALTER TABLE wikistat MODIFY COLUMN `project` LowCardinality(String), MODIFY COLUMN `subproject` LowCardinality(String)
```
Weβve also used [UInt64](https://clickhouse.com/docs/en/sql-reference/data-types/int-uint/) type for the `hits` column, which takes 8 bytes, but has a relatively small max value:
```
SELECT max(hits) FROM wikistat ββmax(hits)ββ β 237913 β βββββββββββββ
β
```
Given this value, we can use `UInt32` instead, which takes only 4 bytes, and allows us to store up to ~4b as a max value:
```
ALTER TABLE wikistat MODIFY COLUMN `hits` UInt32
```
This will reduce the size of this column in memory by at least 2 times. Note that the size on disk will remain unchanged due to compression. But be careful, pick data types that are not too small\!
### Codecs to optimize sequences storage [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#codecs-to-optimize-sequences-storage)
When we deal with sequential data, which time-series data effectively is, we can further improve storage efficiency by using [special codecs](https://clickhouse.com/docs/en/sql-reference/statements/create/table/#specialized-codecs). The general idea is to store changes between values instead of absolute values themselves, which results in much less space needed when dealing with slowly changing data:
```
ALTER TABLE wikistat MODIFY COLUMN `time` CODEC(Delta, ZSTD)
```
Weβve used [Delta](https://clickhouse.com/docs/en/sql-reference/statements/create/table/#delta) codec for `time` column, which fits time series data best. The [right ordering key](https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-cardinality/) can also save disk space. Since we usually want to filter by a path, we should also add this to the key. This requires recreation of the table. Letβs wrap it all and compare storage efficiency with and without optimized types:
| **Unoptimized table** | **Optimized table** |
|---|---|
| **11\.09 GiB** | **1\.68 GiB** |
As we can see, we have optimized storage by ten times without any actual loss in data. For further details on optimizing storage using types and codecs, see our recent blog [Optimizing ClickHouse with Schemas and Codecs](https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema).
## Improving time-series query performance [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#improving-time-series-query-performance)
### Optimize ORDER BY keys [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#optimize-order-by-keys)
Before attempting other optimizations, users should optimize their [ordering key](https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-cardinality/) to ensure ClickHouse produces the fastest possible results. Choosing the key right largely depends on the queries youβre going to run. Suppose most of our queries filter by `project` and `subproject` columns. In this case, its a good idea to add them to the ordering key - as well as the `time` column since we query on time as well:
```
CREATE TABLE optimized_wikistat (β¦) ENGINE = MergeTree ORDER BY (project, subproject, time)
```
Letβs now compare multiple queries to get an idea of how essential our ordering key expression is to performance. Note that we have also applied our previous data type and codec optimizations:
| **Query** | **Ordering Key** | |
|---|---|---|
| | **(time)** | **(project, subproject, time)** |
| | [0\.518 sec β](https://sql.clickhouse.com/?query_id=HPGEZK7USHGDUGTJTHJAYV) | [0\.258 sec β](https://sql.clickhouse.com/?query_id=GFCCTYVJ3YVGMTOFXDQZJT) |
| | [0\.67 sec β](https://sql.clickhouse.com/?query_id=2Q8G2SWXOOSHHVVBQB8ND2) | [0\.025 sec β](https://sql.clickhouse.com/?query_id=1G21SEVPJZM1XFRTPR6MVR) |
| | [0\.65 sec β](https://sql.clickhouse.com/?query_id=4TB6QY6HMELIW5PUWRNG4M) | [0\.014 sec β](https://sql.clickhouse.com/?query_id=9EBNNA5MNGKSDJMNLGMCHB) |
| | [0\.148 sec β](https://sql.clickhouse.com/?query_id=WW6JBMONCU5VKCKQXSYTFM) | [0\.010 sec β](https://sql.clickhouse.com/?query_id=3EYNNE9LVWLHSAH2MD4N3Y) |
Note how we got a 2β¦40x performance increase by picking up a more appropriate ordering key. For further details on choosing a primary key, including how to decide the order of the columns, read our excellent guide [here](https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-intro).
### Materialized views [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#materialized-views)
Another option is to use [materialized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#materialized-view) to aggregate and store the results of popular queries. These results can be queried instead of the original table. Suppose the following query is executed quite often in our case:
```
SELECT path, SUM(hits) AS v FROM wikistat WHERE toStartOfMonth(time) = '2015-05-01' GROUP BY path ORDER BY v DESC LIMIT 10 ββpathβββββββββββββββββββ¬ββββββββvββ β - β 89742164 β β Angelsberg β 19191582 β β Ana_Sayfa β 6376578 β β Academy_Awards β 4901470 β β Accueil_(homonymie) β 3810047 β β 2015_in_spaceflight β 2077195 β β Albert_Einstein β 1621175 β β 19_Kids_and_Counting β 1432484 β β 2015_Nepal_earthquake β 1406457 β β Alive β 1390624 β βββββββββββββββββββββββββ΄βββββββββββ 10 rows in set. Elapsed: 1.016 sec. Processed 256.84 million rows, 10.17 GB (252.69 million rows/s., 10.01 GB/s.)
β
```
We can create the following materialized view:
```
CREATE MATERIALIZED VIEW blogs.wikistat_top ENGINE = SummingMergeTree ORDER BY (month, hits) POPULATE AS SELECT path, toStartOfMonth(time) AS month, sum(hits) AS hits FROM blogs.wikistat GROUP BY path, month 0 rows in set. Elapsed: 8.589 sec. Processed 1.00 billion rows, 40.52 GB (116.43 million rows/s., 4.72 GB/s.)
```
Now we can query the materialized view instead of the original table:
```
SELECT path, hits FROM wikistat_top WHERE month = '2015-05-01' ORDER BY hits DESC LIMIT 10 ββpathβββββββββββββββββββ¬βββββhitsββ β - β 89742164 β β Angelsberg β 19191582 β β Ana_Sayfa β 6376578 β β Academy_Awards β 4901470 β β Accueil_(homonymie) β 3810047 β β 2015_in_spaceflight β 2077195 β β Albert_Einstein β 1621175 β β 19_Kids_and_Counting β 1432484 β β 2015_Nepal_earthquake β 1406457 β β Alive β 1390624 β βββββββββββββββββββββββββ΄βββββββββββ 10 rows in set. Elapsed: 0.005 sec. Processed 24.58 thousand rows, 935.16 KB (5.26 million rows/s., 200.31 MB/s.)
β
```
Our performance improvement here is dramatic. We will publish a blog post on materialized views soon, so watch this space\!
## Scaling time series [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#scaling-time-series)
ClickHouse is efficient in storage and queries and easily scalable to Petabytes, maintaining the same level of performance and simplicity. In a future post, we will explore techniques for scaling to almost 400 billion rows using the [full Wikistat dataset](https://clickhouse.com/docs/en/getting-started/example-datasets/wikistat). We will show how you can scale in terms of storage and processing capacity using our Cloud service, which separates storage and compute and deals with this automatically, or by using a manual clustering solution.
## Summary [\#](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#summary)
In this post, we have shown how you can efficiently store and query time series data using the power of SQL and the performance of ClickHouse. Given this, you wonβt need to install additional extensions or tools to collect and process time series, as ClickHouse has everything in place.
- [Introduction](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#introduction)
- [Date and time types available in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#date-and-time-types-available-in-clickhouse)
- [Querying](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#querying)
- [Quick visual analysis](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#quick-visual-analysis)
- [Counters and Gauge metrics](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#counters-and-gauge-metrics)
- [Improving time series storage efficiency](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#improving-time-series-storage-efficiency)
- [Improving time-series query performance](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#improving-time-series-query-performance)
- [Scaling time series](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#scaling-time-series)
- [Summary](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#summary)
Share this post
Copy URL
Share on Y Combinator
Share on Twitter
Share on BlueSky
Share on Facebook
Share on LinkedIn
### Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings\!
Loading form...
## Recent posts
[View all Blogs](https://clickhouse.com/blog)

Engineering
### [ClickHouse Release 26.3](https://clickhouse.com/blog/clickhouse-release-26-03)
ClickHouse Β· Apr 7, 2026

User stories
### [How Respan is scaling LLM observability with ClickHouse Cloud](https://clickhouse.com/blog/respan-ai-llm-observability)
ClickHouse Β· Apr 6, 2026

Engineering
### [PostgresBench: A Reproducible Benchmark for Postgres Services](https://clickhouse.com/blog/postgresbench)
Lionel Palacin Β· Apr 2, 2026

Engineering
### [We taught ClickStack to read your logs like a detective novel](https://clickhouse.com/blog/clickstack-read-logs-detective-novel)
The ClickStack Team Β· Apr 1, 2026
Follow us
[X](https://x.com/ClickhouseDB "X")[Bluesky](https://bsky.app/profile/clickhouse.com "Bluesky")[Slack](https://clickhouse.com/slack "Slack")
[GitHub](https://github.com/ClickHouse/ClickHouse "GitHub")[Telegram](https://telegram.me/clickhouse_en "Telegram")[Meetup](https://www.meetup.com/pro/clickhouse "Meetup")
[Rss](https://clickhouse.com/rss.xml "Rss")
- - Product
- [ClickHouse Cloud](https://clickhouse.com/cloud)
- [Bring Your Own Cloud](https://clickhouse.com/cloud/bring-your-own-cloud)
- [Postgres managed by ClickHouse](https://clickhouse.com/cloud/postgres)
- [Managed ClickStack](https://clickhouse.com/cloud/clickstack)
- [ClickHouse](https://clickhouse.com/clickhouse)
- [ClickStack](https://clickhouse.com/clickstack)
- [Agentic Data Stack](https://clickhouse.com/ai)
- [ClickHouse Government](https://clickhouse.com/government)
- [ClickHouse Keeper](https://clickhouse.com/clickhouse/keeper)
- [ClickPipes](https://clickhouse.com/cloud/clickpipes)
- [Integrations](https://clickhouse.com/integrations)
- [chDB](https://clickhouse.com/chdb)
- [Pricing](https://clickhouse.com/pricing)
- - Resources
- [Documentation](https://clickhouse.com/docs)
- [Trust center](https://trust.clickhouse.com/)
- [Training](https://clickhouse.com/learn)
- [Support](https://clickhouse.com/support/program)
- [Benchmarks](https://clickhouse.com/benchmarks)
- [Use cases](https://clickhouse.com/use-cases)
- [Videos](https://clickhouse.com/videos)
- [Demos](https://clickhouse.com/demos)
- [Presentations](https://presentations.clickhouse.com/)
- [Real-time data warehouse](https://clickhouse.com/real-time-data-warehouse)
- [ClickHouse for data lakes](https://clickhouse.com/clickhouse-for-data-lakes)
- [Engineering resources](https://clickhouse.com/resources/engineering)
- - Company
- [Blog](https://clickhouse.com/blog)
- [Our story](https://clickhouse.com/company/our-story)
- [Careers](https://clickhouse.com/company/careers)
- [Contact us](https://clickhouse.com/company/contact?loc=footer)
- [Events](https://clickhouse.com/company/events)
- [News](https://clickhouse.com/company/news)
- [Media](https://clickhouse.com/media)
- - Join our community
- [ClickHouse Community](https://clickhouse.com/community)
- [GitHub](https://github.com/ClickHouse/ClickHouse)
- [Slack](https://clickhouse.com/slack)
- [LinkedIn](https://www.linkedin.com/company/clickhouseinc)
- [X](https://x.com/ClickhouseDB)
- [Bluesky](https://bsky.app/profile/clickhouse.com)
- [Telegram](https://telegram.me/clickhouse_en)
- [Meetup](https://www.meetup.com/pro/clickhouse)
- - Comparisons
- [BigQuery](https://clickhouse.com/comparison/bigquery)
- [PostgreSQL](https://clickhouse.com/comparison/postgresql)
- [Redshift](https://clickhouse.com/comparison/redshift)
- [Snowflake](https://clickhouse.com/comparison/snowflake)
- [Elastic](https://clickhouse.com/comparison/elastic-for-observability)
- [Splunk](https://clickhouse.com/comparison/splunk-for-observability)
- [OpenSearch](https://clickhouse.com/comparison/opensearch-for-observability)
- Partners
- [AWS](https://clickhouse.com/partners/aws)
- [Azure](https://clickhouse.com/partners/azure)

Stay informed on feature releases, product roadmap, support, and cloud offerings\!
Loading form...
[Star us on Github](https://github.com/ClickHouse/ClickHouse)
Β© 2026 ClickHouse, Inc. HQ in the Bay Area, CA and Amsterdam, NL.
[Trademark](https://clickhouse.com/legal/trademark-policy)[Privacy](https://clickhouse.com/legal/privacy-policy)[Security](https://trust.clickhouse.com/)[Legal](https://clickhouse.com/legal)[Cookie policy](https://clickhouse.com/legal/cookie-policy)
Your privacy choices
 |
| Readable Markdown | 
Many datasets are collected over time to analyze and discover meaningful trends. Each data point usually has a time assigned when we collect logs or business events. When exploring our data during an analysis stage, we often slice or group by different periods to understand how our data changes over time. Any data that changes over time in any way is **time-series data**. ClickHouse has powerful tools to store and process time-series data efficiently and can be used for both simple solutions and data discovery, as well as for powering real-time analytical applications at the Petabyte scale.
This blog post provides tips and tricks for working with [time series data](https://clickhouse.com/engineering-resources/what-is-time-series-database) based on everyday tasks that we see our users needing to perform. We cover querying and common data type problems, such as handling gauges, and explore how performance can be improved as we scale.
All examples in this post can be reproduced in our [sql.clickhouse.com](https://sql.clickhouse.com/) environment (see the `blogs` database). Alternatively, if you want to dive deeper into this dataset, [ClickHouse Cloud](https://clickhouse.cloud/signUp) is a great starting point - spin up a cluster using a free trial, load the data, let us deal with the infrastructure, and get querying\!
## Date and time types available in ClickHouse
ClickHouse has several date and time types. Depending on your use case, different types can be applied. Using the [Date](https://clickhouse.com/docs/en/sql-reference/data-types/date) type for dates should be sufficient in most cases. This type only requires 2 bytes to store a date but limits the range to `[1970-01-01, 2149-06-06]`. The [DateTime](https://clickhouse.com/docs/en/sql-reference/data-types/datetime) allows storing dates and times up to the year 2106. For cases where more precision is required, the [DateTime64](https://clickhouse.com/docs/en/sql-reference/data-types/datetime64) can be used. This allows storing time with up to nanoseconds precision:
```
CREATE TABLE dates ( `date` Date, `datetime` DateTime, `precise_datetime` DateTime64(3), `very_precise_datetime` DateTime64(9) ) ENGINE = MergeTree ORDER BY tuple()
```
We can use the `now()` function to return the current time and `now64()` to get it in a specified precision via the first argument.
```
INSERT INTO dates SELECT NOW(), NOW(), NOW64(3), NOW64(9);
```
This will populate our columns with time accordingly to the column type:
```
SELECT * FROM dates Row 1: ββββββ date: 2022-12-27 datetime: 2022-12-27 12:51:15 precise_datetime: 2022-12-27 12:51:15.329 very_precise_datetime: 2022-12-27 12:51:15.329098089
```
Practical cases require having timezones stored as well in many cases. ClickHouse letβs us set timezone as a last argument to the [DateTime](https://clickhouse.com/docs/en/sql-reference/data-types/datetime) or [DateTime64](https://clickhouse.com/docs/en/sql-reference/data-types/datetime64) types:
```
CREATE TABLE dtz ( `id` Int8, `t` DateTime('Europe/Berlin') ) ENGINE = MergeTree ORDER BY tuple()
```
Having defined a timezone in our DDL, we can now insert times using different timezones:
```
INSERT INTO dtz SELECT 1, toDateTime('2022-12-12 12:13:14', 'America/New_York') INSERT INTO dtz SELECT 2, toDateTime('2022-12-12 12:13:14') SELECT * FROM dtz ββidββ¬βββββββββββββββββββtββ β 1 β 2022-12-12 18:13:14 β β 2 β 2022-12-12 13:13:14 β ββββββ΄ββββββββββββββββββββββ
```
Note how we have inserted time in `America/New_York` format, and it was automatically converted to `Europe/Berlin` at query time. When no time zone is specified, the server's local time zone is used.
Weβre going to explore ClickHouse time-series querying capabilities with the [Wikistat](https://github.com/ClickHouse/ClickHouse/issues/15318) (Wikipedia pageviews data) dataset:
```
CREATE TABLE wikistat ( `time` DateTime, `project` String, `subproject` String, `path` String, `hits` UInt64 ) ENGINE = MergeTree ORDER BY (time)
```
Letβs populate this table with 1b records:
```
INSERT INTO wikistat SELECT * FROM s3('https://ClickHouse-public-datasets.s3.amazonaws.com/wikistat/partitioned/wikistat*.native.zst') LIMIT 1e9 0 rows in set. Elapsed: 421.868 sec. Processed 2.00 billion rows, 100.89 GB (4.74 million rows/s., 239.15 MB/s.)
```
The most popular requirement is to aggregate data based on periods, e.g. get the total amount of hits for each day:
```
SELECT sum(hits) AS h, toDate(time) AS d FROM wikistat GROUP BY d ORDER BY d ASC LIMIT 5 βββββββββhββ¬ββββββββββdββ β 31045470 β 2015-05-01 β β 30309226 β 2015-05-02 β β 33704223 β 2015-05-03 β β 34700248 β 2015-05-04 β β 34634353 β 2015-05-05 β ββββββββββββ΄βββββββββββββ 5 rows in set. Elapsed: 0.264 sec. Processed 1.00 billion rows, 12.00 GB (3.78 billion rows/s., 45.40 GB/s.)
β
```
Weβve used [toDate()](https://clickhouse.com/docs/en/sql-reference/functions/type-conversion-functions/#todate) function here, which converts the specified time to a date type. Alternatively, we can batch by an hour and filter on the specific date:
```
SELECT sum(hits) AS v, toStartOfHour(time) AS h FROM wikistat WHERE date(time) = '2015-05-01' GROUP BY h ORDER BY h ASC LIMIT 5 ββββββββvββ¬βββββββββββββββββββhββ β 1199168 β 2015-05-01 01:00:00 β β 1207276 β 2015-05-01 02:00:00 β β 1189451 β 2015-05-01 03:00:00 β β 1119078 β 2015-05-01 04:00:00 β β 1037526 β 2015-05-01 05:00:00 β βββββββββββ΄ββββββββββββββββββββββ 5 rows in set. Elapsed: 0.013 sec. Processed 7.72 million rows, 92.54 MB (593.64 million rows/s., 7.12 GB/s.)
β
```
The [`toStartOfHour()`](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofhour) function used here converts the given time to the start of the hour. ClickHouse has [batching functions](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofyear) for generating periods that cover almost all imaginable cases, allowing you to group by year, month, day, hour, or even arbitrary intervals, e.g., [5 minutes](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartoffiveminutes), easily.
We can also use the [toStartOfInterval()](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofintervaltime_or_data-interval-x-unit--time_zone) function to group by custom intervals. Letβs say we want to group by 4-hour intervals:
```
SELECT sum(hits) AS v, toStartOfInterval(time, INTERVAL 4 HOUR) AS h FROM wikistat WHERE date(time) = '2015-05-01' GROUP BY h ORDER BY h ASC LIMIT 6 ββββββββvββ¬βββββββββββββββββββhββ β 3595895 β 2015-05-01 00:00:00 β β 4161080 β 2015-05-01 04:00:00 β β 4602523 β 2015-05-01 08:00:00 β β 6072107 β 2015-05-01 12:00:00 β β 6604783 β 2015-05-01 16:00:00 β β 6009082 β 2015-05-01 20:00:00 β βββββββββββ΄ββββββββββββββββββββββ 6 rows in set. Elapsed: 0.020 sec. Processed 7.72 million rows, 92.54 MB (386.78 million rows/s., 4.64 GB/s.)
β
```
With the [`toStartOfInterval()`](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#tostartofintervaltime_or_data-interval-x-unit--time_zone) function, we use [INTERVAL](https://clickhouse.com/docs/en/sql-reference/data-types/special-data-types/interval/) clause to set the required batching period:
In a lot of cases we deal with sparse data with some absent intervals. This results in empty buckets. Letβs take the following example where we group data by 1-hour intervals. This will out the following stats with some hours missing values:
```
SELECT toStartOfHour(time) AS h, sum(hits) FROM wikistat WHERE (project = 'it') AND (subproject = 'm') AND (date(time) = '2015-06-12') GROUP BY h ORDER BY h ASC ββββββββββββββββββββhββ¬βsum(hits)ββ β 2015-06-12 00:00:00 β 16246 β β 2015-06-12 01:00:00 β 7900 β β 2015-06-12 02:00:00 β 4517 β β 2015-06-12 03:00:00 β 2982 β β 2015-06-12 04:00:00 β 2748 β β 2015-06-12 05:00:00 β 4581 β β 2015-06-12 06:00:00 β 8908 β β 2015-06-12 07:00:00 β 13514 β β 2015-06-12 08:00:00 β 18327 β β 2015-06-12 09:00:00 β 22541 β β 2015-06-12 10:00:00 β 25366 β β 2015-06-12 11:00:00 β 25383 β β 2015-06-12 12:00:00 β 29074 β <- missing values β 2015-06-12 23:00:00 β 27199 β βββββββββββββββββββββββ΄ββββββββββββ 14 rows in set. Elapsed: 0.029 sec. Processed 6.98 million rows, 225.76 MB (237.19 million rows/s., 7.67 GB/s.)
β
```
ClickHouse provides the [WITH FILL](https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier) modifier to address this. This will fill out all the empty hours with zeros, so we can better understand distribution over time:
```
SELECT toStartOfHour(time) AS h, sum(hits) FROM wikistat WHERE (project = 'it') AND (subproject = 'm') AND (date(time) = '2015-06-12') GROUP BY h ORDER BY h ASC WITH FILL STEP toIntervalHour(1) ββββββββββββββββββββhββ¬βsum(hits)ββ β 2015-06-12 00:00:00 β 16246 β β 2015-06-12 01:00:00 β 7900 β β 2015-06-12 02:00:00 β 4517 β β 2015-06-12 03:00:00 β 2982 β β 2015-06-12 04:00:00 β 2748 β β 2015-06-12 05:00:00 β 4581 β β 2015-06-12 06:00:00 β 8908 β β 2015-06-12 07:00:00 β 13514 β β 2015-06-12 08:00:00 β 18327 β β 2015-06-12 09:00:00 β 22541 β β 2015-06-12 10:00:00 β 25366 β β 2015-06-12 11:00:00 β 25383 β β 2015-06-12 12:00:00 β 29074 β β 2015-06-12 13:00:00 β 0 β β 2015-06-12 14:00:00 β 0 β β 2015-06-12 15:00:00 β 0 β β 2015-06-12 16:00:00 β 0 β β 2015-06-12 17:00:00 β 0 β β 2015-06-12 18:00:00 β 0 β β 2015-06-12 19:00:00 β 0 β β 2015-06-12 20:00:00 β 0 β β 2015-06-12 21:00:00 β 0 β β 2015-06-12 22:00:00 β 0 β β 2015-06-12 23:00:00 β 27199 β βββββββββββββββββββββββ΄ββββββββββββ 24 rows in set. Elapsed: 0.039 sec. Processed 6.98 million rows, 225.76 MB (180.92 million rows/s., 5.85 GB/s.)
β
```
Sometimes, we donβt want to deal with the start of intervals (like the start of the day or an hour) but window intervals. Letβs say we want to understand the total hits for a window, not based on days but on a 24-hour period offset from 6 pm. Weβve used [date\_diff()](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/#date_diff) function to calculate the difference between a basepoint time and each recordβs time. In this case, the `d` column will represent the difference in days (e.g., 1 day ago, 2 days ago, etc.):
```
SELECT sum(hits), dateDiff('day', toDateTime('2015-05-01 18:00:00'), time) AS d FROM wikistat GROUP BY d ORDER BY d ASC LIMIT 5 ββsum(hits)ββ¬βdββ β 31045470 β 0 β β 30309226 β 1 β β 33704223 β 2 β β 34700248 β 3 β β 34634353 β 4 β βββββββββββββ΄ββββ 5 rows in set. Elapsed: 0.283 sec. Processed 1.00 billion rows, 12.00 GB (3.54 billion rows/s., 42.46 GB/s.)
β
```
ClickHouse provides the [bar()](https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#bar) function to build quick visuals and help with the analysis of data. This will quickly visualize the most and least popular hours in terms of page views:
```
SELECT toHour(time) AS h, sum(hits) AS t, bar(t, 0, max(t) OVER (), 50) AS bar FROM wikistat GROUP BY h ORDER BY h ASC βββhββ¬βββββββββtββ¬βbarβββββββββββββββββββββββββββββββββββββββββββββββββ β 0 β 146208847 β βββββββββββββββββββββββββββββββββββββββ β β 1 β 143713140 β ββββββββββββββββββββββββββββββββββββββ β β 2 β 144977675 β βββββββββββββββββββββββββββββββββββββββ β β 3 β 145089174 β βββββββββββββββββββββββββββββββββββββββ β β 4 β 139602368 β βββββββββββββββββββββββββββββββββββββ β β 5 β 130795734 β βββββββββββββββββββββββββββββββββββ β β 6 β 126456113 β ββββββββββββββββββββββββββββββββββ β β 7 β 127328312 β ββββββββββββββββββββββββββββββββββ β β 8 β 131772449 β βββββββββββββββββββββββββββββββββββ β β 9 β 137695533 β βββββββββββββββββββββββββββββββββββββ β β 10 β 143381876 β ββββββββββββββββββββββββββββββββββββββ β β 11 β 146690963 β βββββββββββββββββββββββββββββββββββββββ β β 12 β 155662847 β ββββββββββββββββββββββββββββββββββββββββββ β β 13 β 169130249 β βββββββββββββββββββββββββββββββββββββββββββββ β β 14 β 182213956 β βββββββββββββββββββββββββββββββββββββββββββββββββ β β 15 β 188534642 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 16 β 189214224 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 17 β 186824967 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 18 β 185885433 β βββββββββββββββββββββββββββββββββββββββββββββββββ β β 19 β 186112653 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 20 β 187530882 β ββββββββββββββββββββββββββββββββββββββββββββββββββ β β 21 β 185485979 β βββββββββββββββββββββββββββββββββββββββββββββββββ β β 22 β 175522556 β βββββββββββββββββββββββββββββββββββββββββββββββ β β 23 β 157537595 β ββββββββββββββββββββββββββββββββββββββββββ β ββββββ΄ββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββ 24 rows in set. Elapsed: 0.264 sec. Processed 1.00 billion rows, 12.00 GB (3.79 billion rows/s., 45.53 GB/s.)
β
```
Note how weβve used a window max() to compute the max hits per hour, passing this to the `bar()` function for visualization.
## Counters and Gauge metrics
There are two basic types of metrics we encounter when working with time series:
- Counters are used to count the total number of tracked events sliced by attributes and grouped by a time frame. A popular example here is tracking website visitors.
- Gauges are used to set a metric value that tends to change over time. A good example here is tracking CPU load.
Both metric types are easy to work with in ClickHouse and donβt require any additional configuration. Counters can be easily queried using `count()` or `sum()` functions, depending on the storage policy. To efficiently query for gauges, the [`any()`](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/any/) aggregate function can be used together with [INTERPOLATE](https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier) modifier to fill any missing data points:
```
CREATE TABLE metrics ( `time` DateTime, `name` String, `value` UInt32 ) ENGINE = MergeTree ORDER BY tuple(); INSERT INTO metrics VALUES ('2022-12-28 06:32:16', 'cpu', 7), ('2022-12-28 14:31:22', 'cpu', 50), ('2022-12-28 14:30:30', 'cpu', 25), ('2022-12-28 14:25:36', 'cpu', 10), ('2022-12-28 11:32:08', 'cpu', 5), ('2022-12-28 10:32:12', 'cpu', 5); SELECT toStartOfHour(time) AS h, any(value) AS v FROM metrics GROUP BY h ORDER BY h ASC WITH FILL STEP toIntervalHour(1) INTERPOLATE ( v AS v ) ββββββββββββββββββββhββ¬ββvββ β 2022-12-28 06:00:00 β 7 β β 2022-12-28 07:00:00 β 7 β <- filled β 2022-12-28 08:00:00 β 7 β <- filled β 2022-12-28 09:00:00 β 7 β <- filled β 2022-12-28 10:00:00 β 5 β β 2022-12-28 11:00:00 β 5 β <- filled β 2022-12-28 12:00:00 β 5 β <- filled β 2022-12-28 13:00:00 β 5 β <- filled β 2022-12-28 14:00:00 β 50 β βββββββββββββββββββββββ΄βββββ
```
In this case, highlighted values were automatically filled by ClickHouse, to follow the gauge nature of the metric within a continuous time range.
A popular use case for time series data is to build histograms based on tracked events. Suppose we wanted to understand the distribution of a number of pages based on their total hits for a specific date. We can use the [histogram()](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/parametric-functions/#histogram) function to automatically generate an adaptive histogram based on the number of bins and then use [arrayJoin()](https://clickhouse.com/docs/en/sql-reference/functions/array-join/) and [bar()](https://clickhouse.com/docs/en/sql-reference/functions/other-functions/#bar) to visualize it:
```
WITH histogram(10)(hits) AS h SELECT round(arrayJoin(h).1) AS l, round(arrayJoin(h).2) AS u, arrayJoin(h).3 AS w, bar(w, 0, max(w) OVER (), 20) AS b FROM ( SELECT path, sum(hits) AS hits FROM wikistat WHERE date(time) = '2015-06-15' GROUP BY path HAVING hits > 10000. ) ββββββββlββ¬βββββββuββ¬ββββββwββ¬βbβββββββββββββββββββββ β 10034 β 27769 β 84.375 β ββββββββββββββββββββ β β 27769 β 54281 β 19.75 β βββββ β β 54281 β 79020 β 3.875 β β β β 79020 β 96858 β 2.75 β β β β 96858 β 117182 β 1.25 β β β β 117182 β 173244 β 1 β β β β 173244 β 232806 β 1.125 β β β β 232806 β 405693 β 1.75 β β β β 405693 β 1126826 β 1.125 β β β β 1126826 β 1691188 β 1 β β β βββββββββββ΄ββββββββββ΄βββββββββ΄βββββββββββββββββββββββ 10 rows in set. Elapsed: 0.134 sec. Processed 6.64 million rows, 268.25 MB (49.48 million rows/s., 2.00 GB/s.)
β
```
Weβve filtered only pages with more than 10k views. In the result set, `l` and `r` are the left and right boundaries of the bin, and `w` is a bin width (count of items in this bin).
Sometimes we want to understand how metrics change over time by calculating the difference between consecutive values. Letβs compute daily hits for a given page (`path` column) and the change in this value from the previous day:
```
SELECT toDate(time) AS d, sum(hits) AS h, lagInFrame(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS p, h - p AS trend FROM wikistat WHERE path = 'Ana_Sayfa' GROUP BY d ORDER BY d ASC LIMIT 15 βββββββββββdββ¬ββββββhββ¬ββββββpββ¬ββtrendββ β 2015-05-01 β 214612 β 0 β 214612 β β 2015-05-02 β 211546 β 214612 β -3066 β β 2015-05-03 β 221412 β 211546 β 9866 β β 2015-05-04 β 219940 β 221412 β -1472 β β 2015-05-05 β 211548 β 219940 β -8392 β β 2015-05-06 β 212358 β 211548 β 810 β β 2015-05-07 β 208150 β 212358 β -4208 β β 2015-05-08 β 208871 β 208150 β 721 β β 2015-05-09 β 210753 β 208871 β 1882 β β 2015-05-10 β 212918 β 210753 β 2165 β β 2015-05-11 β 211884 β 212918 β -1034 β β 2015-05-12 β 212314 β 211884 β 430 β β 2015-05-13 β 211192 β 212314 β -1122 β β 2015-05-14 β 206172 β 211192 β -5020 β β 2015-05-15 β 195832 β 206172 β -10340 β ββββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββ 15 rows in set. Elapsed: 0.550 sec. Processed 1.00 billion rows, 28.62 GB (1.82 billion rows/s., 52.00 GB/s.)
β
```
Weβve used [lagInFrame() window](https://clickhouse.com/docs/en/sql-reference/window-functions/) function to get the previous `hits` value, and then used this to calculate the difference as a `trend` column.
Following the previous example, sometimes we want to do the opposite - get a cumulative sum of certain metrics over time. This is usually used for counters to visualize cumulative growth and can be easily implemented using window functions:
```
SELECT toDate(time) AS d, sum(hits) AS h, sum(h) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS c, bar(c, 0, 3200000, 25) AS b FROM wikistat WHERE path = 'Ana_Sayfa' GROUP BY d ORDER BY d ASC LIMIT 15 βββββββββββdββ¬ββββββhββ¬βββββββcββ¬βbββββββββββββββββββββββββββ β 2015-05-01 β 214612 β 214612 β ββ β β 2015-05-02 β 211546 β 426158 β ββββ β β 2015-05-03 β 221412 β 647570 β βββββ β β 2015-05-04 β 219940 β 867510 β βββββββ β β 2015-05-05 β 211548 β 1079058 β βββββββββ β β 2015-05-06 β 212358 β 1291416 β ββββββββββ β β 2015-05-07 β 208150 β 1499566 β ββββββββββββ β β 2015-05-08 β 208871 β 1708437 β ββββββββββββββ β β 2015-05-09 β 210753 β 1919190 β βββββββββββββββ β β 2015-05-10 β 212918 β 2132108 β βββββββββββββββββ β β 2015-05-11 β 211884 β 2343992 β βββββββββββββββββββ β β 2015-05-12 β 212314 β 2556306 β ββββββββββββββββββββ β β 2015-05-13 β 211192 β 2767498 β ββββββββββββββββββββββ β β 2015-05-14 β 206172 β 2973670 β ββββββββββββββββββββββββ β β 2015-05-15 β 195832 β 3169502 β βββββββββββββββββββββββββ β ββββββββββββββ΄βββββββββ΄ββββββββββ΄ββββββββββββββββββββββββββββ 15 rows in set. Elapsed: 0.557 sec. Processed 1.00 billion rows, 28.62 GB (1.80 billion rows/s., 51.40 GB/s.)
β
```
Weβve built cumulative daily hits sum and visualized growth for a given page within a 15-day period.
Calculating metric rates (speed per time unit) is also popular when working with time series. Suppose we want to get a certain page hit rate per second for a given date grouped by hours:
```
SELECT toStartOfHour(time) AS t, sum(hits) AS h, round(h / (60 * 60), 2) AS rate, bar(rate * 10, 0, max(rate * 10) OVER (), 25) AS b FROM wikistat WHERE path = 'Ana_Sayfa' GROUP BY t ORDER BY t ASC LIMIT 23 ββββββββββββββββββββtββ¬βββββhββ¬βrateββ¬βbββββββββββββββββββββββββ β 2015-05-01 01:00:00 β 6749 β 1.87 β βββββββββββββ β β 2015-05-01 02:00:00 β 6054 β 1.68 β ββββββββββββ β β 2015-05-01 03:00:00 β 5823 β 1.62 β ββββββββββββ β β 2015-05-01 04:00:00 β 5908 β 1.64 β ββββββββββββ β β 2015-05-01 05:00:00 β 6131 β 1.7 β ββββββββββββ β β 2015-05-01 06:00:00 β 7067 β 1.96 β ββββββββββββββ β β 2015-05-01 07:00:00 β 8169 β 2.27 β ββββββββββββββββ β β 2015-05-01 08:00:00 β 9526 β 2.65 β βββββββββββββββββββ β β 2015-05-01 09:00:00 β 10474 β 2.91 β βββββββββββββββββββββ β β 2015-05-01 10:00:00 β 10389 β 2.89 β ββββββββββββββββββββ β β 2015-05-01 11:00:00 β 9830 β 2.73 β βββββββββββββββββββ β β 2015-05-01 12:00:00 β 10712 β 2.98 β βββββββββββββββββββββ β β 2015-05-01 13:00:00 β 10301 β 2.86 β ββββββββββββββββββββ β β 2015-05-01 14:00:00 β 10181 β 2.83 β ββββββββββββββββββββ β β 2015-05-01 15:00:00 β 10324 β 2.87 β ββββββββββββββββββββ β β 2015-05-01 16:00:00 β 10497 β 2.92 β βββββββββββββββββββββ β β 2015-05-01 17:00:00 β 10676 β 2.97 β βββββββββββββββββββββ β β 2015-05-01 18:00:00 β 11121 β 3.09 β ββββββββββββββββββββββ β β 2015-05-01 19:00:00 β 11277 β 3.13 β ββββββββββββββββββββββ β β 2015-05-01 20:00:00 β 11553 β 3.21 β βββββββββββββββββββββββ β β 2015-05-01 21:00:00 β 11637 β 3.23 β βββββββββββββββββββββββ β β 2015-05-01 22:00:00 β 11298 β 3.14 β ββββββββββββββββββββββ β β 2015-05-01 23:00:00 β 8915 β 2.48 β ββββββββββββββββββ β βββββββββββββββββββββββ΄ββββββββ΄βββββββ΄ββββββββββββββββββββββββββ 23 rows in set. Elapsed: 0.572 sec. Processed 1.00 billion rows, 28.62 GB (1.75 billion rows/s., 50.06 GB/s.)
β
```
The general approach to optimizing storage efficiency is using [optimal data types](https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema). Letβs take the `project` and `subprojects` columns. These columns are of type String, but have a relatively small amount of unique values:
```
SELECT uniq(project), uniq(subproject) FROM wikistat ββuniq(project)ββ¬βuniq(subproject)ββ β 1095 β 99 β βββββββββββββββββ΄βββββββββββββββββββ 1 row in set. Elapsed: 0.895 sec. Processed 1.00 billion rows, 20.43 GB (1.12 billion rows/s., 22.84 GB/s.)
β
```
This means we can use the [LowCardinality()](https://clickhouse.com/docs/en/sql-reference/data-types/lowcardinality/) data type, which uses dictionary-based encoding. This causes ClickHouse to store the internal value ID instead of the original string value, which in turn saves a lot of space:
```
ALTER TABLE wikistat MODIFY COLUMN `project` LowCardinality(String), MODIFY COLUMN `subproject` LowCardinality(String)
```
Weβve also used [UInt64](https://clickhouse.com/docs/en/sql-reference/data-types/int-uint/) type for the `hits` column, which takes 8 bytes, but has a relatively small max value:
```
SELECT max(hits) FROM wikistat ββmax(hits)ββ β 237913 β βββββββββββββ
β
```
Given this value, we can use `UInt32` instead, which takes only 4 bytes, and allows us to store up to ~4b as a max value:
```
ALTER TABLE wikistat MODIFY COLUMN `hits` UInt32
```
This will reduce the size of this column in memory by at least 2 times. Note that the size on disk will remain unchanged due to compression. But be careful, pick data types that are not too small\!
When we deal with sequential data, which time-series data effectively is, we can further improve storage efficiency by using [special codecs](https://clickhouse.com/docs/en/sql-reference/statements/create/table/#specialized-codecs). The general idea is to store changes between values instead of absolute values themselves, which results in much less space needed when dealing with slowly changing data:
```
ALTER TABLE wikistat MODIFY COLUMN `time` CODEC(Delta, ZSTD)
```
Weβve used [Delta](https://clickhouse.com/docs/en/sql-reference/statements/create/table/#delta) codec for `time` column, which fits time series data best. The [right ordering key](https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-cardinality/) can also save disk space. Since we usually want to filter by a path, we should also add this to the key. This requires recreation of the table. Letβs wrap it all and compare storage efficiency with and without optimized types:
| **Unoptimized table** | **Optimized table** |
|---|---|
| **11\.09 GiB** | **1\.68 GiB** |
As we can see, we have optimized storage by ten times without any actual loss in data. For further details on optimizing storage using types and codecs, see our recent blog [Optimizing ClickHouse with Schemas and Codecs](https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema).
Before attempting other optimizations, users should optimize their [ordering key](https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-cardinality/) to ensure ClickHouse produces the fastest possible results. Choosing the key right largely depends on the queries youβre going to run. Suppose most of our queries filter by `project` and `subproject` columns. In this case, its a good idea to add them to the ordering key - as well as the `time` column since we query on time as well:
```
CREATE TABLE optimized_wikistat (β¦) ENGINE = MergeTree ORDER BY (project, subproject, time)
```
Letβs now compare multiple queries to get an idea of how essential our ordering key expression is to performance. Note that we have also applied our previous data type and codec optimizations:
| **Query** | **Ordering Key** | |
|---|---|---|
| | **(time)** | **(project, subproject, time)** |
| | [0\.518 sec β](https://sql.clickhouse.com/?query_id=HPGEZK7USHGDUGTJTHJAYV) | [0\.258 sec β](https://sql.clickhouse.com/?query_id=GFCCTYVJ3YVGMTOFXDQZJT) |
| | [0\.67 sec β](https://sql.clickhouse.com/?query_id=2Q8G2SWXOOSHHVVBQB8ND2) | [0\.025 sec β](https://sql.clickhouse.com/?query_id=1G21SEVPJZM1XFRTPR6MVR) |
| | [0\.65 sec β](https://sql.clickhouse.com/?query_id=4TB6QY6HMELIW5PUWRNG4M) | [0\.014 sec β](https://sql.clickhouse.com/?query_id=9EBNNA5MNGKSDJMNLGMCHB) |
| | [0\.148 sec β](https://sql.clickhouse.com/?query_id=WW6JBMONCU5VKCKQXSYTFM) | [0\.010 sec β](https://sql.clickhouse.com/?query_id=3EYNNE9LVWLHSAH2MD4N3Y) |
Note how we got a 2β¦40x performance increase by picking up a more appropriate ordering key. For further details on choosing a primary key, including how to decide the order of the columns, read our excellent guide [here](https://clickhouse.com/docs/en/guides/improving-query-performance/sparse-primary-indexes/sparse-primary-indexes-intro).
Another option is to use [materialized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#materialized-view) to aggregate and store the results of popular queries. These results can be queried instead of the original table. Suppose the following query is executed quite often in our case:
```
SELECT path, SUM(hits) AS v FROM wikistat WHERE toStartOfMonth(time) = '2015-05-01' GROUP BY path ORDER BY v DESC LIMIT 10 ββpathβββββββββββββββββββ¬ββββββββvββ β - β 89742164 β β Angelsberg β 19191582 β β Ana_Sayfa β 6376578 β β Academy_Awards β 4901470 β β Accueil_(homonymie) β 3810047 β β 2015_in_spaceflight β 2077195 β β Albert_Einstein β 1621175 β β 19_Kids_and_Counting β 1432484 β β 2015_Nepal_earthquake β 1406457 β β Alive β 1390624 β βββββββββββββββββββββββββ΄βββββββββββ 10 rows in set. Elapsed: 1.016 sec. Processed 256.84 million rows, 10.17 GB (252.69 million rows/s., 10.01 GB/s.)
β
```
We can create the following materialized view:
```
CREATE MATERIALIZED VIEW blogs.wikistat_top ENGINE = SummingMergeTree ORDER BY (month, hits) POPULATE AS SELECT path, toStartOfMonth(time) AS month, sum(hits) AS hits FROM blogs.wikistat GROUP BY path, month 0 rows in set. Elapsed: 8.589 sec. Processed 1.00 billion rows, 40.52 GB (116.43 million rows/s., 4.72 GB/s.)
```
Now we can query the materialized view instead of the original table:
```
SELECT path, hits FROM wikistat_top WHERE month = '2015-05-01' ORDER BY hits DESC LIMIT 10 ββpathβββββββββββββββββββ¬βββββhitsββ β - β 89742164 β β Angelsberg β 19191582 β β Ana_Sayfa β 6376578 β β Academy_Awards β 4901470 β β Accueil_(homonymie) β 3810047 β β 2015_in_spaceflight β 2077195 β β Albert_Einstein β 1621175 β β 19_Kids_and_Counting β 1432484 β β 2015_Nepal_earthquake β 1406457 β β Alive β 1390624 β βββββββββββββββββββββββββ΄βββββββββββ 10 rows in set. Elapsed: 0.005 sec. Processed 24.58 thousand rows, 935.16 KB (5.26 million rows/s., 200.31 MB/s.)
β
```
Our performance improvement here is dramatic. We will publish a blog post on materialized views soon, so watch this space\!
ClickHouse is efficient in storage and queries and easily scalable to Petabytes, maintaining the same level of performance and simplicity. In a future post, we will explore techniques for scaling to almost 400 billion rows using the [full Wikistat dataset](https://clickhouse.com/docs/en/getting-started/example-datasets/wikistat). We will show how you can scale in terms of storage and processing capacity using our Cloud service, which separates storage and compute and deals with this automatically, or by using a manual clustering solution.
In this post, we have shown how you can efficiently store and query time series data using the power of SQL and the performance of ClickHouse. Given this, you wonβt need to install additional extensions or tools to collect and process time series, as ClickHouse has everything in place. |
| Shard | 89 (laksa) |
| Root Hash | 12633450985039531489 |
| Unparsed URL | com,clickhouse!/blog/working-with-time-series-data-and-functions-ClickHouse s443 |