Skip to main content

· 7 min read

See this example using clickhouse client and ClickHouse Cloud service.

create a query_cache_test table

Using clickhouse client

clickhouse-cloud :) CREATE TABLE query_cache_test (name String, age UInt8) ENGINE =MergeTree ORDER BY name

CREATE TABLE query_cache_test
(
`name` String,
`age` UInt8
)
ENGINE = MergeTree
ORDER BY name

Query id: 81c54f09-7de4-48ec-916f-c7c304a46931

Ok.

0 rows in set. Elapsed: 0.343 sec.

fill the table with some data:

clickhouse-cloud :) INSERT INTO query_cache_test SELECT * FROM generateRandom('name String, age UInt8',1,1) LIMIT 100000;

INSERT INTO query_cache_test SELECT *
FROM generateRandom('name String, age UInt8', 1, 1)
LIMIT 100000

Query id: 90369105-bd67-494c-bdaf-d90dbfb6def9

Ok.

0 rows in set. Elapsed: 0.173 sec. Processed 327.05 thousand rows, 3.43 MB (1.89 million rows/s., 19.86 MB/s.)

enable trace logs:

clickhouse-cloud :) SET send_logs_level = 'trace'

SET send_logs_level = 'trace'

Query id: d65490b0-7960-4a85-a343-787e70e5e293

Ok.

0 rows in set. Elapsed: 0.134 sec.

run a query asking to make use of query cache (appending SETTINGS use_query_cache=true to the query):

clickhouse-cloud :) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true;

SELECT name
FROM query_cache_test
WHERE age > 1000
FORMAT `Null`
SETTINGS use_query_cache = 1

Query id: 3754a7fd-b786-47c1-a258-dfbc75e35a04

[c-red-qc-36-server-0] 2023.05.29 12:06:10.542408 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> executeQuery: (from 151.53.3.113:50412, user: tony) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true; (stage: Complete)
[c-red-qc-36-server-0] 2023.05.29 12:06:10.542744 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:10.542900 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543020 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543164 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543226 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> InterpreterSelectQuery: FetchColumns -> Complete
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543337 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Key condition: unknown
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543395 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 12/12 marks by primary key, 12 marks to read from 1 ranges
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543412 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Spreading mark ranges among streams (default reading)
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543461 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> MergeTreeBaseSelectProcessor: PREWHERE condition was split into 1 steps: "greater(age, 1000)"
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543484 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> MergeTreeInOrderSelectProcessor: Reading 1 ranges in order from part all_0_0_0, approx. 100000 rows starting from 0
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543559 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: No entry found for query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547760 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: Stored result of query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547827 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> executeQuery: Read 100000 rows, 97.66 KiB in 0.005508 sec., 18155410.31227306 rows/sec., 17.31 MiB/sec.
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547913 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> MemoryTracker: Peak memory usage (for query): 451.89 KiB.
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547933 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> TCPHandler: Processed in 0.005911032 sec.
Ok.

0 rows in set. Elapsed: 0.006 sec. Processed 100.00 thousand rows, 100.00 KB (17.56 million rows/s., 17.56 MB/s.)

run the same query again:

clickhouse-cloud :) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true;

SELECT name
FROM query_cache_test
WHERE age > 1000
FORMAT `Null`
SETTINGS use_query_cache = 1

Query id: a047527c-9d55-4e6b-9747-0ccad8787515

[c-red-qc-36-server-0] 2023.05.29 12:06:17.931007 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> executeQuery: (from 151.53.3.113:50412, user: tony) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true; (stage: Complete)
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931331 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931468 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931585 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931696 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931749 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> InterpreterSelectQuery: FetchColumns -> Complete
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931857 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Key condition: unknown
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931891 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 12/12 marks by primary key, 12 marks to read from 1 ranges
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931913 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Spreading mark ranges among streams (default reading)
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931952 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> MergeTreeBaseSelectProcessor: PREWHERE condition was split into 1 steps: "greater(age, 1000)"
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931975 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> MergeTreeInOrderSelectProcessor: Reading 1 ranges in order from part all_0_0_0, approx. 100000 rows starting from 0
[c-red-qc-36-server-0] 2023.05.29 12:06:17.932043 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> QueryCache: Entry found for query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:17.932551 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> MemoryTracker: Peak memory usage (for query): 5.19 KiB.
[c-red-qc-36-server-0] 2023.05.29 12:06:17.932581 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> TCPHandler: Processed in 0.001961411 sec.
Ok.

0 rows in set. Elapsed: 0.002 sec.

Now observe the differences in the TRACE logs related to QueryCache between,

1st execution:

[c-red-qc-36-server-0] 2023.05.29 12:06:10.543559 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: No entry found for query SELECT name FROM query_cache_test  WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547760 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: Stored result of query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS

at 2nd execution:

[c-red-qc-36-server-0] 2023.05.29 12:06:17.932043 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> QueryCache: Entry found for query SELECT name FROM query_cache_test  WHERE age > 1000 FORMAT `Null` SETTINGS

In the 1st execution, no entry was obviously found (No entry found for query SELECT...), so ClickHouse did store (Stored result of query SELECT...) the entry for us.

In the 2nd execution, they query made use of they query cache as it found the entry already stored (Entry found for query SELECT...).

Using just SQL

Just through issuing SQL commands without inspecting the clickhouse client trace logs,

it is also possible to validate if query cache is being used by checking the relevant system tables:

clickhouse-cloud :) SELECT 1 SETTINGS use_query_cache=true;

SELECT 1
SETTINGS use_query_cache = 1

Query id: a5a078c7-61e5-4036-a6f0-4d602d5b72d2

┌─1─┐
1
└───┘

1 row in set. Elapsed: 0.001 sec.

clickhouse-cloud :) SELECT 1 SETTINGS use_query_cache=true;

SELECT 1
SETTINGS use_query_cache = 1

Query id: 322ae001-b1ab-463f-ac8d-dc5ba346f3f9

┌─1─┐
1
└───┘

1 row in set. Elapsed: 0.001 sec.

clickhouse-cloud :) SELECT * FROM clusterAllReplicas(default,system.query_cache);

SELECT *
FROM clusterAllReplicas(default, system.query_cache)

Query id: c9b57eac-ba64-430e-8d51-8f865a13cc25

┌─query──────────────┬─result_size─┬─stale─┬─shared─┬─compressed─┬──────────expires_at─┬─────────────key_hash─┐
SELECT 1 SETTINGS │ 1360112023-08-02 15:08:2312188185624808016954
└────────────────────┴─────────────┴───────┴────────┴────────────┴─────────────────────┴──────────────────────┘

1 row in set. Elapsed: 0.005 sec.

clickhouse-cloud :) SELECT * FROM clusterAllReplicas(default,system.events) WHERE event LIKE 'QueryCache%'

SELECT *
FROM clusterAllReplicas(default, system.events)
WHERE event LIKE 'QueryCache%'

Query id: d536555e-b8ab-4cd4-9741-c04e95612bec

┌─event────────────┬─value─┬─description────────────────────────────────────────────────────────────────────────────────────────────┐
│ QueryCacheHits │ 1 │ Number of times a query result has been found in the query cache (and query computation was avoided).
│ QueryCacheMisses │ 1 │ Number of times a query result has not been found in the query cache (and required query computation).
└──────────────────┴───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────┘

2 rows in set. Elapsed: 0.006 sec.

In the last results we see 1 QueryCacheMisses for the first time the query SELECT 1 SETTINGS use_query_cache=true; ran and a QueryCacheHits event related to the second execution of the query.

Keep also in mind that the default maximum cache entry size is 1048576 bytes (= 1 MiB) and by default results are stored in cache for 60 seconds only (you can use query_cache_ttl=300 in SETTINGS for example to have a query cache result stored for 5 minutes instead).

You can find more detailed info on ClickHouse Query Cache here

· One min read

Question

My query is returning many rows but I'm only interested in the query processing time. How do I omit the query output and check for query processing time?

Answer

Append FORMAT Null to your query to configure the output format to Null. This prevents data from being transmitted to the client.

For example:

SELECT
customer_id,
count() AS total,
any(review_headline)
FROM amazon_reviews
GROUP BY customer_id
ORDER BY total DESC
FORMAT Null

The response will return the number of rows processed and the elapsed time, but 0 rows will be returned:

0 rows in set. Elapsed: 25.288 sec. Processed 222.04 million rows, 13.50 GB (8.78 million rows/s., 533.77 MB/s.)

· One min read

Question

How can a ClickHouse Cloud service be started, stopped and resumed using API endpoints?

Answer

  1. To wake up/resume a Cloud service from an idle state, you can ping the instance:
curl -X GET https://abc123.us-west-2.aws.clickhouse.cloud:8443/ping
  1. To stop a Cloud service, use the /state endpoint along with the stop command. The syntax looks like:
curl -X PATCH https://api.clickhouse.cloud/v1/organizations/<org_uuid>/services/<service_uuid>/state -u <key_id>:<key_secret> -H "Content-Type: application/json" -d ''{"command": "<stop|start>"}''

For example, the following command stops the 2e2124ca-c5ac-459d-a6f2-abc123549d2a service:

curl -X PATCH https://api.clickhouse.cloud/v1/organizations/123abcd0-e9b5-4f55-9e42-0fb04392445c/services/2e2124ca-c5ac-459d-a6f2-abc123549d2a/state -u abc123:ABC123 -H "Content-Type: application/json" -d '{"command": "stop"}'

The output looks like:

{"result":{"id":"2e2124ca-c5ac-459d-a6f2-abc123549d2a","name":"mars-s3","provider":"aws","regionId":"us-west-2","state":"stopping","endpoints":[{"protocol":"nativesecure","host":"abc123.us-west-2.aws.clickhouse.cloud","port":9440},{"protocol":"https","host":"abc123ntrb.us-west-2.aws.clickhouse.cloud","port":8443}],"tier":"production","idleScaling":true,"idleTimeoutMinutes":5,"minTotalMemoryGb":24,"maxTotalMemoryGb":48,"ipAccessList":[{"source":"[0.0.0.0/0](http://0.0.0.0/0)","description":"Anywhere"}],"createdAt":"2022-10-21T18:46:31Z"},"status":200}%
  1. To start the service again, use the start command:
curl -X PATCH https://api.clickhouse.cloud/v1/organizations/123abcd0-e9b5-4f55-9e42-0fb04392445c/services/2e2124ca-c5ac-459d-a6f2-abc123549d2a/state -u abc123:ABC123 -H "Content-Type: application/json" -d '{"command": "start"}'
note

Here are the various states that a service can be in:

"state":"stopping"
"state":"stopped"
"state":"starting"
"state":"running"
"state":"idle"
note

A Cloud service that is "idle" is considered started, so a start command will not resume/wake it up. Use the ping endpoint shown in Step 1 to wake up a service.

· 2 min read

Question

When running ClickHouse in Docker, Docker is complaining about the lack of CAP_IPC_LOCK and CAP_SYS_NICE capabilities in the system. How can I resolve it?

Here is what the no CAP_SYS_NICE or CAP_SYS_NICE capability log messages look like:

docker run -d --name clickhouse-server \
--ulimit nofile=262144:262144 \
--network clickhouse-net \
-p 8123:8123 -p 9000:9000 -p 9009:9009 -p 9363:9363 \
clickhouse/clickhouse-server:23.2
2023.04.19 08:04:10.022720 [ 1 ] {} <Information> Application: It looks like the process has no CAP_IPC_LOCK capability, binary mlock will be disabled. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_ipc_lock=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

2023.04.19 08:04:10.065860 [ 1 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_priority' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

Answer

  1. Add two --cap-add arguments to provide the container with the IPC_LOCK and SYS_NICE capabilities:
docker run -d --name clickhouse-server \
--cap-add=SYS_NICE \
--cap-add=IPC_LOCK \
--ulimit nofile=262144:262144 \
--network clickhouse-net \
-p 8123:8123 -p 9000:9000 -p 9009:9009 -p 9363:9363 \
clickhouse/clickhouse-server:23.2
  1. Check that the capabilities are visible in the container using the following command:
apt-get update > /dev/null && apt-get install -y libcap2-bin > /dev/null && capsh --print

The response is similar to:

debconf: delaying package configuration, since apt-utils is not installed
WARNING: libcap needs an update (cap=40 should have a name).
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap+ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
  1. Manually set both the capabilities for ClickHouse
setcap "cap_ipc_lock=+ep cap_sys_nice=+ep" /usr/bin/clickhouse
  1. Check that the capabilities are applied.
getcap -v /usr/bin/clickhouse

You should see the following:

/usr/bin/clickhouse = cap_ipc_lock,cap_sys_nice+ep
  1. Restart the ClickHouse server and the log messages should not be shown.

Check out this article on Linux capabilities for more details.

· 2 min read

The following useful query shows which of your executed queries used the most memory. A couple of comments about this query:

  • the results are computed from the past day (now() - toIntervalDay(1))) but you can easily modify the time interval
  • it assumes you have a cluster named default, which is the name of your cluster in ClickHouse Cloud. Change default to the name of your cluster
  • if you do not have a cluster, see the query listed at the end of this article
SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
clusterAllReplicas(default, system.query_log)
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

The response looks like:

┌─nb_query─┬─user────┬─query─────────────────────────────────────────────────────────┬───memory─┬─normalized_query_hash─┐
│ 11 │ default │ select version() │ 46178924 │ 7202516440347714159 │
│ 2 │ default │ SELECT * FROM "system"."table_functions" LIMIT 31 OFFSET 0 │ 8391544 │ 12830067173062987695 │
└──────────┴─────────┴───────────────────────────────────────────────────────────────┴──────────┴───────────────────────┘
note

If you do not have a system.query_log table, then you likely do not have query logging enabled. View the details of the query_log setting for details on how to enable it.

If you do not have a cluster, use can just query your one system.query_log table directly:

SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
system.query_log
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

· One min read

We often get asked about a good schema migration tool for ClickHouse and what is the best practice to manage database schemas in ClickHouse that might change over time? There is no standard schema migration tool for ClickHouse, but we have compiled the following list (in no particular order) of automatic schema migration tools with support for ClickHouse that we know:

· One min read

Question

How do I view the number of active or queued mutations?

Answer

Monitoring the number of active or queued mutations is important if you are performing a lot of ALTER or UPDATE statements on your tables. These queries rewrite data parts and are not atomic - they are ordered by their creation part and applied to each part in that order. You can find more details on mutations in the docs.

Each mutation generates an entry in the system.mutations table. When performing a large number of mutations, you can monitor the count running and queued mutations with this:

SELECT
hostname() AS host,
count()
FROM clusterAllReplicas('default', 'system.mutations')
GROUP BY host;
note

This query assumes you are running a cluster named default, which is the name of your cluster in ClickHouse Cloud. Replace default with the name of your cluster.

If you do not have a cluster, use this command:

SELECT
hostname() AS host,
count()
FROM system.mutations
GROUP BY host;

We also recommend reading this recent blog on updates and deletes.

· 2 min read

ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with ZooKeeper, but it might not be obvious why you should use ClickHouse Keeper instead of ZooKeeper. This article discusses some of the benefits of Keeper.

Answer

ClickHouse Cloud uses clickhouse-keeper at large scale for thousands of services in a multi-tenant environment. We designed and built Keeper so that we could remove our dependency on the Java-based ZooKeeper implementation. ClickHouse Keeper solves many well-known drawbacks of ZooKeeper and makes additional improvements, including:

  • Snapshots and logs consume much less disk space due to better compression
  • No limit on the default packet and node data size (it is 1 MB in ZooKeeper)
  • No zxid overflow issue (it forces a restart for every 2B transactions in ZooKeeper)
  • Faster recovery after network partitions due to the use of a better distributed consensus protocol
  • It uses less memory for the same volume of data
  • It is easier to setup, and it does not require specifying the JVM heap size or a custom garbage collection implementation
  • A few custom commands in the protocol enable faster operations in ReplicatedMergeTree tables
  • A larger coverage by Jepsen tests

In addition, ClickHouse Support has observed a massive decrease in cluster problems in cases with sites who use clickhouse-keeper rather than ZooKeeper.

Check out the Keeper docs page for more details on how to configure and run ClickHouse Keeper.

· 2 min read

If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization setting to optimize serialization.

This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:

SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM table_name
FORMAT Vertical

For example, we ran this query above on the environmental sensors dataset table named sensors which has over 20B rows and 19 columns:

SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM sensors
FORMAT Vertical

Here is response:


Row 1:
──────
round(avg(equals(sensor_id, defaultValueOfArgumentType(sensor_id))), 3): 0
round(avg(equals(sensor_type, defaultValueOfArgumentType(sensor_type))), 3): 0.159
round(avg(equals(location, defaultValueOfArgumentType(location))), 3): 0
round(avg(equals(lat, defaultValueOfArgumentType(lat))), 3): 0.001
round(avg(equals(lon, defaultValueOfArgumentType(lon))), 3): 0.001
round(avg(equals(timestamp, defaultValueOfArgumentType(timestamp))), 3): 0
round(avg(equals(P1, defaultValueOfArgumentType(P1))), 3): 0.474
round(avg(equals(P2, defaultValueOfArgumentType(P2))), 3): 0.475
round(avg(equals(P0, defaultValueOfArgumentType(P0))), 3): 0.995
round(avg(equals(durP1, defaultValueOfArgumentType(durP1))), 3): 0.999
round(avg(equals(ratioP1, defaultValueOfArgumentType(ratioP1))), 3): 0.999
round(avg(equals(durP2, defaultValueOfArgumentType(durP2))), 3): 1
round(avg(equals(ratioP2, defaultValueOfArgumentType(ratioP2))), 3): 1
round(avg(equals(pressure, defaultValueOfArgumentType(pressure))), 3): 0.83
round(avg(equals(altitude, defaultValueOfArgumentType(altitude))), 3): 1
round(avg(equals(pressure_sealevel, defaultValueOfArgumentType(pressure_sealevel))), 3): 1
round(avg(equals(temperature, defaultValueOfArgumentType(temperature))), 3): 0.532
round(avg(equals(humidity, defaultValueOfArgumentType(humidity))), 3): 0.544

1 row in set. Elapsed: 992.041 sec. Processed 20.69 billion rows, 1.39 TB (20.86 million rows/s., 1.40 GB/s.)

From the results above:

  • the sensor_id columns is not sparse at all. In fact, every row has a non-zero value
  • the sensor_type is only sparse about 15.9% of the time
  • the P0 column is very sparse: 99.9% of the values are zero
  • the pressure column is quite sparse at 83%
  • and temperature column has 53.2% of its values missing or zero

Like we said, it's a handy query for computing how sparse your columns are in a ClickHouse table!