John Tipper
Navigate back to the homepage

How to execute Liquibase changesets against ClickHouse

John Tipper
February 8th, 2021 · 4 min read

Liquibase vs ClickHouse

I wanted to run a column database inside AWS for the storage of click data for a personal project. The quantity of data is initially small, but I wanted to be able to scale it up if I needed it to. Whilst Redshift is a viable technology choice, it starts at $0.25/hour, so the cost soon adds up. Running something on EC2 is more to my liking (I also want to be able to turn it off completely when I’m not using it) - so, enter ClickHouse.

ClickHouse is a column-oriented database which was developed by Yandex. It will happily deal with petabytes of data and is blindingly quick. So quick, in fact, that ClickHouse will leave Redshift for dust: here is the benchmark. There are also a useful discussion in this blog by Ned McClain here. Whilst running a production instance at scale is always involved and you may wish to look at a managed service (Altinity offer one), if you just want to run ClickHouse on an EC2 instance for demo purposes, then it’s available as a Docker image from DockerHub.

Liquibase and the liquibase-clickhouse extension

Now, if we’re going to set up a database, then we want to make sure that our database schema matches what our code expects it to be. We specifically do not want to have to log in manually and run data schema conversion commands. That way lie tears, outages and late nights spent trying to fix things that haven’t gone to plan. No, what we want is a scripted mechanism for applying schema changes in a controlled fashion.

Liquibase is an opensource database schema management tool which does exactly what we want. It allows us to apply a sequence of changes to our database, where the changes may be specified in a number of formats, including SQL. Sounds great, right? Unfortunately, the ClickHouse homepage shows that ClickHouse is not a supported database.

There is a project on GitHub here which appears to be a Liquibase extension for ClickHouse, meaning we should be able to apply Liquibase database schema updates to ClickHouse. Caveat: if we can only work out how to use it, because there is zero documentation. This blog post will show you how to do just that.

Starting ClickHouse

Let’s start ClickHouse and check that it starts ok:

1mkdir -p clickhouse_data
2docker run --ulimit nofile=262144:262144 --volume=$(pwd)/clickhouse_data:/var/lib/clickhouse -p 8123:8123 yandex/clickhouse-server
3
4# now check status
5$ curl -s http://localhost:8123/ping
6Ok.

ClickHouse UI

We want to be able to look at the database and see if our changes are applied ok, so let’s now start up a web client for ClickHouse:

1docker run -d -p 8080:80 spoonest/clickhouse-tabix-web-client

Now we can browse to http://localhost:8080 and see the UI - we will need to use the username default with no password in order to log in. We’ll get something that looks like the below:

Tabix web UI

The liquibase-clickhouse extension

Getting this working took me an annoyingly long period of time. We will execute Liquibase against our running ClickHouse database via a Docker image. In order for us to do this successfully, do the following;

  1. Create a directory for Liquibase and the components we will use.
1# root directory for this tutorial
2mkdir liquibase
3
4# directory for our changesets
5mkdir -p liquibase/changelog/sql
  1. Now create the liquibase-clickhouse jar with all its dependencies. The source code is in GitHub here: https://github.com/MEDIARITHMICS/liquibase-clickhouse.
1# assumes we are in the liquibase top-level directory we created above.
2git clone git@github.com:MEDIARITHMICS/liquibase-clickhouse.git
3cd liquibase-clickhouse
4mvn package
  1. Create the Liquibase root changeset file, which tells Liquibase what changes to make to our database. We are going to put all the changes in a series of files within a subdirectory, one change per file, so this root changeset file is simply a pointer to the other changes. Create a file at liquibase/changelog/root-changelog.xml:
1<?xml version="1.0" encoding="UTF-8"?>
2<databaseChangeLog
3xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
4xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
6 http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">
7
8<includeAll path="sql/"/>
9
10</databaseChangeLog>

Note how we refer to a sub-directory called sql: all of the changes we describe in this sub-directory will be applied to the database. The path refers to the classpath of the Liquibase binary, not a file path. Also, there is an option in the docs called relativeToChangelogFile for the <includeAll> node, but this appears not to work well - I couldn’t get it to work and I could see some issues reported by others referring to it, so just keep as is for the moment.

  1. Create a changeset that you wish to apply to the database: as a simple example, let’s just create a dummy database. All of the files in the changesets directory will be applied in alphabetical order, so let’s create liquibase/changelog/sql/001-create-dummy-db.sql
1--liquibase formatted sql
2--changeset johndoe:create-dummy-database
3CREATE DATABASE IF NOT EXISTS dummy;

The format is SQL, with some custom Liquibase headers at the top which tell Liquibase how we want to deal with the file - further details on Liquibase changeset syntax are here. Details on ClickHouse SQL syntax are here.

  1. Finally, we create a Liquibase properties file, describing to Liquibase how we will connect to our database. Create liquibase/liquibase.properties:
1changeLogFile: /liquibase/changelog/root-changelog.xml
2url: jdbc:clickhouse://YOUR_LOCAL_IP_HERE:8123
3username: default
4password:
5classpath: /liquibase/changelog
6driver: shaded.liquibase.ru.yandex.clickhouse.ClickHouseDriver
7databaseClass: liquibase.ext.clickhouse.database.ClickHouseDatabase
8strict: true

Note that you need to add your computer’s local IP so that your Liquibase container can connect to your ClickHouse database container: do not mistakenly use localhost here, as this file will be mounted inside a container. We add the directory containing the sql directory which contains our changesets to the classpath of Liquibase. We specify the standard JDBC driver for clickhouse, but because the plugin shades this into a different location, we need to specify the shaded package (shaded.liquibase.ru.yandex.clickhouse.ClickHouseDriver). Finally, we need to specify a class describing the ClickHouse database, which Liquibase will use when initialising the connection.

The classpath argument caused me no end of grief - there seems to be some funny behaviour around parsing classes when Liquibase starts up where it determines which databases it supports etc, and it seems to have trouble parsing classes within jars. Whilst I could specify the compiled jar in here, its classes are not picked up and Liquibase won’t use the classes in it to connect to ClickHouse using the correct SQL syntax, so we will actually need to add the jar into the normal library path for ClickHouse using a bind mount in the next step.

  1. Now we are in a position to start Liquibase:
1docker run -it --rm -v $(pwd)/liquibase/changelog:/liquibase/changelog -v $(pwd)/liquibase/liquibase-clickhouse/target/liquibase-clickhouse-0.5.3-SNAPSHOT-shaded.jar:/liquibase/lib/clickhouse.jar liquibase/liquibase:3.10 --defaultsFile=/liquibase/changelog/liquibase.properties --logLevel=debug update

Of paramount importance: ensure that the major version of the Liquibase container that you run matches the version of the Liquibase library used as a dependency on the liquibase-clickhouse jar you compiled. The latest version of Liquibase is 4.x, which is NOT COMPATIBLE with this extension you just compiled (ask me how long I wasted finding this out…). Also to note here: bind mount your compiled jar into the container, changing the version if it’s named differently when you build it.

If you follow these steps, you should see a long log from your running container, where Liquibase loads, parses your extension, connects to your running container and then applies every changeset to your database, followed by your container exiting cleanly. Repeated executions of this container will be idempotent. Use the UI to confirm that your changes were applied correctly, or to test new changesets (this is why we write the changesets in SQL, by the way!).

Further information

If you have any further queries, have comments or noticed something wrong then please give me a shout on Twitter.

More articles from John Tipper

AWS CDK cross-account deployments with CDK Pipelines and cdk-assume-role-credential-plugin

Cross-account deployments with AWS CDK, CDK Pipelines and cdk-assume-role-credential-plugin

December 4th, 2020 · 7 min read

Integrating AWS CDK into GitHub Actions

Integrating AWS CDK into GitHub Actions

September 13th, 2020 · 2 min read
© 2018–2021 John Tipper
Link to $https://twitter.com/john_tipperLink to $https://github.com/john-tipperLink to $https://www.linkedin.com/in/john-tipper-5076395