ClickHouse Turns 10: A Decade of Open-Source Analytics

Ten years ago, on June 15, 2016, ClickHouse was released as open source. Today it's the most popular open-source analytical database, with over 2,000 contributors. But the story starts much earlier — in 2009, with a performance optimization that replaced slow libc functions like localtime and mktime.

From OLAPServer to ClickHouse

ClickHouse's creator, Alexey Milovidov, was working on a web analytics system similar to Google Analytics. The system used MySQL for pre-aggregated reports and custom C++ data structures for real-time processing. As data volume grew, MySQL couldn't keep up. Milovidov experimented with column-oriented databases like Infobright, InfiniDB, Vertica, MonetDB, and LucidDB, but none could handle 100 billion records per day with 500 columns.

In December 2008, he built a prototype called "OLAPServer": every column (integers only, with hashes for strings) stored in a single binary file per day per website, with lightweight compression. Queries were specified in XML. It worked so well that analysts started using it instead of the internal MapReduce system.

Next came "Metrage", a custom data structure for incremental aggregation with background merges, using CRDTs (Conflict-free Replicated Data Types). Every record was a C++ struct with methods like add, update, merge, serializeText/Binary, and deserializeText/Binary. This allowed real-time updates on aggregated reports.

The ClickHouse Architecture

ClickHouse combined the column-oriented approach of OLAPServer (fast aggregation) with the merge tree of Metrage (real-time updates and data locality), plus a real SQL query language and data types. It was built entirely from scratch — not based on any existing database.

Key early commits:

  • Column implementation in memory (May 29, 2009): Introduced IColumn and Field classes, still recognizable today. This predates Apache Arrow, RCFile, Trevni, ORC, and Parquet.
  • Aggregate functions: Added in a commit that remains one of the most critical parts of ClickHouse.
  • Table engines: Initially called "primary key", then renamed. The first engine was similar to today's TinyLog.
  • Compression: Started with QuickLZ, quickly replaced by LZ4 after reading Yann Collet's blog.
  • Block streams: Pipeline components for column chunks, later replaced by Processors. The first query pipeline printed numbers in TSV.
  • First SQL parser: Initially tried boost::spirit, then built a recursive descent parser.

Development Philosophy: Level 3 Open Source

Milovidov defines four levels of open source. Level 0 is just publishing code (like Doom or MS-DOS). Level 1 has public commits but no contributions. Level 2 accepts contributions without transparent processes. Level 3 includes open contribution guidelines, task tracker, code review, CI/CD, release cycle, and user support.

ClickHouse aims for Level 3. The codebase is designed to be a learning resource:

  • How to build a database: Modular, orthogonal, well-documented code. Complex concepts explained from scratch.
  • Learn C++: One of the most popular C++ repositories, showcasing C++23 features alongside build systems, CI, and code review practices.
  • Experimentation playground: Pull requests for new memory allocators, compression libraries, hash tables, or sorting algorithms are tested with production-level scrutiny. The roadmap includes a section for "experimental, weird, and even ridiculous things."
  • Contributor recognition: Every contributor is credited in the changelog and in the system.contributors table. Even if code is rewritten, the initial author is credited.

Technical Details and Trade-offs

Early experiments included a variable-length encoded number column (removed for slowness, later reintroduced as custom compression codecs), a Variant column type (removed, then added back in 2025), and a fixed-size array type (removed, under consideration for reintroduction). Milovidov emphasizes: "Removing unnecessary code is more important than adding new code." Many commits are titled "remove trash."

The first real table structure tested was the hits table, still used in ClickBench benchmarks today. C++ iostreams were found slow, so WriteBuffer and ReadBuffer were introduced — still in use.

The ClickHouse server was introduced on March 9, 2012, and clickhouse-client on March 25. With table engines like Log, TinyLog, Merge, Distributed, and Memory, it was enough to replace MySQL for reporting.

What Developers Should Know

ClickHouse's journey shows how a focused, performance-driven approach can build a successful open-source project. The codebase is a goldmine for C++ developers wanting to see modern practices in action. For anyone considering building a database, ClickHouse's source code is a masterclass in modular design and documentation.

If you're evaluating analytical databases, ClickHouse's maturity (10 years of open-source development, 2,000+ contributors) and emphasis on contributor experience make it a strong choice. To get started, clone the repo and check out the system.contributors table:

SELECT * FROM system.contributors LIMIT 5;

Or run a quick query against the hits table to see the performance in action.