Alexey Milovidov

Author Archives: Alexey Milovidov

Evolution of data structures in Yandex.Metrica

Yandex.Metrica is the world's second largest web analytics system. Metrica takes in a stream of data representing events that took place on sites or on apps. Our task is to process this data and present it in an analyzable form.


Processing the data in itself is not a problem. The real difficulty lies in trying to determine what form the processed results should be saved in so that they are easy to work with. During the development process, we had to completely change our approach to data storage organization several times. We started with MyISAM tables, then used LSM-trees and eventually came up with column-oriented database, ClickHouse. In this article I'll explain what led us to settle on this last option.

Yandex.Metrica was launched in 2008 and has now been running for more than nine years. Every time we changed our approach to data storage in the past it was because a particular solution proved inefficient: either there was insufficient performance reserve, or the solution was unreliable, or it used too many computational resources, or it just did not allow us to implement what we needed to.

The old Yandex.Metrica for websites has more than 40 "fixed" Continue reading