Geoffrey Plouviez

Author Archives: Geoffrey Plouviez

Moving Quicksilver into production

Moving Quicksilver into production

One of the great arts of software engineering is making updates and improvements to working systems without taking them offline. For some systems this can be rather easy, spin up a new web server or load balancer, redirect traffic and you’re done. For other systems, such as the core distributed data store which keeps millions of websites online, it’s a bit more of a challenge.

Quicksilver is the data store responsible for storing and distributing the billions of KV pairs used to configure the millions of sites and Internet services which use Cloudflare. In a previous post, we discussed why it was built and what it was replacing. Building it, however, was only a small part of the challenge. We needed to deploy it to production into a network which was designed to be fault tolerant and in which downtime was unacceptable.

We needed a way to deploy our new service seamlessly, and to roll back that deploy should something go wrong. Ultimately many, many, things did go wrong, and every bit of failure tolerance put into the system proved to be worth its weight in gold because none of this was visible to customers.

The Bridge

Our goal Continue reading

Introducing Quicksilver: Configuration Distribution at Internet Scale

Introducing Quicksilver: Configuration Distribution at Internet Scale

Cloudflare’s network processes more than fourteen million HTTP requests per second at peak for Internet users around the world. We spend a lot of time thinking about the tools we use to make those requests faster and more secure, but a secret-sauce which makes all of this possible is how we distribute configuration globally. Every time a user makes a change to their DNS, adds a Worker, or makes any of hundreds of other changes to their configuration, we distribute that change to 200 cities in 90 countries where we operate hardware. And we do that within seconds. The system that does this needs to not only be fast, but also impeccably reliable: more than 26 million Internet properties are depending on it. It also has had to scale dramatically as Cloudflare has grown over the past decade.

Historically, we built this system on top of the Kyoto Tycoon (KT) datastore. In the early days, it served us incredibly well. We contributed support for encrypted replication and wrote a foreign data wrapper for PostgreSQL. However, what worked for the first 25 cities was starting to show its age as we passed 100. In the summer of 2015 we decided to Continue reading