Zero-downtime Elasticsearch migrations
Back in January, I wrote a small paper on how we built a resilient Elasticsearch based search and analytics at ProcessOut. However, we left out a pretty important part: how we chose to plan our indexes migrations without impacting our users.
Handling mapping migrations
We push updates to our indexes mappings regularly. Most of the time, these updates are additional keys we add, which has virtually no impact. Some data might be missing in the old documents, but luckily we’ve already covered how to keep documents up to date in our previous article.
Things get a bit more complicated when we want to update the mapping on keys that previously existed. Although Elasticsearch’s blog already covers some part of this problem, we found it lacking concrete examples.
Downtime is bad, even when it comes to not-so critical services and features. We strive on making our internal upgrades invisible to never disrupt our users’ businesses- something that shows on our status page. Luckily, Elasticsearch has a neat feature that allows us to keep two indexes somewhat in sync and hot-swap between them without the need to deploy any software update: aliases.
Concrete example
As promised, here’s a concrete (simple) example of how one can use Elasticsearch aliases. When building our indexes, we’ll always versionize them as so:
And create an alias that points to this index:
This alias allows us to query the transactions index, and Elasticsearch will automatically point to the versionized alias defined above, as if it was the real one. This means that any application that retrieves data from Elasticsearch can use this endpoint, and will not need to be updated if we chose to change the alias.
Now that’s we’ve laid out our base, let’s say we now want to update our mapping in such a way that it’s not compatible with the old format. We’ll first create another versionized index:
One problem emerges: we now have two indexes that need to be kept in sync. In our previous blog article, we’ve seen that using background workers to index can be a great way of performing non-blocking indexing, with all documents being eventually indexed.
Because we’re able to index newly updated documents, we can also tell our workers to start indexing from the beginning again. Internally, our workers re-index any document updated after the last indexing job. In order to re-index every document, we simply need to tell our workers that the indexing job will be the first one.
However, because we want to keep two Elasticsearch indexes in sync (on top of our Postgres database), we still need our workers to push the documents to the two indexes at the same time. Internally, our workers simply have a slice of indexes they’ll push documents to, and we update this slice when we need to migrate data.
Once the new index with the new mapping is up to date, we can safely update the index alias to point to the new index.
Because the alias update is done in a transaction, we don’t need to worry about search queries failing during the migration: all is done in a single step. The new requests will all be forwarded to the new index.
The last step in the migration is to stop updating the old index and dropping it. We simply remove it from the worker’s indexes slice, and delete the index in Elasticsearch.
And that’s it! The new index mapping is now fully in production, in-sync, and users haven’t faced any search or analytics downtime.