Hidden in SurrealDB’s announcement is a gem that can easily be overlooked — the introduction of a new key-value store, built by the Surreal team, called SurrealKV.
In this post I’d like to discuss two burning questions I had when reading the post —
- Why did they bother building an entire storage engine? (hint — there are a lot of reasons)
- How does it work?
Why a new engine?
SurrealDB uses a key-value (KV) storage under the hood to store and query all of it’s data, including both metadata (e.g. table definitions) and records (the actual rows in the tables).
For a more in depth look on how they do it, have a look at this post.
Previously, SurrealDB had several options for a KV backend, including RocksDB (for local, persistant storage), TiKV (for distributed clusters) and in-memory (for small and/or temporary instances). These seem to account for all major use-cases, why do we need a new one?
To answer this question we need to look at what a database offers us. On the surface level, a database allows us to simply store and retrieve data, but if we go deeper it can provide so much more. There are several important aspects of a database that are expected of a modern and mature tool:
- ACID — the ability to perform multiple actions on the database atomically (either all succeed or all fail)
- Versioning — made a mistake? don’t worry, just see how the data looked in the past. This is a new feature that SurrealKV provides
- SurrealDB specific improvements — by using a data structure uniquely suitable for how SurrealDB stores it’s values, by using an adaptive radix trie.
And finally, one more to align with SurrealDB’s vision — the entire storage engine can be embedded alongside SurrealDB itself for various application needs.
Versioning
The highlight of the new engine is it’s ability to time travel.
What I mean by that is that it enables querying different versions of the data by specifying the timestamp you want SurrealDB to show you. Querying past versions is extremely useful for many applications: preventing data loss, undoing changes, tracking updates and so on.
Surreal enables this by introducting a new VERSION
keyword that can be added to queries:
-- Example from Surreal's blog post
SELECT * FROM user VERSION d'2024-08-12T11:03:00Z'
[[{ id: user:john, name: 'John v1' }]]
Behind the scenes this is accomplished using a versioned trie data structure that allows for querying the same key across different versions. The implementation of this is a whole new project created by the team called vart, which we’ll explain in the next section.
SurrealDB specific improvements
I’ve written in the past about how SurrealDB stores it’s objects, but the gist of the matter is that it always builds it’s keys as a list of hierarchal values. For example, saving a table will generate a key with the structure:
* namespace_name
* database_name
!tb <- A constant value
table_name
And creating an object in that table will have a key of the structure:
* namespace_name
* database_name
* table_name
* object_key
SurrealKV utilizes a prefix-tree (or trie) as the underlying data structure. What this enables is that related objects in the database are “closer” in the stored data. For example, all the objects for the same table have the same prefix (up to the table name), and accessing multiple keys from the same table becomes quicker (you only need to filter for the prefix of the table name once and then you have all the keys).
By building the new store in such a way, it is uniquely suited for SurrealDB’s use case and can unlock better performance for the entire project.
Embedding
Lastly, SurrealDB prides itself on being able to be embedded directly, both into other project, on websites or on-device. Building a storage engine that aligns with that vision, can be deployed anywhere SurrealDB is and doesn’t require any external services is a huge milestone that enables SurrealDB to move forwart with their vision of being the new database for the web.
How is it built?
The second most important questions after why is how.
In future posts we’ll dive into each element in more detail, but for now we’ll look at a brief overview.
The use of SurrealKV is done on three layers:
- SurrealDB translates all the commands it recieves into key-value format, either for saving or retrieving.
- SurrealKV recieves those keys and values and makes sure they are saves in an atomic manner and that no two concurrent actions can impact each other.
- VART is the underlying code that saves the keys as an efficient, versioned Adaptive Radix Trie
All of these layers work together to provide an efficient and reliable database experience.
Radix what?
The underlying data structure, adaptive radix tree (ART), is a variation on a prefix-tree (or trie), a data structure that allows for efficient storage of lookup of keys. This is doubly so for Surreal, as we’ve seen.
What ART improves over the classic trie is in two aspects:
- Combining nodes if there is only one child. This way we don’t generate a long length of nodes for an extremely unique key
- Using different sizes of nodes for a different number of child nodes, allowing for optimized storage.
All in all, ART provides a storage solution that is efficient in both storage space and lookup time. It’s a really cool data structure that I’ll probably do an entire post on in the near future.
Summary
SurrealDB 2.0 introduces SurrealKV, a new storage engine for the database, that both improves and enables future enhancements to the product,, while introducing two new features: version queries and an embedded store alongside SurrealDB.
In following posts we’ll be looking more deeply into the technology powering this new engine, I hope you’ll join me on this journey.
Interested in learning more about how SurrealDB works? All my posts are available here.
Have specific questions or requests? Let me know in the comments.
Please clap and follow for more, I’ll be continuing my deep dives into SurrealKV and SurrealDB.