SurrealDB Internal design — Part 3— Into the Rabbit Hole with SurrealQL

Ori Cohen
5 min readOct 18, 2022

--

Photo by Sigmund on Unsplash

A full list of posts in the series can be found here.

In the previous post we looked at an overview of SurrealDB and it’s architecture. In this post we will start looking under the hood of SurrealDB — with the code responsible for parsing the language Surreal speaks, which is commonly referred to as SurQL (Pronounced as circle).

SurQL

The language itself is similar to classical SQL in structure, but it has a few unique features to allow for SurrealDB’s full abilities.

The way Surreal parses the queries passed to it is by using the nom crate. Nom is a parsing library that allows for building parsers incrementally by chaining functions, while also using Rust’s strong type system to maintain the memory and type safety Rust guarantees as part of it’s programming model.

Basically, you can write functions to parse small pieces of text that return a Rust type, and compose them in larger and larger functions that eventually can parse your entire language, much like building up with Lego pieces into a larger and larger build.

Let’s look at an example from inside Surreal itself — the table definition.

As with any Rust code, the data type is the language we build our code around. Let’s start by looking at the table definition type.

Just by looking at the type we can understand a lot about the code that we need to write. We will need internal functions to parse sub-types (Ident, View and Permissions). As well as seeing the view definition is optional, and we will need to handle the cases where it does and doesn't exist separately.

For reference, the official docs give us the full syntax (abbreviated here to only show the table definition):

We can see that the type and syntax overlap exactly, but to me the type is easier to understand as it destructures the statement into it’s components in a clear and concise way.

First, let’s try and describe the statement as a tree of statements. The beauty of nom is that it allows us to write each leaf in the tree as a separate parser and then combine them all into a large parser.

Define Table Statement Parser Diagram

The translation into code is straightforwards:

The function itself is a bit long, but it shows us exactly how awesome Nom is.

Let’s start from the top:

fn table(i: &str) -> IResult<&str, DefineTableStatement> 

Each parser in Nom takes the string to parse as an input, and returns a special type of Result (as the parsing can fail). When the parsing succeeds, the parser returns a tuple — both the remaining text to process (and that text is passed on to additional functions) and the parsed Rust type that is created from this parser, in our case the table definition.

The parsing itself is done in only 6 lines of this function (lines 4–9), which also demonstrate how to use the result of a parser, for example (line 8):

let (i, name) = ident(i)?;
  • i — the remaining text
  • name — the object of type Ident that the parser returned

Reading the parsing lines we can easily understand the allowed syntax for this command.

Lines 12–43 show us the extreme usefulness of the fact that Nom returns rust types. In those lines we are using idiomatic Rust, and are guaranteed all the memory safety as well as correctness promises Rust gives.

In order to gain a better understanding of the power of composite parsers, let’s dive even deeper into the table options parser.

Again, the options themselves are guaranteed to be a certain set of commands via a use of a Rust type:

The really interesting part comes at the parsing stage. As we can have multiple options that come in different orders, we need to check for alternatives, each alternative a separate parser. Luckily, nom makes this almost trivial.

First we define a series of parsers, one for each option (notice that all return a variant of the enum we defined above):

And then we simply wrap them is a general parser for all options that can return any of the alternatives, using the build in alt parser nom provides, which does exactly what we want:

For those interested, a good exercise is to look even further down at the view parser and understands how it fits in to the general table definition parser.

But wait, we are not done. How do we get all the options for the table? Nom again comes to the rescue!

Look again at the definition parser:

let (i, opts) = many0(table_opts)(i)?;

We used another nom parser called many0, which allows us to iterate and return a list (of type Vec<DefineTableOption>) of all the matching patterns in a sequence until the match fails, allowing for the option that there are no matches (the complementary many1 method requires at least one match or the entire parser will fail).

All parsers in the project follow the same pattern, combining smaller parsers into increasingly large ones that can deal with the large and complex statements that make up SurQL.

Conclusion

In this post we looked at how SurrealDB uses Nom to parse SurQL, it’s SQL-like query language, using the example of the table definition to study the amazing power of combination parsers to create complex ones with relative ease.

Now that you understand one, understanding all other parsers in SurrealDB should not be a problem. In addition, here are a few tips to not get lost in the depth of the parsers:

  1. Start with the structs, they will tell you what should be returned as the entire statement and allow you to get a overhead view of the parser.
  2. Open the docs, if they exist, to see how the parser translates into the full query.
  3. don’t worry if you don’t understand a parser all the way down, sometimes knowing the type returned from some sub-parser (for example, the view parser above) is enough.

In the next post we’ll talk about the core of SurrealDB and one of the most brilliant and exciting parts of it — the data model in the Key-Value stores backing the database.

See you then!

Additional Resources

--

--

Ori Cohen

An applicable math and computer science enthusiastic.