Type safe SQL Queries using Rezoom.SQL
This is my F# Advent Calendar 2019 entry. Last year I wrote about Fable React and Preact, but this year I want to write about data access, a topic that practically any application requires. Considering F# is a .Net language, any of the large amount of ORMs available to C# are also available to F#. The big problem, of course, is that they are generally OOP oriented and not not very idiomatic F#. Now, of course you could argue that, because F# is multi-paradigm, it's not actually a problem switching to OOP, but where would the fun be in that?
On top of that, there do exist a number of ORMs written in F# that aim to make data access more "functional". They range anywhere from simple wrappers around existing ORMs to using type providers to provide direct access to a database. A later post in this year's advent calendar will talk more about the state of type providers in .NET Core 3. So I want to write about a really cool type provider ORM that I recently discovered called Rezoom.SQL To skip directly to it, click here.
ORMs in F#
Entity Framework (EF)
The gold standard when it comes to data access in C#. Chances are, if you've every worked with .NET extensively, then you've used Entity Framework as well. I'm not going to go into too much detail about EF here, but it can also be used from F# without any issues. The main trick is to make use of the [<CLIMutable>]
attribute to be able to use record types instead of having to declare mutable types. Here's a short introduction to using EF with F#.
Dapper
Another popular choice when it comes to Micro ORMs, dapper excels in being easy to use. Since it's a micro ORM, it doesn't provide much in abstractions, instead opting to let the developer write their own SQL statements instead, and only handling the data binding part. Similar to Entity Framework, the [<CLIMutable>]
attribute once again allows for using record types as the generic parameter that dapper automatically maps to. There's another little helper that also makes using Option types more pleasant by writing a small type converter like this.
SQLProvider
The F# SQLProvider is the first ORM that I came across that's specifically meant for use with F# and it was also my introduction to Type Providers. To put it simply, you give the SQLProvider a connection string and then you have access to all data and metadata about that database at compile time, meaning you get intellisense help for anything from finding individual records to writing complicated queries using the query {}
builder. On top of that, SQLProvider also supports a number of additional features that go beyond simple querying and insertion, and supports many database servers and their respective exclusive features.
My one and only gripe with SQLProvider is that is a database first approach to ORMs and I personally prefer a Code First approach. And considering that this requires an active database server to generate its types, that makes working offline impossible, unless you also have the database running locally (or are using SQLite).
What is Rezoom.SQL
I stumbled upon Rezoom.SQL when I was looking for alternative ORMs for F# just to see what else existed out of curiosity. I started reading more about it and it happened to fill that exact criteria I had for an F# ORM. It's a Type Provider and it also runs Code First. It accomplishes this by using migration scripts as the source of truth, meaning that the migration scripts you write to set up your database are also used by the type provider to generate the types. The only problem is that there's no version that runs on .NET Core published to NuGet. There is a branch on Github with a port to .NET Standard, which I have successfully gotten to run on this very website.
The basic idea behind Rezoom.SQL is that it introduces its own SQL dialect, which is mostly based on SQLite, which it is them able to transform into one of three different backends: SQLite, PostgreSQL, and MS SQL Server. Migration scripts and all SQL scripts used during runtime are written in this SQL dialect and also type checked. This approach is nice for those people that like to directly write SQL instead of having something like LINQ or the query
builder generate some SQL for them at runtime, as well as the people (like me) that like to have as much statically type checked as possible.
The project itself also contains extensive documention and showcases how it transforms its own SQL dialect (that is based on SQLite) to the various database backends that it supports as well.
Migrations
Migration scripts look exactly like you would expect them to. It's a "simple" SQL script that you save in a location. You then need to configure Rezoom.SQL to look at the location of your .sql files. The filenames should be prefixed with a number that defines the order in which the migrations get run. An interesting addition that I haven't seen elsewhere (although I'm sure the concept does exist elsewhere) are migration trees, which allow multiple migrations to be run independent of order, as sometimes happens when multiple people are working in separate features at the same time. They look exactly like you'd expect a migration to look like:
create table articles ( Id INT PRIMARY KEY autoincrement, Title string, Source string, Parsed string, Tooltips string, CreatedOn datetime ); create table tags ( Id INT PRIMARY KEY autoincrement, Name string unique ); create table article_tags ( ArticleId INT REFERENCES Articles(Id), TagId INT REFERENCES Tags(Id) );
These are the table definitions for this very site. It's a simple mapping from an Article, Tags, and a many to many relationship. This is saved as v1.articles.sql
next to the data access project file. Then, within the code, the migrations are references with type HashsetModel = SQLModel<".">
, with the string being a relative file path to where the migrations are kept. (Hashset here refers to the website name, not the data structure.)
These migration scripts are the source for the type provider that the types get generated from. When writing your queries, Rezoom.SQL checks against these migrations to see if you spelled all your column names correctly and passed in the correct types of data, both from SQL parameters and also simple comparisons.
Querying
This is where things get interesting. A very simple query would look something like this:
type GetArticleById = SQL<""" select a.*, many tags(t.*) from articles a join article_tags at on a.Id = at.ArticleId join tags t on t.id = at.TagId where a.Id = @id """> use context = new ConnectionContext() let article = GetArticleById.Command(id = id).ExecuteExactlyOne(context)
with @id
being a parameter that gets bound during runtime. All table names and columns will be suggested by intellisense and type checked at compile time. The parameter in the Command
generated method are also type checked.
This is also where you will notice the first non-standard SQL syntax—the many
keyword. This is a feature that comes from the Rezoom.SQL dialect, which is a "helper method" that easily allows us to bind to a list of types instead of having to map that manually. When you inspect the article object during runtime, it will have a list of Tag
objects for each article instead of an article
being duplicated but with a different tag each time. I personally consider this automatic mapping of lists to be an essential feature of any ORM, and a must have.
It is important to note that article
is of a generated type and not a record type that you define. This isn't a problem in and of itself, as you can just map to a record type or even pull the generated type through the codebase (please don't do this). However, where it is a problem is when you have two separate queries querying for the same thing and differing only in the where
clause. This is still an open issue and as described in the comments, this is still an ongoing issue, but has workarounds. One of them is to write a configurable query, which makes use of dynamic SQL, and the other is to define an inline function. My problem with using dynamic SQL for this problem is that the SQL is no longer fully type checked. So I opted to write an inline function. What isn't described in that issue is what the inline mapping function looks like with a nested list property. But the concept is the same and the function that I came up with looks like this:
let inline mapArticle row = { ParsedDocument.Id = (^a: (member get_Id: unit -> int)(row)) Title = (^a: (member get_Title: unit -> string)(row)) ArticleDate = (^a: (member get_CreatedOn: unit -> System.DateTime)(row)) Source = (^a: (member get_Source: unit -> string)(row)) Document = (^a: (member get_Parsed: unit -> string)(row)) Tooltips = (^a: (member get_Tooltips: unit -> string)(row)) Tags = (^a: (member get_tags: unit -> IReadOnlyList<_>)(row)) |> Seq.map (fun tagRow -> { Tag.Id = (^b: (member get_Id: unit -> int)(tagRow)) Name = (^b: (member get_Name: unit -> string)(tagRow)) } ) |> Seq.toList }
What's happening here is that we're defining statically resolved type parameters with specific member constraints. Consequently, to get each value, we define a constraint that specifies that the member get_Id
with the specified signature must exist. Then that member is called on the row
. What's nice is that the get_Id
and other get_Foo
methods are also type checked because the member names are generated by the type provider. Using this function is as simple as passing in the article
from earlier:
let mappedArticle = mapArticle article
.
Inserting
When inserting records, Rezoom's flavor of SQL also provides a helper. Rezoom.SQL lets us specify that we are inserting a row with the row
helper.
type InsertArticle = SQL<""" insert into articles row Title = @title, Source = @source, Parsed = @parsed, Tooltips = @tooltips, CreatedOn = @createdOn; select lastval() as Id; """>
How it works is self-explanatory and it helps immensely when you have large columns, as you have a simple one-to-one mapping of values to columns instead of trying to keep the order straight when first defining columns and defining values afterward. Also of note is that not every column needs to be specified, like how we are skipping the Id
column here because that's auto incremented.
What we are doing, however, is selecting the lastval() as Id
. This simply returns the newly created Id from the query so that we can use it in subsequent queries when we reference the newly created article. This is actually a PostgreSQL specific function that is reimplemented by Rezoom.SQL and is not SQL dialect agnostic. This is actually something that I wish the library would handle out of the box as, in my opinion, it's a rather commonly used function.
Unfortunately, I have not figured out how to write an insert statement with multiple rows. What I have been able to find is usage of Rezoom's batch
function, which will batch together several insert statements and send them to the database in a single trip. To utilize this, we must be within the context of a Plan
, and then it looks like this:
plan { for tagId in batch tagIds do do! Queries.InsertArticleTagsMapping.Command(articleId = article.Id, tagId = tagId).Plan() }
And if we turn on Npgsql
Logging, we can prove that it happened in a single trip:
TRACE [33] Start user action
DEBUG [33] Executing statement(s):
INSERT INTO "tags" ("name") VALUES ($1)
Parameters: $1: F# SELECT lastval() AS "id"
INSERT INTO "tags" ("name") VALUES ($1)
Parameters: $1: Rezoom SELECT lastval() AS "id"
TRACE [33] Cleaning up reader
TRACE [33] End user action
However, it would definitely be nice to if it could be implemented as a batch insert.
Usage from .NET Core
This is where things get even more interesting. As of this writing, Rezoom.SQL doesn't exist as a NuGet library for .NET Standard. This is unfortunate, because I have the luxury of being able to utilize .Net Core as my runtime. It's understandable because it did take quite a while longer for type providers to be supported by .Net Core. However, if we look at Github, there does exist a standard
branch which is the WIP of porting it over to .NET Standard. This branch is actually what I'm using in this very website, and as the code is MIT licensed, I have pulled it directly into my source code. If I completely misunderstood how licensing works and I'm not allowed to do this, please, by all means, let me know. Of course I will be switching to the nuget package the moment it becomes available.
However, there are a few quirks with this branch and usage from .Net Core in general that I want to mention. The first is that Rezoom.SQL still relies on an app.config
or web.config
file for its connection strings and suitable transforms for the various environments. This in and of itself works, with the caveat that I never managed to get the DbProviderFactories
section in the app.config
working, as during runtime it threw an exception. The fix for this was to reference Npgsql
directly within the Rezoom.SQL codebase and leave out the parsing of that section.
However, this still leaves the problem of transforms, as I'm in the context of an ASP.NET Core application. Here, the appsettings.json
file reigns supreme. Additionally, I wanted to be able to override the connection string using environment variables, as well as the new dotnet user secrets. So that made using the app.config
difficult again.
The solution to this was to implement my own Connection Provider, which I found in this issue. This allowed me to create a very simple class that takes a connection string into the constructor and implements the Open
method of the interface using Npgsql
directly. After that, when specifying my query executions, I simply needed to pass in my custom connection provider.
let serviceConfig = ServiceConfig() do serviceConfig.SetConfiguration<ConnectionProvider>(HashsetConnectionProvider(connectionString)) |> ignore let executionConfig = { Execution.ExecutionConfig.Default with ServiceConfig = serviceConfig :> IServiceConfig } // later Execution.execute executionConfig queryPlan
This worked well, except that I still needed to execute the initial migration against my transformed connection string. Although the migration also takes in an execution config, it is a slightly different one that doesn't accept a connection provider. Unfortunately, there's no way to specify a connection string at all without getting into some ugly hacks. I decided to try the suggestion of the reply and added a MigrateWithConnectionString
method directly in the library, as I had it in my code base to begin with. This is what I came up with in TypeGeneration.fs
:
do let pars = [ ProvidedParameter("config", typeof<MigrationConfig>) ProvidedParameter("connection", typeof<ConnectionStringSettings>) ] let meth = ProvidedMethod("MigrateWithConnection", pars, typeof<unit>, isStatic = true, invokeCode = function | [ config; connection ] -> let backend = <@ fun () -> (%backend.MigrationBackend) (%%connection) @> <@@ let migrations : string MigrationTree array = %%Expr.PropertyGet(migrationProperty) migrations.Run(%%config, %%(upcast backend)) @@> | _ -> bug "Invalid migrate argument list") provided.AddMember meth
Incidentally, this was also my very first foray into implementing, or extending, a type provider, so please don't ask me exactly what's going on in the code | ✖ 〜 ✖ |. I should probably submit this change and a way to read from something other than app.config
in a pull request, but as that other person stated that they're planning on submitting a PR already, I figure I'll wait until there's a .NET Standard nuget package available.
Using Rezoom
Since Rezoom.SQL is meant to be used with its companion library, Rezoom, I thought I might go into it a little bit. To keep it short, Rezoom is a library meant to cache remote data fetches within a Plan
. What this amounts to is a short lived cache where, for example, a user is loaded from a remote resource (e.g. Database, REST endpoint), and then cached for subsequent uses, including calls to the same endpoint. It does this most efficiently with Rezoom.SQL, where the library already knows which database calls are idempotent, and which are "destructive". If, within a plan
, a select query is called multiple times, it will cache the result as long as no new data has been inserted or changed in the remote resource, in which case Rezoom knows to invalidate the cache and fetch the data again.
Rezoom accomplishes this via use of the Plan
computation expression, which defines the bind
operation as, for lack of better term, a cache. Once such a plan is defined, it isn't executed until there is a call to Execute
, at which point the Plan
gets executed as a .Net Task
. This plan can then be transformed into usage of the async
builder or used within the TPL. Or, if we were in C#, we could use async await
. However, the developer of Rezoom also happens to be the developer of the TaskBuilder
library, which I'm sure a lot of you have seen before. It is used in essentially the same manner as the async
builder, only differing in that it acts upon a .NET Task
instead of F#'s async. This allows for usage with various C# APIs in a more functional and familiar way, without the overhead of converting everything to an async
object at runtime. In the context of this website, this proved very useful as Giraffe is built upon ASP.NET CORE, which makes extensive use of the Task
object, and Giraffe itself also utilizes the very same Task
builder that Rezoom also uses. So this was a perfect fit for the two.