28 Aug 2023

Derailing gracefully with Railway Oriented Programming

When learning F#, at some point you'll come across the concept of Railway Oriented Programming (ROP). It's primarily a technique used for error handling and helps alleviate some of the struggles of using only exceptions. The main advantage is that possible error cases are encoded in the type system, which requires you to explicitly handle the possible errors instead of encountering an unexpected exception at runtime. Scott Wlaschin has a great post introducing the concept and use cases of ROP. This is a great read if you're unfamiliar with this style of programming.

However, what many of the various introductions to ROP fail to cover in depth is what do we actually want our errors to be. The green path is rather obvious, it's simply the return types of the data that we want. But how should the red path be modeled. What makes the most sense while still keeping with the spirit of ROP? I'd like to present one possible approach to how to handle the error cases in a way that makes sense and is easily extended.

Know your domain

What's generally discouraged with ROP is having the error type simply be an exception: Result<MyType, Exception>. This is because it doesn't bring any additional value over simply throwing the exception, as it still requires you to handle the exception at the end of the pipeline. The main idea behind the error case is that we want to encode in the type system what went wrong, and why it went wrong. Errors generally cover two situations. We wish to tell the end user why something they tried to do didn't work, and when the application tries to accomplish a task but something unexpected happened. In HTTP terms, these are generally split into the status codes 400 - 499, which indicate that the client messed up, and the status codes 500 - 599 to indicate that the application encountered an unexpected error.

Some examples of user errors could be entering an incorrect password or trying to access data that belongs to another user. Examples of application errors could be the usual suspects including 3rd party services being unavailable or the famed NullReferenceException (but not in F#, right?). In the one case, we can tell the user what they did wrong, allowing them to correct the mistake. In the other case, the user is out of luck and the best we can do as developers is log the error and ask the user to try again later.

These errors can be categorized into two distinct groups: domain and non-domain errors.

The domain error

These are the interesting errors. These errors are usually returned by business logic when some rule gets violated. These can also be simple validation errors. The most important aspect of domain errors is that it is usually the client that made a mistake. Depending on which kind of domain error was returned, the client application can tell the user what they did wrong and how to fix it.

As an example, the user is trying to log in to the application. We could model the possible error scenarios like this:

type LoginDomainError =
    | UsernameNotFound
    | IncorrectUsernamePassword
    | AccountSuspended

In this scenario, three different errors can occur when the user is trying to log in. When the client is prompted by either UsernameNotFound or IncorrectUsernamePassword, they most likely made a typo when logging in. For the AccountSuspended error case, we can present the user with a telephone number or email address to contact support for further assistance. But what all three error cases have in common is that the application behaved as expected and only violated some business logic. Incidentally, because they are all about business logic, ideally they would be returned from pure functions, making them dead-simple to unit test.

The non-domain error

These errors are the kind of errors that you never want to see. Not as the end user, nor as the developer. These generally mean something bad has happened. As previously mentioned, non-domain errors function as unexpected errors that occur outside your business logic. This usually means that some side-effectful event has errored out, such as a network failure or a dependent 3rd party service is down. Generally, within the .NET runtime, and especially in F#, anytime an exception is thrown, it probably means that something unexpected happened even though the end user did everything correctly. In these scenarios, all we can do as developers is raise an internal alert, let the user know that something out of their control has happened, and ask to try again. In code, they're much simpler to model:

type NonDomainError =
    { Message: string
      Exception: Exception option }

Of course, it's up to the developer to model this exactly to how they see fit. But the general idea is that we don't really know or care during what operation this error happened. Thus, we don't need to explicitly model that information in the type system. However, we still want to why the operation failed, and that information we can store in this NonDomainError type. We can simply collect the error details, usually from the exception, and log it into the internal logging/alerting system. Of note is that the Exception property is optional. This is because not all non-domain errors are caused by exceptions. Sometimes, when using third-party libraries, they may already have handled problems in their own way. But these errors are irrelevant to our application, which is why we can wrap it into a non-domain error and call it a day.

Putting it all together

The last thing we need to do is to apply this concept in our application. What better way to do that than with code examples? The key to any good application architecture is to separate concerns. This also holds true for the errors. In the previous domain error example, we modeled possible error cases for a login flow. We can model a similar but different type for any other flow. As an example, we have the following business rules for a micro-blogging platform:

  • Text must be less than 500 characters
  • Only premium accounts can post videos
  • Images must be smaller than a certain size

The accompanying domain error type could look like this:

type SubmitPostDomainError =
    | TextTooLong
    | VideoNotAllowed
    | ImageTooLarge

The non-domain error looks exactly the same as the other case, so we can just re-use the type for it. Now we have two separate services, one for each data flow in our application:

let login username password =
    // Login,
    // return success, or one of the errors
        
let submitPost post = 
    // save new post to DB
    // return success, or one of the errors

The type signature of the function is where the magic lies in this approach. They both return a Result<> object, but they both could return either a respective domain error or a non-domain error. The key here is to define a general service error that takes a generic type of the specific domain error.

type ServiceError<'a> =
    | Domain of 'a
    | NonDomain of NonDomainError

This then allows us to define our service functions with the following result type (Ok and Error refer to the two cases of the Result<> type):

let login username password : Result<LoginSuccess, ServiceError<LoginDomainError>> =
    try
        let user = database.getUser(username)
        
        match user with
        | None -> Error (Domain LoginDomainError.UsernameNotFound)
        | Some u ->
            if u.Password <> password then
                Error (Domain LoginDomainError.IncorrectUsernamePassword)
            elif u.IsSuspended then
                Error (Domain LoginDomainError.AccountSuspended)
            else 
                Ok u
    with ex ->
        let error: NonDomainError =de, they're much simpler to m
            { Message = $"Unexpected error while getting user from database"
              Exception = Some ex }
        Error (NonDomain error)
        
let submitPost currentUser post : Result<SubmitSuccess, ServiceError<SubmitPostDomainError>> = 
    if post.text.length > 500 then
        Error (Domain SubmitPostDomainError.TextTooLong)
    elif post.image.size > 1024 then
        Error (Domain SubmitPostDomainError.ImageTooLarge)
    elif post.video.IsSome && (not currentUser.IsPremium) then
        Error (Domain SubmitPostDomainError.VideoNotAllowed)
    
    try
        let post = database.save(post)
        Ok post
    with ex ->
        let error: NonDomainError =
            { Message = $"Unexpected error while saving post to database"
              Exception = Some ex }
        Error (NonDomain error)

These are obviously oversimplified examples, but they should get the point across on how to write the error cases and where the distinction between domain and non-domain comes from. Keep in mind, non-domain doesn't automatically mean a .NET exception, they could also be other things, like 3rd party libraries returning unexpected results.

The last step is to return the response to the client. When using Giraffe as the web framework, the HttpHandlers would handle the result type like this:

let loginHandler username password =
    fun next context ->
        task {
            let loginResponse = login username password

            let response =
                match loginResponse with
                | Ok login -> Successful.OK login
                | Error (NonDomainError e) ->
                    Log.Error("Something bad happened", e)
                    ServerErrors.INTERNAL_ERROR e.Message
                | Error (Domain e) ->
                    match e with
                    | UsernameNotFound ->
                        RequestErrors.NOT_FOUND "username not found"
                    | IncorrectUsernamePassword
                        RequestErrors.UNAUTHORIZED "username/password incorrect"
                    | AccountSuspended
                        RequestErrors.FORBIDDEN "user suspended"

            return! response next context
        }
        
let submitPost post =
    fun next context ->
        task {
            let user = context.GetService<User>().user
            let submitPostResponse = submitPost user post

            let response =
                match submitPostResponse with
                | Ok post -> Successful.OK post
                | Error (NonDomainError e) ->
                    Log.Error("Something bad happened", e)
                    ServerErrors.INTERNAL_ERROR e.Message
                | Error (Domain e) ->
                    match e with
                    | TextTooLong ->
                        RequestErrors.BAD_REQUEST "post too long"
                    | VideoNotAllowed
                        RequestErrors.FORBIDDEN "user not allowed to post video"
                    | ImageTooLarge
                        RequestErrors.BAD_REQUEST "image too large"

            return! response next context
        }

With this approach of separating the errors by domain, as well as each data flow getting their own set of errors, it's trivial to keep the code clean and the concerns separated. Additionally, it's simple to extend or modify existing flows with other errors, while only having to make minimal or no changes to other flows.

Taking it one step further

Now that we know that we can separate our errors into two distinct categories and see how it helps us shape our code for increased maintainability, can we take it further? If you have any experience working with Results, you'll probably have encountered some rough edges around getting all your types to match up and correctly wrapping and unwrapping your results. Some of you will have also come across a fantastic library called FsToolkit.ErrorHandling. This library packs a bunch of utility and helper functions specifically to work with Result, Option, Async, and Task types in their various combinations. Using this library makes dealing with Result much simpler and adds a level of easily readable conciseness to your code.

We can make use of this library to help our situation and make it abundantly clear what is happening with each function call. In the previous code examples, every time we returned one of our specific errors, we also had to construct a ServiceError<'a?> as well as wrap it in the Result type. Let's see how we can simplify that:

let getUserFromDatabase username =
    task {
        try
            return database.getUser(username) |> Ok
        with ex ->
            return
                let error =
                    { Message = $"Unexpected error while getting user from database"
                      Exception = Some ex }
                Error error
    }
    
let doesUserExist loadedUser =
    loadedUser // Could be null response from DB, that we have parsed into an option type
    |> Result.requireSome LoginDomainError.UsernameNotFound
    
let isPasswordCorrect user password =
    match user.Password = password with
    | true -> Ok user
    | false -> Error LoginDomainError.IncorrectUsernamePassword
    
let isUserSuspended user =
    match user.IsSuspended with
    | true -> Error LoginDomainError.AccountSuspended
    | false -> Ok user
    
          
let login username password : Task<Result<LoginSuccess, ServiceError<LoginDomainError>>> =
    taskResult {
        let! loadedUser = 
            getUserFromDatabase username
            |> TaskResult.mapError NonDomain
            
        let! user =
            doesUserExist loadedUser
            |> Result.mapError Domain
        
        let! authenticatedUser =
            isPasswordCorrect user "secret"
            |> Result.mapError Domain
            
        return! 
            isUserSuspended authenticatedUser
            |> Result.mapError Domain
    }

Refactoring the login service to the above brings several advantages. We've extracted each rule check into separate functions, which gives a clean overview of what each business rule actually does, as well as allowing for incredibly easy unit testing. Within the login flow, it's also obvious which functions handle domain logic and which functions do other things. This at the same time translates to a clean split between pure and impure functions. The other benefit is that we're no longer constructing a full Result<'a, ServiceError<LoginDomainError>> type at each step, which makes things generally easier to read.

With these abstractions, we've explored an alternative to setting up and returning the useful Result type, as well as how to manage the difference between domain errors and non-domain errors. This approach clearly separates and communicates the intent of each function, as well as what kind of role they have.

Non-domain errors are just exceptions with extra steps

To open this post, I introduced Scott Wlaschin and his post on ROP, but he also has a follow-up post against ROP. The gist of it is to only use ROP and the Result type for domain problems. One of the major points made is to not re-invent exceptions. However, this post encourages doing exactly that. Specifically, the non-domain type has a property for exceptions. Addtionally, if we dropped the idea of the ServiceError and had each function only possibly return a domain error, the type signatures would be much simpler, e.g.:

let login username password : Task<Result<LoginSuccess, LoginDomainError>>

From here, when some operation would have returned a non-domain error, we can rely on .NET exceptions and throw one of those. At the end of the pipeline, we can make use of Giraffe's generic error handler and handle the different types of exceptions there.

This is a perfectly fine approach. Exceptions in .NET exist to be used. F# has explicit support for them and support doesn't only exist for interop reason. Use exceptions. Looking at the login function above, if we refactored it to remove the ServiceError and throw exceptions instead of non-domain errors, it could look like this:

let login username password : Task<Result<LoginSuccess, LoginDomainError>> =
    taskResult {
        let! loadedUser = getUserFromDatabase username
            
        let! user = doesUserExist loadedUser
        
        let! authenticatedUser = isPasswordCorrect user "secret"
            
        return! isUserSuspended authenticatedUser
    }

As you can see, it simplifies both the signature of the function as well as shortens the code a bit. We're no longer piping everything into a version of TaskResult.map. This then requires you to handle the potential thrown exceptions either in your Giraffe handler or in the Giraffe error handling middleware if you'd like extra functionality such as logging of errors.

However, I would still argue for keeping non-domain errors for a few reasons. Firstly, in the interest of being as explicit as possible, non-domain errors accomplish exactly that. Encoding non-domain errors in the types tells the programmer that this function might run into unintended problems. Secondly, there won't always be an exception to throw. Many third party libraries and client SDKs already have built in error handling. Instead of throwing an exception, they return some kind of success or error object. This differs from library to library. Now a choice needs to be made about whether to wrap that error object and throw an exception or return a non-domain error with a short message to go along with it. But to me, a big advantage that comes from non-domain errors, and especially the |> TaskResult.maping, is that it is immediately clear what kind of function each one of those calls is. Looking at this example again:

let login username password : Task<Result<LoginSuccess, ServiceError<LoginDomainError>>> =
    taskResult {
        let! loadedUser = 
            getUserFromDatabase username
            |> TaskResult.mapError NonDomain
            
        let! user =
            doesUserExist loadedUser
            |> Result.mapError Domain
        
        let! authenticatedUser =
            isPasswordCorrect user "secret"
            |> Result.mapError Domain
            
        return! 
            isUserSuspended authenticatedUser
            |> Result.mapError Domain
    }

We can tell immediately from how we're mapping the taskResult if something includes business logic or is interacting with external dependencies. To put it into functional programming terms, we know if a function is pure or impure simply based on the usage of domain vs. non-domain error. This added readability to me is worth the added complexity of including non-domain errors and the ServiceError type, even when exceptions would suffice.


Additional Reading