When learning F#, at some point you'll come across the concept of Railway Oriented Programming (ROP). It's primarily a technique used for error handling and helps alleviate some of the struggles of using only exceptions. The main advantage is that possible error cases are encoded in the type system, which requires you to explicitly handle the possible errors instead of encountering an unexpected exception at runtime. Scott Wlaschin has a great post introducing the concept and use cases of ROP. This is a great read if you're unfamiliar with this style of programming.
However, what many of the various introductions to ROP fail to cover in depth is what do we actually want our errors to be. The green path is rather obvious, it's simply the return types of the data that we want. But how should the red path be modeled. What makes the most sense while still keeping with the spirit of ROP? I'd like to present one possible approach to how to handle the error cases in a way that makes sense and is easily extended.
What's generally discouraged with ROP is having the error type simply be an exception:
Result<MyType, Exception>. This is because it doesn't bring any additional value over simply throwing the exception, as it still requires you to handle the exception at the end of the pipeline. The main idea behind the error case is that we want to encode in the type system what went wrong, and why it went wrong. Errors generally cover two situations. We wish to tell the end user why something they tried to do didn't work, and when the application tries to accomplish a task but something unexpected happened. In HTTP terms, these are generally split into the status codes 400 - 499, which indicate that the client messed up, and the status codes 500 - 599 to indicate that the application encountered an unexpected error.
Some examples of user errors could be entering an incorrect password or trying to access data that belongs to another user. Examples of application errors could be the usual suspects including 3rd party services being unavailable or the famed
NullReferenceException (but not in F#, right?). In the one case, we can tell the user what they did wrong, allowing them to correct the mistake. In the other case, the user is out of luck and the best we can do as developers is log the error and ask the user to try again later.
These errors can be categorized into two distinct groups: domain and non-domain errors.
These are the interesting errors. These errors are usually returned by business logic when some rule gets violated. These can also be simple validation errors. The most important aspect of domain errors is that it is usually the client that made a mistake. Depending on which kind of domain error was returned, the client application can tell the user what they did wrong and how to fix it.
As an example, the user is trying to log in to the application. We could model the possible error scenarios like this:
1: 2: 3: 4:
In this scenario, three different errors can occur when the user is trying to log in. When the client is prompted by either
IncorrectUsernamePassword, they most likely made a typo when logging in. For the
AccountSuspended error case, we can present the user with a telephone number or email address to contact support for further assistance. But what all three error cases have in common is that the application behaved as expected and only violated some business logic. Incidentally, because they are all about business logic, ideally they would be returned from pure functions, making them dead-simple to unit test.
These errors are the kind of errors that you never want to see. Not as the end user, nor as the developer. These generally mean something bad has happened. As previously mentioned, non-domain errors function as unexpected errors that occur outside your business logic. This usually means that some side-effectful event has errored out, such as a network failure or a dependent 3rd party service is down. Generally, within the .NET runtime, and especially in F#, anytime an exception is thrown, it probably means that something unexpected happened even though the end user did everything correctly. In these scenarios, all we can do as developers is raise an internal alert, let the user know that something out of their control has happened, and ask to try again. In code, they're much simpler to model:
1: 2: 3:
Of course, it's up to the developer to model this exactly to how they see fit. But the general idea is that we don't really know or care during what operation this error happened. Thus, we don't need to explicitly model that information in the type system. However, we still want to why the operation failed, and that information we can store in this
NonDomainError type. We can simply collect the error details, usually from the exception, and log it into the internal logging/alerting system. Of note is that the
Exception property is optional. This is because not all non-domain errors are caused by exceptions. Sometimes, when using third-party libraries, they may already have handled problems in their own way. But these errors are irrelevant to our application, which is why we can wrap it into a non-domain error and call it a day.
The last thing we need to do is to apply this concept in our application. What better way to do that than with code examples? The key to any good application architecture is to separate concerns. This also holds true for the errors. In the previous domain error example, we modeled possible error cases for a login flow. We can model a similar but different type for any other flow. As an example, we have the following business rules for a micro-blogging platform:
- Text must be less than 500 characters
- Only premium accounts can post videos
- Images must be smaller than a certain size
The accompanying domain error type could look like this:
1: 2: 3: 4:
The non-domain error looks exactly the same as the other case, so we can just re-use the type for it. Now we have two separate services, one for each data flow in our application:
1: 2: 3: 4: 5: 6: 7:
The type signature of the function is where the magic lies in this approach. They both return a
Result<> object, but they both could return either a respective domain error or a non-domain error. The key here is to define a general service error that takes a generic type of the specific domain error.
1: 2: 3:
This then allows us to define our service functions with the following result type (
Error refer to the two cases of the
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:
These are obviously oversimplified examples, but they should get the point across on how to write the error cases and where the distinction between domain and non-domain comes from. Keep in mind, non-domain doesn't automatically mean a .NET exception, they could also be other things, like 3rd party libraries returning unexpected results.
The last step is to return the response to the client. When using Giraffe as the web framework, the
HttpHandlers would handle the result type like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46:
With this approach of separating the errors by domain, as well as each data flow getting their own set of errors, it's trivial to keep the code clean and the concerns separated. Additionally, it's simple to extend or modify existing flows with other errors, while only having to make minimal or no changes to other flows.
Now that we know that we can separate our errors into two distinct categories and see how it helps us shape our code for increased maintainability, can we take it further? If you have any experience working with
Results, you'll probably have encountered some rough edges around getting all your types to match up and correctly wrapping and unwrapping your results. Some of you will have also come across a fantastic library called FsToolkit.ErrorHandling. This library packs a bunch of utility and helper functions specifically to work with
Task types in their various combinations. Using this library makes dealing with
Result much simpler and adds a level of easily readable conciseness to your code.
We can make use of this library to help our situation and make it abundantly clear what is happening with each function call. In the previous code examples, every time we returned one of our specific errors, we also had to construct a
ServiceError<'a?> as well as wrap it in the
Result type. Let's see how we can simplify that:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45:
Refactoring the login service to the above brings several advantages. We've extracted each rule check into separate functions, which gives a clean overview of what each business rule actually does, as well as allowing for incredibly easy unit testing. Within the login flow, it's also obvious which functions handle domain logic and which functions do other things. This at the same time translates to a clean split between pure and impure functions. The other benefit is that we're no longer constructing a full
Result<'a, ServiceError<LoginDomainError>> type at each step, which makes things generally easier to read.
With these abstractions, we've explored an alternative to setting up and returning the useful
Result type, as well as how to manage the difference between domain errors and non-domain errors. This approach clearly separates and communicates the intent of each function, as well as what kind of role they have.
To open this post, I introduced Scott Wlaschin and his post on ROP, but he also has a follow-up post against ROP. The gist of it is to only use ROP and the
Result type for domain problems. One of the major points made is to not re-invent exceptions. However, this post encourages doing exactly that. Specifically, the non-domain type has a property for exceptions. Addtionally, if we dropped the idea of the
ServiceError and had each function only possibly return a domain error, the type signatures would be much simpler, e.g.:
From here, when some operation would have returned a non-domain error, we can rely on .NET exceptions and throw one of those. At the end of the pipeline, we can make use of Giraffe's generic error handler and handle the different types of exceptions there.
This is a perfectly fine approach. Exceptions in .NET exist to be used. F# has explicit support for them and support doesn't only exist for interop reason. Use exceptions. Looking at the
login function above, if we refactored it to remove the
ServiceError and throw exceptions instead of non-domain errors, it could look like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10:
As you can see, it simplifies both the signature of the function as well as shortens the code a bit. We're no longer piping everything into a version of
TaskResult.map. This then requires you to handle the potential thrown exceptions either in your Giraffe handler or in the Giraffe error handling middleware if you'd like extra functionality such as logging of errors.
However, I would still argue for keeping non-domain errors for a few reasons. Firstly, in the interest of being as explicit as possible, non-domain errors accomplish exactly that. Encoding non-domain errors in the types tells the programmer that this function might run into unintended problems. Secondly, there won't always be an exception to throw. Many third party libraries and client SDKs already have built in error handling. Instead of throwing an exception, they return some kind of success or error object. This differs from library to library. Now a choice needs to be made about whether to wrap that error object and throw an exception or return a non-domain error with a short message to go along with it. But to me, a big advantage that comes from non-domain errors, and especially the
|> TaskResult.maping, is that it is immediately clear what kind of function each one of those calls is. Looking at this example again:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:
We can tell immediately from how we're mapping the
taskResult if something includes business logic or is interacting with external dependencies. To put it into functional programming terms, we know if a function is pure or impure simply based on the usage of domain vs. non-domain error. This added readability to me is worth the added complexity of including non-domain errors and the
ServiceError type, even when exceptions would suffice.