A guide to error-handling

Error-handling is a heavily debated topic. Generally speaking, the following mechanisms exist to handle errors when implementing a function:

Return something to indicate the error: an error code, a sentinel value (like NaN), or using a result-or-error type (like C++’s std::expected or Rust’s Result).
Throw an exception and unwind the stack (i.e. walk it backwards) until a handler is found and return control there; or if no viable handlers exist, terminate the application.
Abort or terminate the application, after printing an error message and generating a core dump. (assert() is also just a flavor of this.)

Notably, options 1 and 2 (return values and exceptions) are recoverable, while option 3 (terminate) is not. So let’s start with these two categories: when do you want errors to be recoverable? Some errors are almost always recoverable, such as attempting to open a file that does not exist (maybe we should create it then, or try another file), and some are almost never such as encountering inconsistent application state (which usually indicates a programming error or memory corruption).

Recoverable vs non-recoverable errors

In my experience, the answer mostly depends on the context of your code: are you writing application code or library code? A library code that will be used across many applications usually shouldn’t decide on the application’s behalf whether that should crash or not, as it doesn’t tend to be aware of the context of its own usage: it cannot tell whether the using application is a critical service that must never go down or perhaps just a one-off script. Therefore libraries should typically use recoverable forms of error reporting, even for severe errors.

When writing application code, simply crashing on an error may be an option, but again depends on the type of application and whether you are aware of all possible contexts it may run in. As a rule of thumb however, it’s better for most applications to crash rather than risk doing something unintended or corrupting data.

So in application code, I tend to use assertions a lot, even for things like function preconditions when I know all contexts and all places where the function may be called from, since in these cases a precondition failure must be a programming error and I want to crash early rather than risk doing something unintended. In library code, I tend to use assertions only as sanity checks: conditions that must always hold, barring a severe programming error inside the library or memory corruption.

There are, of course, caveats. For example, even if an error is unrecoverable, you may want to use a recoverable mechanism for reporting it to give a chance for user code to gracefully shut down and signal the error to downstream services or dump its internal state to help with debugging. This is a painful use case because it is a subversion of the whole concept of an unrecoverable error; this is one main reason why many (most?) programming languages typically rely on exceptions for reporting every type of error. (The other being simplicity by having fewer competing mechanisms in the language.)

Finally, some errors may be unrecoverable in the vast majority of cases, but actually recoverable in a few: one example is a memory allocation failure. Most libraries and applications in most cases just want to crash when this happens because there is nothing sane they can do otherwise, but sometimes you know you’re loading a gigantic amount of data into memory and can gracefully handle running out of memory; or you are using something like an arena allocator or ring buffer where running out of space can happen as part of normal operations.

Reporting recoverable errors

To recap, the two main ways of reporting errors in way that is recoverable, i.e. it’s up to the caller to decide what to do with, is by using the function return value mechanism by returning an error value or by throwing an exception and walking upwards on the call stack until a suitable handler could be found. They both have advantages and disadvantages.

Returning an error is conceptually and implementationally simple, and usually very efficient (mostly depending on whether how expensive is it to construct the error object). However, propagating errors (i.e. transporting them from where the error occurred to a level where we can handle them) tends to be very tedious. Some languages like Rust provide language-level support to do this conveniently, but this is very painful in languages like C and Go given that most function calls have no reasonable ways of handling errors, so usually errors have to propagate a good distance up before reaching code that knows what to do with them.

Exceptions were designed with that last insight in mind, and explicitly allow the error site (where the error is detected) and the handler to be arbitrarily far away from each other, without this having to impact any other code in between. In return though, exceptions tend to be very expensive in terms of runtime performance, much more so than returning error values.

Further, the fact that in a call chain of A -> B -> C the fact that C may throw exceptions that A knows how to catch while B has no knowledge of any of this can be viewed as an anti-feature: when a developer is looking at the code of B, they may not be aware that C may throw. In the error-returning approach, the possible presence of errors is always spelled out explicitly, to the point that it can easily become annoying (because e.g. you have to keep writing .unwrap() in Rust or if err != nil then return err in Go); but the opposite can also easily be counter-productive.

Exceptions in C++

C++ makes things more complicated by offering both options, in a typical C++ fashion. You may return error codes, or use the newly added std::expected which is basically a specialized union that either represents the function result or an error, but has no special handling or syntactical sugar support.

You may also throw and catch exceptions in the same way as many mainstream languages, but unlike them, all major C++ compilers let you disable exceptions altogether by passing a compiler flag like -fno-exceptions. The C++ standard library does use exceptions, but inconsistently: the C++17 std::filesystem implementation attempts to cater to both groups by offering two distinct set of functions that do the same thing, except one set uses exceptions and the other error codes (see e.g. std::filesystem::create_directory()).

This should already tell you that the C++ community is split on whether exceptions are a good idea or not. Like with so many problematic things in C++, the concerns about exceptions revolve around performance:

Exceptions may be prohibitively expensive or not even supported in resource-constrained embedded environments, where C++ is often used.
Exceptions have a large run-time performance penalty when thrown.
Even when not thrown, the possible presence of exceptions may prohibit some compiler optimizations.
The possibility of exceptions significantly raise the size of the binary due to the unwind code that has to be generated.

C++ has so far been trying to cater to both groups: noexcept was added to mark functions that can never throw to try an alleviate some performance concerns about exceptions, and there is a proposal for Zero-overhead deterministic exceptions. On the other side, std::variant and std::expected were both added recently, both of which make it easier to return and store error values.

The C++ noexcept specifier is actually a bit of a lie: it doesn’t actually promise that the given function cannot throw exceptions, but rather than exceptions will never escape from that function; this would call std::terminate() instead. It still works as a guarantee that can be leveraged by the caller, enabling optimizations by for example not having to emit code to call destructors when an exception is unwinding the stack. The canonical example is resizing an std::vector: when it is full while we are trying to insert a new element, a larger array needs to be allocated and the elements moved before the old array can be deallocated and insertion may proceed. However, a move operation throwing would result in the vector being in an inconsistent state, as some objects may already have been moved while others have not. Therefore, if the move constructor of the element type is declared as non-noexcept, then std::vector will copy objects instead of move them when reallocating, even though copying can be hugely more expensive; however if a copy operation throws then we can simply deallocate the new array and propagate the exception, leaving the vector in its original state.

Recoverable errors in other languages

It is also interesting to see how different languages deal with error-handling, given its heavily debated nature.

Most mainstream languages like Python, Java, C#, JavaScript, and PHP all primarily use exceptions.
C uses error codes given that it has no proper objects nor exceptions, though it does have the facilities to implement exceptions with goto, setjmp() and longjmp().
Rust mainly relies on a Result type which is the equivalent of std::expected, but has dedicated language support. Rust does not have exceptions, but it does have unwinding panics and a way to catch them with std::panic::catch_unwind.
Python uses exceptions not only for errors but also for control flow, e.g. to indicate that there’s no more data to consume in an iterator by raising StopIteration. Python also has assert as a language keyword, but it raises an AssertionError exception on failure, unlike in other languages.
Zig uses an approach similar in spirit to C’s error codes but implemented more like Rust’s Result or C++’s std::expected: functions return either their return value or an error, where an error is essentially a dedicated enum.
Go has a simple approach where functions typically return a (result, error) pair. Errors are not particularly special, and little-to-no language support exists for them.
The Microsoft .NET Framework that C# relies on used to throw an exception of type System.ExecutionEngineException when the runtime detected a corrupted internal state: using a recoverable error mechanism to report a fundamentally unrecoverable error. At least they stopped doing this, and now the class is marked as obsolate.

I personally particularly dislike Go’s approach, and struggle to see how that is a good idea: it doesn’t even have the excuse of being an ancient, very low-level language like C.