In the last post of the Implementing order books in C++ and Rust series I have proposed an unsafe Rust solution. Unfortunately, that solution turns out to invoke undefined behaviour (UB) due to violating the stacked borrows model of Rust. It does appear to work perfectly fine at the moment, but UB (as C and C++ programmers well know) is no joke: it may continue to work for years, until one day an unrelated code change causes it to break.

We are going to start where we left off in the last blogpost: for reference, here is the full source code, cleaned up a bit compared to last time but having the exact same structure and logic.

Discovering the issue with Miri

Rust has a tool called Miri that is invaluable for detecting undefined behaviour even if it’s not actively causing issues: it’s like a bunch of C++ sanitizers rolled into one.

Consider the following sequence of actions all on the same orderbook and side (this is exactly what main() in the reference code does):

  1. Insert order with ID 1, and store its OrderHandle,
  2. Insert order with ID 2,
  3. Cancel order ID 1 using its OrderHandle.

Running Miri against our latest version of the unsafe code yields this diagnostic:

error: Undefined Behavior: trying to retag from <5771> for Unique permission at alloc2192[0x0], but that tag does not exist in the borrow stack for this location
   --> ../blog/blogpost.rs:168:46
    |
168 |                     let orderbook = unsafe { &mut *orderbook_ptr };
    |                                              ^^^^^^^^^^^^^^^^^^^
    |                                              |
    |                                              trying to retag from <5771> for Unique permission at alloc2192[0x0], but that tag does not exist in the borrow stack for this location
    |                                              this error occurs as part of retag at alloc2192[0x0..0x30]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <5771> was created by a SharedReadWrite retag at offsets [0x0..0x30]
   --> ../blog/blogpost.rs:165:41
    |
165 |             let orderbook_ptr: *mut _ = self;
    |                                         ^^^^
help: <5771> was later invalidated at offsets [0x0..0x30] by a Unique retag
   --> ../blog/blogpost.rs:203:25
    |
203 |           let orderbook = self
    |  _________________________^
204 | |             .order_books
205 | |             .entry(symbol)
206 | |             .or_insert_with(|| Box::new(Orderbook::new()))
207 | |             .as_mut();
    | |_____________________^
    = note: BACKTRACE (of the first span):
    = note: inside closure at ../blog/blogpost.rs:168:46: 168:65
    = note: inside `<{closure@../blog/blogpost.rs:167:37: 167:44} as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/gabor/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5: 250:71
    = note: inside `<std::boxed::Box<dyn std::ops::FnOnce()> as std::ops::FnOnce<()>>::call_once` at /home/gabor/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2063:9: 2063:52
note: inside `Exchange::cancel`
   --> ../blog/blogpost.rs:221:13
    |
221 |             (order_handle.remove_fn)();
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `main`
   --> ../blog/blogpost.rs:271:5
    |
271 |     exchange.cancel(OrderId(1));
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Uh-oh, undefined behaviour! This is what we get for messing with unsafe without properly understanding what the rules are. Writing unsafe Rust is harder than writing C++, because the Rust compiler makes more and stronger assumptions about what our code can and cannot do in order to better optimize safe Rust; in unsafe Rust though, we then have to be aware of all of these and also be very careful not to violate them.

Mutable references are really unique

So what is causing UB here? Miri does a decent job of explaining what happens, but only once you understand the terminology. Recall that in Rust, mutable references like &mut Orderbook are exclusive (as opposed to immutable references which are shared): if a mutable reference to a location (what you would call an object in C++) exists, then that means that no other references to that same location may exist. For references, this is checked by the borrow checker, and guarantees that this is the case: the compilation will fail if you have two competing mutable references.

But what about pointers? Let’s see an example:

let mut val = 15;
let ptr: *mut _ = &mut val;
let mref = &mut val;
*mref = 17;
unsafe { *ptr = 16; };
println!("{}", *mref);

This compiles and runs just fine (printing 16), but Miri does not like it:

error: Undefined Behavior: attempting a write access using <2452> at alloc1251[0x0], but that tag does not exist in the borrow stack for this location
    |
    | unsafe { *ptr = 16; };
    |          ^^^^^^^^^
    |          |
    |          attempting a write access using <2452> at alloc1251[0x0], but that tag does not exist in the borrow stack for this location
    |          this error occurs as part of an access at alloc1251[0x0..0x4]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2452> was created by a SharedReadWrite retag at offsets [0x0..0x4]
    |
    | let ptr: *mut _ = &mut val;
    |                   ^^^^^^^^
help: <2452> was later invalidated at offsets [0x0..0x4] by a Unique retag
    |
    | let mref = &mut val;
    |            ^^^^^^^^

Let’s try to parse this, using the source lines and locations that Miri cites in its diagnostic. Creating the mutable pointer ptr is referred to as “SharedReadWrite retag”, and forming the mutable (exclusive!) reference is the “Unique retag”. According to the diagnostic, ptr is invalidated by the creation of a mutable (unique) reference: trying to later dereference the pointer violates the uniqueness assumption of the mutable reference, leading to undefined behaviour. Them’s the rules: constructing a mutable reference immediately invalidates any pointers to the same location, making dereferencing them undefined behaviour.

Interestingly, the above code not only compiles fine, but also seems to run fine (without Miri, that is), printing 16 even when built in release mode with optimizations enabled (opt-level = 3). I’d have expected that in println(), the compiler would not re-read *mref but rather use directly the last value that was written into it validly, which is 17. Performing this kind of optimization is the entire point of these aliasing rules, after all! (I have also tried hiding the values behind #[inline(never)] functions, but that does not appear to be make a difference.) Nonetheless, it is undefined behaviour, and so a future version of the compiler may begin optimizing this code more aggressively and changing this behaviour.

Summarizing the problem

Returning to our Exchange and Orderbook, we should now be able to parse the diagnostic:

  1. The first order insert takes a mutable reference to the orderbook and saves it in its OrderHandle as a mutable raw pointer (“created by a SharedReadWrite retag”).
  2. The second order insert takes a mutable reference to the orderbook again, and this invalidates the mutable raw pointer stored in the OrderHandle for the first order (“invalidated by a Unique retag”).
  3. Canceling the first order via its OrderHandle reads from the mutable raw pointer, which is now invalid, triggering undefined behaviour (“trying to retag for Unique permission, but that tag does not exist in the current borrow stack for this location”).

In my opinion, this particular violation of Rust’s aliasing rules is very unlikely to ever cause runtime issues, but nonetheless code triggering undefined behaviour should not be relied on.

It’s important to point out scenarios that do not cause undefined behaviour with our code. Consider the simple insert-cancel case. Without another operation in between, the pointer in OrderHandle does not get invalidated, so dereferencing it during cancellation is completely valid: no UB. This can, of course, be repeated, so insert 1 - cancel 1 - insert 2 - cancel 2 is still completely fine. Similarly, insert 1 - insert 2 - cancel 2 is also fine, since there is no operation between insert 2 and cancel 2 that could invalidate the pointer in OrderHandle-2. We would only run into trouble if we tried to cancel OrderHandle-1.

Note how even with the help of Miri, we can only detect the presence of UB if we have some particular code (e.g. test) that triggers it. If our only test case is one of the working ones, then we’d never have been able to figure out the problem! UB is nasty business, both in Rust and in C++.

The solution: do not mix references and pointers

Our problem stems from the aliasing rules of Rust, namely that mutable references are unique. So what if just avoided creating any mutable references to orderbooks, and worked with them entirely through pointers? This should work. The full implementation is available here.

Storing the order books

Recall that our orderbooks are stored in HashMap<Symbol, Box<Orderbook>>. Because it’s too easy to accidentally form a mutable reference to Orderbook via the Box (indeed all Box APIs just return references to the boxed value), we will instead opt to store orderbooks directly as a raw pointer instead. For convenience we can still have Box handle the memory allocation using Box::into_raw():

// Creating an orderbook in `Exchange::insert()`:
let orderbook: *mut Orderbook = *self
    .order_books
    .entry(symbol)
    .or_insert_with(|| Box::into_raw(Box::new(Orderbook::new())));

Box::into_raw() gives up ownership of the allocated value and returns it as a raw pointer. Notice how this is not a member function but rather an associated function on Box<T>: this is because of a peculiarity of Rust in that it auto-dereferences references and pointers, including smart pointers, without any explicit syntax. This feature of Rust is called “Deref coercion” and is tied to the Deref trait (as well as DerefMut, its mutable counterpart). For example:

struct Foo;
impl Foo {
    fn hello(&self) {}
}

let foo = Foo;
foo.hello();

let foobox = Box::new(Foo);
foobox.hello(); // equivalent to: (&*foobox).hello()

let fooboxref: &Box<Foo> = &foobox;
fooboxref.hello(); // equivalent to: (&**fooboxref).hello()

Contrast this to C++ where the arrow operator (->) is used to function calls on pointers and smart pointers, as opposed to the dot operator (.) that is used for values and references.

Because Box implements Deref and DerefMut, you can call member functions (methods) on Box<T> as-if it was just a T (without having to explicitly dereference), and Box has to make sure that there’s never a case where it would be ambiguous whether a function name refers to one on Box or on T itself. The only way to avoid any possibility of this is by Box not defining any member functions at all.

Implementing Drop

We also need to clean up after ourselves if an Exchange object is destroyed, otherwise we would leak all the memory used for the Orderbooks. To do this, we will have Exchange implement the Drop trait, which is roughly equivalent to supplying a user-defined destructor for a class in C++. In the implementation we will use Box::from_raw() to turn our raw pointer back into a Box, and then we drop it explicitly to deallocate the orderbook:

impl Drop for Exchange {
    fn drop(&mut self) {
        for (_, orderbook) in self.order_books.iter() {
            drop(unsafe { Box::from_raw(*orderbook) });
        }
    }
}

Working with only pointers

Recall that we have to avoid creating references (especially mutable ones) to the orderbooks, and work with them entirely through pointers. This poses a problem with Orderbook::insert(), which takes the dreaded &mut self. Rust does not unfortunately allow us to take self by pointer, because Rust is kind of a jerk when it comes to unsafe, so we have to turn our insert() member function into an associated function; this mostly just means that we have to call it using the more verbose Orderbook::insert(orderbook_ptr, ...) syntax rather than the more convenient orderbook.insert(...).

Rust also encourages us to mark the whole function as unsafe because it works with a pointer; though it wouldn’t be an error to keep using unsafe blocks only for the parts where we are actually dereferencing the pointer, it’s not good form.

This pattern will spread throughout our codebase. Recall that we were using the OrderbookSide trait to get the correct collection of orders:

pub trait OrderbookSide: Copy + Eq + Sized + 'static {
    type Opposite: OrderbookSide;

    fn is_price_better_than(left: f64, right: f64) -> bool;
    fn get_orders(orderbook: &mut Orderbook) -> &mut BTreeMap<OrderKey<Self>, Order>;
}

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct Bid;

impl OrderbookSide for Bid {
    type Opposite = Ask;

    fn is_price_better_than(left: f64, right: f64) -> bool {
        left > right
    }

    fn get_orders(orderbook: &mut Orderbook) -> &mut BTreeMap<OrderKey<Self>, Order> {
        &mut orderbook.bids
    }
}

// similarly for Ask

OrderbookSide::get_orders() also has to change to take a pointer and be unsafe. Our first instinct might be to leave the return type a &mut BTreeMap<...>, but that yields an exciting compiler diagnostic:

error[E0106]: missing lifetime specifier
  --> ../blog/blogpost-ptr.rs:47:56
   |
47 |     unsafe fn get_orders(orderbook: *mut Orderbook) -> &mut BTreeMap<OrderKey<Self>, Order> {
   |                                                        ^ expected named lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime, but this is uncommon unless you're returning a borrowed value from a `const` or a `static`
   |
47 |     unsafe fn get_orders(orderbook: *mut Orderbook) -> &'static mut BTreeMap<OrderKey<Self>, Order> {
   |                                                         +++++++
help: instead, you are more likely to want to change the argument to be borrowed...
   |
47 |     unsafe fn get_orders(orderbook: &*mut Orderbook) -> &mut BTreeMap<OrderKey<Self>, Order> {
   |                                     +
help: ...or alternatively, you might want to return an owned value
   |
47 -     unsafe fn get_orders(orderbook: *mut Orderbook) -> &mut BTreeMap<OrderKey<Self>, Order> {
47 +     unsafe fn get_orders(orderbook: *mut Orderbook) -> BTreeMap<OrderKey<Self>, Order> {
   |

References are meant to be safe and are not allowed to be dangling, a rule enforced by the borrow checker via the concept of lifetimes. The compiler explains itself pretty well here: it has no natural lifetime to tie the reference to (by the rules of lifetime elision) given that pointers have no lifetimes. None of the listed alternatives work for us (suggesting that we pass the pointer by reference in particularly is weird), so we are left with no choice but to return the BTreeMap itself as a pointer:

unsafe fn get_orders(orderbook: *mut Orderbook) -> *mut BTreeMap<OrderKey<Self>, Order> {
    &mut (*orderbook).bids
}

It’s worth noting that this does form a &mut but to the bids field itself, which doesn’t get Miri mad at us. There are cases though when it’d not be legal to go through an intermediate reference like that, and for that exists the addr_of_mut!() macro. Using it would look like this:

unsafe fn get_orders(orderbook: *mut Orderbook) -> *mut BTreeMap<OrderKey<Self>, Order> {
    addr_of_mut!((*orderbook).bids)
}

Important to remember that in the lambda of OrderHandle we must save a pointer to the orderbook, and not to the side’s BTreeMap. This is because we cannot avoid forming mutable pointers to the BTreeMap as we are using it, which would get us back to where we started: to invalidated pointers and undefined behaviour.

Unsafe taint

With these changes, we should notice something important: correct use of Exchange now requires that the user code never forms a (mutable) reference to an Orderbook. Whether Exchange is part of a library or a larger application, we have to be very careful to never allow other code to accidentally trigger undefined behaviour for us. This is what people mean when they say that unsafe taints potentially an entire crate: it restricts what is fine to do even in safe Rust.

We have three main options to deal with this situation:

  1. Document it but let it be otherwise,
  2. Mark any public functions on Exchange that can return an Orderbook as unsafe,
  3. Do not publicly expose Orderbook at all.

The third is clearly the safest option, but it does mean that Exchange cannot have as rich an API as we might want. This may or not may not be viable in a real situation. If possible though, then this is likely the best option. This is a really important use of visibility controls in Rust: to encapsulate unsafe code.

Note that making Orderbook non-public in turn requires that the OrderbookSide trait also be made private, because it has a method (get_orders()) that takes an Orderbook parameter. This in turn means that OrderKey<Side> cannot be public either.

Revisiting Box<Orderbook>

Even though it’s not a good idea, we could keep our orderbooks in a Box if we really wanted. This causes complications as we will see, and the only real advantage is that we can get rid of the manual Drop implementation for Exchange.

If we really wanted to do this though, the key to avoiding mutable references to the underlying Orderbook is to only have a reference to the Box itself and then use the aforementioned addr_of_mut!() to turn it into a Orderbook pointer:

// Creating an orderbook in `Exchange::insert()`:
let orderbook: &mut Box<Orderbook> = self
    .order_books
    .entry(symbol)
    .or_insert_with(|| Box::new(Orderbook::new()));
let orderbook_ptr = addr_of_mut!(**orderbook);

However, Box<T> is implemented on top of Unique<T> whose semantics assume that other than the Box itself or any references formed from it, nobody is referencing the encapsulated value. That forbids our pointer trickery. We cannot rely on Box currently to provide address stability that is sound for use with pointers: moving a box invalidates all pointers to it due to Unique’s aliasing rules, even though the pointed-to memory address remains stable.

Nevertheless, at the moment storing Box<Orderbook> and then only ever mutating the Orderbook through a raw pointer seems to be okay by Miri, even with MIRIFLAGS="-Zmiri-tree-borrows -Zmiri-unique-is-unique". This can of course change at any moment. Box and Unique are anyway magic, so normal rules don’t really apply.

Further reading

Thank you for sticking with me on my Rust journey! Here are some additional references and links that you may find interesting: