Comparing implementations of std::string

Highly recommended reading for C++ developers: An informal comparison of the three major implementations of std::string (discussed on Reddit and Hacker News), which goes into how std::string is implemented by the 3 major compilers: GCC (libstdc++), MSVC, and Clang (libc++), and what design choices and trade-offs they each made.

I was surprised that under both GCC and MSVC std::string is 32 bytes big (on 64 bit platforms), I thought already that Clang’s 24 bytes is way too big! In virtually every case I have seen, std::strings are treated as immutable, so the extra 8 bytes for capacity is useless. You may not mind that much, but if you’re really trying to optimize for performance and worry about things like cache lines, then those 8 bytes hurt: it’s a waste of 1/8th of a 64-byte cache line! So much for the “you don’t pay for what you don’t use and what you do use you couldn’t implement any better yourself” principle…

(I do have another, related rant about C++: for a language that places such a strong focus on efficiency, why is there no API for me to extract the pointer from an std::string / std::vector and take ownership of it? Or build an std::vector / std::string by passing it a pointer along with ownership of it? You can do this in Rust, see e.g. Vec::from_raw_parts() for an example.)

Anyhow, Clang’s (well, libc++’s) std::string being 8 bytes smaller does come at a performance penalty: virtually every operation, even things that you’d expect to be trivial like empty() and size() have to do branching as they must do different things depending on whether or not the string data is small (i.e. stored in-place with small string optimization) or large (stored on the heap). MSVC and GCC do not and so the Assembly for their common functions look exactly as simple as you’d expect and hope. I am very curious whether overall Clang’s choice is worthwhile, it’s probably highly situational: in some cases I’d expect it to be a massive win, in others a massive loss.

Really good article regardless, as is common from The Old New Thing! An interesting view into the nontrivial decisions you must make when building libraries, especially such core library types where it’s completely impossible for you to predict all the circumstances under which your APIs will be used and so you have little more than guesses about what trade-offs are worthwhile e.g. performance-wise.