I’ve recently been talking with Johan Tibell about submitting his hashable package to become a part of the Haskell Platform.
Once we get that submission accepted, we can fold Johan’s excellent hash-based data structures from his unordered-containers package into the standard containers package. For many applications, the hash array mapped tries in unordered-containers offer a decent performance advantage over the current standard of ordered search trees.
Prior to making the submission, I found myself wanting to spring clean the hashable package. Although it is simple and well put together, it has some robustness problems that are important in practice.
Chief among these weaknesses is that it uses the FNV-1 hash, which is well known to be vulnerable to collisions. Since it currently uses FNV-1 with a fixed key, Internet-facing web applications that use hash trees are at risk.
The current state of the art in reasonably fast, secure hash algorithms is Aumasson and Bernstein’s SipHash. There already existed a Haskell implementation in the form of the siphash package, but I thought it was worth implementing a version of my own.
My criteria for developing a new Haskell SipHash implementation were that it had to be fast and easily reusable. I have a version in progress right now, and so far it is turning out well.
The main features of this implementation (specifically when hashing ByteStrings) are as follows:
All arithmetic is unboxed. I carefully inspected the generated Core at every step to ensure this would be the case. (It's usually easier to improve a code base that starts out with good performance properties than to retrofit good performance into existing code.)
The main loop performs 8-byte reads when possible, and if the final block is less than 8 bytes, unrolls that last bytewise fill to be as efficient as possible.
The common cases of SipHash's c and d parameters being fixed at 2 and 4 are unrolled.
The implementation is written in continuation passing style. This had the happy consequence of making unrolling particularly easy (read the source to see).
It’s worth looking at what “unrolling” means in this context. The SipHash algorithm is organized around a loop that performs a “round” a small number times, where a round is a step of an ARX (add-rotate-xor) cipher.
sipRound :: (Sip -> r) -> Sip -> r
Each 8-byte block of input is put through two rounds. After processing is complete, the state is put through four rounds.
Using a CPS representation, we can easily special-case these values so that each can be executed with just a single conditional branch.
runRounds :: Int -> (Sip -> r) -> Sip -> r
runRounds 2 k = sipRound (sipRound k)
runRounds 4 k = sipRound (sipRound (sipRound (sipRound k)))
GHC 7.6 generates excellent Core for the above code, inlining sipRound
and the continuation k
everywhere, and avoiding both jumps and uses of boxed values.
I measured the performance of my in-progress SipHash implementation against the FNV-1 hash currently used by the hashable package, and against two versions of Vincent Hanquez’s siphash package. Different input sizes are across the x axis; the y axis represents speedup relative to FNV-1 (less than 1 is slower, greater is faster).
My SipHash implementation starts off at a little over half the speed of FNV-1 on 5-byte inputs; breaks even at around 45 bytes; and reaches twice the speed at 512 bytes. For comparison, I included a C implementation; my Haskell code is just 10% slower. I’m pretty pleased by these numbers.
(After I published an initial benchmark, Vincent Hanquez sped up his siphash package by a large amount in just a couple of hours. The dramatic effect is shown above in the huge jump in performance between versions 1.0.1 and 1.0.2 of his package. This demonstrates the value of even a brief focus on performance!)
Incidentally, SipHash is another case where compiling with -fllvm
really helps; compared to GHC’s normal code generator, I saw speed jump by a very pleasing 35%. (Sure enough, the numbers I show above are with -fllvm
.)
With some extra work and a little time to focus, this new code should be ready to incorporate into the hashable package, hopefully within a few days.