More on Robin Hood Hashing

I posted about Robin Hood hashing in a previous post. In the comments to that post David Wragg pointed out a flaw in my implementation: if you repeatedly delete and reinsert elements into a hashtable, the average probe count keeps rising. This is a good observation, and it’s true. Repeatedly deleting items makes the table slower but it (until you insert enough to trigger a rebuild of the table, which will filter out any tombstones - note: the number tombstones do not affect the rate of rebuilds).

Now, first of all it’s worth noting that the probe count grows slowly. After a slight tweak to the insertion method, at least (if the existing element is deleted, then tweak the replacement condition so it happens even when the probe counts are merely equal, rather than the existing one being strictly smaller). In David’s test case it grew to about 30 after a million deletions (with my hash function it’s more like 20).

The other point to make is that we already accept periodic “batch” operations such as the table-rebuilding that happens whenever you hit the max load factor. For that reason, it’s relatively reasonable to just reinsert all items again (skipping tombstones) whenever some number of deletions have occurred (e.g. 5N or something - it doesn’t have to be very frequent since the average probe count grows so slowly with the number of deletions).

So, there’s a simple and fairly bullet proof solution to this problem, but what if we prefer an incremental approach, where we make each deletion slightly slower, but never have to run a big batch job to improve the average probe count?

I’m not sure if there are existing solutions with established asymptotic properties, but I tried a simple strategy that seems to work pretty well in my test. The idea is this: whenever we delete an element, we scan ahead for a small number of slots and rearrange them to limit (or remove) the impact of the delete on the average probe count. So long as this operation tends to improve the average probe count at a faster rate than the probe count would otherwise grow, you’ll end up with an average probe count that stays reasonable.

Here’s the code that runs at the very end of the “erase” method (after the element has already been destroyed):

 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132
int cur_ix = found_ix;      
for (int i = 0; i<NUM_DELETE_FIXUPS; ++i)
    const uint32_t next_ix = (cur_ix + 1) & mask;
    if (next_ix == found_ix) break; // looped around

    // if the next elem is empty or has a zero probe dist, 
    // then any query will always fail on the next elem anyway, 
    // so might as well clear the current one too.
    if (elem_hash(next_ix) == 0 || 
        probe_distance(elem_hash(next_ix), next_ix) == 0)
        elem_hash(cur_ix) = 0;
        return true;

    // Now, the next elem has probe count of at least 1, 
    // so it could use our deleted slot to improve its 
    // query perf. 
    // First, pretend we're a dupe of the next elem.
    elem_hash(cur_ix) = elem_hash(next_ix);

    // Since we're now a dupe of the next elem, ordering is 
    // arbitrary, which means we're allowed to swap places.
    // Note> current slot has been destructed, hence 'new'.
    new(&buffer[cur_ix]) elem(std::move(buffer[next_ix]));

    cur_ix = next_ix;
elem_hash(cur_ix) |= 0x80000000; // mark as deleted
return true;

You can choose a small number for NUM_DELETE_FIXUPS, or remove the condition in the loop (it’ll go through one pass of the table at worst, although that’s highly unlikely since it’ll stop on any empty element or an element with a zero probe distance). In my tests I’ve found that 2 will bound the variance and average probe count, but 3 will keep them slightly lower at marginal extra cost so that’s what I’ve set it to.

Again, this is just me messing around - I haven’t made any real attempt to prove the properties of this approach. There may be more rigorously investigated approaches in the literature somewhere. Given that there exists a simple fix with very low amortized cost (periodic reinsertion of elements), I don’t really consider this issue a deal-breaker even if this incremental approach doesn’t work out.