I was reading titus's recent post about data structures for k-mer filtering.

ONe thing struct me. Titus, you looked at binary Bloom filters, but seemingly ignored Counting Bloom filters (where instead of indexing a bit with your hashes, you index a counter of a suitable size, you do run into problems if you need both increments and decrements, since once you it max on the counter, you can no longer decrement).

Of course, it's not necessarily trivial going from a Bloom filter (counting or not) to the key, whereas the key is typically available when using a normal hash table.

I suspect, but have not verified, that a counting Bloom combined with the parallelization opportunity you found (and that one is the big win, I think) would be an equally interesting approach.