the bioinformatics chat

#8 Perfect k-mer hashing in Sailfish

08.05.2017 - By Roman CheplyakaPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

The original version of Sailfish,

an RNA-Seq quantification tool, used minimal perfect hash functions to

replace k-mers with unique integers.

(The current version appears to be using a Cuckoo hashmap instead.)

This is my attempt to explain how a minimal perfect hash function could be

built. The algorithm described here is not exactly the same as the one

Sailfish used, but it follows the same idea.

Sections:

Sailfish and perfect hashing (1:15)

Perfect hashing based on binary search or hash tables (4:34)

Random hash functions (7:34)

Perfect hash function based on an acyclic graph (12:16)

Links:

The Sailfish paper

The paper describing the perfect hashing algorithm

The birthday paradox

Integrative Biology & Medicine

If you enjoyed this episode, please consider supporting the podcast on Patreon.

More episodes from the bioinformatics chat