The original version of Sailfish,
an RNA-Seq quantification tool, used minimal perfect hash functions to
replace k-mers with unique integers.
(The current version appears to be using a Cuckoo hashmap instead.)
This is my attempt to explain how a minimal perfect hash function could be
built. The algorithm described here is not exactly the same as the one
Sailfish used, but it follows the same idea.
Sailfish and perfect hashing (1:15)
Perfect hashing based on binary search or hash tables (4:34)
Random hash functions (7:34)
Perfect hash function based on an acyclic graph (12:16)
The Sailfish paper
The paper describing the perfect hashing algorithm
The birthday paradox
Integrative Biology & Medicine
If you enjoyed this episode, please consider supporting the podcast on Patreon.