the bioinformatics chat

#8 Perfect k-mer hashing in Sailfish


Listen Later

The original version of Sailfish,

an RNA-Seq quantification tool, used minimal perfect hash functions to
replace k-mers with unique integers.
(The current version appears to be using a Cuckoo hashmap instead.)

This is my attempt to explain how a minimal perfect hash function could be

built. The algorithm described here is not exactly the same as the one
Sailfish used, but it follows the same idea.

Sections:

  • Sailfish and perfect hashing (1:15)
  • Perfect hashing based on binary search or hash tables (4:34)
  • Random hash functions (7:34)
  • Perfect hash function based on an acyclic graph (12:16)
  • Links:

    • The Sailfish paper
    • The paper describing the perfect hashing algorithm
    • The birthday paradox
    • Integrative Biology & Medicine
    • If you enjoyed this episode, please consider supporting the podcast on Patreon.

      ...more
      View all episodesView all episodes
      Download on the App Store

      the bioinformatics chatBy Roman Cheplyaka

      • 4.7
      • 4.7
      • 4.7
      • 4.7
      • 4.7

      4.7

      35 ratings