the bioinformatics chat

#69 Suffix arrays in optimal compressed space and δ-SA with Tomasz Kociumaka and Dominik Kempa


Listen Later

Today on the podcast we have Tomasz Kociumaka and Dominik Kempa,

the authors of the preprint
Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space.

The suffix array is one of the foundational data structures in bioinformatics,

serving as an index that allows fast substring searches in a large text.
However, in its raw form, the suffix array occupies the space proportional to (and
several times larger than) the original text.

In their paper, Tomasz and Dominik construct a new index, δ-SA, which on the

one hand can be used in the same way (answer the same queries) as the suffix
array and the inverse suffix array, and on the other hand, occupies the space
roughly proportional to the gzip’ed text (or, more precisely, to the measure δ
that they define — hence the name).

Moreover, they mathematically prove that this index is optimal, in the sense

that any index that supports these queries — or even much weaker queries, such
as simply accessing the i-th character of the text — cannot be significantly
smaller (as a function of δ) than δ-SA.

Links:

  • Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space (Dominik Kempa, Tomasz Kociumaka)
  • Thank you to Jake Yeung and other Patreon members for supporting this episode.

    ...more
    View all episodesView all episodes
    Download on the App Store

    the bioinformatics chatBy Roman Cheplyaka

    • 4.8
    • 4.8
    • 4.8
    • 4.8
    • 4.8

    4.8

    34 ratings


    More shows like the bioinformatics chat

    View all
    Radiolab by WNYC Studios

    Radiolab

    43,909 Listeners

    This American Life by This American Life

    This American Life

    90,830 Listeners

    The Bioinformatics and Beyond Podcast by Leo Elworth

    The Bioinformatics and Beyond Podcast

    10 Listeners

    If Books Could Kill by Michael Hobbes & Peter Shamshiri

    If Books Could Kill

    8,747 Listeners

    the bioinformatics lab by The Bioinformatics Lab

    the bioinformatics lab

    0 Listeners