PeerTube

The billion row challenge: do we have a bug?


Listen Later

A couple of people contacted me with feedback about the SIMD implementation we used to find newlines in the billion-row file. One suggested a possible bug (shock!) and one suggested a way that might be more efficient. We'll take a look at both, obviously starting by writing a test that checks for the bug.

After the stream I made another attempt at using the information about all newlines in a 64-byte chunk, instead of just the first one. I did it with no Vecs at all, unifying the two functions we worked with into a single one with nested loops. Surprisingly (to me) this was still slower than the original solution. Again, this seems to prove the power of simplicity! You can find this code at https://codeberg.org/andybalaam/brrmbrrm/src/branch/main/src/read_lines/memmap_simd.rs#L152

Read my blog at https://artificialworlds.net/blog

Follow me on mastodon: @[email protected]

...more
View all episodesView all episodes
Download on the App Store

PeerTubeBy