12.08.2020 - By Select Works
A famous datset of Reuters articles from the 1980s includes “Blah blah blah.” in place of some stories. Why?
We have a Patreon now! Sign up to support the show and get access to our bonus podcast, Overunderstood.
Show notes:
00:31 - The link Jess sent
8:31 - SGML
8:46 - This is what the blahs look like and this is what all the entries look like.
24:00 - FTP
24:34 - Linguistic Data Consortium
29:00 - RCV1 at NIST and David D. Lewis’s README
30:22 - Construe-TIS: A System for Content-Based Indexing of a Database of News Stories (Phil Hayes and Steven Weinstein)