Hacker Public Radio

HPR2091: Everyday Unix/Linux Tools for data processing


Listen Later

Here are some of the tools I use to process and clean data from all manner of customers:
detox
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.
See other episodes for great sed information. I like to remove DOS end of line and end of file characters:
sed -i 's/
//g' *.txt
or
sed -i 's/r//g' *.txt
Command-line tools
ack
awk
detox
grep
pandoc
pdftotext -layout
sed
unix2dos and dos2unix
wget
curl
R libraries
RCurl
XML
rvest
tm
xlsx
Python libraries
beautifulsoup
csv
nltk YouTube Series
rdflib
re
Vim tricks
buffer searches (:vim /pattern/ ##)
Ack plugin
bufdo (:bufdo %s/pattern/replace/ge | update)
Other tools
OpenRefine
reconcile-csv
tabula
...more
View all episodesView all episodes
Download on the App Store

Hacker Public RadioBy Hacker Public Radio

  • 4.2
  • 4.2
  • 4.2
  • 4.2
  • 4.2

4.2

34 ratings


More shows like Hacker Public Radio

View all
The Infinite Monkey Cage by BBC Radio 4

The Infinite Monkey Cage

1,952 Listeners

Click Here by Recorded Future News

Click Here

418 Listeners

Hacker And The Fed by Chris Tarbell & Hector Monsegur

Hacker And The Fed

168 Listeners