Hacker Public Radio

HPR2184: Gnu Awk - Part 5


Listen Later

GNU AWK - Part 5
Regular Expressions in AWK
The syntax for using regular expressions to match lines in AWK is as follows:
word ~ /match/
Or for not matching, use the following:
word !~ /match/
Remember the following file from the previous episodes:
name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
We can run the following command:
$1 ~ /p[elu]/ {print $0}
We will get the following output:
apple red 4
grape purple 10
apple green 8
plum purple 2
pineapple yellow 5
In another example:
$2 ~ /e{2}/ {print $0}
Will produce the output:
apple green 8
Regular expression basics
Certain characters have special meaning when using regular expressions.
Anchors
^ - beginning of the line
$ - end of the line
A - beginning of a string
z - end of a string
b on a word boundary
Characters
[ad] - a or d
[a-d] - any character a through d
[^a-d] - not any character a through d
w - any word
s - any white-space character
d - any digit
The capital version of w, s, and d are negations.
Or, you can reference characters the POSIX standard way:
[:alnum:] - Alphanumeric characters
[:alpha:] - Alphabetic characters
[:blank:] - Space and TAB characters
[:cntrl:] - Control characters
[:digit:] - Numeric characters
[:graph:] - Characters that are both printable and visible (a space is printable but not visible, whereas an ‘a’ is both)
[:lower:] - Lowercase alphabetic characters
[:print:] - Printable characters (characters that are not control characters)
[:punct:] - Punctuation characters (characters that are not letters, digits, control characters, or space characters)
[:space:] - Space characters (such as space, TAB, and formfeed, to name a few)
[:upper:] - Uppercase alphabetic characters
[:xdigit:] - Characters that are hexadecimal digits
Quantifiers
. - match any character
+ - match preceding one or more times
* - match preceding zero or more times
? - match preceding zero or one time
{n} - match preceding exactly n times
{n,} - match preceding n or more times
{n,m} - match preceding between n and m times
Grouped Matches
(...) - Parentheses are used for grouping
| - Means or in the context of a grouped match
Replacement
The sub command substitutes the match with the replacement string. This only applies to the first match.
The gsub command substitutes all matching items.
The gensub command command substitutes the in a similar way as sub and gsub, but with extra functionality
The
...more
View all episodesView all episodes
Download on the App Store

Hacker Public RadioBy Hacker Public Radio

  • 4.2
  • 4.2
  • 4.2
  • 4.2
  • 4.2

4.2

34 ratings


More shows like Hacker Public Radio

View all
The Infinite Monkey Cage by BBC Radio 4

The Infinite Monkey Cage

1,952 Listeners

Click Here by Recorded Future News

Click Here

418 Listeners

Hacker And The Fed by Chris Tarbell & Hector Monsegur

Hacker And The Fed

168 Listeners