Hacker Public Radio

HPR4407: A 're-response' Bash script


Listen Later

This show has been flagged as Explicit by the host.

Introduction

On 2025-06-19 Ken Fallon did a show, number
4404
,
responding to Kevie's show
4398
,
which came out on 2025-06-11.

Kevie was using a Bash pipeline to find the latest episode in an RSS
feed, and download it. He used
grep
to parse the XML of the
feed.

Ken's response was to suggest the use of
xmlstarlet
to
parse the XML because such a complex structured format as XML cannot
reliably be parsed without a program that "understands" the intricacies
of the format's structure. The same applies to other complex formats
such as HTML, YAML and JSON.

In his show Ken presented a Bash script which dealt with this problem
and that of the ordering of episodes in the feed. He asked how others
would write such a script, and thus I was motivated to produce this
response to his response!

Alternative script

My script is a remodelling of Ken's, not a completely different
solution. It contains a few alternative ways of doing what Ken did, and
a reordering of the parts of his original. We will examine the changes
in this episode.

Script
#!/bin/bash
# Original (c) CC-0 Ken Fallon 2025
# Modified by Dave Morriss, 2025-06-14 (c) CC-0
podcast="https://tuxjam.otherside.network/feed/podcast/"
# [1]
while read -r item
do
# [2]
pubDate="${item%;*}"
# [3]
pubDate="$( \date --date="${pubDate}" --universal +%FT%T )"
# [4]
url="${item#*;}"
# [5]
echo "${pubDate};${url}"
done < <(curl --silent "${podcast}" | \
xmlstarlet sel --text --template --match 'rss/channel/item' \
--value-of 'concat(pubDate, ";", enclosure/@url)' --nl - ) | \
sort --numeric-sort --reverse | \
head -1 | \
cut -f2 -d';' | wget --quiet --input-file=- # [6]

I have placed some comments in the script in the form of
'# [1]'
and I'll refer to these as I describe the changes
in the following numbered list.

Note:
I checked, and the script will run with the
comments, though they are only there to make it easier to refer to
things.

  1. The format of the pipeline is different. It starts by defining a
    while
    loop, but the data which the
    read
    command receives comes from a
    process substitution
    of the form
    '<(statements)'
    (see the
    process
    substitution section
    of "hpr2045 :: Some other Bash tips"
    ). I
    have arranged the pipeline in this way because it's bad practice to
    place a
    while
    in a pipeline, as discussed in the show:
    hpr3985 :: Bash snippet - be careful when feeding data to
    loops
    .
    (I added
    -r
    to the
    read
    because
    shellcheck
    , which I run in the
    vim
    editor,
    nagged me!)

  2. The lines coming from the
    process substitution
    are from
    running
    curl
    to collect the feed, then using
    xmlstarlet
    to pick out the
    pubDate
    field of
    the item, and the
    url
    attribute of the
    enclosure
    field returning them as two strings separated by
    a semicolon (
    ';'
    ). This is from Ken's original code. Each
    line is read into the variable
    item
    , and the first element
    (before the semicolon) is extracted with the Bash expression
    "${item%;*}"
    . Parameter manipulation expressions were
    introduced in
    HPR show
    1648
    . See the full notes section
    Remove
    matching suffix pattern
    for this one.

  3. I modified Ken's
    date
    command to simplify the
    generation of the ISO8601 date and time by using the pattern
    +%FT%T
    . This just saves typing!

  4. The
    url
    value is extracted from the contents of
    item
    with the expression
    "${item#*;}
    . See the
    section of show 1648 entitled
    Remove
    matching prefix pattern
    for details.

  5. The
    echo
    which generates the list of podcast URLs
    prefixed with an ISO time stamp uses
    ';'
    as the delimiter
    where Ken used a
    tab
    character. I assume this was done for
    the benefit of either the following
    sort
    or the
    awk
    script. It's not needed for
    sort
    since it
    sorts the line as-is and doesn't use fields. My version doesn't use
    awk
    .

  6. Rather than using
    awk
    I use
    cut
    to
    remove the time stamp from the front of each line, returning the second
    field delimited by the semicolon. The result of this will be the URL for
    wget
    to download. In this case
    wget
    receives
    the URL on standard input (
    STDIN
    ), and the
    --input-file=-
    option tells it to use that information for
    the download.

    Conclusion

    I'm not sure my solution is
    better
    in any significant way. I
    prefer to use Bash functionality to do things where calling
    awk
    or
    sed
    could be overkill, but that's just
    a personal preference.

    I might have replaced the
    head
    and
    cut
    with
    a
    sed
    expression, such as the following as the last
    line:

    sed -e '1{s/^.\+;//;q}' | wget --quiet --input-file=-

    Here, the
    sed
    expression operates on the first line from
    the
    sort
    , where it removes everything from the start of the
    line to the semicolon. The expression then causes
    sed
    to
    quit, so that only the edited first line is passed to
    wget
    .

    Links
    • hpr1648 ::
      Bash parameter manipulation
      • Section
        entitled
        Remove matching suffix pattern
      • Section
        entitled
        Remove matching prefix pattern
      • Diagram
        showing the Bash parameter manipulation methods
      • hpr2045 ::
        Some other Bash tips
        • Section
          on
          process substitution
        • hpr3985 :: Bash snippet - be careful when feeding data to
          loops
        • hpr4398 ::
          Command line fun: downloading a podcast
          by Kevie
        • hpr4404 ::
          Kevie nerd snipes Ken by grepping xml
          by Ken Fallon

          Provide feedback on this episode.

          ...more
          View all episodesView all episodes
          Download on the App Store

          Hacker Public RadioBy Hacker Public Radio

          • 4.2
          • 4.2
          • 4.2
          • 4.2
          • 4.2

          4.2

          34 ratings


          More shows like Hacker Public Radio

          View all
          The Changelog: Software Development, Open Source by Changelog Media

          The Changelog: Software Development, Open Source

          292 Listeners

          Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec by Jerry Bell and Andrew Kalat

          Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec

          374 Listeners

          LINUX Unplugged by Jupiter Broadcasting

          LINUX Unplugged

          266 Listeners

          SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast) by Johannes B. Ullrich

          SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)

          653 Listeners

          Curious Cases by BBC Radio 4

          Curious Cases

          826 Listeners

          The Strong Towns Podcast by Strong Towns

          The Strong Towns Podcast

          426 Listeners

          Late Night Linux by The Late Night Linux Family

          Late Night Linux

          164 Listeners

          Darknet Diaries by Jack Rhysider

          Darknet Diaries

          8,016 Listeners

          Cybersecurity Today by Jim Love

          Cybersecurity Today

          177 Listeners

          CISO Series Podcast by David Spark, Mike Johnson, and Andy Ellis

          CISO Series Podcast

          189 Listeners

          TechCrunch Daily Crunch by TechCrunch

          TechCrunch Daily Crunch

          41 Listeners

          Strict Scrutiny by Crooked Media

          Strict Scrutiny

          5,773 Listeners

          2.5 Admins by The Late Night Linux Family

          2.5 Admins

          97 Listeners

          Cyber Security Headlines by CISO Series

          Cyber Security Headlines

          136 Listeners

          What the Hack? by DeleteMe

          What the Hack?

          222 Listeners