Hacker Public Radio

HPR4417: Newest matching file


Listen Later

This show has been flagged as Explicit by the host.

Overview

Several years ago I wrote a Bash script to perform a task I need to
perform almost every day - find the newest file in a series of
files.

At this point I was running a camera on a Raspberry Pi which was
attached to a window and viewed my back garden. I was taking a picture
every 15 minutes, giving them names containing the date and time, and
storing them in a directory. It was useful to be able to display the
latest picture.

Since then, I have found that searching for newest files useful in
many contexts:

  • Find the image generated by my random recipe chooser, put in the
    clipboard and send it to the Telegram channel for my family.

  • Generate a weather report from
    wttr.in
    and send it
    to Matrix.

  • Find the screenshot I just made and put it in the
    clipboard.

    Of course, I could just use the same name when writing these various
    files, rather than accumulating several, but I often want to look back
    through such collections. If I am concerned about such files
    accumulating in an unwanted way I write
    cron
    scripts which
    run every day and delete the oldest ones.

    Original script

    The first iteration of the script was actually written as a Bash
    function which was loaded at login time. The function is called
    newest_matching_file
    and it takes two arguments:

    • A file glob expression to match the file I am looking
      for.

    • An optional directory to look for the file. If this is omitted,
      then the current directory will be used.

      The first version of this function was a bit awkward since it used a
      for
      loop to scan the directory, using the glob pattern to
      find the file. Since Bash glob pattern searches will return the search
      pattern when they fail, it was necessary to use the
      nullglob
      (see references) option to prevent this, turning
      it on before the search and off afterwards.

      This technique was replaced later with a pipeline using the
      find
      command.

      Improved Bash script

      The version using
      find
      is what I will explain here.

      function newest_matching_file {
      local glob_pattern=${1-}
      local dir=${2:-$PWD}
      # Argument number check
      if [[ $# -eq 0 || $# -gt 2 ]]; then
      echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2
      return 1
      fi
      # Check the target directory
      if [[ ! -d $dir ]]; then
      echo "Unable to find directory $dir" >&2
      return 1
      fi
      local newest_file
      # shellcheck disable=SC2016
      newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \
      -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}')
      # Use printf instead of echo in case the file name begins with '-'
      [[ -n $newest_file ]] && printf '%s\n' "$newest_file"
      return 0
      }

      The function is in the file
      newest_matching_file_1.sh
      ,
      and it's loaded ("sourced", or declared) like this:

      . newest_matching_file_1.sh

      The
      '.'
      is a short-hand version of the command
      source
      .

      I actually have two versions of this function, with the second one
      using a regular expression, which the
      find
      command is able
      to search with, but I prefer this one.

      Explanation
      • The first two lines beginning with
        local
        define
        variables local to the function holding the arguments. The first,
        glob_pattern
        is expected to contain something like
        screenshot_2025-04-*.png
        . The second will hold the
        directory to be scanned, or if omitted, will be set to the current
        directory.

      • Next, an
        if
        statement checks that there are the
        right number of arguments, aborting if not. Note that the
        echo
        command writes to STDERR (using
        '>&2'
        ), the error channel.

      • Another
        if
        statement checks that the target
        directory actually exists, and aborts if not.

      • Another local variable
        newest_file
        is defined. It's
        good practice not to create global variables in functions since they
        will "leak" into the calling environment.

      • The variable
        newest_file
        is set to the result of a
        command substitution containing a pipeline:

        • The
          find
          command searches the target directory.
          • Using
            -maxdepth 1
            limits the search to the chosen
            directory and does not descend into sub-directories.
          • The search pattern is defined by
            -name "$glob_pattern"
          • Using
            -type f
            limits the search to files
          • The
            -printf "%T@ %p\n"
            argument returns the file's last
            modification time as the number of seconds since the Unix epoch
            '%T@'
            . This is a number which is larger if the file is
            older. This is followed, after a space, by the full path to the file
            (
            '%p'
            ), and a newline.
          • The matching file names are sorted. Because each is preceded by a
            numeric time value, they will be sorted in ascending order of age.
          • Finally
            sed
            is used to return the last file in the
            sorted list with the program
            '${s/.\+ //;p}'
            :
            • The use of the
              -n
              option ensures that only lines which
              are explicitly printed will be shown.
            • The
              sed
              program looks for the last line (using
              '$'
              ). When found the leading numeric time is removed with
              '
              s/.\+ //'
              and the result is printed (with
              'p'
              ).
            • The end result will either be the path to the newest file or nothing
              (because there was no match).
            • The expression
              '[[ -n $newest_file ]]'
              will be true
              if
              $newest_file
              variable is not empty, and if that is the
              case, the contents of the variable will be printed on STDOUT, otherwise
              nothing will be printed.

            • Note that the script returns 1 (false) if there is a failure, and
              0 (true) if all is well. A null return is regarded as success.

              Script update

              While editing the audio for this show I realised that there is a flaw
              in the Bash function
              newest_matching_file
              . This is in the
              sed
              script used to process the output from
              find
              .

              The
              sed
              commands used in the script delete all
              characters up to a space, assuming that this is the only space in the
              last line. However, if the file name itself contains spaces, this will
              not work because regular expressions in
              sed
              are
              greedy
              . What is deleted in this case is everything up to and
              including the
              last
              space.

              I created a directory called
              tests
              and added the
              following files:

              'File 1 with spaces.txt'
              'File 2 with spaces.txt'
              'File 3 with spaces.txt'

              I then ran the
              find
              command as follows:

              $ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}'
              spaces.txt

              I adjusted the
              sed
              call to
              sed -ne '${s/[^ ]\+ //;p}'
              . This uses the regular
              expression:

              s/[^ ]\+ //

              This now specifies that what it to be removed is every
              non-space
              up to and including the first space. The result
              is:

              $ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}'
              tests/File 3 with spaces.txt

              This change has been propagated to the copy on
              GitLab
              .

              Usage

              This function is designed to be used in commands or other
              scripts.

              For example, I have an alias defined as follows:

              alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"

              This uses
              xclip
              to load the latest screenshot into the
              clipboard, so I can paste it into a social media client for example.

              Perl alternative

              During the history of this family of scripts I wrote a Perl version.
              This was originally because the Bash function gave problems when run
              under the Bourne shell, and I was using
              pdmenu
              a lot which
              internally runs scripts under that shell.

              #!/usr/bin/env perl
              use v5.40;
              use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8
              use Cwd;
              use File::Find::Rule;
              #
              # Script name
              #
              ( my $PROG = $0 ) =~ s|.*/||mx;
              #
              # Use a regular expression rather than a glob pattern
              #
              my $regex = shift;
              #
              # Get the directory to search, defaulting to the current one
              #
              my $dir = shift // getcwd();
              #
              # Have to have the regular expression
              #
              die "Usage: $PROG regex [DIR]\n" unless $regex;
              #
              # Collect all the files in the target directory without recursing. Include the
              # path and let the caller remove it if they want.
              #
              my @files = File::Find::Rule->file()
              ->name(qr/$regex/)
              ->maxdepth(1)
              ->in($dir);
              die "Unsuccessful search\n" unless @files;
              #
              # Sort the files by ascending modification time, youngest first
              #
              @files = sort {-M($a) <=> -M($b)} @files;
              #
              # Report the one which sorted first
              #
              say $files[0];
              exit;
              Explanation
              • This is fairly straightforward Perl script, run out of an
                executable file with a
                shebang
                line at the start indicating
                what is to be used to run it -
                perl
                .

              • The preamble defines the Perl version to use, and indicates that
                UTF-8
                (character sets like Unicode) will be acceptable for
                reading and writing.

              • Two modules are required:

                • Cwd
                  : provides functions for determining the pathname of
                  the current working directory.
                • File::Find::Rule
                  : provides tools for searching the file
                  system (similar to the
                  find
                  command, but with more
                  features).
                • Next the variable
                  $PROG
                  is set to the name under
                  which the script has been invoked. This is useful when giving a brief
                  summary of usage.

                • The first argument is then collected (with
                  shift
                  )
                  and placed into the variable
                  $regex
                  .

                • The second argument is optional, but if omitted, is set to the
                  current working directory. We see the use of
                  shift
                  again,
                  but if this returns nothing (is undefined), the
                  '//'
                  operator invokes the
                  getcwd()
                  function to get the current
                  working directory.

                • If the
                  $regex
                  variable is not defined, then
                  die
                  is called to terminate the script with an error
                  message.

                • The search itself is invoked using
                  File::Find::Rule
                  and the results are added to the array
                  @files
                  . The
                  multi-line call shows several methods being called in a "chain" to
                  define the rules and invoke the search:

                  • file()
                    : sets up a file search
                  • name(qr/$regex/)
                    : a rule which applies a regular
                    expression match to each file name, rejecting any that do not match
                  • maxdepth(1)
                    : a rule which prevents the search from
                    descending below the top level into sub-directories
                  • in($dir)
                    : defines the directory to search (and also
                    begins the search)
                  • If the search returns no files (the array is empty), the script
                    ends with an error message.

                  • Otherwise the
                    @files
                    array is sorted. This is done
                    by comparing modification times of the files, with the array being
                    reordered such that the "youngest" (newest) file is sorted first. The
                    <=>
                    operator checks if the value of the left operand
                    is greater than the value of the right operand, and if yes then the
                    condition becomes true. This operator is most useful in the Perl
                    sort
                    function.

                  • Finally, the newest file is reported.

                    Usage

                    This script can be used in almost the same way as the Bash variant.
                    The difference is that the pattern used to match files is a Perl regular
                    expression. I keep this script in my
                    ~/bin
                    directory, so it
                    can be invoked just by typing its name. I also maintain a symlink called
                    nmf
                    to save typing!

                    The above example, using the Perl version, would be:

                    alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"

                    In regular expressions
                    '.*'
                    means "any character zero or
                    more times". The
                    '.'
                    in
                    '.png'
                    is escaped
                    because we need an actual dot character.

                    Conclusion

                    The approach in both cases is fairly simple. Files matching a pattern
                    are accumulated, in the Bash case including the modification time. The
                    files are sorted by modification time and the one with the lowest time
                    is the answer. The Bash version has to remove the modification time
                    before printing.

                    This algorithm could be written in many ways. I will probably try
                    rewriting it in other languages in the future, to see which one I think
                    is best.

                    References
                    • Glob expansion:
                      • Wikipedia
                        article on glob patterns
                        • HPR shows covering
                          glob
                          expansion:
                          • Finishing
                            off the subject of expansion in Bash (part 1)
                          • Finishing
                            off the subject of expansion in Bash (part 2)
                            • GitLab repository holding these files:
                              • hprmisc
                                -
                                Miscellaneous scripts, notes, etc pertaining to HPR episodes which I
                                have contributed

                                Provide feedback on this episode.

                                ...more
                                View all episodesView all episodes
                                Download on the App Store

                                Hacker Public RadioBy Hacker Public Radio

                                • 4.2
                                • 4.2
                                • 4.2
                                • 4.2
                                • 4.2

                                4.2

                                34 ratings


                                More shows like Hacker Public Radio

                                View all
                                Security Now (Audio) by TWiT

                                Security Now (Audio)

                                1,979 Listeners

                                Off The Hook by 2600 Enterprises

                                Off The Hook

                                119 Listeners

                                No Agenda Show by Adam Curry & John C. Dvorak

                                No Agenda Show

                                5,974 Listeners

                                The Changelog: Software Development, Open Source by Changelog Media

                                The Changelog: Software Development, Open Source

                                284 Listeners

                                LINUX Unplugged by Jupiter Broadcasting

                                LINUX Unplugged

                                265 Listeners

                                BSD Now by JT Pennington

                                BSD Now

                                89 Listeners

                                Open Source Security by Josh Bressers

                                Open Source Security

                                44 Listeners

                                Late Night Linux by The Late Night Linux Family

                                Late Night Linux

                                154 Listeners

                                The Linux Cast by The Linux Cast

                                The Linux Cast

                                35 Listeners

                                Darknet Diaries by Jack Rhysider

                                Darknet Diaries

                                7,909 Listeners

                                This Week in Linux by TuxDigital Network

                                This Week in Linux

                                36 Listeners

                                Linux Dev Time by The Late Night Linux Family

                                Linux Dev Time

                                21 Listeners

                                Hacking Humans by N2K Networks

                                Hacking Humans

                                314 Listeners

                                2.5 Admins by The Late Night Linux Family

                                2.5 Admins

                                92 Listeners

                                Linux Matters by Linux Matters

                                Linux Matters

                                20 Listeners