July 08, 2025

HPR4417: Newest matching file

23 minutes

This show has been flagged as Explicit by the host.

Overview

Several years ago I wrote a Bash script to perform a task I need to

perform almost every day - find the newest file in a series of

files.

At this point I was running a camera on a Raspberry Pi which was

attached to a window and viewed my back garden. I was taking a picture

every 15 minutes, giving them names containing the date and time, and

storing them in a directory. It was useful to be able to display the

latest picture.

Since then, I have found that searching for newest files useful in

many contexts:

Find the image generated by my random recipe chooser, put in the

clipboard and send it to the Telegram channel for my family.

Generate a weather report from

wttr.in

and send it

to Matrix.

Find the screenshot I just made and put it in the

clipboard.

Of course, I could just use the same name when writing these various

files, rather than accumulating several, but I often want to look back

through such collections. If I am concerned about such files

accumulating in an unwanted way I write

cron

scripts which

run every day and delete the oldest ones.

Original script

The first iteration of the script was actually written as a Bash

function which was loaded at login time. The function is called

newest_matching_file

and it takes two arguments:

A file glob expression to match the file I am looking

for.

An optional directory to look for the file. If this is omitted,

then the current directory will be used.

The first version of this function was a bit awkward since it used a

for

loop to scan the directory, using the glob pattern to

find the file. Since Bash glob pattern searches will return the search

pattern when they fail, it was necessary to use the

nullglob

(see references) option to prevent this, turning

it on before the search and off afterwards.

This technique was replaced later with a pipeline using the

find

command.

Improved Bash script

The version using

find

is what I will explain here.

function newest_matching_file {

local glob_pattern=${1-}

local dir=${2:-$PWD}

# Argument number check

if [[ $# -eq 0 || $# -gt 2 ]]; then

echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2

return 1

# Check the target directory

if [[ ! -d $dir ]]; then

echo "Unable to find directory $dir" >&2

return 1

local newest_file

# shellcheck disable=SC2016

newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \

-type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}')

# Use printf instead of echo in case the file name begins with '-'

[[ -n $newest_file ]] && printf '%s\n' "$newest_file"

return 0

}

The function is in the file

newest_matching_file_1.sh

and it's loaded ("sourced", or declared) like this:

. newest_matching_file_1.sh

The

'.'

is a short-hand version of the command

source

I actually have two versions of this function, with the second one

using a regular expression, which the

find

command is able

to search with, but I prefer this one.

Explanation

The first two lines beginning with

local

define

variables local to the function holding the arguments. The first,

glob_pattern

is expected to contain something like

screenshot_2025-04-*.png

. The second will hold the

directory to be scanned, or if omitted, will be set to the current

directory.

Next, an

statement checks that there are the

right number of arguments, aborting if not. Note that the

echo

command writes to STDERR (using

'>&2'

), the error channel.

Another

statement checks that the target

directory actually exists, and aborts if not.

Another local variable

newest_file

is defined. It's

good practice not to create global variables in functions since they

will "leak" into the calling environment.

The variable

newest_file

is set to the result of a

command substitution containing a pipeline:

The

find

command searches the target directory.

Using

-maxdepth 1

limits the search to the chosen

directory and does not descend into sub-directories.

The search pattern is defined by

-name "$glob_pattern"

Using

-type f

limits the search to files

The

-printf "%T@ %p\n"

argument returns the file's last

modification time as the number of seconds since the Unix epoch

'%T@'

. This is a number which is larger if the file is

older. This is followed, after a space, by the full path to the file

(

'%p'

), and a newline.

The matching file names are sorted. Because each is preceded by a

numeric time value, they will be sorted in ascending order of age.

Finally

sed

is used to return the last file in the

sorted list with the program

'${s/.\+ //;p}'

The use of the

-n

option ensures that only lines which

are explicitly printed will be shown.

The

sed

program looks for the last line (using

'$'

). When found the leading numeric time is removed with

s/.\+ //'

and the result is printed (with

'p'

The end result will either be the path to the newest file or nothing

(because there was no match).

The expression

'[[ -n $newest_file ]]'

will be true

$newest_file

variable is not empty, and if that is the

case, the contents of the variable will be printed on STDOUT, otherwise

nothing will be printed.

Note that the script returns 1 (false) if there is a failure, and

0 (true) if all is well. A null return is regarded as success.

Script update

While editing the audio for this show I realised that there is a flaw

in the Bash function

newest_matching_file

. This is in the

sed

script used to process the output from

find

The

sed

commands used in the script delete all

characters up to a space, assuming that this is the only space in the

last line. However, if the file name itself contains spaces, this will

not work because regular expressions in

sed

are

greedy

. What is deleted in this case is everything up to and

including the

last

space.

I created a directory called

tests

and added the

following files:

'File 1 with spaces.txt'

'File 2 with spaces.txt'

'File 3 with spaces.txt'

I then ran the

find

command as follows:

$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}'

spaces.txt

I adjusted the

sed

call to

sed -ne '${s/[^ ]\+ //;p}'

. This uses the regular

expression:

s/[^ ]\+ //

This now specifies that what it to be removed is every

non-space

up to and including the first space. The result

is:

$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}'

tests/File 3 with spaces.txt

This change has been propagated to the copy on

GitLab

Usage

This function is designed to be used in commands or other

scripts.

For example, I have an alias defined as follows:

alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"

This uses

xclip

to load the latest screenshot into the

clipboard, so I can paste it into a social media client for example.

Perl alternative

During the history of this family of scripts I wrote a Perl version.

This was originally because the Bash function gave problems when run

under the Bourne shell, and I was using

pdmenu

a lot which

internally runs scripts under that shell.

#!/usr/bin/env perl

use v5.40;

use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8

use Cwd;

use File::Find::Rule;

# Script name

( my $PROG = $0 ) =~ s|.*/||mx;

# Use a regular expression rather than a glob pattern

my $regex = shift;

# Get the directory to search, defaulting to the current one

my $dir = shift // getcwd();

# Have to have the regular expression

die "Usage: $PROG regex [DIR]\n" unless $regex;

# Collect all the files in the target directory without recursing. Include the

# path and let the caller remove it if they want.

my @files = File::Find::Rule->file()

->name(qr/$regex/)

->maxdepth(1)

->in($dir);

die "Unsuccessful search\n" unless @files;

# Sort the files by ascending modification time, youngest first

@files = sort {-M($a) <=> -M($b)} @files;

# Report the one which sorted first

say $files[0];

exit;

Explanation

This is fairly straightforward Perl script, run out of an

executable file with a

shebang

line at the start indicating

what is to be used to run it -

perl

The preamble defines the Perl version to use, and indicates that

UTF-8

(character sets like Unicode) will be acceptable for

reading and writing.

Two modules are required:

Cwd

: provides functions for determining the pathname of

the current working directory.

File::Find::Rule

: provides tools for searching the file

system (similar to the

find

command, but with more

features).

Next the variable

$PROG

is set to the name under

which the script has been invoked. This is useful when giving a brief

summary of usage.

The first argument is then collected (with

shift

)

and placed into the variable

$regex

The second argument is optional, but if omitted, is set to the

current working directory. We see the use of

shift

again,

but if this returns nothing (is undefined), the

'//'

operator invokes the

getcwd()

function to get the current

working directory.

If the

$regex

variable is not defined, then

die

is called to terminate the script with an error

message.

The search itself is invoked using

File::Find::Rule

and the results are added to the array

@files

. The

multi-line call shows several methods being called in a "chain" to

define the rules and invoke the search:

file()

: sets up a file search

name(qr/$regex/)

: a rule which applies a regular

expression match to each file name, rejecting any that do not match

maxdepth(1)

: a rule which prevents the search from

descending below the top level into sub-directories

in($dir)

: defines the directory to search (and also

begins the search)

If the search returns no files (the array is empty), the script

ends with an error message.

Otherwise the

@files

array is sorted. This is done

by comparing modification times of the files, with the array being

reordered such that the "youngest" (newest) file is sorted first. The

<=>

operator checks if the value of the left operand

is greater than the value of the right operand, and if yes then the

condition becomes true. This operator is most useful in the Perl

sort

function.

Finally, the newest file is reported.

Usage

This script can be used in almost the same way as the Bash variant.

The difference is that the pattern used to match files is a Perl regular

expression. I keep this script in my

~/bin

directory, so it

can be invoked just by typing its name. I also maintain a symlink called

nmf

to save typing!

The above example, using the Perl version, would be:

alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"

In regular expressions

'.*'

means "any character zero or

more times". The

'.'

'.png'

is escaped

because we need an actual dot character.

Conclusion

The approach in both cases is fairly simple. Files matching a pattern

are accumulated, in the Bash case including the modification time. The

files are sorted by modification time and the one with the lowest time

is the answer. The Bash version has to remove the modification time

before printing.

This algorithm could be written in many ways. I will probably try

rewriting it in other languages in the future, to see which one I think

is best.

References

Glob expansion:

Wikipedia

article on glob patterns

HPR shows covering

glob

expansion:

Finishing

off the subject of expansion in Bash (part 1)

Finishing

off the subject of expansion in Bash (part 2)

GitLab repository holding these files:

hprmisc

Miscellaneous scripts, notes, etc pertaining to HPR episodes which I

have contributed

Provide feedback on this episode.

...more

View all episodes

By Hacker Public Radio

4.2

3434 ratings

July 08, 2025

HPR4417: Newest matching file

23 minutes

This show has been flagged as Explicit by the host.

Overview

Several years ago I wrote a Bash script to perform a task I need to

perform almost every day - find the newest file in a series of

files.

At this point I was running a camera on a Raspberry Pi which was

attached to a window and viewed my back garden. I was taking a picture

every 15 minutes, giving them names containing the date and time, and

storing them in a directory. It was useful to be able to display the

latest picture.

Since then, I have found that searching for newest files useful in

many contexts:

Find the image generated by my random recipe chooser, put in the

clipboard and send it to the Telegram channel for my family.

Generate a weather report from

wttr.in

and send it

to Matrix.

Find the screenshot I just made and put it in the

clipboard.

Of course, I could just use the same name when writing these various

files, rather than accumulating several, but I often want to look back

through such collections. If I am concerned about such files

accumulating in an unwanted way I write

cron

scripts which

run every day and delete the oldest ones.

Original script

The first iteration of the script was actually written as a Bash

function which was loaded at login time. The function is called

newest_matching_file

and it takes two arguments:

A file glob expression to match the file I am looking

for.

An optional directory to look for the file. If this is omitted,

then the current directory will be used.

The first version of this function was a bit awkward since it used a

for

loop to scan the directory, using the glob pattern to

find the file. Since Bash glob pattern searches will return the search

pattern when they fail, it was necessary to use the

nullglob

(see references) option to prevent this, turning

it on before the search and off afterwards.

This technique was replaced later with a pipeline using the

find

command.

Improved Bash script

The version using

find

is what I will explain here.

function newest_matching_file {

local glob_pattern=${1-}

local dir=${2:-$PWD}

# Argument number check

if [[ $# -eq 0 || $# -gt 2 ]]; then

echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2

return 1

# Check the target directory

if [[ ! -d $dir ]]; then

echo "Unable to find directory $dir" >&2

return 1

local newest_file

# shellcheck disable=SC2016

newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \

-type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}')

# Use printf instead of echo in case the file name begins with '-'

[[ -n $newest_file ]] && printf '%s\n' "$newest_file"

return 0

}

The function is in the file

newest_matching_file_1.sh

and it's loaded ("sourced", or declared) like this:

. newest_matching_file_1.sh

The

'.'

is a short-hand version of the command

source

I actually have two versions of this function, with the second one

using a regular expression, which the

find

command is able

to search with, but I prefer this one.

Explanation

The first two lines beginning with

local

define

variables local to the function holding the arguments. The first,

glob_pattern

is expected to contain something like

screenshot_2025-04-*.png

. The second will hold the

directory to be scanned, or if omitted, will be set to the current

directory.

Next, an

statement checks that there are the

right number of arguments, aborting if not. Note that the

echo

command writes to STDERR (using

'>&2'

), the error channel.

Another

statement checks that the target

directory actually exists, and aborts if not.

Another local variable

newest_file

is defined. It's

good practice not to create global variables in functions since they

will "leak" into the calling environment.

The variable

newest_file

is set to the result of a

command substitution containing a pipeline:

The

find

command searches the target directory.

Using

-maxdepth 1

limits the search to the chosen

directory and does not descend into sub-directories.

The search pattern is defined by

-name "$glob_pattern"

Using

-type f

limits the search to files

The

-printf "%T@ %p\n"

argument returns the file's last

modification time as the number of seconds since the Unix epoch

'%T@'

. This is a number which is larger if the file is

older. This is followed, after a space, by the full path to the file

(

'%p'

), and a newline.

The matching file names are sorted. Because each is preceded by a

numeric time value, they will be sorted in ascending order of age.

Finally

sed

is used to return the last file in the

sorted list with the program

'${s/.\+ //;p}'

The use of the

-n

option ensures that only lines which

are explicitly printed will be shown.

The

sed

program looks for the last line (using

'$'

). When found the leading numeric time is removed with

s/.\+ //'

and the result is printed (with

'p'

The end result will either be the path to the newest file or nothing

(because there was no match).

The expression

'[[ -n $newest_file ]]'

will be true

$newest_file

variable is not empty, and if that is the

case, the contents of the variable will be printed on STDOUT, otherwise

nothing will be printed.

Note that the script returns 1 (false) if there is a failure, and

0 (true) if all is well. A null return is regarded as success.

Script update

While editing the audio for this show I realised that there is a flaw

in the Bash function

newest_matching_file

. This is in the

sed

script used to process the output from

find

The

sed

commands used in the script delete all

characters up to a space, assuming that this is the only space in the

last line. However, if the file name itself contains spaces, this will

not work because regular expressions in

sed

are

greedy

. What is deleted in this case is everything up to and

including the

last

space.

I created a directory called

tests

and added the

following files:

'File 1 with spaces.txt'

'File 2 with spaces.txt'

'File 3 with spaces.txt'

I then ran the

find

command as follows:

$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}'

spaces.txt

I adjusted the

sed

call to

sed -ne '${s/[^ ]\+ //;p}'

. This uses the regular

expression:

s/[^ ]\+ //

This now specifies that what it to be removed is every

non-space

up to and including the first space. The result

is:

$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}'

tests/File 3 with spaces.txt

This change has been propagated to the copy on

GitLab

Usage

This function is designed to be used in commands or other

scripts.

For example, I have an alias defined as follows:

alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"

This uses

xclip

to load the latest screenshot into the

clipboard, so I can paste it into a social media client for example.

Perl alternative

During the history of this family of scripts I wrote a Perl version.

This was originally because the Bash function gave problems when run

under the Bourne shell, and I was using

pdmenu

a lot which

internally runs scripts under that shell.

#!/usr/bin/env perl

use v5.40;

use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8

use Cwd;

use File::Find::Rule;

# Script name

( my $PROG = $0 ) =~ s|.*/||mx;

# Use a regular expression rather than a glob pattern

my $regex = shift;

# Get the directory to search, defaulting to the current one

my $dir = shift // getcwd();

# Have to have the regular expression

die "Usage: $PROG regex [DIR]\n" unless $regex;

# Collect all the files in the target directory without recursing. Include the

# path and let the caller remove it if they want.

my @files = File::Find::Rule->file()

->name(qr/$regex/)

->maxdepth(1)

->in($dir);

die "Unsuccessful search\n" unless @files;

# Sort the files by ascending modification time, youngest first

@files = sort {-M($a) <=> -M($b)} @files;

# Report the one which sorted first

say $files[0];

exit;

Explanation

This is fairly straightforward Perl script, run out of an

executable file with a

shebang

line at the start indicating

what is to be used to run it -

perl

The preamble defines the Perl version to use, and indicates that

UTF-8

(character sets like Unicode) will be acceptable for

reading and writing.

Two modules are required:

Cwd

: provides functions for determining the pathname of

the current working directory.

File::Find::Rule

: provides tools for searching the file

system (similar to the

find

command, but with more

features).

Next the variable

$PROG

is set to the name under

which the script has been invoked. This is useful when giving a brief

summary of usage.

The first argument is then collected (with

shift

)

and placed into the variable

$regex

The second argument is optional, but if omitted, is set to the

current working directory. We see the use of

shift

again,

but if this returns nothing (is undefined), the

'//'

operator invokes the

getcwd()

function to get the current

working directory.

If the

$regex

variable is not defined, then

die

is called to terminate the script with an error

message.

The search itself is invoked using

File::Find::Rule

and the results are added to the array

@files

. The

multi-line call shows several methods being called in a "chain" to

define the rules and invoke the search:

file()

: sets up a file search

name(qr/$regex/)

: a rule which applies a regular

expression match to each file name, rejecting any that do not match

maxdepth(1)

: a rule which prevents the search from

descending below the top level into sub-directories

in($dir)

: defines the directory to search (and also

begins the search)

If the search returns no files (the array is empty), the script

ends with an error message.

Otherwise the

@files

array is sorted. This is done

by comparing modification times of the files, with the array being

reordered such that the "youngest" (newest) file is sorted first. The

<=>

operator checks if the value of the left operand

is greater than the value of the right operand, and if yes then the

condition becomes true. This operator is most useful in the Perl

sort

function.

Finally, the newest file is reported.

Usage

This script can be used in almost the same way as the Bash variant.

The difference is that the pattern used to match files is a Perl regular

expression. I keep this script in my

~/bin

directory, so it

can be invoked just by typing its name. I also maintain a symlink called

nmf

to save typing!

The above example, using the Perl version, would be:

alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"

In regular expressions

'.*'

means "any character zero or

more times". The

'.'

'.png'

is escaped

because we need an actual dot character.

Conclusion

The approach in both cases is fairly simple. Files matching a pattern

are accumulated, in the Bash case including the modification time. The

files are sorted by modification time and the one with the lowest time

is the answer. The Bash version has to remove the modification time

before printing.

This algorithm could be written in many ways. I will probably try

rewriting it in other languages in the future, to see which one I think

is best.

References

Glob expansion:

Wikipedia

article on glob patterns

HPR shows covering

glob

expansion:

Finishing

off the subject of expansion in Bash (part 1)

Finishing

off the subject of expansion in Bash (part 2)

GitLab repository holding these files:

hprmisc

Miscellaneous scripts, notes, etc pertaining to HPR episodes which I

have contributed

Provide feedback on this episode.

...more

More shows like Hacker Public Radio

View all

The Changelog: Software Development, Open Source

292 Listeners

Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec

371 Listeners

LINUX Unplugged

265 Listeners

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)

652 Listeners

Curious Cases

829 Listeners

The Strong Towns Podcast

425 Listeners

Late Night Linux

165 Listeners

Darknet Diaries

8,013 Listeners

Cybersecurity Today

177 Listeners

CISO Series Podcast

189 Listeners

TechCrunch Daily Crunch

41 Listeners

Strict Scrutiny

5,769 Listeners

2.5 Admins

97 Listeners

Cyber Security Headlines

136 Listeners

What the Hack?

222 Listeners

Share HPR4417: Newest matching file

Sign up to save your podcasts

HPR4417: Newest matching file

HPR4417: Newest matching file

More shows like Hacker Public Radio

The Changelog: Software Development, Open Source

Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec

LINUX Unplugged

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)

Curious Cases

The Strong Towns Podcast

Late Night Linux

Darknet Diaries

Cybersecurity Today

CISO Series Podcast

TechCrunch Daily Crunch

Strict Scrutiny

2.5 Admins

Cyber Security Headlines

What the Hack?