How do you think about PDF documents? Many people talk about PDF as "old" or
"boring". These sentiments aren't necessarily wrong, so why would anyone be
interested in a blog or podcast about PDFs? Why talk about PDFs at all, their inner
workings, history, and how PDFs affect modern society in so many ways? And why
now, when so many other technologies seem more exciting, relevant, or important?
Before we get started in exploring the dark forest that is PDF, this first post
is meant to unpack some answers to those
whys.
Why talk about PDF?
Architecture diagram not to scale. Image credit: xkcd #2347
The easy answer to this question is that PDFs are ✨everywhere✨. Every family,
every nation, every culture, every business and organization depends upon PDFs
in some way. The scale of PDF usage is so vast that it's hard to overstate.
If you are curious about the world around you, anything as widely used as PDFs
is worth understanding to some degree.
Some basic facts will help more than hyperbole though, so:
PDFs are the most-used document format in the world, by a wide margin.Adobe's 2020 10-K filing with the U.S. Securities and Exchange
Commission
estimated that trillions of PDF documents are generated every year (yes,
"trillions" with a 't').
PDF is among the most mainstream, front-of-mind technologies. The word "PDF"is one of the most-searched-for terms on the web, even surpassing some of the
most popular consumer technology brands, like "iphone" and "android". This is
not what a niche or dying technology looks like:
Worldwide relative (Google) search interest in "PDF", "iPhone", and
"Android", January 2005 - January 2025. Retrieved from Google
Trends
So, PDFs are widely used — perhaps one of the most widely used consumer-facing
technologies ever, for that matter — and that usage and PDF's mindshare among
the public seems as high as it could possibly be.
PDF is semi-permanent infrastructure
While it's useful to understand the scale of PDF's mindshare and usage like any
other popular technology, there are ways in which PDF is more like permanent
infrastructure, with a lifespan measurable in human generations.
At least three factors make this a reasonable characterization of PDF:
No PDF alternatives exist
The last serious effort at an alternative to PDF was Microsoft's
XPS in the
mid-aughts (now called OpenXPS, after its standardization retirement to the
ECMA standards body, to be published as a
standard just
once,
in 2009), and it never gained any real traction.
Beyond XPS, PDF alternatives have only ever been successfully fielded to serve
relatively niche use cases, such as ebook formats like ePub and .mobi.
PDF technologies are stable
Compared to the churn we've all become accustomed to in software libraries,
frameworks, programming languages, consumer computing hardware, and even
instruction sets and operating systems, PDF as a living platform for its
constituent technologies has been remarkably stable: a PDF generated in 1991
when Adobe first introduced the format is still a completely functional document
today, even using the most modern tools. Few things in computing can make
PDF is a living standard in industry and government
While originally created as a published but proprietary Adobe technology, PDF
has been an ISO standard since 2008,
revised in 2020, and has additionally
seen domain-specific subsets of the PDF standard
(PDF/A,
PDF/E,
PDF/UA,
PDF/VT,
and PDF/X, all of which we'll get
acquainted with in later posts) published more than a dozen times.
Meanwhile, PDF has been enshrined via statute and regulation by many governments
around the world as the preferred (and sometime only) document format for
official publications, legal agreements, archives, and so on.
Given all this, I fully expect not just my children to use and rely on PDFs
throughout their lives, but (if they come to be) my grandchildren as well.
Making a prediction like this about almost any other technology in computing
would be foolish, but this one feels quite safe.
Why talk about PDF now?
People often talk about PDF as being "mature", and not in a good way. Many
would (and do) call it "boring". I have occasionally been guilty of the same;
even though most of my professional life has been built around PDFs, that
familiarity (and some occasional shiny-object FOMO) has sometimes bred contempt.
Certainly compared to faster-moving lanes of the tech world, PDF might seem like
However, like any long-lived infrastructure that's mostly taken for granted,
once you lift the cover on PDF, there's a wealth of fascinating history and
technologies inside that are worth exploring and learning about:
As a publishing medium, parts of PDF have their roots in typography andprinciples of printing that go back to the invention of the printing press.
As a programmable document format, PDF incorporates lessons learned inprogramming language theory and information architecture that are not only
still relevant, but oftentimes sorely lacking in modern software practices.
As a container format, PDF incorporates dozens of other technologies toproduce complete documents: image formats and rasters, color spaces, font
rendering, and encryption and digital signature standards and techniques all
play their part.
As a global standard, PDF has grown up alongside and incorporated the bestpractices of internationalization, including support for
every kind of human script, and right-to-left and bidirectional text.
And, as common infrastructure that touches every single industryin the world, often in very sensitive places, PDF has a significant impact on
how businesses and organizations operate, how they're structured, and how they
interact with each other.
There is a beautiful depth to what PDF is, how it works, and the ways it
affects us all. It deserves to be more widely understood. And yet, outside of
a handful of insider-y industry blogs, there is little to no media dedicated to
exploring it. I hope to change that, here…and of course, now is always the
best time to start anything.
Why me?
Chas Emerick
For almost 25 years, I've been building tools to work with PDF documents —
creating, "editing", converting, searching, obfuscating, extracting data from,
and redacting them. Most of that work has been at
Snowtide, where I've been the lead engineer for a
loooong time on PDFxStream, a PDF data extraction
library for Java and .NET. In the process, I've had the opportunity to work with
a wide variety of customers and use cases, seeing PDF used and abused in ways
I'm sure the original designers never intended.
As a side hobby, I've also spent some considerable amount of time learning of
the history of PDF: the context in which it was initially created,
the functional predecessors that informed its design, and so on.
All that experience means that I've seen the best and worst of what PDFs have to
offer, in dozens of industries and roles. I know how the PDF format itself is
designed, and how to engineer tools to work with it well.
I'm hoping to make the content here at The PDF Minute reflect that experience, and
hopefully in time it will become an accessible educational resource for anyone
that wants to better understand how PDF documents work, why they are the way
they are, and how they influence and enable large swaths of modern society.
I'll be publishing here weekly, each time exploring one aspect of PDF's history,
technologies, or impact on society. Every post will also be available in
podcast form, as this one is. I hope you decide to join me on this little side quest,
and subscribe — either to the blog via its RSS feed
or mailing list, or to the podcast via your favorite podcast app — and follow
The PDF Minute on Twitter, Mastodon, and Bluesky to know when new content drops, and
to ask your own questions about PDF and related topics, maybe to be discussed in