The PDF Minute

A podcast about PDFs? In this economy?


Listen Later

How do you think about PDF documents? Many people talk about PDF as "old" or

"boring". These sentiments aren't necessarily wrong, so why would anyone be
interested in a blog or podcast about PDFs? Why talk about PDFs at all, their inner
workings, history, and how PDFs affect modern society in so many ways? And why
now, when so many other technologies seem more exciting, relevant, or important?

Before we get started in exploring the dark forest that is PDF, this first post

is meant to unpack some answers to those whys.

Why talk about PDF?

Architecture diagram not to scale. Image credit: xkcd #2347

The easy answer to this question is that PDFs are ✨everywhere✨. Every family,

every nation, every culture, every business and organization depends upon PDFs
in some way. The scale of PDF usage is so vast that it's hard to overstate.
If you are curious about the world around you, anything as widely used as PDFs
is worth understanding to some degree.

Some basic facts will help more than hyperbole though, so:

  • PDFs are the most-used document format in the world, by a wide margin.
  • Adobe's 2020 10-K filing with the U.S. Securities and Exchange
    Commission
    estimated that trillions of PDF documents are generated every year (yes,
    "trillions" with a 't').
  • PDF is among the most mainstream, front-of-mind technologies. The word "PDF"
  • is one of the most-searched-for terms on the web, even surpassing some of the
    most popular consumer technology brands, like "iphone" and "android". This is
    not what a niche or dying technology looks like:

    Worldwide relative (Google) search interest in "PDF", "iPhone", and

    "Android", January 2005 - January 2025. Retrieved from Google
    Trends
    January 2025.

    So, PDFs are widely used — perhaps one of the most widely used consumer-facing

    technologies ever, for that matter — and that usage and PDF's mindshare among
    the public seems as high as it could possibly be.

    PDF is semi-permanent infrastructure

    While it's useful to understand the scale of PDF's mindshare and usage like any

    other popular technology, there are ways in which PDF is more like permanent
    infrastructure, with a lifespan measurable in human generations.

    At least three factors make this a reasonable characterization of PDF:

    No PDF alternatives exist

    The last serious effort at an alternative to PDF was Microsoft's

    XPS in the
    mid-aughts (now called OpenXPS, after its standardization retirement to the
    ECMA standards body, to be published as a
    standard just
    once,
    in 2009), and it never gained any real traction.

    Beyond XPS, PDF alternatives have only ever been successfully fielded to serve

    relatively niche use cases, such as ebook formats like ePub and .mobi.

    PDF technologies are stable

    Compared to the churn we've all become accustomed to in software libraries,

    frameworks, programming languages, consumer computing hardware, and even
    instruction sets and operating systems, PDF as a living platform for its
    constituent technologies has been remarkably stable: a PDF generated in 1991
    when Adobe first introduced the format is still a completely functional document
    today, even using the most modern tools. Few things in computing can make
    similar claims.

    PDF is a living standard in industry and government

    While originally created as a published but proprietary Adobe technology, PDF

    has been an ISO standard since 2008,
    revised in 2020, and has additionally
    seen domain-specific subsets of the PDF standard
    (PDF/A,
    PDF/E,
    PDF/UA,
    PDF/VT,
    and PDF/X, all of which we'll get
    acquainted with in later posts) published more than a dozen times.

    Meanwhile, PDF has been enshrined via statute and regulation by many governments

    around the world as the preferred (and sometime only) document format for
    official publications, legal agreements, archives, and so on.

    Given all this, I fully expect not just my children to use and rely on PDFs

    throughout their lives, but (if they come to be) my grandchildren as well.
    Making a prediction like this about almost any other technology in computing
    would be foolish, but this one feels quite safe.

    Why talk about PDF now?

    People often talk about PDF as being "mature", and not in a good way. Many

    would (and do) call it "boring". I have occasionally been guilty of the same;
    even though most of my professional life has been built around PDFs, that
    familiarity (and some occasional shiny-object FOMO) has sometimes bred contempt.
    Certainly compared to faster-moving lanes of the tech world, PDF might seem like
    a backwater.

    However, like any long-lived infrastructure that's mostly taken for granted,

    once you lift the cover on PDF, there's a wealth of fascinating history and
    technologies inside that are worth exploring and learning about:

    • As a publishing medium, parts of PDF have their roots in typography and
    • principles of printing that go back to the invention of the printing press.
    • As a programmable document format, PDF incorporates lessons learned in
    • programming language theory and information architecture that are not only
      still relevant, but oftentimes sorely lacking in modern software practices.
    • As a container format, PDF incorporates dozens of other technologies to
    • produce complete documents: image formats and rasters, color spaces, font
      rendering, and encryption and digital signature standards and techniques all
      play their part.
    • As a global standard, PDF has grown up alongside and incorporated the best
    • practices of internationalization, including support for
      every kind of human script, and right-to-left and bidirectional text.
    • And, as common infrastructure that touches every single industry
    • in the world, often in very sensitive places, PDF has a significant impact on
      how businesses and organizations operate, how they're structured, and how they
      interact with each other.

      There is a beautiful depth to what PDF is, how it works, and the ways it

      affects us all. It deserves to be more widely understood. And yet, outside of
      a handful of insider-y industry blogs, there is little to no media dedicated to
      exploring it. I hope to change that, here…and of course, now is always the
      best time to start anything.

      Why me?
      Chas Emerick

      For almost 25 years, I've been building tools to work with PDF documents —

      creating, "editing", converting, searching, obfuscating, extracting data from,
      and redacting them. Most of that work has been at
      Snowtide, where I've been the lead engineer for a
      loooong time on PDFxStream, a PDF data extraction
      library for Java and .NET. In the process, I've had the opportunity to work with
      a wide variety of customers and use cases, seeing PDF used and abused in ways
      I'm sure the original designers never intended.

      As a side hobby, I've also spent some considerable amount of time learning of

      the history of PDF: the context in which it was initially created,
      the functional predecessors that informed its design, and so on.

      All that experience means that I've seen the best and worst of what PDFs have to

      offer, in dozens of industries and roles. I know how the PDF format itself is
      designed, and how to engineer tools to work with it well.

      I'm hoping to make the content here at The PDF Minute reflect that experience, and

      hopefully in time it will become an accessible educational resource for anyone
      that wants to better understand how PDF documents work, why they are the way
      they are, and how they influence and enable large swaths of modern society.

      I'll be publishing here weekly, each time exploring one aspect of PDF's history,

      technologies, or impact on society. Every post will also be available in
      podcast form, as this one is. I hope you decide to join me on this little side quest,
      and subscribe — either to the blog via its RSS feed
      or mailing list, or to the podcast via your favorite podcast app — and follow
      The PDF Minute on Twitter, Mastodon, and Bluesky to know when new content drops, and
      to ask your own questions about PDF and related topics, maybe to be discussed in
      future posts.

      ...more
      View all episodesView all episodes
      Download on the App Store

      The PDF MinuteBy Chas Emerick