When PDF was introduced in 1993, one of the most persistent problems in
mainstream computing was that reliably publishing documents (either literally
via printing, or simply electronically distributing them for others to view) was
There were a lot of hurdles:
Simply moving a document (whether an office document, Postscript file, orsomething else) from one computer to another could result in an unreadable or
unpleasant display.
Printers (from consumer models up to high-endtypesetters) each had their own proprietary formats and requirements.
Many document formats were tied to a single vendor, or asingle operating system.
One of PDF's initial design criteria and fundamental promises was to address
this family of problems, so that one could distribute and use documents with any
display, any operating system, and any print device, with confidence that the
result would remain faithful to the author's intent. This was such a pressing,
unmet concern that it gave the file type its name: the Portable Document
Format. Let's talk about how that portability is accomplished.
Documents are heterogeneous…
Most document formats focus on text: oftentimes its logical structure, sometimes
some aspects of its appearance, and occasionally some metadata. However, for
a document to be faithfully rendered away from its author's computer, a host of
other data is needed: fonts, images (if any), vector graphics, essential
auxillary data, and so on. Documents are definitionally heterogeneous, and
missing any part of a document's data or dependencies can render it useless.
The way that web content handles this is by referring to these external data,
with the expectation that browsers will fetch and integrate them appropriately.
This is how most non-PDF document formats are also structured: for example,
Postscript files, PDF's predecessor, refer to fonts and images in a similar way
as HTML (though using names and sometimes hard-coded relative file paths instead
of URLs), and those resources have to be carried around alongside the
document(s) that refer to them. But if a Postscript or HTML file refers to some
resources that aren't available or have moved unexpectedly, the document's
rendering will be fundamentally broken.
…so every PDF carries what it needs
PDF's solution to this problem is to avoid referring to external resources
entirely1. Instead, PDF documents are self-contained: all of the data
needed to render the document is included, from fonts to images to metadata to
interactive elements and auxiliary data. Satisfying this most basic premise —
knowing that a document's resources would always travel with it — clears the
lowest bar of portability.
Next time, I'll talk about the (very cool) fundamental structures within every
PDF document, and how they are designed to support including all of these
disparate data types and resources in a single container file.
Rendering documents to different devices is hard…
At the time of PDF's introduction, document rendering was done in a bespoke way
by each individual application, and often was tied to the particular operating
system and output device being targeted. That is, a word processing program
would need to use a completely different rendering approach when rendering to a
display on Windows vs. a display on a Mac vs. sending a document to a printer.
…so PDF uses an abstract rendering model for all of them
Adobe changed that by introducing2 (as part of Postscript) what would later come
to be known as the Adobe Imaging Model, a high-level procedural rendering
approach that provided abstractions over the details of operating system and
output device. The model includes command primitives for drawing text, lines,
shapes, images, setting fonts, colors, clipping paths, and so on. PDF adopted
most of the Postscript graphics model's semantics, and then extended it over the
decades to support new features, media types, and usage patterns.
It was a good abstraction, in large part because it neatly separated concerns
between groups with different incentives and requirements: applications could
target a relatively high-level rendering model, a far simpler task than needing
to know the details of each class of display or printer they might render to;
and groups responsible for implementing displays (usually operating system
vendors) and manufacturing printers could focus on distilling those high-level
graphics commands into concrete actions to color pixels, move print heads, and
This imaging model was such a successful abstraction that it effectively
redefined how 2D graphics are programmed and rendered. If you've done any
graphics programming in the last 30 years, you've benefitted from the results of
that progress, as you've surely used a library or API that provides a similar
abstraction; the Adobe Imaging Model was the direct precursor to the most
widely-used modern 2D graphics APIs like Java's Graphics2D, .NET's
System.Drawing, Skia's Canvas, and the web-standard canvas API. We'll talk
a lot about this graphics model in future posts.
Proprietary document formats actively prevent portability…
Before PDF, most document formats were proprietary, and choices were regularly made
by vendors to use document formats as competitive leverage, usually to the
detriment of users' interests.
Microsoft Word was a particularly notorious offender, as
there was not a single "Word document format", but rather a matrix of format
variations depending on the version of Word and the operating system being used,
each with its own quirks and limitations when it came to importing other variants.
While this was a great benefit to Microsoft's Word and Windows businesses, it was
a nightmare for users who needed to share documents with others using different
programs or operating systems.
When Adobe first introduced PDF in 1993, it could have kept the format strictly
proprietary, so that only Adobe and its designated partners could implement PDF
generators, viewers, and so on. After all, other peer companies and file formats
(e.g. Microsoft with Word, Apple with QuickTime) had taken that approach, to
great commercial success.
…so PDF was "open" from the start
Instead of introducing yet another proprietary file format, Adobe did two things
with PDF that were quite unusual:
They published a detailed specification of the format in 1993, including thealgorithms and data structures used to encode and decode PDF documents.
Further, they explicitly encouraged software vendors, printer manufacturers,
and others to adopt and implement PDF. This was a big deal: it meant that
anyone could write software to read or write PDF documents, without needing
to reverse engineer the format. This made it possible for a wide variety of
software to support PDF, from word processors and web browsers to
printers and image editors.
Later, in 2008, Adobe submitted the PDF specification to the InternationalStandards Organization (ISO), where it was accepted as an open standard, and
has since been further refined and expanded in concert among a diversity of
interested vendors. As part of this, Adobe also issued a public patent
license3, where they explicitly swore off any claim to enforce
patents that covered technologies within the PDF standard4.
If Adobe had treated PDF as a strictly proprietary format, existing only to
enrich themselves and provide them with a unique competitive advantage, I don't
think PDF would be as widely-used as it is today. More importantly, though,
without a coordinated expectation of "openness" (however vaguely defined or
informal in the early days), and then the tangible commitment to remove all
remaining proprietary interests from the PDF landscape5, it's
likely that other vendors and groups would have attempted to create
mutually-incompatible PDF variants over time.
Such fragmentation would have significantly degraded the real-world portability
of PDF documents: just imagine if Microsoft or Apple or Google had successfully
pushed their own incompatible PDF variants (or some other wholly-different
document format6), to the extent that "real" PDF documents were no longer
guaranteed to render correctly on Windows, or Mac, or iPhone or Android devices.
The promise of PDF's portability would have been broken.
PDF effectively solved the problem of document portability by addressing
these three fundamental issues: structurally guaranteeing that document
resources would always move with the document; disentangling document rendering
from any particular display, device, or operating system via an abstract
rendering model; and by being first an open and then a standardized
specification that anyone could implement. This accomplishment did not come
without its own set of tradeoffs, which we'll come back to in later posts.
Footnotes
PDF documents are allowed by refer to certain types of resources
using file paths, but this rare practice is a concession to certain specialized
workflows where it would be extremely costly to repeatedly embed
frequently-updated resources on every edit. ↩
The actual graphics model was first introduced in a 1982 paper,
published well before Adobe was ever founded.
'A device independent graphics imaging model for use with raster
devices' is a short paper,
easy to read, and is very worth taking in to better understand the design
decisions that underpin the graphics model, and thus, PDF itself. ↩
https://www.adobe.com/pdf/pdfs/ISO32000-1PublicPatentLicense.pdf ↩
Prior to this, Adobe had made informal assurances about their
disinterest in enforcing PDF-related patents against third party vendors and
open source projects that implemented PDF support. Those assurances were not
legally binding, so the formal patent grant took the legal risk associated
with implementing PDF software off the table for good. ↩
This is not to say that Adobe has not benefitted from making PDF an
open standard. They have, and continue to do so, in many ways. However, the
point is that the benefits of making PDF an open standard have been widely
distributed, and have accrued to many parties, not just Adobe. ↩
Microsoft did try to push their own document format, XPS, as a
competitor to PDF. It never gained significant traction, and Microsoft has since