PDF v. TIFF? Which is the right format for your document scanning project?
One of the first decisions to make when you’ve decided to scan your documents into a document management or search and retrieval system is what format should I use. A simple decision right? Not so fast. There are many considerations when deciding on document file type for scanned documents. And the decision should be well-thought out as making the wrong decision can have expensive consequences. Later conversion to a different file type can be costly in terms of time and effort and data can be lost in certain conversions.
Scanning Considerations
First let’s look at what questions need answered before selecting your scanning file type. Answer these questions:
- What are the documents to be scanned?
Text-based (office documents, magazines, books, etc.)
Graphic-based (drawings, maps, charts, etc.)
- What are their characteristics?
Black and white, bitonal, grayscale, color
Stained, torn, aged
Handwritten notes, mixed components
- How will I use the scanned files?
Normal office use
Web search, retrieval and viewing
Archival search and retrieval or preservation
- How will my users search for documents?
Designated fields such as Invoice No., Customer Name, Date, Patient ID…
Free-form searching on all text
- Other considerations?
Legal: |
Admissibility and retention requirements? |
Retention: |
How long do to keep the file for the users, legal? |
Security: |
Do documents need passwords, restricted usage, changes tracked? |
Retrieval Limitations: |
Can my users wait milliseconds, seconds, or minutes? |
Storage Limitations: |
How many documents do I have? Is my storage budget limited ? |
Conversion |
Will I need to convert or present the files in another, or multiple formats later. |
Now let’s take a look at PDF and TIFF the dominant scanning file formats used today to understand their capabilities.
What is TIFF?
A quick look at TIFF or Tagged Image File Format:
- Created by Aldus and Microsoft in 1980’s and now owned by Adobe
- Developed as a format for scanned images
- Most recent version, 6.0 published in 1992
- Universal: Broadly adopted, widely supported by many applications and free viewers, platform independent
- Many subtypes representing different compression and color representation schemes.
The most notable versions for scanning documents are TIFF-UNC for uncompressed files, TIFF-G4 which uses G4 compression and TIFF-LZW which uses the LZW compression algorithms. All of these formats are “lossless” meaning the original data can be perfectly reconstructed from the compressed data as opposed to “lossy” which discards data in order to minimize the file size. TIFF-G4 is widely deployed in digital libraries and businesses as a master format for bitonal images. Bitonal images contain two colors such as in a black and white text document. TIFF-LZW is often used for bitonal or color and is most effective for solid colors (graphics), and less effective for 24-bit photo.
What is PDF?
A quick look at PDF -Portable Document Format:
- Created by Adobe over 20 years ago, portions now maintained by ISO
- Page-oriented and may contain text, images, graphics, and other multimedia content, such as video and audio
- Universal: Broadly adopted, widely supported by many applications and free viewers, platform independent
- Many subtypes representing different features
- Optionally: hyperlinks, searchable, assistive technology, security features, bookmarks
PDF files can be full-text searchable. Selecting “make searchable”, “apply OCR”, “text-under-image” or “searchable PDF” from your scanning device options creates a “full-text” searchable file by creating a PDF file with two layers, an image layer and a text layer for full-text searching.
Of note, PDF/A , is the ISO-standard for digital preservation or archiving of electronic documents. It differs from normal PDF by omitting features not necessary for long-term archiving, such as font linking. Use of PDF/A is growing in international government and industry segments, including legal systems, libraries, newspapers, and regulated industries.
A Quick Side Note on JPEG
JPEG is widely used for photographs. It only allows a single page scan and is a uses a “lossy” compression scheme. It is not considered a “document” scanning format and should only be used for photos where “lossy” compression is ok such as scanning photos for web viewing only.
Decision Points: Which Format Should I Use, TIFF or PDF?
Indexing/Searchability?
This is a key differentiator. TIFF was designed as a “wrapper for images. It can use simple tags only. To be fully searchable, it needs an OCR process to create a separate text file that can then be searched and indexed. Some document indexing software packages include this as an option.
PDF format accommodates basic tags and can support more sophisticated XML-based metadata with Adobe's Extensible Metadata Platform (XMP). XMP allows you to embed metadata about a file, into the file itself. Full-text searching option is easily supported and native to the file format so unless it is saved as an “image-only” format, it is fully searchable.
Adoption/Portability?
Both TIFF and PDF are universal in that they are common output formats of many applications. They also can be accessed and viewed using many different applications. TIFF files are easily integrated into other applications such as Word and PowerPoint as they are “image” based. Both formats are viewable across most if not all operating systems.
Longevity/Archiving?
Because of the widespread adoption and plethora of viewers both PDF and TIFF are expected to be viable file formats for some time. Because PDF/A format was designed for long term use and has been adopted by many libraries and government groups, PDF/A is the clear winner for archiving situations.
Security?
There are no built-in security features in TIFF; users can only be allowed or disallowed access. PDF has sophisticated security options. These include password protection, permissions and restricted use (view, search, print, cut/copy/paste restrictions), watermarking, and encryption.
File Size/Upload and Download Speed?
Before we take a look at file size which impacts storage requirements and upload/download speeds, let’s examine the four things that effect file size.
- Scanning Resolution
A 300 dpi scan is much smaller than a 600 dpi scan. - Color Space
Color and grayscale scans are much larger than black and white scans. - Physical Dimensions
An 8 ½ by 11 page is much smaller than an 11 x 14, all other things being equal. - Compression
Raw scans can be compressed for a much smaller size and compression technologies compress different types scanned of documents differently.
Both TIFF and PDF offer compression technology. Scan your typical documents with a variety of file compression formats to determine the acceptable file size and upload/download speed for your environment.
Color Space: Color, Grayscale, or Black and White?
As mentioned previously, TIFF- G4 compression files are often used for black and white or bitonal scans.
TIFF-LZW is often used for bitonal or color images and is most effective for solid color graphics and less effective for 24-bit photos. PDF files also offer different compression technologies which present options for color space.
As both TIFF and PDF support color, grayscale, and black and white, here again, scan your typical documents with a variety of formats to determine the acceptable output. Caution, scanning a black and white text document with a color setting, needlessly creates a large file.
Legal Admissibility?
Varies by country. Generally both file types can be admissible as long as the appropriate processes are followed for the rules of evidence for the specific jurisdiction
Conversion?
Both TIFF and PDF files can be converted with readily available tools. This may be important if your scanned files are to be used as “master files. For example, you may need to scan for both archival and web viewing. Because of file size, you may need to copy and convert a large archival file for easy web viewing. Hence the “master file” may need to be converted to another file type later.
Should I Use PDF or TIFF for my Document Scanning?
After review of your requirements and an understanding to the capabilities of each file type, the answer to what file type you should be using is often, both. You may decide to keep your “office” documents in PDF format and other documents such as floor plans, installation manuals and others with heavy graphics in TIFF. It’s up to you!