The compression algorithm removes useless objects (duplicated or unused), strips off regenerable objects (ASCII filters, thumbnails), and replaces LZW compression with the superior Flate (even on inline images), among other optimizations. By default it retains all non-regenerable information. Further space savings are possible by setting command line options to remove various non-regenerable objects as well or to apply lossy image compression such as JPEG.
Unless the -compatible option is given,
PDFs are updated to the current version of PDF, which at this time is PDF 1.5, which corresponds to Acrobat 6.
Additional compression is possible in PDF 1.5 with cross reference streams and object streams.
With PDF 1.5, one can usually obtain about the same compression as
gzip on the whole PDF file,
with the advantage that the PDF is directly readable by Acrobat.
Note that if the PDF is already PDF 1.5, this option does not back convert it to and earlier version of PDF.
Compact PDF is a new format that can give an additional compression of 30 to 60% on many classes of PDF beyond what is possible in PDF 1.5. For instance, the PDF Reference 1.5 shrinks from 12.2MB as distributed by Adobe down to 4.4MB in Compact format. No information is lost in obtaining the additional compression, in contrast to some methods that throw away structural or other information or use lossy image compression. This format is not presently part of Adobe's PDF specification and cannot be directly read by Acrobat. However, it is fully supported by Multivalent: the PDF viewer in the browser and all tools including full-text search are "Compact-aware", meaning that they transparently view and manipulate Compact PDFs just as easily as standard ones. You can archive PDFs in this format and if you need to read them in non-Compact-aware PDF viewers, you can always convert them back to standard format by rerunning this tool omitting the -compact option. Technical details for developers can be found in the Compact PDF Specification.
|Category||Representative PDF||Original Size||gzip||-compat||PDF 1.5||Compact||-max|
|pure text||Aida.pdf||85,437||19,187||33,184||29,661||15,477 (81% savings)||15,461|
|Compact format compresses all the pages together, rather than as individual pages.|
|TeX document||pdftex-s.pdf||329,601||197,483||240,793||188,142||93,529 (71%)||93,530|
|TeX documents that use Computer Modern fonts typically embed it as an encrypted Type 1 fonts. Compact format decrypts them to make them available to compression for the first time.|
|FrameMaker||Java Language Specification 2.0||4,419,906||1,622,296||3,938,673||1,534,857||829,671 (81%)||829,667|
|FrameMaker generates many named destinations (anchors), which compress well in PDF 1.5 and Compact. FrameMaker will also sometimes write out the page template in each page rather than sharing it, and this compress out in Compact.|
|reference manual||PDF Reference 1.5 (draft)||12,765,416||7,399,695||10,973,652||7,247,136||4,577,057 (64%)||4,438,420|
|PDF Reference 1.5 (final; in PDF 1.4)||14,171,448||8,386,007||13,184,358||8,231,991||5,356,277 (62%)||5,243,832|
|PDF Reference 1.5 (final; in PDF 1.5)||9,190,216||8,377,179||8,205,628||8,205,628||5,350,532 (41%)||5,236,604|
|Reference manuals are typically dominated by text, which compresses better in Compact because it compresses all pages together.|
|converted from HTML||htmldoc||379,568||245,630||349,335||256,767||169,704 (55%)||169,708|
|W3C HTML 4.0 specification||3,006,205||958,150||1,727,038||936,942||468,013 (84%)||467,999|
|Hyperlinks compress well with PDF 1.5 object streams and Compact.|
|book||Manning JDK 1.4||10,168,352||8,871,949||9,777,838||8,633,355||2,999,054 (70%)||2,857,728|
|UNIX Haters Handbook||3,639,172||2,803,546||3,125,244||2,516,284||2,068,612 (43%)||2,068,607|
|Real World Go Live||18,530,903||15,692,402||18,015,832||17,424,962||12,800,412 (30%)||12,800,405|
|Large books often compress well in percentage and absolute terms, saving much bandwidth for online distribution.|
|image dominated||p40-marshall.pdf||1,762,945||1,488,077||1,594,236||1,584,487||1,272,028 (27%)||1,272,028|
|By default images are not recompressed. With the -jpeg option, JPEG compression is applied to raw image samples, which explains the drastic compression for beos_osx in the -max column. If you have lots of scanned images, Adobe Acrobat 6 can compress them considerably with JBIG2 or JPEG2000; for instance, UnixTextProcessing compresses down to about 9MB with JBIG2.|
|high quality PDF generator||JDJ 1-01.pdf||3,236,507||3,072,982||3,213,335||3,155,262||2,924,454 (9%)||2,729,700|
|Sometimes only a little compression is available, but isaacs uses the latest libraries (Adobe InDesign 2.0.2, Adobe PDF Library 5.0) and still compresses well.|
Some large PDFs need more than the 64MB that Java limits itself to by default. Additional memory can also sometimes speed up compression. Most PDFs compress in a few seconds, but a small percentage need up to a few minutes for the most advanced Compact mode. If Compress stops with an
java tool.pdf.Compress [options] PDF-file(s)
OutOfMemoryErroror takes more than two minutes on a given PDF (most take less than 10 seconds), try boosting memory, as in:
The new PDF file has the same name as the original with the addtion of of -o before the .pdf suffix.
java -Xmx128m tool.pdf.Compress ...
Note: PDFs lose their "linearization" or "Fast Web View" organization. Use another tool to recompute it if desired.