| ||||||||||||||||||||||||||||||||
IBM home | Products & services | Support & downloads | My account |
|
Server clinic: PDF for the server | ||||
Automate generation of professional-quality output
Cameron Laird (claird@phaseit.net) PDF is the recognized standard for several categories of top-quality displayable output. While most programmers regard it as a "desktop" technology, a format that a content specialist chooses through a SaveAs operation, you can make your document management processes more powerful through server-side automation of PDF creation. This month, Cameron introduces the ReportLab library for PDF management and programming. You know PDF. When someone in marketing wants a brochure that looks "just so," or legal needs a document that shouldn't be changed, they publish it as Portable Document Format (PDF). PDF is a standard defined by Adobe Systems for platform-independent, device-independent rendering and display of documents. PDF builds on the fantastic success of Adobe's PostScript (PS), first released in 1984 to improve the printing sophistication possible with common hardware. In principle, PDF has a fixed appearance, invariant across different Web browsers and different devices including printers; the content of PDF documents is "locked down." While neither of these propositions is strictly true, they're close enough for most purposes. Moreover, PDF generally prints well; only a plain text document is more likely to be compatible with any particular printer. What does that have to do with you? As a systems or server-side programmer, perhaps you think of PDF as just another opaque content type. Your desktop users or document specialists occasionally update instances on your servers, and you serve up the files just as you would any others. That, you say, should be the limit of your involvement.
Programmatic PDF generation Desktop software vendors have a partial appreciation of this. Several word-processing or desktop-publishing packages have scripting capabilities that reach at least part of the way to PDF. Some shops create PostScript images and transform them into PDF with Ghostscript or similar packages. My favorite way to automate PDF generation, though, is with one of three actively maintained open source libraries: ReportLab, PJ, and PDFlib. They're all roughly comparable, and I've had medium to good success on projects that relied on each. Pointers to all three, along with several other tools, appear in Resources, below. Among these, ReportLab is the one I currently use most: it handles the multi-megabyte PDFs with which I work, its exposure of Python as a scripting language suits me, its library includes all the functionality I need for daily work, and the ReportLab company behind the library appears to enjoy sustainable business. Moreover, its convenient integration into the Python interactive shell makes for a delightfully productive development environment. The rest of this month's "Server clinic" illustrates how you can start to program PDF.
PDF's "Hello, world" With Python installed, you need to visit the ReportLab Download page before you begin your PDF programming career. Even over slow connections, downloads and installations of both Python and the ReportLab Toolkit take well under an hour (see Resources for links to both downloads) . The source code for your first application can be as simple as this: Source code for a "Hello, world" page
This code simply puts a headline on an otherwise blank piece of paper. While mundane, it hints at new horizons: font style and size, content, and formatting are all programmable. When your organization decides to publish in Times New Roman rather than Helvetica, you can, in principle, change one configuration assignment and regenerate everything, rather than having to open each of thousands of documents, alter them, and write them back out. The same is true for other effects: if you want to expand the typeface on information targeted to older readers, for instance, your application can automate that. Don't think you have to develop your own word processor to accomplish
anything meaningful, though. While the ReportLab library is broad and
deep enough to allow that, it also supports a couple of specific
shortcuts that enormously simplify my PDF programming. First is the
This gives me a very fast, easily maintained, fully programmatic way to pour content into PDF. ReportLab's processing efficiency is so good that I can comfortably generate all kinds of PDF documents for Web display on the fly. This gives me the opportunity to keep critical financial or engineering reports fully current with the latest data while preserving an appropriate visual appearance. Print documents enjoy the same choices for customization, of course.
Putting together PDF pieces For more sophisticated effects, ReportLab, like other PDF tool vendors, licenses a for-fee product. In ReportLab's case, its PageCatcher product annotates existing PDF documents, reorders their pages, reformats them for different printing methods, adds backgrounds (including watermarks), and fills in PDF forms. ReportLab documents several interesting uses for PageCatcher. One example is programmatic preparation of completed Internal Revenue Service (IRS) forms. A final ReportLab capability I've found important is its management
of Tables of Contents. Online document readers appreciate these
navigational aids, which Adobe calls "bookmarks" or "outlines." Most
PDF viewers show these as menus in a left-hand window. The ReportLab
Reference itself constitutes a nice example of a bookmarked document.
Such ReportLab functions as
Conclusion Future installments of "Server clinic" are likely to touch on other underappreciated fields for server-side automation, including generation of Excel and Word documents. Disclaimer: I'm on cordial personal terms with the employees of several companies that specialize in PDF-related products. However, I've never had a financial interest in any of the companies, nor any contractual relationship other than as an ordinary customer. PDF is a biiiiig subject. You don't have to know all of it, though, to begin a successful automation project. The resources below are more than sufficient to get you started.
|
About IBM | Privacy | Legal | Contact |