PDF Clown 0.1.3 — Document Inspector

NOTE — As ContentScanner class is under refactoring, the development of 0.1.3 version is temporarily frozen.

1. Document Inspector

Since its earliest versions, PDF Clown has been shipped including a simple Swing-based proof of concept for viewing PDF file structures. Now that little fledgling is going to become a comprehensive tool for the visual editing of the structure of PDF files: PDF Clown Document Inspector. It was initially planned to be part of 0.1.2 version as a dedicated project within the PDF Clown distribution, but approaching the release deadline it wasn’t ready yet.

This tool conforms to the PDF model as defined by PDF Clown (see the diagram above), which adheres to the official PDF Reference 1.7/ISO 32000-1. This implies that a PDF file is represented through several concurrent views which work at different abstraction levels: Document view (document layer), File view (file/object layer, hierarchical) and XRef view (file/object layer, flat).

1.1. Document view

Document view (see the left pane in the above screenshot) shows the high-level structure of a PDF file; selecting a node, its data is shown in the right pane through several views — in this case, selecting a page node shows its content stream structure (Contents view, see below) and its rendering (Render view [¹], see above). Note that the page model represented by both Contents view and Render view corresponds to the content (sub)layer described in the diagram above.

Here it is just one of the possible functionalities: hovering the mouse pointer over a show-text-operation node, a tooltip pops up revealing the actual text encoded inside it (in this example, inspecting a russian-language document):

There’s such a potential for custom features that I’m considering to make it pluggable so as to let it be extended with additional modules, at user’s will.

1.2. File view

File view shows the low-level representation of the same entities you found in the above-mentioned Document view, expressed as primitive objects like dictionaries (PdfDictionary), arrays (PdfArray), streams (PdfStream) and so on.

1.3. XRef view

XRef view lists the entries of the cross-reference index (either table or stream, but that’s a technical detail you can happily ignore as it’s transparently handled by the library).

It’s really interesting to note that all the views (Document, File, XRef) are always kept synchronized: when you select a node in one of these views, its corresponding entities in each of the others are automatically selected, allowing to seamlessly switch from one view to another.

[¹] Rendering is still partial as it’s under development (pre-alpha stage).

6 thoughts on “PDF Clown 0.1.3 — Document Inspector

      1. Hi Darren, unfortunately I do not commit any code that hasn’t reached an acceptable level of stability yet. I’m sorry but this functionality will be available only after the ContentScanner refactoring is complete. thank you for your interest!

  1. Hi!

    Is there any improvement to the Renderer in the next release? It is in Pre-Alpha now and missing text for instance. I thought, as you are using the Render view in the Document inspector, that this area would get some focus now. Would be highly appreciated!

    1. I have some good news: I’m currently working on major enhancements about page content handling (including page rendering). I’ll soon publish a new post here to describe the ongoing iteration (0.2.0, as 0.1.3’s activity is going to be revived in a subsequent development cycle); in the meantime, please follow the project’s Twitter account.

Leave a Reply to stechio Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s