Blog

PDF Clown 0.2.0 — Enhanced content handling

UPDATED ON January 20, 2022

IMPORTANT — Version 0.2.0, which had paused since 2016 (and never published), has been superseded by a new, heavy development cycle (version 2.0) since early 2021: the new version (whose release date is still undefined) will represent a full-order-of-magnitude advancement over the existing code base, completely renovating its functionalities yet keeping its core design principles; it will be modular to accommodate its sprawling set of features and will be Java-only (no more .NET).

PDF Clown 0.2.0 development iteration revolves around these topics:

  1. Content stream manipulation: ContentScanner class has been refactored to expand its capabilities;
  2. Content composition engine: DocumentCompositor class has been introduced to support high-level typographic functionalities.
  3. Form flattening: a new tool (FormFlattener) has been added to replace Acroform fields with their static appearance for content consolidation.
  4. Annotations: PDF annotation support has been expanded and refined.
  5. Automated object stream compression: full PDF compression is now automatically applied.

NOTE: All the topics above will be part (in more advanced form) of the new version 2.0 (which supersedes the unpublished version 0.2.0), with the notable exception of the content composition engine, which has been completely redesigned into a fully integrated HTML-to-PDF layout engine.

1. Content stream manipulation

Since its very inception, I have been really delighted by the concept subtending the ContentScanner class, as it proved to be a versatile processor for handling content stream object trees along with their graphics state: you could use it directly to read existing content streams, modify them and also create new ones in a convenient object-oriented fashion, or it could be plugged into specialized tools (e.g. PrimitiveCompositor, TextExtractor, Renderer, etc.) for more advanced applications.

But till version series 0.1.x it suffered a significant drawback: it lacked separation of concerns from its object model, that is the algorithmic responsibility to carry out the tasks was delegated to the respective content stream operations. This may work well in case there’s just a single task (“read/write the content stream”), but when further tasks are required (e.g. rendering the content stream into a graphics context) it rapidly becomes unbearable.

Therefore I proceeded with a massive refactoring which was informed by two main concurrent requirements: algorithmic separation between process and structure (accomplished through the classic Visitor pattern) and preservation of the distinctive cursor-based behavior of ContentScanner (solved through dedicated impedance-matching logic).

All the non-core functionalities which were bloating the original ContentScanner (like rendering and content wrappers) have been extracted into specialized processors (respectively: ContentRenderer and ContentModeller), resulting in the following classes:

  • ContentVisitor: abstract content stream processor supporting all the common graphics state transformations;
  • ContentScanner: multi-purpose cursor-based processor;
  • ContentModeller: modelling processor (generates declarative forms (GraphicsElement hierarchy) of the corresponding content objects);
  • ContentRenderer: rendering processor (generates raster representations of the content stream).
ContentScanner refactored
ContentScanner refactored

1.1. ContentScanner

ContentScanner‘s new implementation focuses exclusively on its core purpose, that is to enable users to manipulate content streams through their low-level, procedural, stacked model (operations and composite objects along with their graphics state).

1.2. ContentModeller

ContentModeller works as a parser which maps the low-level content stream model to its high-level, declarative, flat representation through a dedicated model rooted in GraphicsElement abstract class (which corresponds to GraphicsObjectWrapper hierarchy of ContentScanner’s old implementation). This simplified-yet-equivalent representation can be modified and saved back into the content stream.

1.3. ContentRenderer

ContentRenderer works on content rasterization (that is page imaging and printing). Its reimplementation spurred enhancements in text rendering, image object rasterization and color space management.

Let’s see a practical example of the flexibility delivered by the new renderer: suppose that you have a multi-layered (OCG) document and you would like to selectively render only the contents belonging to a specific layer. To accomplish this, you can subclass ContentRenderer and tweak the drawing switch according to your own logic:

import java.awt.Graphics2D;
import java.awt.geom.Dimension2D;

import org.pdfclown.documents.contents.IContentContext;
import org.pdfclown.documents.contents.layers.LayerEntity;
import org.pdfclown.documents.contents.objects.ContentMarker;
import org.pdfclown.documents.contents.objects.MarkedContent;
import org.pdfclown.documents.contents.render.ContentRenderer;
import org.pdfclown.objects.PdfName;
import org.pdfclown.tools.Renderer;

class LayerRenderer
  extends ContentRenderer
{
  private Collection<LayerEntity> layers;

  public LayerRenderer(
    IContentContext context,
    Collection<LayerEntity> layers
    )
  {
    super(context.getContents());
    this.layers = layers;
  }

  @Override
  public void render(
    Graphics2D canvas,
    Dimension2D canvasSize
    )
  {
    drawable = false; // This avoids contents to be drawn on the canvas.
    super.render(canvas, canvasSize);
  }

  @Override
  public Object visit(
    MarkedContent object,
    Object data
    )
  {
    boolean isLocalDrawing = false;
    if(!drawable)
    {
      ContentMarker marker = object.getHeader();
      if(marker.isLayer()
        && ((LayerEntity)marker.getProperties(getContext())).containsAny(layers))
      {
        isLocalDrawing = true;
        drawable = true; // This enables contents to be drawn on the canvas because they belong to the relevant layer.
      }
    }
    Object result = super.visit(object, data);
    if(isLocalDrawing)
    {
      drawable = false; // This restores default (non-drawing) state exiting the layer contents.
    }
    return result;
  }
}

. . .

Renderer renderer = new Renderer();
renderer.render(new LayerRenderer(page, selectedLayers), imageSize, outputFile);

Let’s apply this custom renderer to a sample PDF comprising three layers, each associated to a distinct shape content.

The sample represents three shapes, each belonging to a different layer.
The sample represents three shapes, each belonging to a different layer.

And this is the result when filtering the ‘square’ layer:

Selectively drawing only the required layer.
Selectively drawing only the required layer.

Following is a more complex case, a historical map from USGS collection: Saint Landry, Louisiana (BTW, PDF Clown is proud for having been adopted by USGS in 2010 for their relayering process [original link]).

USGS map as rendered by Adobe Reader
USGS map as rendered by Adobe Reader
USGS map as rendered by PDF Clown
USGS map as rendered by PDF Clown

The original, multi-layered map (see above) features geographic artifacts (projection and grids) which we decided to exclude from our selective rendering, choosing ‘Roads’, ‘Hydrographic Features’, ‘Contour Features’ and ‘Woodland’ layers only (see below).

Projection and grids excluded from the rendering
Projection and grids excluded from the rendering generated by PDF Clown

How about page boundaries? When production-related contents such as bleeds or printer marks populate the area outside the boundaries of the finished page they appear like this:

Front page of the UNESCO Science Report 2010 rendered by PDF Clown (bleeds included (media box))
Front page of the UNESCO Science Report 2010 rendered by PDF Clown (bleeds included (media box))

Excluding these additional elements is a trivial matter, as ContentRenderer takes care to clip the rasterized area along the page crop box. Here it is the final result:

Front page of the UNESCO Science Report 2010 rendered by PDF Clown (clipped (crop box))
Front page of the UNESCO Science Report 2010 rendered by PDF Clown (clipped (crop box))

The following sample (from an old brochure of the Natural Tunnel State Park, Virginia) demonstrates how the renderer has evolved since its pre-alpha stage: text-showing operations have been temporarily implemented through substitute fonts emulating the styles (italic, bold, regular…) of the actual ones — such trick works nicely for thumbnail generation. Next step will address full-size rendering quality, adding support to glyph outlines.

Text and vector graphics as rendered by PDF Clown
Text and vector graphics as rendered by PDF Clown

Another comparison between Poppler’s and PDF Clown’s renderings (as noted above, the latter currently doesn’t perform embedded-font outline rasterization), page 8 of the Virginia State Parks Guide:

Generated by Poppler
Generated by Poppler
Generated by PDF Clown
Generated by PDF Clown

The substitute fonts seem to work quite well also for non-Latin Unicode characters (as mapped on Ubuntu GNU/Linux):

First page of the Universal Declaration of Human Rights, Arabic translation, as rendered by PDF Clown
First page of the Universal Declaration of Human Rights, Arabic translation, as rendered by PDF Clown
First page of the Universal Declaration of Human Rights, Chinese translation, as rendered by PDF Clown
First page of the Universal Declaration of Human Rights, Chinese translation, as rendered by PDF Clown

2. Content composition engine

PDF Clown 0.2.0 introduces the much-requested keystone of its content composition stack: DocumentCompositor class. This engine features a layout model inspired by a distilled, meaningful subset of the HTML+CSS ruleset.

Its high-level typographic model (columns, sections, paragraphs, tables and so on) is laid out leveraging the existing lower-level functionalities provided by BlockCompositor (paragraph typesetting) and GraphicsCompositor (primitive graphics instructions — previously named PrimitiveCompositor), the latter of which in turn sits upon the above-mentioned ContentScanner for feeding into the content stream (IContentContext).

PDF Clown's content composition stack
PDF Clown’s content composition stack

This subject is massively broad, so here I’m going to give you just some little highlight about its features (development is currently underway — I’ll add more details as it advances):

2.1. Multi-column layout

PDF Clown’s layout engine supports the multi-column layout model described by the CSS3 specification, which extends the block layout mode to allow the easy definition of multiple columns of text (and any other kind of content, like tables, images, lists and so on). Columns can be defined by count (number of columns desired), width (minimum column width desired) or both: in any case, the official CSS3 pseudo-algorithm is applied.

PDF Clown, according to the CSS3 specification, automatically balances the column heights, that is, it sets the maximum column height so that the heights of the content in each column are approximately equal. This is possible because of a simulation algorithm which ensures an accurate arrangement. Should the content exceed the available height on the paged medium, it would automatically flow into the next page.

If you are interested in further info about CSS multi-column layouts, I recommend you to see Mozilla’s great introduction to CSS Multi-column Layout Module.

Here it is a practical example of its use:

Multi-column layout sample
Multi-column layout sample generated by PDF Clown

And this is the corresponding code:

import org.pdfclown.documents.Document;
import org.pdfclown.documents.contents.composition.*;
import org.pdfclown.documents.contents.fonts.StandardType1Font;
import org.pdfclown.util.math.geom.Dimension;

. . .

DocumentCompositor compositor = new DocumentCompositor(document);

/*
  NOTE: Compositor's style is at the root of the style model, that is, its definitions
  are inherited by the descending elements, analogously to the style of BODY element
  in HTML DOM.
*/
compositor.getStyle()
  .withTextAlign(XAlignmentEnum.Justify)
  .withFontSize(new Length(12));

/*
  NOTE: Element type styles are analogous to CSS styles defined through element type
  selectors.
*/
compositor.getStyle(Paragraph.class)
  .withTextIndent(new Length(10));
compositor.getStyle(Heading.class)
  .withMargin(new QuadLength(0, 0, 10, 0));

/*
  NOTE: Styles can be defined analogously to CSS class definitions and can be derived
  analogously to Less mixins (http://lesscss.org/).
*/
Style strongStyle = new Style("strong")
  .withFont(new Font(new StandardType1Font(document, StandardType1Font.FamilyEnum.Times, true, false), null));
Style emStyle = new Style("em")
  .withFont(new Font(new StandardType1Font(document, StandardType1Font.FamilyEnum.Times, false, true), null));
Style noteStyle = new Style("note")
  .withBorder(new Border(
    null,
    new QuadBorderStyle(BorderStyleEnum.Solid, BorderStyleEnum.None, BorderStyleEnum.None, BorderStyleEnum.None),
    new QuadLength(.1, 0, 0, 0),
    null))
  .withFont(new Font(null, 6d))
  .withMargin(new QuadLength(30, 0, 0, 0))
  .withPadding(new QuadLength(5, 0, 0, 0))
  .withTextAlign(XAlignmentEnum.Left)
  .withTextIndent(new Length(0));
Style superStyle = new Style("super")
  .withFont(new Font(null, 6.5d))
  .withVerticalAlign(LineAlignmentEnum.Super);

Section section = new Section("Hello World, this is PDF Clown!");

/*
  NOTE: Group is a typographic element analogous to DIV element in HTML DOM.
*/
Group group = new Group(
  new Image(
    new Style()
      .withFloat(FloatEnum.Left)
      .withMargin(new QuadLength(new Length(5)))
      .withSize(new Dimension(100,0)),
    document,
    "Clown.jpg"
    ),
  new Paragraph(
    new Text("PDF Clown's layout engine supports the "),
    new Text(strongStyle, "multi-column layout model"),
    new Text(" described by the CSS3 specification"),
    new Text(superStyle, "[1]"),
    new Text(" which extends the block layout mode to allow the easy definition of multiple columns "
      + "of text (and any other kind of content, like tables, images, lists and so on).")
    ),
  new Paragraph(
    new Text("PDF Clown, according to the CSS3 specification, "),
    new Text(emStyle, "automatically balances the column heights"),
    new Text(", i.e., it sets the maximum column height so that the heights of the content in each column "
      + "are approximately equal. This is possible because of a powerful simulation algorithm which ensures "
      + "an accurate arrangement. Should the content exceed the available height on the paged medium, it "
      + "would automatically flow into the next page.")
    ),
  new Paragraph(
    new Text("Columns can be defined by count (number of columns desired), width (minimum column width desired)"
      + " or both: in any case, the official CSS3 pseudo-algorithm is applied"),
    new Text(superStyle, "[2]"),
    new Text(". If you are interested in further info about CSS multi-column layouts, I recommend you to see "
      + "Mozilla's documentation for a great introduction to CSS Multi-column Layout Module"),
    new Text(superStyle, "[3]"),
    new Text(".")
    ),
  new Paragraph(noteStyle,
    new Text("1. http://www.w3.org/TR/css3-multicol/\n"
      + "2. http://www.w3.org/TR/css3-multicol/#pseudo-algorithm\n"
      + "3. https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Using_multi-column_layouts")
    )
  );
/*
  NOTE: This is the declarative CSS3-equivalent style which prescribes the layout engine to treat
  this group as a multi-column block (in this case: 2 columns with a 14-point gap between).
*/
group.getStyle().withColumn(new Column(2, new Length(14)));

section.add(group);

compositor.show(section);
compositor.close();

Honoring the KISS principle, all the magic here is done by a minimal declaration (see line 100 above) which, analogously to the CSS fragment {column-count:2; column-gap:14pt;}, prescribes the layout engine to render the content group as a multi-column block:

group.getStyle().withColumn(new Column(2, new Length(14)));

This solution provides adaptive column intrusion detection, that is the layout engine keeps track of absolutely-positioned elements and its block compositor takes care to automatically flow content around those already-occupied areas.

As I said, multi-column layout is just a little treat in a full-fledged layout engine… PDF Clown is maturing: in the next weeks new technical details, code snippets and announcements will appear here. Stay tuned with its Twitter stream!

2.2. Tables

I know many of you eventually craved PDF Clown to natively support table composition… So, here we go: fully styleable, rowspan/colspan-enabled, arbitrarily nestable… really sweet!

Let’s see an example to reason on the real thing:

Table composition sample
Table composition sample generated by PDF Clown

And this is the corresponding code:

import org.pdfclown.documents.Document;
import org.pdfclown.documents.contents.colors.DeviceRGBColor;
import org.pdfclown.documents.contents.composition.*;
import org.pdfclown.documents.contents.fonts.StandardType1Font;

. . .

DocumentCompositor compositor = new DocumentCompositor(document);

/*
  We decide that table cells sport a solid border by default (analogous
  to CSS styles defined through an element type selector).
*/
compositor.getStyle(Cell.class)
  .withBorder(new Border(
    new QuadColor(new DeviceRGBColor(0, 0, 0)),
    new QuadBorderStyle(BorderStyleEnum.Solid),
    new QuadLength(new Length(1)),
    new QuadCornerRadius()))
  .withPadding(new QuadLength(new Length(5)));
/*
  Custom style for table headers (analogous to a CSS class definition).
*/
Style headerRowStyle = new Style("headerRowStyle")
  .withBackgroundColor(new DeviceRGBColor(.8f, .8f, .8f))
  .withFont(new Font(new StandardType1Font(document, StandardType1Font.FamilyEnum.Helvetica, true, false), 14d))
  .withTextAlign(XAlignmentEnum.Center);

/*
  The table will be included in a section.
*/
Section section = new Section("Hello World, this is PDF Clown!");

/*
  This is the actual table composition.
*/
Table table = new Table(
  /* Custom table style: we decide it has a wider, rounded border. */
  new Style().withBorder(
    new Border(
      new QuadColor(new DeviceRGBColor(0, 0, 0)),
      new QuadBorderStyle(BorderStyleEnum.Solid),
      new QuadLength(new Length(2)),
      new QuadCornerRadius(new Size(10))
      )
    ),
  /* Main header row */
  new Row(
    /* Base style (previously defined) */
    headerRowStyle,
    new Cell("Main Header 1 (ColSpan=2)").withColSpan(2),
    new Cell("Main Header 2")
    ),
  /* Main Row 1 */
  new Row(
    new Cell("Main Row 1 Cell 1"),
    new Cell("Main Row 1 Cell 2"),
    new Cell("Main Row 1 Cell 3")
    ),
  /* Main Row 2 (with subtable and custom background) */
  new Row(
    /*
      Custom row style: we decide it has a reddish background (NOTE: it will propagate to
      inner elements as this property is inheritable by default in table substructures).
    */
    new Style().withBackgroundColor(new DeviceRGBColor(1f, .3f, .3f)),
    new Cell("Main Row 2 Cell 1"),
    new Cell(
      /* Custom cell style: we decide to exclude padding in order to collapse subtable borders. */
      new Style().withPadding(new QuadLength()),
      /* Subtable */
      new Table(
        /* Custom subtable style */
        new Style().withFontSize(new Length(10)),
        /* Subtable header row */
        new Row(
          /* Custom style: we decide for a smaller font size than header base style */
          new Style(headerRowStyle).withFontSize(new Length(12)),
          "Subtable\rHeader 1",
          "Subtable\rHeader 2"
          ),
        /* Subtable row 1 */
        new Row("Subtable\rRow 1 Cell 1", "Subtable\rRow 1 Cell 2"),
        /* Subtable row 2 (with colspan) */
        new Row(
          new Cell("Subtable\rRow 2 Cell 1 (ColSpan=2)").withColSpan(2)
          )
        )
      ),
    new Cell("Main Row 2 Cell 3")
    ),
  /* Main Row 3 */
  new Row(
    new Cell("Main Row 3 Cell 1"),
    new Cell("Main Row 3 Cell 2"),
    new Cell("Main Row 3 Cell 3")
    ),
  /* Main Row 4 (with rowspan and custom background) */
  new Row(
    /*
      Custom row style: we decide it has a greenish background (NOTE: it will propagate to
      inner elements as this property is inheritable by default in table substructures).
    */
    new Style().withBackgroundColor(new DeviceRGBColor(0, 1, 0)),
    /* Cell with 3-row span */
    new Cell(
      new Paragraph(
        new Text("Main Row 4 Cell 1 (RowSpan=3)"),
        new Text(
          new Style().withFontSize(new Length(8)),
          " - This is additional text to test the vertical adjustment of cells spanning over"
            + " multiple rows (which is one of the trickiest parts of table layout management)"
          )
        )
      ).withRowSpan(3),
    new Cell("Main Row 4 Cell 2"),
    new Cell("Main Row 4 Cell 3")
    ),
  /* Main Row 5 */
  new Row(
    new Cell("Main Row 5 Cell 2"),
    new Cell("Main Row 5 Cell 3")
    ),
  /* Main Row 6 */
  new Row(
    new Cell("Main Row 6 Cell 2"),
    new Cell("Main Row 6 Cell 3")
    )
  );
section.add(table);

compositor.show(section);
compositor.close();

Some considerations about the code above:

  • Element construction: any composition element features a uniform set of constructors designed for compact definition. Here it is their parameter pattern:

    where style is the element’s style (either custom or class), children are the elements contained by the element.
  • Style: the resolved style of each element is a combination of multiple styles:
    1. a custom style (like HTML+CSS element.style)
    2. a base style (like HTML+CSS classes)
    3. a default style (like HTML+CSS styles defined through element type selector)
    4. an inherited style (like in HTML+CSS style inheritance)
  • Border style: PDF Clown supports CSS3-like border definitions (edge granularity, that is you can define each edge distinctly by color, width, style (solid, dashed, dotted…)), including rounded corners (bi-dimensional corner granularity, that is you can define each corner’s vertical and horizontal radius). Here it is an example:
Border style sample
Border style sample generated by PDF Clown

And this is the corresponding code:

Paragraph paragraph = new Paragraph(
  new Style()
    .withBackgroundColor(new DeviceRGBColor(105f/255, 255f/255, 124f/255))
    .withBorder(
      new Border(
        /* Border color (black top and bottom edges, red left and right edges) */
        new QuadColor(new DeviceRGBColor(0, 0, 0), new DeviceRGBColor(1, 0, 0)),
        /* Border style */
        new QuadBorderStyle(BorderStyleEnum.Dashed),
        /* Border width */
        new QuadLength(new Length(5)),
        /* Squared top-left and bottom-right corners, rounded top-right and bottom-left corners */
        new QuadCornerRadius(new Size(0), new Size(20))
        )
      )
    .withFont(new Font(new StandardType1Font(document, StandardType1Font.FamilyEnum.Courier, true, true), 24d))
    .withPadding(new QuadLength(new Length(10))),
  new Text("A fancifully bordered paragraph\rMade by PDF Clown "),
  new Image(
    new Style()
      .withSize(new Dimension(0, 50))
      .withVerticalAlign(LineAlignmentEnum.Bottom),
    document,
    "pdfclown.jpg"
    ),
  new Text("!")
  );
compositor.show(paragraph);

2.3. Lists

PDF Clown 0.2.0 supports lists with most of the CSS2 features (standard/custom, ordered/unordered markers).

List composition sample
List composition sample generated by PDF Clown

And this is the corresponding code:

import org.pdfclown.documents.Document;
import org.pdfclown.documents.contents.colors.DeviceRGBColor;
import org.pdfclown.documents.contents.composition.*;

. . .

DocumentCompositor compositor = new DocumentCompositor(document);

/*
  We decide that table cells sport a solid border by default (analogous
  to CSS styles defined through an element type selector).
*/
compositor.getStyle(Cell.class)
  .withBorder(new Border(
    new QuadColor(new DeviceRGBColor(0, 0, 0)),
    new QuadBorderStyle(BorderStyleEnum.Solid),
    new QuadLength(new Length(1)),
    new QuadCornerRadius()))
  .withPadding(new QuadLength(new Length(5)));

/*
  The list will be included in a section.
*/
Section section = new Section("Hello World, this is PDF Clown!");

/*
  This is the actual list composition.
*/
List list = new List(
  new ListItem("Item 1"),
  new ListItem("Item 2"),
  new ListItem(
    /* We decide that this list item has an arbitrary 5pt margin. */
    new Style().withMargin(new QuadLength(new Length(5))),
    "Item 3 (margin: 5pt)"
    ),
  new ListItem("Item 4"),
  new ListItem("Item 5"),
  new ListItem(
    /* We decide that this list item has a custom background color, border and padding. */
    new Style()
      .withBackground(new Background(new DeviceRGBColor(252f/255, 232f/255, 131f/255)))
      .withBorder(new Border(
        new QuadColor(new DeviceRGBColor(218f/255, 165f/255, 32f/255)),
        new QuadBorderStyle(BorderStyleEnum.Dotted),
        new QuadLength(new Length(2)),
        new QuadCornerRadius(new Size(5))
        ))
      .withPadding(new QuadLength(new Length(10))),
    new Paragraph("Item 6 (background, border, padding test + nested table)"
      + "\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor "
      + "incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud "
      + "exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure "
      + "dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. "
      + "Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt "
      + "mollit anim id est laborum."),
    /* Nested table. */
    new Table(
      new Row(
        new Cell("Cell1,1"),
        new Cell("Cell1,2"),
        new Cell("Cell1,3"),
        new Cell("Cell1,4")
        ),
      new Row(
        new Cell("Cell2,1"),
        new Cell("Cell2,2").withColSpan(2),
        new Cell("Cell2,4")
        )
      )
  ),
  new ListItem(
    new Paragraph("Item 7 (sublist test)"),
    /* Nested list. */
    new List(
      /* We decide this nested list sports circle markers. */
      new Style().withListStyle(new ListStyle(ListStyleTypeEnum.Circle)),
      new ListItem("Sublist Item 1"),
      new ListItem(
        new Paragraph("Sublist Item 2 (Sub-sublist with multiple custom markers mimicking an ordered list)"),
        /* Level-2 nested list (custom ordered markers). */
        new List(
          /*
            We decide this nested list sports a set of custom numerical symbols mapped
            as octal codes to ZapfDingbats character set (see PDF Reference 1.7, § D.5, http://www.adobe.com/devnet/pdf/pdf_reference.html).
          */
          new Style().withListStyle(new ListStyle(new char[]{0312, 0313, 0314, 0315, 0316, 0317, 0320, 0321, 0322})),
          new ListItem("Sub-sublist Item 1"),
          new ListItem("Sub-sublist Item 2"),
          new ListItem("Sub-sublist Item 3"),
          new ListItem("Sub-sublist Item 4"),
          new ListItem("Sub-sublist Item 5")
          )
        ),
      new ListItem("Sublist Item 3"),
      new ListItem(
        new Paragraph("Sublist Item 4 (Sub-sublist with decimal markers)"),
        /* Level-2 nested list (decimal markers). */
        new List(
          new Style().withListStyle(new ListStyle(ListStyleTypeEnum.Decimal)),
          new ListItem("Sub-sublist Item 1"),
          new ListItem(
            new Paragraph("Sub-sublist Item 2 (Sub-sub-sublist with lower-latin markers)"),
            /* Level-3 nested list (lower-latin markers). */
            new List(
              new Style().withListStyle(new ListStyle(ListStyleTypeEnum.LowerLatin)),
              new ListItem("Sub-sub-sublist Item 1"),
              new ListItem("Sub-sub-sublist Item 2"),
              new ListItem("Sub-sub-sublist Item 3"),
              new ListItem("Sub-sub-sublist Item 4"),
              new ListItem("Sub-sub-sublist Item 5")
              )
            ),
          new ListItem("Sub-sublist Item 3"),
          new ListItem("Sub-sublist Item 4"),
          new ListItem("Sub-sublist Item 5")
          )
        ),
      new ListItem("Sublist Item 5")
      ),
    new Paragraph("End of Item 7")
    ),
  new ListItem("Item 8")
  );
section.add(list);

compositor.show(section);
compositor.close();

2.4. Page Breaks

PDF Clown 0.2.0 supports CSS-like page breaks.

Page breaks sample
Page breaks sample generated by PDF Clown

And this is the corresponding code (lines 45 and 46 apply the page breaks):

import org.pdfclown.documents.Document;
import org.pdfclown.documents.contents.colors.DeviceRGBColor;
import org.pdfclown.documents.contents.composition.*;

. . .

DocumentCompositor compositor = new DocumentCompositor(document);

/*
  We decide that table cells sport a solid border by default (analogous
  to CSS styles defined through an element type selector).
*/
compositor.getStyle(Cell.class)
  .withBorder(new Border(
    new QuadColor(new DeviceRGBColor(0, 0, 0)),
    new QuadBorderStyle(BorderStyleEnum.Solid),
    new QuadLength(new Length(1)),
    new QuadCornerRadius()))
  .withPadding(new QuadLength(new Length(5)));
/*
  We decide to highlight inline code references through a dedicated style.
*/
Style codeStyle = new Style("code")
  .withBackgroundColor(new DeviceRGBColor(1, 1, 0))
  .withFontType(new StandardType1Font(document, FamilyEnum.Courier, false, false));

/*
  The contents will be included in a section.
*/
Section section = new Section("Hello World, this is PDF Clown!");
section.add(
  new Paragraph(
    new Text("This paragraph is the last content on this page as its next sibling is marked with CSS-like "),
    new Text(codeStyle, "page-break-before: always"),
    new Text(". Clean and simple!")
    )
  );
section.add(
  new Group(
    new Paragraph(
      /*
        Here it is the custom style applied to the isolated paragraph.
      */
      new Style()
        .withPageBreakAfter(PageBreakEnum.Always)
        .withPageBreakBefore(PageBreakEnum.Always),
      new Text("This paragraph is isolated on this page as we marked it with both CSS-like "),
      new Text(codeStyle, "page-break-before: always"),
      new Text(" and "),
      new Text(codeStyle, "page-break-after: always")
      ),
    new Table(
      new Row(
        new Cell("Cell1,1"),
        new Cell("Cell1,2"),
        new Cell("Cell1,3"),
        new Cell("Cell1,4")
        ),
      new Row(
        new Cell("Cell2,1"),
        new Cell("Cell2,2").withColSpan(2),
        new Cell("Cell2,4")
        )
      )
    )
  );
compositor.show(section);
compositor.close();

2.5. Advanced features (composition event listener and typographic goodies)

Advanced typesetting sample generated by PDF Clown
Advanced typesetting sample generated by PDF Clown

This is a demonstration of some of the fine typesetting capabilities of the new layout engine of PDF Clown (the code which generated the sample shown above is listed below):

  • composition event listener: DocumentCompositor notifies its relevant events to a dedicated listener (DocumentCompositor.DocumentListener), so you can apply custom logic when the engine requires a new page (onContextInit), begins to compose the current page (onContextBegin), ends to compose the current page (onContextEnd) and so on. In this demonstration (see code below, line 28) a margin note is added reacting to the end of the page layout.
  • drop caps: stylish initial letters work like a charm, you just need to float your letter to the left and choose its font and size, at your will (see code below, lines 96-99).
  • vertical fill property: have you ever found yourself trying to convince your traditional horizontally-flowing layout engine (like those HTML-based) to automatically place, for example, a paragraph aligned to the bottom of a page (like footnotes), or to center a title in the middle of a page? That’s often a somewhat tricky and brave deed, which typically results in some inglorious coding gymnastics, stretching here and there, or resorting to the awkward and infamous tables… PDF Clown features a specific style property (VerticalFill) which addresses this kind of situation in the most clean and simple way, vertically stretching the element box to cover the whole usable page area. In this demonstration (see code below, line 113) the paragraphs following the title are aligned to the bottom of the page.
import java.awt.Dimension;
import java.awt.geom.*;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.contents.composition.*;

. . .

DocumentCompositor compositor = new DocumentCompositor(
  /*
    NOTE: The compositor works along with a listener whose event
    callbacks can be customized.
    If you don't need any customization, you pass your document
    variable directly to the DocumentCompositor constructor (behind
    the scenes it instantiates the default listener implementation).
  */
  new DocumentCompositor.DocumentListener(document)
  {
    /*
      'onContextEnd' notifies that the layout on the current page
      has ended.
    */
    @Override
    public void onContextEnd(
      Event event
      )
    {
      showMarginNote(event.getSource());
      super.onContextEnd(event);
    }

    private void showMarginNote(
      DocumentCompositor compositor
      )
    {
      /*
        NOTE: In this example, we decided that when the page ends,
        a vertically-oriented note is placed on the right margin.
        NOTE: This lower-level construct (which works directly with
        BlockCompositor) will be replaced by high-level elements
        (paragraphs) as soon as absolute positioning will be available.
      */
      BlockCompositor block = compositor.getBase();
      GraphicsCompositor graphics = block.getBase();
      Dimension2D pageSize = compositor.getContext().getSize();
      Style pageStyle = compositor.getStyle();

      graphics.beginLocalState();
      graphics.rotate(
        90,
        new Point2D.Double(
          pageSize.getWidth()
            - pageStyle.getMargin().getRight().getValue(),
          pageSize.getHeight()
            - pageStyle.getMargin().getBottom().getValue() / 2
          )
        );
      block.begin(
        new Rectangle2D.Double(0, 0,
          pageSize.getHeight() / 2,
          pageStyle.getMargin().getRight().getValue()),
        XAlignmentEnum.Left,
        YAlignmentEnum.Middle
        );
      graphics.setFont(compositor.getStyle(null).getFontType(), 8);
      block.showText("Generated by PDF Clown on " + new java.util.Date());
      block.showBreak();
      block.showText("For more info, visit http://www.pdfclown.org");
      block.end();
      graphics.end();
    }
  }
  );

// Style definition.
compositor.getStyle()
  .withLineSpace(new Length(0))
  .withMargin(new QuadLength(new Length(50)));
compositor.getStyle(null)
  .withFont(new Font(
    org.pdfclown.documents.contents.fonts.Font.get(
      document,
      "TravelingTypewriter.otf"),
    14))
  .withTextAlign(XAlignmentEnum.Justify);
compositor.getStyle(Paragraph.class)
  .withMargin(new QuadLength(8, 0, 0, 0))
  .withTextIndent(new Length(24));
org.pdfclown.documents.contents.fonts.Font decorativeFont =
  org.pdfclown.documents.contents.fonts.Font.get(
    document,
    "Ruritania-Outline.ttf");
compositor.getStyle(Heading.class)
  .withFont(new Font(decorativeFont, 56))
  .withLineSpace(new Length(.25, UnitModeEnum.Relative));
Style firstLetterStyle = new Style("firstLetter")
  .withFloat(FloatEnum.Left)
  .withFont(new Font(decorativeFont, new Length(2, UnitModeEnum.Relative)))
  .withMargin(new QuadLength(0, 5, 0, 0));

// Content insertion.
Section section = new Section(
  new Heading(
    new Text("Chapter 1"),
    new Text(
      new Style().withFontSize(new Length(32)),
      "\nDown the Rabbit- Hole"
      )
    ),
  new Group(
    new Style()
      .withVerticalAlign(LineAlignmentEnum.Bottom)
      .withVerticalFill(VerticalFillEnum.FirstPage),
    new Paragraph(
      new Style().withTextIndent(new Length(0)),
      new Text(firstLetterStyle, "A"),
      new Text("lice was beginning to get very tired of sitting "
        + "by her sister on the bank, and of having nothing to do: "
        + "once or twice she had peeped into the book her sister "
        + "was reading, but it had no pictures or conversations in "
        + "it, 'and what is the use of a book,' thought Alice "
        + "'without pictures or conversation?'")
      ),
    new Image(
      new Style()
        .withFloat(FloatEnum.Right)
        .withMargin(new QuadLength(new Length(5)))
        .withSize(new Dimension(0,250)),
      document,
      "alice_white_rabbit.jpg"
      ),
    new Paragraph("So she was considering in her own mind (as well "
      + "as she could, for the hot day made her feel very sleepy and "
      + "stupid), whether the pleasure of making a daisy-chain would "
      + "be worth the trouble of getting up and picking the daisies, "
      + "when suddenly a White Rabbit with pink eyes ran close by her."),
    new Paragraph("There was nothing so VERY remarkable in that; nor "
      + "did Alice think it so VERY much out of the way to hear the "
      + "Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' "
      + "(when she thought it over afterwards, it occurred to her that "
      + "she ought to have wondered at this, but at the time it all "
      + "seemed quite natural); but when the Rabbit actually TOOK A "
      + "WATCH OUT OF ITS WAISTCOAT- POCKET, and looked at it, and then "
      + "hurried on, Alice started to her feet, for it flashed across "
      + "her mind that she had never before seen a rabbit with either a "
      + "waistcoat-pocket, or a watch to take out of it, and burning with "
      + "curiosity, she ran across the field after it, and fortunately "
      + "was just in time to see it pop down a large rabbit-hole under the "
      + "hedge.")
    )
  );
compositor.show(section);
compositor.close();
Layout areas revealed
Layout areas revealed

The layout process works balancing concurring constraints: the picture above reveals how this composition takes place (for each content element, the gray dashed shape represents the potential frame while the green shape represents the actually-occupied area).

3. Form flattening

A request from a user on Stack Overflow urged the implementation of an Acroform flattener to convert field annotations into static representations for content consolidation. Here it is an example of its use:

import org.pdfclown.documents.Document;
import org.pdfclown.files.File;
import org.pdfclown.tools.FormFlattener;

. . .

File file = null;
try
{
  // 1. Opening the PDF file...
  file = new File(myFilePath);
  Document document = file.getDocument();

  // 2. Flatten the form!
  FormFlattener formFlattener = new FormFlattener();
  formFlattener.flatten(document);

  // 3. Serialize the PDF file!
  file.save(SerializationModeEnum.Standard);
}
finally
{
  // 4. Closing the PDF file...
  if(file != null)
  {file.close();}
}

4. Annotations

Annotations have been enhanced in several ways, introducing Markup support, fluent interface, standard and custom rubber stamp appearance embedding…

Embedded rubber stamps generated by PDF Clown
Embedded rubber stamps generated by PDF Clown

Here it is the code which generated the stamps shown above:

import java.awt.Color;
import java.awt.Dimension;
import java.awt.Point;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.contents.colorSpaces.DeviceRGBColor;
import org.pdfclown.documents.contents.fonts.Font;
import org.pdfclown.documents.interaction.annotations.Stamp;
import org.pdfclown.documents.interaction.annotations.styles.StampAppearanceBuilder;
import org.pdfclown.files.File;

. . .

File file = new File();
Document document = file.getDocument();

Page page = new Page(document);
document.getPages().add(page);

// Standard rubber stamps.
// Define the standard stamps template path!
/*
  NOTE: The PDF specification defines several stamps (aka "standard
  stamps") whose rendering depends on the support of viewer applications.
  As such support isn't guaranteed, PDF Clown offers smooth, ready-to-use
  embedding of these stamps through the StampPath property of the document
  configuration: you can decide to point to the stamps directory of your
  Acrobat installation (e.g., on my GNU/Linux system it's located in
  "/opt/Adobe/Reader9/Reader/intellinux/plug_ins/Annotations/Stamps/ENU")
  or to the collection included in the distribution (std-stamps.pdf).
*/
document.getConfiguration().setStampPath(new java.io.File("/opt/Adobe/Reader9/Reader/intellinux/plug_ins/Annotations/Stamps/ENU")));

// Add a standard stamp, rotating it 15 degrees counterclockwise!
new Stamp(
  page,
  new Point(485, 515),
  null, // Default size is natural size.
  "This is 'Confidential', a standard stamp",
  Stamp.StandardTypeEnum.Confidential
  ).withRotation(15)
   .withAuthor("Stefano")
   .withSubject("Standard stamp");

// Add a standard stamp, without rotation!
new Stamp(
  page,
  new Point(485, 580),
  null, // Default size is natural size.
  "This is 'SBApproved', a standard stamp",
  Stamp.StandardTypeEnum.BusinessApproved
  ).withAuthor("Stefano")
   .withSubject("Standard stamp");

// Add a standard stamp, rotating it 10 degrees clockwise!
new Stamp(
  page,
  new Point(485, 635),
  new Dimension(0, 40), // This scales the width proportionally to the 40-unit height (you can obviously do also the opposite, defining only the width).
  "This is 'SHSignHere', a standard stamp",
  Stamp.StandardTypeEnum.SignHere
  ).withRotation(-10)
   .withAuthor("Stefano")
   .withSubject("Standard stamp");

// Custom rubber stamps.
Font stampFont = Font.get(document, "TravelingTypewriter.otf");
new Stamp(
  page,
  new Point(75, 570),
  "This is a round custom stamp",
  new StampAppearanceBuilder(
    document,
    StampAppearanceBuilder.TypeEnum.Round,
    "Done",
    50,
    stampFont
    ).build()
  ).withRotation(-10)
   .withAuthor("Stefano")
   .withSubject("Custom stamp");

new Stamp(
  page,
  new Point(210, 570),
  "This is a squared (and round-cornered) custom stamp",
  new StampAppearanceBuilder(
    document,
    StampAppearanceBuilder.TypeEnum.Squared,
    "Classified",
    150,
    stampFont
    ).withColor(DeviceRGBColor.get(Color.ORANGE))
     .build()
  ).withRotation(15)
   .withAuthor("Stefano")
   .withSubject("Custom stamp");

Font stampFont2 = Font.get(document, "MgOpenCanonicaRegular.ttf");
new Stamp(
  page,
  new Point(350, 570),
  "This is a striped custom stamp",
  new StampAppearanceBuilder(
    document,
    StampAppearanceBuilder.TypeEnum.Striped,
    "Out of stock",
    100,
    stampFont2
    ).withColor(DeviceRGBColor.get(Color.GRAY))
     .build()
  ).withRotation(90)
   .withAuthor("Stefano")
   .withSubject("Custom stamp");

file.save("MyFile.pdf", SerializationModeEnum.Standard);

5. Automated object stream compression

Object streams [PDF:1.7:3.4.6] and cross-reference streams [PDF:1.7:3.4.7] have been switched from manual to automatic compression: till version 0.1.2.0 full PDF compression relied on the client’s choice of which data objects to aggregate into object streams; now all this process is transparent to the client and affects all the legally-compressible data objects.

import org.pdfclown.files.File;
import org.pdfclown.files.XRefModeEnum;

. . .

File file = new File();

. . .

// Set full compression for the serialization of this file!
file.getConfiguration().setXRefMode(XRefModeEnum.Compressed);
file.save("MyFile.pdf", SerializationModeEnum.Standard);

PDF Clown 0.1.3 — Document Inspector

NOTE — As ContentScanner class is under refactoring, the development of 0.1.3 version is temporarily frozen.

1. Document Inspector

Since its earliest versions, PDF Clown has been shipped including a simple Swing-based proof of concept for viewing PDF file structures. Now that little fledgling is going to become a comprehensive tool for the visual editing of the structure of PDF files: PDF Clown Document Inspector. It was initially planned to be part of 0.1.2 version as a dedicated project within the PDF Clown distribution, but approaching the release deadline it wasn’t ready yet.

This tool conforms to the PDF model as defined by PDF Clown (see the diagram above), which adheres to the official PDF Reference 1.7/ISO 32000-1. This implies that a PDF file is represented through several concurrent views which work at different abstraction levels: Document view (document layer), File view (file/object layer, hierarchical) and XRef view (file/object layer, flat).

Continue reading PDF Clown 0.1.3 — Document Inspector

PDF Clown 0.1.2 released

This release enhances several base structures, providing fully automated object change tracking and object cloning (allowing, for example, to copy page annotations and Acroform fields). It adds support to video embedding, article threads, page labels and several other functionalities.

This release may be downloaded from:
https://sourceforge.net/projects/clown/files/PDFClown-devel/0.1.2%20Beta/

What about screencasts on PDF Clown use?

I’m considering to make screencasts on the use of the library.

Topics are still under definition: what would you like to see in action?

Unleash your curiosity and let me know!

PS: I use open-source IDEs only, so don’t expect me to tweak around with proprietary tools like MS Visual Studio… 😉

PDF Clown 0.1.2 — Multimedia and lots of good stuff

LATEST NEWS — On February 10, 2013 PDF Clown 0.1.2 has been released!

This release cycle revolves around these topics:

  1. Multimedia
  2. Text line alignment
  3. File references (file specifications, file identifiers, PDF stream object externalization)
  4. Advanced cloning
  5. Article threads

1. Multimedia


For a long time I kept low priority over multimedia features (chapter 9 of PDF Reference 1.7), but recently I received some solicitation about that on the project’s forum… so yes, video embedding through Screen annotations is now ready!

Continue reading PDF Clown 0.1.2 — Multimedia and lots of good stuff

PDF Clown 0.1.1 released

NOTE — PDF Clown 0.1.1 has been superseded by PDF Clown 0.1.2

This release adds support to optional/layered contents, text highlighting, metadata streams (XMP), Type1/CFF font files, along with primitive object model and AcroForm fields filling enhancements. Lots of minor improvements have been applied too.

Last but not least: ICSharpCode.SharpZipLib.dll dependency has been removed from .NET implementation.

This release may be downloaded from:
https://sourceforge.net/projects/clown/files/PDFClown-devel/0.1.1%20Beta/

enjoy!

PDF Clown 0.1.1 — Text highlighting and lots of good stuff

LATEST NEWS — On November 14, 2011 PDF Clown 0.1.1 has been released!

Next release is going to introduce new exciting features (text highlighting, optional/layered contents, Type1/CFF font support, etc.) along with improvements and consolidations of existing ones (enhanced text extraction, enhanced content rendering, enhanced acroform creation and filling, etc.). This post will be kept updated according to development progress, so please stay tuned! 😉
These are some of the things I have been working on till now:

  1. Primitive object model enhancements
  2. Text highlighting
  3. Metadata streams (XMP)
  4. Optional/layered contents
  5. AcroForm fields filling

Continue reading PDF Clown 0.1.1 — Text highlighting and lots of good stuff

PDF Clown 0.1.0 released

LATEST NEWS — PDF Clown 0.1.0 has been superseded by PDF Clown 0.1.1

This release introduces support to cross-reference-stream-based PDF files (as defined since PDF 1.5 spec) along with page rendering and printing: a specialized tool provides a convenient way to convert PDF pages into images (aka rasterization). Lots of minor improvements have been applied too.

Last but not least: the project’s base namespace has changed to org.pdfclown

This release may be downloaded from:
https://sourceforge.net/projects/clown/files/PDFClown-devel/0.1.0%20Alpha/

enjoy!

PDF Clown 0.1.0 — XRef streams, content rasterization and lots of good stuff

NOTE — On March 4, 2011 PDF Clown 0.1.0 has been released!

Hi there!

New features currently under development that will be available in the next (0.1.0) release:

  1. Cross-reference streams and object streams
  2. Version compatibility check
  3. Content rasterization
  4. Functions
  5. Page data size (a.k.a. How to split a PDF document based on maximum file size)

It’s time to reveal you that I decided to consolidate the project’s identity (and simplify your typing life) changing its namespace prefix (it.stefanochizzolini.clown) in favor of the more succinct org.pdfclown: I know you were eager to strip that cluttering italian identifier! 😉

Last week I was informed that USGS adopted PDF Clown for relayering their topographic maps and attaching metadata to them. Although on a technical note it’s stated that its use will be only transitory, as they are converging toward a solution natively integrated with their main application suite (TerraGo), nonetheless its service in such a production environment seems to be an eloquent demonstration of its reliability. 8)

Continue reading PDF Clown 0.1.0 — XRef streams, content rasterization and lots of good stuff

PDF Clown 0.0.8: Q&A

LATEST NEWS — PDF Clown 0.0.8 functionalities are part of the latest release (PDF Clown 0.1.0). As 0.0 version series is under decommissioning, you’re warmly invited to adopt the current 0.1 version series. Thank you!

This post collects all the relevant information about issues and questions regarding PDF Clown 0.0.8.

If you have any doubt on topics not treated here, please apply your question to the Help forum.

1. ‘GoToExternalDestination’ class missing

See Topic 3836075 in the Help forum.

2. ‘xref’ keyword not found

See Topic 3434621 in the Help forum.

3. Unknown type: Comment

See Topic 3863926 in the Help forum.

4. Text line height

See Topic 3928380 in the Help forum.