DocBook to ConTeXt XSL stylesheets

By Jan Tošovský on Apr 10, 2016

Let us introduce a set of stylesheets for conversion a DocBook XML source into the format of ConTeXt typesetting system, which can be further processed into a PDF output.

It is an alternative approach to a more common XML -> XSL-FO -> PDF route.

The main driver here is to offer a solution for publishing books (novels, proses, fiction) with the best available typographic quality from the DocBook XML source.

Sample output

To help embrace this technology a sample DocBook book has been released together with a real customization layer. For next steps see the Usage section below.

For your convenience, the result of an XSLT transformation can be downloaded directly, even in two forms:

  • Zip archive (0.3 MB) containing the ConTeXt source together with all the images. It needs to be unpacked and then converted into the PDF output as described in the Usage section (step #4).

  • Final PDF (3.9 MB).

This output needs further tweaks. Namely fixing lines exceeding the text area by setting the less strict alignment tolerance. However, majority of cases are caused by inability to properly hyphenate those dummy words. In the real content such occurrences should be rare.

The red grid is provided as a separate PDF layer and in e.g. Acrobat Reader it can be switched off. A drawback of this visualization are blue and yellow boxes shown in the back-of-the-book index. It will be improved in an upcoming ConTeXt version. Anyway, it is just a minor issue as this visualization is used for debugging purposes only.

Motivation

  • While the most natural conversion from XML to PDF is via XSL-FO intermediate markup, no XSL-FO engine offers advanced typographic features. This method is hence disqualified for book production with high typographic standards.

  • For professional workflows Adobe InDesign XML import capabilities can be employed, but even InDesign has its own limits, namely in footnotes processing.

  • Fortunately, there are TeX-based systems, which are flexible enough, yet open source. The most advanced seems to be the ConTeXt typesetting system. While there is a dedicated dbcontext project available, bringing a decent set of DocBook to ConTeXt XSL stylesheets, it doesn't reflect recent 10+ years of ConTeXt development.

This project started as dbcontext fork for one specific book. While these updates were supplied to the original dbcontext author, it has been decided to create a barely new set of stylesheets.

Main reasons

  • simplifying settting up the tools to the end user (not necessary to patch dbcontext with these updates)

  • utilizing the current DocBook XSL stylesheets distribution (localizations, string manipulations)

  • including verified stylesheets only

Main goals

  • to offer generating PDF outputs with advanced typographic features

  • utilizing the current DocBook stylesheets infrastructure

  • potentially become an integral part of DocBook stylesheets distribution

Known limitations

  1. Support for very narrow subset of elements, namely chapter, section, para, footnotes, images, tables, index and few others. While not numerous, still sufficient for majority of non-technical books.

  2. Advanced index features are not supported in DocBook v5.x as the syntax has changed since v4.x substantially.

Usage

  1. Integrate DocBook to ConTeXt XSL stylesheets into DocBook XSL stylesheets.

  2. Optionally make your own stylesheets customizations.

  3. Run the transformation via an XSLT processor.

  4. Convert the ConTeXt source into a PDF output.

Future

The long term plan is to continuously extend the element coverage to support more complex documents.

Feel free to speed-up this process by your pull requests :-)


Aknowledgement

  • Norman Walsh, Bob Stayton and Jirka Kosek (DocBook developers & evangelists)

  • Hans Hagen (ConTeXt developer)

  • Wolfgang Schuster (ConTeXt community supporter)

  • Ben Guillon (dbcontext developer)