Lately I’ve found myself doing more discussion of DITA, so it is time for another in the 5-minute-series. If you are new to XML it might be helpful to start with the previous two posts on XML and Schemas before continuing.
In the previous posts I discussed how XML isn’t a specific language, but is instead a set of rules governing the syntax of languages that may be invented. The invention of XML came out of a need to be able to describe content. Word processors and desktop publishers mostly focused on the formatting of content. When you create new content in these tools you do so as a part of the layout and formatting process. With XML, you instead try to describe what the content you are entering is, for example a paragraph, a chapter, a book, an article, a caption or whatever.
XML provides a common syntax for creating languages to describe your content, but does not specify the actual grammar. As described in detail in the previous post in this series, XML Schemas or DTDs are used to specify the exact labels and grammar of a particular type of XML.
While you can invent your own labels and grammar based upon XML, doing so means that unless others adopt your format, you will have to customize editing tools to understand your particular vocabulary.
Instead of always creating a vocabulary from scratch, many users of XML instead adopt a shared standard. Standards exist to represent most any data you can think of, whether it be recipes, musical scores, articles, chapters, books or anything else. These standards can be shared, and tools can be created to create, edit, manage and format based on the standard. If a community exists around my particular flavor of XML, we can share tools and techniques that can mean reduced effort required to deploy content solutions.
DITA, an acronym for Darwin Information Typing Architecture, is an XML language that is extensible and can be adapted to a range of uses. DITA is based on the concept of topics. A topic is a unit of information that typically can be read in isolation or inserted into a larger document. In order to stream together topics, DITA uses the concept of a map file. A map file is simply an XML file that acts as a sort of table of contents stringing together a series of topic files.
The term “topic” is generic. DITA allows, however, the generic topic to be adapted to represent more specific structures. The basic DITA specification includes Concept, Task and Reference. These content units are more specific versions of the generic topic. They can be handled with special rules if you want. But if you don’t haven special rules, they can be also treated more generically as topics.
Benefits of a common vocabulary
Having a common vocabulary means that users of the vocabulary can share information with each other and share tools and code used to handle the content. For example, if you use a DITA-based format, there are a number of editing tools that can be used to edit your content. Tools used to process the content can also be shared. For example, DITA includes the code and stylesheets needed to create PDF, HTML and other output formats, and the community is constantly evolving. New formats may appear and other DITA-based solutions can take advantage of the tools to support the new format without needing to modify their existing processes.
For DITA, the community provides the DITA Open Toolkit. This toolkit includes a variety of transformations that can take DITA content and render it in HTML, PDF, and other formats. It also provides an extensible architecture. If you have a customized version of DITA, you can create a plug-in that can enable DITA solutions to handle the specific requirements of your customizations. Toolkit plugins can be used to configure editing tools, extend the rules of DITA, or modify the included stylesheets used to render content so that they can account for a most specific vocabulary adapted from the base DITA stylesheets. Any DITA tool can process content even if it is based on proprietary extensions because all of those proprietary extensions are mapped to more generic DITA structures. So if I use a DITA-based vocabulary that defines a “chapter,” systems that do no understand “chapter” can always treat the encoded content as a more generic “topic.”
So while XML is a set of rules for creating a particular language to encode your content, DITA is a particular language that was designed to be able to be extended to more specific uses that still share a common grammar. DITA provides a base set of stylesheets for rendering your content in a variety of specific formats. Many XML tools exist to process any DITA-based document, and most provide extension points so that you can adapt the tool to a more specific DITA-based language without having to start from scratch.
Tools to edit DITA documents can edit any vocabulary derived from DITA without modification, and can be extended to support more specific vocabulary structures if desired. At RSI Content Solutions we have content management systems with support for DITA that provides a range of features that make it much quicker to deploy a DITA solution without starting from scratch. Our solutions allow editing, transformation, as well as the ability to reuse content in different contexts if needed. So while XML is a set of rules governing the structure of an infinite variety of languages, DITA is a topic-based XML language used for representing content. Although you can use DITA without any modifications, many organizations wish to encode content in less generic manner. DITA has the advantage of allowing more specific content structures to be derived from the existing generic structures if needed. This means that if you need to create an XML vocabulary you aren’t starting from scratch and you are providing a fallback mechanism for systems not aware of the specifics of your particular vocabulary.