Choosing an XML Model

Posted by Mary McRae on Mar 24, 2017 10:17:16 AM

JATS FAQs.png

After seeing my infographic above, one of my colleagues asked me why I am so passionate about JATS (Journal Article Tag Suite NISO Z39.96) and its sister tag suite, BITS.  There are two big reasons: first, their suitability for the task, and second, the dedicated team of experts that designed them and continue to develop them.  That doesn’t mean I’m not an advocate for other tag suites.  I am, so long as they serve my passion about XML and using the right vocabulary for the job at hand.

The JATS standard has been developed for the STM (Scientific, Technical, and Medical) journal publishing community and contains the markup necessary for publishers to develop, manage, produce, and deliver research articles.  There are 3 flavors, listed from most to least restrictive:

  • Article Authoring (Orange in the infographic)
  • Journal Publishing(Blue in the infographic)
  • Journal Archiving and Interchange (Green in the infographic)

The vast majority of JATS users work with JATS Blue.  BITS, on the other hand, extends the JATS model for book-like structures and includes tags for front matter, back matter, parts, and chapters.  A third tag suite based on JATS is in use at ISO and is being standardized for use by international standards bodies and SDOs (Standards Development Organizations), which submit their standards to the international bodies for approval.

Which one should I use?

If you’re a journal publisher, any or all of these forms of JATS might be the right tag set for your journals.  It really depends on your particular situation.

Take authoring, for example.  For most journal publishers, content is written by researchers who don’t author in XML, so there’s little need for the JATS Authoring DTD.  These publishers might transform the original content (often supplied in Microsoft Word) to the JATS Journal DTD (Blue) after the content creation process.  Other publishers have found they need to restrict the way their authors create content and require them to use the JATS Authoring DTD (Orange).  Publishers converting an older, print-only back catalog may choose to use the JATS Archiving and Interchange DTD (Green), since its loose structure supports varying styles used over the years and does not require rework.

It’s not uncommon for a publisher to use multiple flavors of JATS and BITS and then to customize them.  What makes the most sense for your team depends on your content, your workflows, and the platforms that host your content; you may have to consider alternatives to JATS.

How do I choose the one DTD that works for everyone in my organization?

Sometimes you don’t choose only one!

It may make business sense to force everyone to use the same model, but it’s certainly not required.  Many of our clients use multiple DTDs effectively for different communities of users within their organizations–the key is how you build your environment.

Start by finding yourself a good consultant

IMG_1041.jpg

Contact me at mmcrae@rsicms.com and I can help you find a consultant.
  The best way to find out what you need is to hire a business analyst/content architect.  After he learns about your business objectives and does a thorough analysis of your content, workflows, and processes, he can recommend the best  vocabularies and customizations to be deployed at each stage of the process.  He can help you develop transforms to make your tag sets interoperable and to make it easy to go from your authoring DTD to whatever other publishing and delivery models your organization requires.

Invest in technologies to facilitate your choices

If you publish scientific articles, journals, or related content, there are a number of great tools on the market that support JATS and JATS-based DTDs out-of-the-box.  Others help you to build the transforms needed to maximize your efficiency and create seamless delivery between authoring, publishing, and archiving.  And of course, I can’t finish without pointing out the usefulness of a native-XML publishing automation system like RSuite to bring everything together.

RSuite Beyond CMS white background.gif

Useful Resources

Topics: DITA, XML, XML Schema, DTD, JATS, S1000D, BITS

Learn XML Schemas and DTDs in 5 minutes

Posted by Christopher Hill on Mar 22, 2011 12:56:00 PM

In my previous blog post I introduced XML in 5 minutes. As a follow up, here's another 5 minute lesson to understand what an XML Schema or DTD is and what it might mean to end users of XML-based systems.
In the previous post we created an XML document to describe a book. Recall that it used tags around the actual content to describe the content.
<book>
     <title>Alice's Adventures In Wonderland</title>
     <author>Lewis Carroll</author>
     <summary>This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.</summary>
</book>

We also learned how representing content in this way allows us to dramatically reduce the effort required to support multichannel publishing. It also helps a great deal with automation and moving content between systems or organizations as it eliminates some of the issues of file formatting.
What would happen to our stylesheet if someone decided to use different tags to label their content?
<book>
     <title>Alice's Adventures in Wonderland</title>
     <writer>Lewis Carroll</writer>
     <summary>This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.</summary>
</book>

What we called author in one document we called writer in another. This inconsistency might be small now, but if we didn't restrict what people named things in our XML we would have to support a potentially endless list of tags. In the previous article we wrote rules for how to make our books look good on a page. If we can't predict what tags (labels) people are going to use - such as author - then it becomes nearly impossible to reliably write rules.
So even though XML helps us get a consistent base format for content, we need more help to get predictability and consistency.
Enter the concept of a DTD or Schema. DTDs and Schemas are ways that systems can impose rules on the XML itself. You can describe what tags can be used, where they can be used, and put restrictions on the content of those tags. There are two different standards for describing these restrictions: Document Type Definition (DTD) and XML Schemas. We won't get into the syntax or pros and cons of the two approaches. For our 5 minute lesson we can just assume they both are ways to enforce consistent labeling of our tags in our XML documents.
Here in English is how we might communicate the requirements for our flavor of XML:
  1. Put everything inside a book tag. You can only have one of these.
  2. The first thing you put in a book is a title tag containing the title text. You cannot leave this out.
  3. The second thing you put in a book is an author tag containing the author name. You must have at least one author. If there are more, you can repeatedly add more tagged authors.
  4. After all the tagged authors, you can add a summary tag. This is optional - leave it out if you want. But you can have at most 1 summary.
This is essentially what a DTD or XML Schema does, although they do this in a language friendlier to computers.
DTDs/XML Schemas allow you to specify the rules for the structure of your XML documents
You can think of XML Schemas or DTDs as a means to create a template that all valid documents must follow

These rules can now be applied to the two examples above. The first example follows the rules, so we would say that the first XML document is valid. That means it conforms to the rules. The second document, when tested with the above rules, would be invalid. The presence of tagged content labeled "writer" is not allowed by the rules. 
In the XML world, XML Schemas or DTDs are used in a lot of scenarios, including:
  • XML editors know what is allowed by the rules and prevent writers from making mistakes
  • XML programs test incoming content and indicate when the rules are being broken, preventing formatting errors
  • XML stylesheets can be much more easily written as they only process valid content and don't have to worry about rulebreakers
  • If I want to merge my book content with yours, we can look at the rules and decide what adjustments will need to be made to bring our rules together
  • Industries can agree on the rules for types of content. So we might create a set of rules to represent newspaper articles, adopt it as an industry standard, enabling anyone to easily exchange newspaper articles without having to modify the content.
So when you hear someone rambling on about an XML Schema or DTD, they are really just talking about the rules governing how the particular XML document is to be structured.
That's XML Schemas and DTDs in 5 minutes. In the coming weeks watch the blog for more quick lessons on XML-related technologies.

Topics: publishing, XML, XML Schema, DTD, 5-minute-series

Comment below