Choosing an XML Model

Posted by Mary McRae on Mar 24, 2017 10:17:16 AM

JATS FAQs.png

After seeing my infographic above, one of my colleagues asked me why I am so passionate about JATS (Journal Article Tag Suite NISO Z39.96) and its sister tag suite, BITS.  There are two big reasons: first, their suitability for the task, and second, the dedicated team of experts that designed them and continue to develop them.  That doesn’t mean I’m not an advocate for other tag suites.  I am, so long as they serve my passion about XML and using the right vocabulary for the job at hand.

The JATS standard has been developed for the STM (Scientific, Technical, and Medical) journal publishing community and contains the markup necessary for publishers to develop, manage, produce, and deliver research articles.  There are 3 flavors, listed from most to least restrictive:

  • Article Authoring (Orange in the infographic)
  • Journal Publishing(Blue in the infographic)
  • Journal Archiving and Interchange (Green in the infographic)

The vast majority of JATS users work with JATS Blue.  BITS, on the other hand, extends the JATS model for book-like structures and includes tags for front matter, back matter, parts, and chapters.  A third tag suite based on JATS is in use at ISO and is being standardized for use by international standards bodies and SDOs (Standards Development Organizations), which submit their standards to the international bodies for approval.

Which one should I use?

If you’re a journal publisher, any or all of these forms of JATS might be the right tag set for your journals.  It really depends on your particular situation.

Take authoring, for example.  For most journal publishers, content is written by researchers who don’t author in XML, so there’s little need for the JATS Authoring DTD.  These publishers might transform the original content (often supplied in Microsoft Word) to the JATS Journal DTD (Blue) after the content creation process.  Other publishers have found they need to restrict the way their authors create content and require them to use the JATS Authoring DTD (Orange).  Publishers converting an older, print-only back catalog may choose to use the JATS Archiving and Interchange DTD (Green), since its loose structure supports varying styles used over the years and does not require rework.

It’s not uncommon for a publisher to use multiple flavors of JATS and BITS and then to customize them.  What makes the most sense for your team depends on your content, your workflows, and the platforms that host your content; you may have to consider alternatives to JATS.

How do I choose the one DTD that works for everyone in my organization?

Sometimes you don’t choose only one!

It may make business sense to force everyone to use the same model, but it’s certainly not required.  Many of our clients use multiple DTDs effectively for different communities of users within their organizations–the key is how you build your environment.

Start by finding yourself a good consultant

IMG_1041.jpg

Contact me at mmcrae@rsicms.com and I can help you find a consultant.
  The best way to find out what you need is to hire a business analyst/content architect.  After he learns about your business objectives and does a thorough analysis of your content, workflows, and processes, he can recommend the best  vocabularies and customizations to be deployed at each stage of the process.  He can help you develop transforms to make your tag sets interoperable and to make it easy to go from your authoring DTD to whatever other publishing and delivery models your organization requires.

Invest in technologies to facilitate your choices

If you publish scientific articles, journals, or related content, there are a number of great tools on the market that support JATS and JATS-based DTDs out-of-the-box.  Others help you to build the transforms needed to maximize your efficiency and create seamless delivery between authoring, publishing, and archiving.  And of course, I can’t finish without pointing out the usefulness of a native-XML publishing automation system like RSuite to bring everything together.

RSuite Beyond CMS white background.gif

Useful Resources

Topics: DITA, XML, XML Schema, DTD, JATS, S1000D, BITS

HarperCollins and RSI Content Solutions Contribute EPUB 3 Transformation to DITA for Publishers

Posted by Sarah Silveri on Jan 14, 2015 12:55:00 PM

epub3 conversionHarperCollins Publishers today announced it has finalized the implementation of EPUB 3 for its e-book production; all new e-books are now being distributed in EPUB 3 with backward compatibility to EPUB 2. The company has implemented RSuite CMS workflows and EPUB transformation to allow for continuous support of the format. HarperCollins and RSI Content Solutions will contribute the EPUB 3 transformation to the DITA for Publishers open-source community so that others may benefit.

“Implementing EPUB 3 enables us to support industry standards while increasing accessibility for the disabled community,” said Leslie Hulse, vice president of business development. “We look forward to the enhanced benefits this new format will provide readers.”

“Working with RSuite allowed us to upgrade to an XML-based digital workflow that facilitates more automated e-book production,” said Leslie Padgett, senior director, global digital operations. “We are excited to contribute the work we’ve done on the transformation to the DITA for Publishers project so the community can benefit from our team’s good work.”  

“The joint team can be proud of the final work product that resulted from our close collaboration,” stated Paul Eisenberg, director, professional services at RSI Content Solutions. “HarperCollins has demonstrated leadership by contributing the EPUB 3 transform back to the DITA for Publishers community.”  

About RSI Content Solutions

RSI Content Solutions accelerates publishers’ revenue and profit growth through better content management. Since 2000, RSI Content Solutions has provided publishers, media companies, and technical publishers with award-winning software solutions that transform their businesses to provide clients with the ability to delivery content in any format, to any channel, at any time. For more information, please visit www.rsuitecms.com.

About HarperCollins Publishers

HarperCollins Publishers is the second largest consumer book publisher in the world, with operations in 18 countries. With nearly two hundred years of history and more than 65 unique imprints around the world, HarperCollins publishes approximately 10,000 new books every year, in over 30 languages, and has a print and digital catalog of more than 200,000 titles. Writing across dozens of genres, HarperCollins authors include winners of the Nobel Prize, the Pulitzer Prize, the National Book Award, the Newbery and Caldecott Medals and the Man Booker Prize. HarperCollins, headquartered in New York, is a subsidiary of News Corp and can be visited online at corporate.HC.com.

Topics: RSuite, DITA, HarperCollins, ePub3

DocZone | Managing Challenges on the Road to DITA

Posted by Christopher Hill on May 8, 2014 1:06:00 PM

DocZone DITALast week we attended the DITA North America conference in my home city of Seattle. It is always interesting to have the opportunity to interact with the DITA community. During a birds-of-a-feather lunch gathering I had some interesting conversation regarding the promise and perils of content re-use. Many of those in the conversation were new to DITA. As we discussed the challenges around content re-use in DITA I was reminded of the many pitfalls often encountered that often subvert the efficiencies promised by DITA.

The technical challenges of re-use

One of DITA's more attractive feature is its ability to provide a range of technical support for reusing content within or across publications. Traditional authoring and page layout tools tended to have limited support for re-use, and often the mechanisms for this support was proprietary. This is why more often than not these tools are underused in favor of copy and paste.

Re-use is baked into DITA

The DITA specification has effective, open concepts of re-use baked into the design. This means that using DITA for content authoring eliminates the technical hurdles limiting the re-use of content. It's topic-based approach encourages the creation of reusable content. Many organizations that move to DITA, then, are often surprised when after solving the technical hurdles content is still not widely re-used in their organizations.

Non-technical challenges

Many challenges to content re-use exist that are not technical. Unless these are addressed as part of a DITA project, wide re-use of content may remain an elusive vision. Here are some of the critical challenges that should at least be considered when trying to create an environment of content re-use.

Delivery assumptions

Most content delivered today is shifting from traditional print deliveries to a range of digital delivery formats. Yet when the content is authored it is often from the perspective of a single deliverable. Even though the tool may allow any section of a book to be re-used, you may find that the way the content is actually written prevents such use. Imagine a section of a technical manual that makes reference to other content contained in the manual. What happens if that content is re-used in a manual that does not contain the referenced section? And are such references required to be clearly tagged so that tools can provide assistance in managing such references when needed? Often this area of training is not well-addressed and authors continue to create content dependencies that are difficult to manage.

Authors who write content units only in the context of a deliverable will be tempted to write content that is dependent on the particular deliverable. You can't assume that they'll "figure it out" or that a tool can magically change such behaviors. You need to have a plan to train users not only on the technical use of the new tools but also the conceptual framework that DITA brings.

Oversimplification

Another temptation is to try to simplify your DITA implementation in an attempt to address potential dissatisfaction from your users. While it sounds like an XML editor that hides details and "acts just like Word" might be much easier for users to learn, if such a tool becomes an excuse to hide the details of DITA from the content creators then you may actually impede the realization of many of DITA's benefits. One of the important foundations of DITA is in separating content from presentation. This is at the heart of its powers to support single-source publishing.

In such a scenario, the concept of a WYSIWYG editor makes little sense. XML editors that provide a word-processor-like view are really better thought of as tools that present a style of XML useful for the authors in the creation of content. As such, when you design stylesheets for your authoring tool you probably do not want them matching your print output. Doing so risks obfuscating many important structures that are not visible in a specific print output but may be important foundations for re-use and linking. Instead of trying to match the authoring tool to a specific output format, you should consider creating styles for authors that make apparent the information they need to manage in order to create re-usable content. Metadata, conditional tagging and other supporting structures should be made apparent and easily accessed. If hidden behind a context menu or in an external panel users may find their use cumbersome. Worse, unexpected behaviors not readily apparent to someone re-using the content may occur. Such surprises often result in users avoiding the re-use of content they can't readily understand.

Access to properly configured tools and training is critical if you expect your users to take advantage of DITA's support for re-use. Not everyone needs to have a full set of content analysis skills in order to use a DITA system, but they do need to know the details of how your particular business environment is configured.

Lack of content management

Many DITA projects start using the DITA open toolkit and a file system or shared folder. But problems can occur specifically around re-use in such a scenario. Shared folders rarely provide controlled access to content. Content in directories often provides no mechanism for tracking versions, controlling updates, and ensuring that two users do not try to modify the same content (or overwrite others' content). When a small DITA project is rolled out to a larger set of users these problems worsen with increased content re-use. If a team runs into these problems they quickly begin creating copies of content anyway to avoid the problem — eliminating any potential benefits of content re-use.

Content management is also needed to support the organization of content components. Content management systems often provide the ability to support metadata and search features that make locating potential content for re-use much easier. Relying on a shared folder typically means that as the number of content components increases users' ability to find re-usable components decreases. At some point, users may decide its easier to write new content rather than try to deal with these challenges.

Unless you have a very small team that is tightly coordinated content management is an important tool for maximizing your investment in DITA.

Earning your wings

These examples represent just a few of the many potential challenges that can surprise newcomers to DITA. Some may be addressed through training. Some may improve with experience. But it is very hard for organizations new to DITA to make content decisions before they have sufficient experience. And there is often a chicken-and-egg scenario where the promise of DITA remains unrealized without an investment in tools to properly use it. Such investment is often risky, as it may require investment in integration, licenses and products unfamiliar to your organization.

In the last several years, however, tools like our own DocZone product are available that provide preconfigured integrated DITA tools on a hosted basis. For a monthly fee, you can license the use of DocZone's best-of-breed tools in an already integrated and configured environment. If your organization is new to DITA and/or content management such a solution can enable users to build experience on a professional platform without the cost typically associated with setting up such an environment yourself. Because you are in a proven, production-ready environment you are less susceptible to many of the problems typically encountered when setting up a new DITA environment. A hosted solution also can scale to address the requirements of small and large teams. The use of DITA also means that your content can be used with other DITA-based tools if needed.

Trying to achieve the full promise of DITA is a challenge whether you try learning in an ineffective shared-folder environment or try integrating an end-to-end DITA toolset for the first time. A hosted option like DocZone is a great alternative to taking on these challenges alone.

Topics: content management, DITA, DocZone

RSI's 5-minute-series: Learn DITA in 5 minutes

Posted by Christopher Hill on Oct 24, 2013 8:59:00 AM

5-minute-series; Learn Dita in 5 MinutesLately I’ve found myself doing more discussion of DITA, so it is time for another in the 5-minute-series. If you are new to XML it might be helpful to start with the previous two posts on XML and Schemas before continuing.

In the previous posts I discussed how XML isn’t a specific language, but is instead a set of rules governing the syntax of languages that may be invented. The invention of XML came out of a need to be able to describe content. Word processors and desktop publishers mostly focused on the formatting of content. When you create new content in these tools you do so as a part of the layout and formatting process. With XML, you instead try to describe what the content you are entering is, for example a paragraph, a chapter, a book, an article, a caption or whatever. 

XML provides a common syntax for creating languages to describe your content, but does not specify the actual grammar. As described in detail in the previous post in this series, XML Schemas or DTDs are used to specify the exact labels and grammar of a particular type of XML.

While you can invent your own labels and grammar based upon XML, doing so means that unless others adopt your format, you will have to customize editing tools to understand your particular vocabulary.

Instead of always creating a vocabulary from scratch, many users of XML instead adopt a shared standard. Standards exist to represent most any data you can think of, whether it be recipes, musical scores, articles, chapters, books or anything else. These standards can be shared, and tools can be created to create, edit, manage and format based on the standard. If a community exists around my particular flavor of XML, we can share tools and techniques that can mean reduced effort required to deploy content solutions.

DITA, an acronym for Darwin Information Typing Architecture, is an XML language that is extensible and can be adapted to a range of uses. DITA is based on the concept of topics. A topic is a unit of information that typically can be read in isolation or inserted into a larger document. In order to stream together topics, DITA uses the concept of a map file. A map file is simply an XML file that acts as a sort of table of contents stringing together a series of topic files.

The term “topic” is generic. DITA allows, however, the generic topic to be adapted to represent more specific structures. The basic DITA specification includes Concept, Task and Reference. These content units are more specific versions of the generic topic. They can be handled with special rules if you want. But if you don’t haven special rules, they can be also treated more generically as topics.

Benefits of a common vocabulary 

Having a common vocabulary means that users of the vocabulary can share information with each other and share tools and code used to handle the content. For example, if you use a DITA-based format, there are a number of editing tools that can be used to edit your content. Tools used to process the content can also be shared. For example, DITA includes the code and stylesheets needed to create PDF, HTML and other output formats, and the community is constantly evolving. New formats may appear and other DITA-based solutions can take advantage of the tools to support the new format without needing to modify their existing processes.

For DITA, the community provides the DITA Open Toolkit. This toolkit includes a variety of transformations that can take DITA content and render it in HTML, PDF, and other formats. It also provides an extensible architecture. If you have a customized version of DITA, you can create a plug-in that can enable DITA solutions to handle the specific requirements of your customizations. Toolkit plugins can be used to configure editing tools, extend the rules of DITA, or modify the included stylesheets used to render content so that they can account for a most specific vocabulary adapted from the base DITA stylesheets. Any DITA tool can process content even if it is based on proprietary extensions because all of those proprietary extensions are mapped to more generic DITA structures. So if I use a DITA-based vocabulary that defines a “chapter,” systems that do no understand “chapter” can always treat the encoded content as a more generic “topic.” 

So while XML is a set of rules for creating a particular language to encode your content, DITA is a particular language that was designed to be able to be extended to more specific uses that still share a common grammar. DITA provides a base set of stylesheets for rendering your content in a variety of specific formats. Many XML tools exist to process any DITA-based document, and most provide extension points so that you can adapt the tool to a more specific DITA-based language without having to start from scratch. 

Tools to edit DITA documents can edit any vocabulary derived from DITA without modification, and can be extended to support more specific vocabulary structures if desired. At RSI Content Solutions we have content management systems with support for DITA that provides a range of features that make it much quicker to deploy a DITA solution without starting from scratch. Our solutions allow editing, transformation, as well as the ability to reuse content in different contexts if needed. So while XML is a set of rules governing the structure of an infinite variety of languages, DITA is a topic-based XML language used for representing content. Although you can use DITA without any modifications, many organizations wish to encode content in less generic manner. DITA has the advantage of allowing more specific content structures to be derived from the existing generic structures if needed. This means that if you need to create an XML vocabulary you aren’t starting from scratch and you are providing a fallback mechanism for systems not aware of the specifics of your particular vocabulary.

Topics: DITA, XML, 5-minute-series

DITA for Practitioners Volume 1: Architecture and Technology

Posted by Marianne Calihanna on Apr 24, 2012 3:18:00 PM

Eliot Kimber | DITA For PractitionersEliot Kimber, senior solutions architect at RSI Content Solutions, spends most of his days helping publishers implement RSuite CMS, proselytizing DITA For Publishers, testing XML-to-EPUB3 conversions, and tending to his brood of hens and rooster. In his spare time he wrote a book: DITA For Practitioners, Volume 1, Architecture and Technology published by XML Press. On behalf of all his co-workers, congratulations Eliot!

Eliot has spent the past 25 years entrenched in generalized markup languages. He was one of the founding members of the XML Working Group, a co-editor of the HyTime standard (with Charles Goldfarb and Steve Newcomb), a long-time member of the XSL-FO working group, and most recently, a founding member of the DITA Technical Committee.
The goal of this book is to provide a detailed look at DITA aimed at engineers, tools builders, and content strategists – anyone who designs, implements, or supports DITA-based systems – as well as experienced DITA authors who want a deeper understanding of the technology they are using.

Click me

Click me

 

Topics: DITA, CMS for publishers, RSuite CMS, DITA for Publishers, Eliot Kimber

This Changes Everything: Content Management + DITA for Publishers

Posted by Marianne Calihanna on Mar 9, 2012 12:21:00 PM

 DITA for Publishers - content management comes alive

Publishers understand the value of XML but sometimes the cost of entry into an early XML workflow is difficult, expensive, and time consuming. The open source project DITA for Publishers combined with RSuite CMS changes everything.

DITA is a sophisticated XML-based application architecture for authoring, producing, and delivering information. While enthusiastically adopted in the TechDoc world, DITA is less understood among traditional publishing organizations.  Until now. DITA specialist Eliot Kimber and technology evangelist Christopher Hill are hosting a webinar in March that details how RSuite CMS and the open source project, “DITA for Publishers" is  the toolset that launches publishers into the XML world and why this is critical. DITA’s unique extensibility architecture makes it a better business value than any comparable XML alternative. Eliot and Chris' enthusiasm combined with a straight-forward approach to CMS and DITA, will have you starting to take DITA seriously.

Webinar details

Wednesday, March 21, 2012 2:00 PM - 3:00 PM EDT

How Successful Publishers Deliver Content: RSuite CMS & DITA for Publishers

Panelists: Eliot Kimber & Christopher Hill

Click me


Topics: content management, DITA, RSuite CMS, DITA for Publishers

Eliot Kimber and Christopher Hill to Speak at Intelligent Content Conference 2012

Posted by Marianne Calihanna on Feb 20, 2012 8:53:00 AM

Intelligent Content 2012

Really Strategies' employees Eliot Kimber and Christopher Hill will speak at this week's Intelligent Content Conference in Palm Springs, California. Intelligent Content is a 3-day learning experience designed to help attendees understand what is required to create intelligent content: content designed to be structurally rich and semantically categorized, automatically discoverable, reusable, reconfigurable, and adaptable to any future functionality.

Eliot Kimber, senior solutions architect at Really Strategies will present, “DITA for Publishers: Intelligent Content Starts Here.” In this session, Eliot will introduce his project DITA for publishers, and detail how DITA (Darwin Information Typing Architecture) can be the toolset that launches publishers into the XML world in a way that is affordable and easy.

Christopher Hill, vice president of product management, will take part in a software demonstration that illustrates how DocZone Book Publisher is used to automate print and ebook production, highlighting DocZone’s multilingual features as well as its ease of use. DocZone Book Publisher is also a gold sponsor of the event in Palm Springs, California.

Topics: content management, DITA, DocZone Book Publisher

DocZone celebrates 6 years as the leading cloud-based DITA content management solution for technical publishers

Posted by Marianne Calihanna on Dec 13, 2011 12:06:00 PM

DocZone is DITA content management for tech pubsDocZone is the industry’s first award-winning software as a service (SaaS) XML content management system designed for technical publishers. This month marks the 6th year of serving the DITA-based tech community. We're proud to have an annual 98% renewal rate among customers such as Texas Instruments, General Electric, Johnson & Johnson, Agfa Healthcare, Citrix, Epson America, Kyocera, TechProse, Unica Corporation, and many others.
DocZone provides a SaaS platform for authoring, editorial review, localization, and single-source publishing. TechProse, a full service technical writing, training, and instructional design consulting service company, reports
“Using the automated publishing to PDF and HTML5 Help available in DocZone, we decreased the cost to publish content to multiple output channels by 15%. Further, by taking advantage of DocZone’s DITA reuse features we decreased the cost to author and maintain content by an additional 35%. The overall savings per year to update the documentation is now 50% and growing as we mature our processes!"

Want to see DocZone for yourself?

Schedule demo

Topics: DITA, DocZone

Content Management Survey for DITA North America Conference

Posted by Marianne Calihanna on Jan 26, 2011 4:13:00 PM

laptops globeWhy business units can’t or don’t commit to a Content Management System (CMS)?

Our friends at the independent consulting firm, TechProse, are conducting a survey for a presentation at the spring DITA North America Conference. All professionals who are involved with content management are invited to participate. As a survey participant, you can receive the results via email. The results could be helpful if you're planning a content management initiative and need to put the R in your ROI!

Click here to take the survey.

Topics: content management, DITA, XML

Comment below