Principles matter - A successful CMS implementation for publishers

Posted by Lisa Bos on Mar 25, 2011 2:42:00 PM

XML content management for publishersThe success of a CMS implementation is ultimately determined by thousands of individual choices made every day by team members - developers, PMs, analysts, and so on. Many of these choices are based on design principles with which business managers might strongly disagree but of which they are often unaware. Occasionally one of these decisions has inordinate impact - a complexity of design that results in a complexity of implementation, testing, and maintenance that is radically inappropriate relative to the value of the feature in question.

Managers who aren't deeply engaged with their teams tend to over-simplify such discussions by saying things like, "So-and-so engineer always makes things more complicated than they need to be." It's true some engineers are hard to manage (and that some managers start almost every new project with unrealistic expectations, and that some users won't give up complicated requirements). But I believe this difference in philosophy is also in play, and a PM or other manager would do well to understand and talk about how a team's varying philosophies impact its members' choices, and where change might be in order.

An interesting article on this topic by Eliot Kimber is here.

Such discussions might seem esoteric, but they absolutely are not.

Topics: publishing, CMS, publishing industry, XML CMS, project management

Learn XML Schemas and DTDs in 5 minutes

Posted by Christopher Hill on Mar 22, 2011 12:56:00 PM

In my previous blog post I introduced XML in 5 minutes. As a follow up, here's another 5 minute lesson to understand what an XML Schema or DTD is and what it might mean to end users of XML-based systems.
In the previous post we created an XML document to describe a book. Recall that it used tags around the actual content to describe the content.
<book>
     <title>Alice's Adventures In Wonderland</title>
     <author>Lewis Carroll</author>
     <summary>This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.</summary>
</book>

We also learned how representing content in this way allows us to dramatically reduce the effort required to support multichannel publishing. It also helps a great deal with automation and moving content between systems or organizations as it eliminates some of the issues of file formatting.
What would happen to our stylesheet if someone decided to use different tags to label their content?
<book>
     <title>Alice's Adventures in Wonderland</title>
     <writer>Lewis Carroll</writer>
     <summary>This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.</summary>
</book>

What we called author in one document we called writer in another. This inconsistency might be small now, but if we didn't restrict what people named things in our XML we would have to support a potentially endless list of tags. In the previous article we wrote rules for how to make our books look good on a page. If we can't predict what tags (labels) people are going to use - such as author - then it becomes nearly impossible to reliably write rules.
So even though XML helps us get a consistent base format for content, we need more help to get predictability and consistency.
Enter the concept of a DTD or Schema. DTDs and Schemas are ways that systems can impose rules on the XML itself. You can describe what tags can be used, where they can be used, and put restrictions on the content of those tags. There are two different standards for describing these restrictions: Document Type Definition (DTD) and XML Schemas. We won't get into the syntax or pros and cons of the two approaches. For our 5 minute lesson we can just assume they both are ways to enforce consistent labeling of our tags in our XML documents.
Here in English is how we might communicate the requirements for our flavor of XML:
  1. Put everything inside a book tag. You can only have one of these.
  2. The first thing you put in a book is a title tag containing the title text. You cannot leave this out.
  3. The second thing you put in a book is an author tag containing the author name. You must have at least one author. If there are more, you can repeatedly add more tagged authors.
  4. After all the tagged authors, you can add a summary tag. This is optional - leave it out if you want. But you can have at most 1 summary.
This is essentially what a DTD or XML Schema does, although they do this in a language friendlier to computers.
DTDs/XML Schemas allow you to specify the rules for the structure of your XML documents
You can think of XML Schemas or DTDs as a means to create a template that all valid documents must follow

These rules can now be applied to the two examples above. The first example follows the rules, so we would say that the first XML document is valid. That means it conforms to the rules. The second document, when tested with the above rules, would be invalid. The presence of tagged content labeled "writer" is not allowed by the rules. 
In the XML world, XML Schemas or DTDs are used in a lot of scenarios, including:
  • XML editors know what is allowed by the rules and prevent writers from making mistakes
  • XML programs test incoming content and indicate when the rules are being broken, preventing formatting errors
  • XML stylesheets can be much more easily written as they only process valid content and don't have to worry about rulebreakers
  • If I want to merge my book content with yours, we can look at the rules and decide what adjustments will need to be made to bring our rules together
  • Industries can agree on the rules for types of content. So we might create a set of rules to represent newspaper articles, adopt it as an industry standard, enabling anyone to easily exchange newspaper articles without having to modify the content.
So when you hear someone rambling on about an XML Schema or DTD, they are really just talking about the rules governing how the particular XML document is to be structured.
That's XML Schemas and DTDs in 5 minutes. In the coming weeks watch the blog for more quick lessons on XML-related technologies.

Topics: publishing, XML, XML Schema, DTD, 5-minute-series

Learn XML in 5 Minutes!

Posted by Christopher Hill on Mar 15, 2011 8:24:00 PM

As product manager for an XML-based CMS, often I find myself answering the question "What is XML?" Not necessarily a simple thing to answer in passing - entire books and courses are dedicated to the subject - but in about 5 minutes most content creators can at least understand at a basic level what XML is all about. Here is my 5 minute XML class to give a first pass at answering this question. At the end most people should have a rough idea of what XML is and can start thinking about its potential.
   
Take a look at the following text:
      
Alice's Adventures In Wonderland
by Lewis Carroll
This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.
      
Reading it, you probably can infer that the underlined text is a book title. The italicized text appears to be the author of the book (if you dropped the word "by"). Then there is additional text. If you are familiar with the book, you can tell that the text is not part of the book itself, but appears to be a short summary. We know all of this because of certain conventions we have been taught as well as contextual clues provided by the words themselves.
      
Let's look at the same text again:
     
Lewis Carroll
Alice's Adventures In Wonderland
This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.

Even though the text looks quite different, we probably could still successfully interpret the author's intent. Now, imagine you have to train a robot to understand these two cases. What rules might you create to "teach" the robot how to decide how to successfully parse these two text examples and understand what parts of the text are titles, author names, and summaries? Even for these two limited cases, the list of rules is going to be quite involved, and will include rules based on formatting, language, and convention. And a robot will have to know that somewhere in the text are the three pieces of content, otherwise it probably has little chance of determining the nature of what it is looking at (Darwin the Jeopardy-playing supercomputer excluded).

Even if all that was true, errors are likely to creep in. Do your rules allow the robot to understand that "by" is really a formatting cue indicating the following text is an author name? What about the meaning of an underline? Bold? Italics? Each example uses the same formatting cues but in different ways. 

When you write a document in a word processor or a desktop publishing program you mostly focus on how that text looks. Imagine after all that careful work making the text look right for your printer, you decide your content should appear on a Web site. Suddenly all that formatting needs to be re-worked to take advantage of the conventions of the Web. Now if you want to deliver to an electronic reader or a mobile phone your formatting may again need to be revisited. 

This is an expensive proposition when you start imagining all of the possible delivery channels that may require formatting changes. For many channels, all that formatting work needs to be scrapped and done again. 

The answer to this is XML: the extensible markup language. A short way to summarize the promise of XML is that it allows authors to indicate what the content is rather than how the content looks. XML does this using tags, essentially text-based labels that indicate what a particular piece of data might be. Here is the same example as XML. Even though you may not know anything about XML, you can make sense of this content.

<book>
<title>Alice's Adventures In Wonderland</title>
<author>Lewis Carroll</author>
<summary>This book tells the story of an English girl, Alice, who drops down a rabbit hole and meets a colorful cast of characters in a fantastical world called Wonderland.</summary>
</book>

Yes, angle brackets seem a little cold, but notice how they are labels making the meaning of the various pieces of text completely unambiguous. Notice how each tag (i.e. <book>) has a match (i.e. </book>). These pairs create a container. Everything appearing between matching tags is considered contained by them.

Now, imagine you want to deliver the content to a printed page, web site or mobile device. It is much easier now to write a set of rules that indicate how each should be formatted.

Rule for <book>:
start a new page
Rule for <title>:
one the first line, output the text as underlined,16 point Helvetica
Rule for <author>:  
start a new line, output the text in italic, 12 point Helvetica
Rule for <summary>: 
double-space, start a new line, output the text in 12 point Helvetica

These rules could be called a stylesheet. In reality, stylesheets are written in a more machine-friendly format that are beyond the scope of a 5 minute XML course, but this example should suffice to give you an idea of what a stylesheet's role in creating presentable XML content is. 

If I had hundreds, thousands or millions of book content items I could use this single stylesheet to output them all and be guaranteed consistency. 

If I decide I want to switch to a new look for my content, maybe using Garamond instead of Helvetica, then I just need to modify the stylesheet. Notice how the original book content exists completely independently of the formatting. And notice that changing the look of something doesn't require any editing of the original content.

Suppose I want to deliver my content to multiple channels. Each channel has its own conventions, limitations, and capabilities. People expect different formatting on a mobile phone than on their desktop computer. A web site typically looks different than the printed page. We used to underline book titles in print - but underline means something else on the web so often a different format is used. Helvetica may not even be an available font for many channels. 

Fortunately, XML makes supporting each unique channel straightforward. Instead of endlessly modifying my content, I simply develop new rules (a new stylesheet, or a variation of an existing stylesheet) for a new channel. The authoring process is unaffected. Existing content does not need to be re-edited to support the new channel. Instead, the new rules are applied to the content generating channel-appropriate output. And as the number of channels expand, XML serves as a content anchor point so that you can adapt content rapidly and in an automated way to the unique requirements of each channel.
XML as an anchor for multichannel publishing

If all of your content were clearly labeled by what it is, it is not nearly as daunting to support new output formats for new opportunities. Instead to a massive conversion of all your formatted content from one format to another, you just have to develop a new set of rules and you are ready to go.

That's XML in 5 minutes. At least as it impacts content creation and delivery. In the coming weeks I'll be posting additional quick lessons on XML-related technologies.

Topics: ebooks, publishing, XML, 5-minute-series

SharePoint: When did free begin to cost so much?

Posted by Barry Bealer on Mar 11, 2011 9:58:00 AM

when did free cost somuchI ran across an interesting blog post by Stephen Arnold who is a longtime industry analyst and consultant that spells out the hidden costs of a SharePoint implementation.  The post is here.  While free always looks better than paying for software, I think Stephen hits the spot with his assessment.  A free software framework such as SharePoint is great if you plan to keep requirements very simplistic.  As we all know from our own experiences with implementing content management, that is easier said than done.  If you have more complex requirements, maybe a packaged solution such as RSuite makes sense.  Your company's approach to projects, culture to build versus buy, and several other factors need to be considered before selecting a technology and embarking on a content management project.  Whatever approach you take, just be cognizant of hidden costs as Arnold pointed out.  Free software does not always mean it will be cheap to implement and maintain in the long run.

Topics: content management, Sharepoint

Are publishers working too hard to create PDF and eBooks?

Posted by Marianne Calihanna on Mar 9, 2011 4:42:00 PM

Visit us at Publishing Business and Expo We recently hosted a great webinar titled, "The Digital Integration: Your Content Anywhere, In Any Format, Anytime." We learned that a large number of publishers are still struggling with how XML fits into their organization.

While we view XML as the primordial soup in the publishing world, the reality is that a large number of publishing organizations do not have an XML early (or XML at all) workflow.

We can help.

Schedule a meeting with us at the nation’s largest event for book and magazine publishing executives: Publishing Business Conference and Expo and let us show you how some of the leading publishers are using RSuite and RSuite Cloud to stay ahead of the digital revolution.

Hint: XML is involved but you don't have to tell any of your authors or editors!

Topics: content management, publishing, XML, conference, Publishing Business Conference and Expo

Centralizing metadata, content and assets: Paradise Lost and Regained

Posted by Christopher Hill on Feb 23, 2011 5:46:00 PM

I've been working in content management for more than ten years, and thinking back over that time I realized that the dream of a true, standards-based, central repository for all of an organization's assets I naïvely espoused in the late 90s still hasn't become a reality except in the most narrow of applications. When I used to write and teach XML classes I was sure that open markup standards were going to revolutionize the way we created and managed assets. Around 2003 I started to become a bit disillusioned with my vision for content utopia. By 2008 I had all but thrown in the towel. Despite herculean efforts content kept worming its way into proprietary, tactical-level production systems and often was never seen nor heard from again, a victim of legacy of "fire and forget" publishing approaches common prior to the rise of the Internet.

 

Fortunately, just as I had resigned myself to living in a world of content silos, new strategic ways of managing content started to emerge that rekindled my ideals. The idea is more modest than my grandiose vision of pure standards I once embraced, but offers a new, more practical approach that can survive in the real world.

 

Rather than insist that every asset be centralized in a consistent, preferably open, format practicality may dictate that we instead work to build a centralized asset repository that shares common representations for all assets. The actual bits and bytes making up the asset (Word documents, InDesign files, photos, videos, etc.) can still be developed and stored in traditional systems where applicable, but a new system takes on the responsibility of cataloging relevant features and details about the asset in a centralized repository. So instead of insisting that every asset be physically managed in a central repository, we instead insist on the much more modest - and realistic - demand that all assets make relevant, common data and metadata available in a consistent format through a centralized system. This distinction means that rather than try to replace the tactical systems we use to create, manage or distribute content we instead develop a parallel, complementary content management strategy that reflects data in these systems and presents a common, consistent view of the asset regardless of type. 

 

So an image file may exist as a TIFF or PSD formatted file in a production system or on some hard drive somewhere, but the centralized repository maintains a record for this image with all of its relevant metadata and a standard image format readily accessible to any system (i.e. PNG, JPG in thumbnail and applicable preview formats). For a lot of applications, centralized lighter-weight representations of content is enough to create new products without returning to . For example, if I want to rapidly re-use images or stories on a new microsite, I don't have to resort to tracking down all of the content in its silos, but instead rely on these common representations to collect the assets together and send them into my Web CMS for the new microsite. Formats, conversions, and so forth can either be provided to the central system through traditional manual conversion or, preferably, through automated mechanisms built in to existing content workflows.

 

This sort of approach was attempted using search technologies at one time, but lacked an important ability to offer the depth of content management required to not just find the asset but also to be able to use and transform it. It gave us the ability to view the content but not any tools to do anything once we saw it. Search remains important, but a real central repository needs to actually have usable representations of content that can be managed, transformed and distributed as assets on their own. This requires a full content management system.

 

So my new vision of a centralized asset repository is not the end-all be-all "do everything" system that becomes impossible to design and build, it's a "do-some-things" central system that maintains some consistent, common format that can be readily transformed and transmitted and becomes an organization's strategic content reserve. It can answer questions like "what assets do we have about Egypt?" quickly, and serve as a baseline for those assets so that after finding them they can be used in our various tactical systems.

 

To build such a thing, consistent representations are needed. When looking for data standards we of course start with XML. When only a binary will do, ensuring that pointers are accurately maintained to the original assets and appropriate renditions of the binaries are created for things like the user interface of the central repository is an obviously useful model. Even if re-work is required the assets are already under active management. 

 

The RSuite Content Management System happens to be a great foundation for building shared, managed centralize repositories of content. The system is flexible, built on an XML standard database with a metadata model that can not only leverage existing metadata but also be extended in arbitrary ways to adapt to evolving requirements. It is built on open standards and is a good corporate citizen, ready to interoperate with existing systems. The native XML database and pointer management features ensure that consistent representations are available. This approach creates a solid foundation for a strategic, centralized asset repository. 

 

Part of my role as Product Manager for Really Strategies will be to focus on the ways that our existing clients have adopted XML-based content management. I'll be reporting in with our client success stories at building these content repositories here on the blog. 

 

Does your organization have a vision for managing content strategically? It’d be great hearing how others are working to address this challenge.

Topics: content management for publishers, publishing, CMS, best practices, XML, metadata

Metadata lessons from Google Books

Posted by Lisa Bos on Feb 15, 2011 9:24:00 AM

This Salon interview with Geoffrey Nunberg about Google Books' unfortunate use of metadata is fascinating as an illustration of why a publisher implementing a CMS should focus as much (maybe more) on metadata as on anything else. Bad metadata leads to all sorts of problems, and unfortunately it's a self-reinforcing problem - bad leads to worse as users repeat mistakes, act on inaccurate search results, and ultimately come to distrust the system. By "focus on metadata" I mean publishers implementing CMS should take care in:

  • modeling metadata
  • the creation of controlled lists and taxonomies
  • the design of automated and manual tools for assigning metadata
  • the development of automated validation tools to ensure quality
  • the development of search that leverages metadata
  • user interface design to make metadata easily visible in various contexts (browse, edit, search results, ...) to encourage consistent usage and metadata correction/entry whenever it's convenient to the user

Here's Nunberg's original article in the Chronicle of Higher Education from August 2009 and a related blog post. This topic is obviously fascinating at face value as well - as it relates to the usefulness of Google Books for different usages by different users with different expectations. The comments to Nunberg's article/blog posts illustrate effectively that smart, well-intentioned people strongly disagree on the value of metadata or of particular types of metadata as compared to the benefits of "simply" making content available through fulltext search. This basic disagreement often shows up during design projects for RSuite CMS implementation. Leaders within a publisher need to reach agreement about which metadata will truly be of value internally and to readers and about which types of usage are most important to support. They also need to determine the cost/benefit ratio (metadata is often relatively expensive to do right). If they can't reach such agreements, then it's also unlikely they will consistently and usefully build and leverage tools for metadata in the first place - thus leading to a self-fulfilling prophecy on the part of the fulltext-instead-of-metadata advocates.

Of course, there's also a role here for the technology vendor like Really Strategies - we need to make it as easy as possible for publishers to take the steps on the bulleted list at the top of this post, so that the human effort required to make metadata really valuable is also really efficient.

Topics: content management, publishing, metadata, Google Books

Top 5 Reasons CMS Projects Fail at Publishers

Posted by Barry Bealer on Feb 7, 2011 12:01:00 PM

RSuite - Content management for publishersAll of us have been involved at one point or another in our careers with that “death project” that just seems to lack any real conclusion and no one seems to know how or why it is in the state of limbo.  Vendors who serve the publishing industry have many reasons why a project is in jeopardy or has failed altogether including lack of proper resources, no project management discipline,  etc.   From a vendor’s perspective there are some telltale signs that were evident from the very beginning of the project but everyone overlooked them in the excitement of project kickoff.  Following are my top 5 reasons (from a vendor perspective) on why CMS projects fail at publishers:

  1. Solution/Technology was not the right fit – Almost no one will admit to selecting the wrong solution or technology.  We all know that buying technology is sometimes a mystery.  Some vendors are really good at selling a vision only to have the publisher realize in the middle of the project that reaching that vision is going to cost three times more than they budgeted.  In other cases publishers already have a preferred technology or product and force that product on all groups.  On more than one occasion we were told point blank by a larger publisher that we love your RSuite technology but corporate is forcing us to use Documentum because we bought a site license.  Economically that makes sense. But functionally this may be trying to shoehorn a technology that was built to manage documents into a publishing environment that requires content management.  Different requirements, different solution required.  It is that simple.  It is no surpise when we hear back from the publisher 9 months later that the project failed or the system is not being used because the system does not meet expectations.
  2. Buying a vision that is unattainable – Publishers get excited by vendor demos.  And they should!  What they are seeing in a demo is something they generally don’t have in place.  Some vendors are outstanding at selling a vision by demonstrating slick end-user applications.  The problem with this is that a publisher needs to ask the question “how much is it going to cost me to reach that vision?”  Seeing a technology vendor show really cool functionality does not mean there is a good business or production model behind it.  It is good to see demonstrations that push the envelope, but understand what it will take to implement such a vision (time, money, and business model changes).  I have seen several publishers purchase certain technology to build really cool end user applications only to have the technology sitting around because the vision was not attainable to begin with because it just cost way more than they could ever budget.  Investing in a vision is fine, just be able to break that vision down into logical, cost-effective projects.  Be realistic about what you would like to accomplish and what you can actually accomplish.
  3. Poor project budgeting – Along with vague requirements goes poor budgeting.  If you went to a home builder and said "I want a two story house built, give me a quote." How much confidence would you have in the quote you would receive back when the requirements for the house were so vague?  Well, from a vendor’s perspective, we get this level of vague requirements for a CMS on a regular basis and are expected to provide a budget to implement the software.  It is generally couched with “we are only looking for a ballpark.”  OK, great, but if you are looking for a ballpark price that you would have a low level of confidence in, why are you putting that ballpark price into your next fiscal years’ budget?  Immediately you are putting the project at risk.  Again, vague requirements will lead to ballpark estimates that can be misconstrued in budgets. There can be pressure on the vendor to implement a solution based on an unrealistic budget.  Because the system does not operate according to some vague vision there is a real risk of project failure and unhappy customers.  See the chain of events?
  4. Inherent conflict between IT and editorial  – My colleague, Lisa Bos, wrote several years back in one of our website newsletters that software development and editorial processes are in direct conflict with one another.  Think about it.  Software developers are used to an environment where they work up to the very last minute making changes on the fly and moving the system to production with an acceptable level of bugs.  The software is never 100%, but it is operational.  On the other hand, the editorial team has a defined process to complete edits on a deadline with the goal of 100% accuracy.  This inherent conflict between these operational approaches comes out during a CMS project implementation.  Understanding the cultural differences between the two organizations is important.
  5. No definition of CMS project success – Why do publishers implement CMSs?  There are many reasons of course, but how often are the goals of the CMS project discussed: during the budget cycle, RFP stage, kickoff only, never?  If you hold a CMS project kickoff meeting and ask the group the definition of success when the system is operational and no one in the room knows the answer, you have a problem.  How can a CMS project be successful if the project team does not know the measurement of success?  Installing a CMS is not a success criteria.  Managing XML better is not a measurable goal.   Re-using X% of content in new derivative products, or reducing time-to-market by X days are real, measurable success criteria.  One exercise I like to do at the project kickoff meeting is to make the team draft a press release announcing the completion of the project.  After the team gets over the silliness of the initial request, most teams have fun with the exercise and actually contribute to the writing of the press release.  This simple exercise allows the team to verbally communicate among peers what their interpretation of success is for the project.  If you cannot articulate the definition of project success for the CMS project from the outset, you may be in trouble of ever meeting expectations of management.  Know the success criteria, communicate the success criteria, and celebrate the success with your team.
There are many reasons that CMS projects fail, but over the past decade these five are top of mind.  You will not be able to avoid all of them, but recognizing an issue early on and addressing it will benefit you in the long run and make everyone happier because of the ultimate success your team will achieve.

Topics: content management for publishers, content management, CMS, project management, best practices, CMS project, Content Mangement Project Team, CMS Teams

Really Strategies Announces RSuite Cloud

Posted by Barry Bealer on Feb 1, 2011 1:10:00 PM

"Push-Button Publishing System” for Print, Web, and eBook Production in 70 Languages

rsuite cloud 200wWe are pleased to announce the availability of RSuite Cloud - a complete end-to-end hosted editorial and production system for book publishers.  If you are looking to shorten your book production time to market, want to publish to multiple channels (print, HTML, eBook) at once, and publish in 70 different languages, I suggest you take a look for yourself.  Online demo here.

Here is what one of our clients said about RSuite Cloud:

“We saw the time to produce PDF proofs drop from a week to just a few minutes. This improvement in productivity allowed us to dramatically shorten our production cycle and even recognize revenue in 2010 for a book that was originally scheduled for 2011," stated Stephen Driver, vice president of production services, Rowman & Littlefield Publishing Group. “We are excited about our ability to scale with this solution and the new scheduling flexibility that we could never have dreamed of in our old environment."

Topics: content management for publishers, content management, ebooks, CMS, CMS project, XML

Content Management Survey for DITA North America Conference

Posted by Marianne Calihanna on Jan 26, 2011 4:13:00 PM

laptops globeWhy business units can’t or don’t commit to a Content Management System (CMS)?

Our friends at the independent consulting firm, TechProse, are conducting a survey for a presentation at the spring DITA North America Conference. All professionals who are involved with content management are invited to participate. As a survey participant, you can receive the results via email. The results could be helpful if you're planning a content management initiative and need to put the R in your ROI!

Click here to take the survey.

Topics: content management, DITA, XML

Comment below