Metadata can be as valuable as content - just ask the NSA

Posted by Christopher Hill on Jun 21, 2013 11:00:00 AM

For the bulk of my 10+ years in various jobs involving content management I found myself routinely discussing metadata at work. I don't know if I ever recall the subject arising outside of that sphere. This month that has changed literally overnight. The breaking news on the National Security Agency's PRISM program - apparently focused on collecting metadata regarding telephone calls - has brought metadata to the attention of the general public. Suddenly I'm reading tutorials on the term metadata in newspapers and magazine, hearing it defined on network television, and finding opportunities to discuss metadata with my non-technical family, friends and acquaintences. I've often said that metadata can be just as important as the content it describes, and the PRISM program serves as an excellent example of this.

I find it sometimes dismaying that even today content management problems still neglect to give proper attention to the subject of metadata. Many of the RFPs I read have highly detailed inquiries regarding the ability to work with content including re-use, componentization, and other sophisticated capabilities. Most of the time, however, they make only a few cursory inquiries regarding metadata neglecting key capabilities that can adversely affect the ability to take advantage of some of these advanced content management capabilities.

Here are some of those neglected metadata capabilities that may have a major impact of your ability to solve your content management issues.

Extensibility

More times than I care to remember I have worked with systems that have very little flexibility to modify metadata requirements after they are deployed. These are often related to technical implementation issues relating to the underlying architecture. You know you are working in such a situation if adding a new metadata field requires system downtime, interventions of a team of database administrators, or the cooperation of a team of programmers.

In practice such systems tend to limit metadata to a few narrow cases and fields. This is sometimes tolerable with systems dedicated to specific tactical production duties or limited content repositories. But when trying to deploy solutions horizontally through an organization or adapt a production system to a broader set of requirements or tasks it is typically easier to just deploy another content management solution than adapt the existing one. 

When I worked for a semantic text mining company we provided a tool that could extract a list of all of the important metadata values from a piece of text. This included things like lists of people, places, companies, landmarks, events, etc. present in the text. There was no artificial limit to how many of these things might be present. Unfortunately, when we worked with many companies we found their systems could only deal with a predefined number of items, and the user interfaces were not well-equipped to present or manage these lists of items.

Look for a content management system that has the flexibility to be modified over time. Inquire about how metadata is stored and the provisions to modifying and expanding it. Can metadata fields have any number of multiple values? Try to uncover these limitations in both storage as well as user interface.

Selective presentation

Another problem that plagues many systems is their assumption that all metadata fields are important to all users equally. In reality, most users are only concerned with - or even prepared to understand - a subset of the metadata on a piece of content. Compare the key interests of your editorial staff to that of your marketing team or legal department. Are you able to selectively present each potential viewer of content with appropriate metadata that serves their needs, without wading through a lot of information that isn't important to them? I have sat through many analysis meetings where representatives of every stakeholder gather around a table and spend hours, days or weeks trying to agree upon a comprehensive set of metadata.

Instead, look for your content management system to provide the capability to selectively present relevant metadata to different audiences. This should include the forms to view and edit metadata as well as a means to provide different search tools that make it easy to filter based on each category of users' requirements.Then you can maintain a large body of metadata appropriate to a wide range of requirements without overwhelming an individual user.

Context sensitivity

Today it is common for organizations ask for a content management system to provide them with the tools to re-use content. Yet even when the requirement is specified and delivered, in practice the capability often goes unused. One of the hidden reasons for this is in the inability of most systems to allow metadata to vary based on context.

In many situations a user may want to re-use a piece of content in a few places - but requires different metadata values in each place. As a simple example think of a photo caption. A caption appropriate in a novel may not work when the photo is used in a news article. Digital rights may be secured for the same photo multiple times, each needing its own context-specific representation. In countless situations like these, the inability to have some of the metadata vary by context means that users are forced to copy and paste content and cannot take advantage of the re-use tools provided by a system.

Even if the initial deployment of a system will not require contextual metadata you will probably find yourself wanting the capability at some point in the future. 

Versioning... or not

Most content management inquiries will ask about versioning content. But what about metadata? In many systems, metadata is either a required part of the version history of content - meaning that every change to a metadata field creates a new version of the content. This may be desirable for many of the fields, but there are some cases where all of the versions created by metadata changes end up generating so much noise in the content's version history that those features become difficult or impossible to use. 

At the other end of the spectrum are systems that do not version any metadata values. For important metadata values, this can be risky as there is no way to determine when or even if a metadata value was modified. 

In reality, most organizations will have some metadata they want to keep in the version history and others that they do not. Unfortunately, it is generally after they have started putting a tool in production that most people find out the capabilties of their system.

These four metadata requirements are significant in their impact of content management and can have major implications on how a system is implemented and adapted over time. While not all of these are a necessity to a single solution, their absence may make it difficult or impossible to adapt or expand your system in the future. 

You may be thinking of some other neglected metadata requirements you've run across. If so I'd like to hear about them.

Topics: content management for publishers, content management, metadata

Comment below