Blog

Metadata can be as valuable as content - just ask the NSA

Posted by Christopher Hill on Jun 21, 2013, 11:00:00 AM

For the bulk of my 10+ years in various jobs involving content management I found myself routinely discussing metadata at work. I don't know if I ever recall the subject arising outside of that sphere. This month that has changed literally overnight. The breaking news on the National Security Agency's PRISM program - apparently focused on collecting metadata regarding telephone calls - has brought metadata to the attention of the general public. Suddenly I'm reading tutorials on the term metadata in newspapers and magazine, hearing it defined on network television, and finding opportunities to discuss metadata with my non-technical family, friends and acquaintences. I've often said that metadata can be just as important as the content it describes, and the PRISM program serves as an excellent example of this.

I find it sometimes dismaying that even today content management problems still neglect to give proper attention to the subject of metadata. Many of the RFPs I read have highly detailed inquiries regarding the ability to work with content including re-use, componentization, and other sophisticated capabilities. Most of the time, however, they make only a few cursory inquiries regarding metadata neglecting key capabilities that can adversely affect the ability to take advantage of some of these advanced content management capabilities.

Here are some of those neglected metadata capabilities that may have a major impact of your ability to solve your content management issues.

Extensibility

More times than I care to remember I have worked with systems that have very little flexibility to modify metadata requirements after they are deployed. These are often related to technical implementation issues relating to the underlying architecture. You know you are working in such a situation if adding a new metadata field requires system downtime, interventions of a team of database administrators, or the cooperation of a team of programmers.

In practice such systems tend to limit metadata to a few narrow cases and fields. This is sometimes tolerable with systems dedicated to specific tactical production duties or limited content repositories. But when trying to deploy solutions horizontally through an organization or adapt a production system to a broader set of requirements or tasks it is typically easier to just deploy another content management solution than adapt the existing one.

When I worked for a semantic text mining company we provided a tool that could extract a list of all of the important metadata values from a piece of text. This included things like lists of people, places, companies, landmarks, events, etc. present in the text. There was no artificial limit to how many of these things might be present. Unfortunately, when we worked with many companies we found their systems could only deal with a predefined number of items, and the user interfaces were not well-equipped to present or manage these lists of items.

Look for a content management system that has the flexibility to be modified over time. Inquire about how metadata is stored and the provisions to modifying and expanding it. Can metadata fields have any number of multiple values? Try to uncover these limitations in both storage as well as user interface.

Selective presentation

Another problem that plagues many systems is their assumption that all metadata fields are important to all users equally. In reality, most users are only concerned with - or even prepared to understand - a subset of the metadata on a piece of content. Compare the key interests of your editorial staff to that of your marketing team or legal department. Are you able to selectively present each potential viewer of content with appropriate metadata that serves their needs, without wading through a lot of information that isn't important to them? I have sat through many analysis meetings where representatives of every stakeholder gather around a table and spend hours, days or weeks trying to agree upon a comprehensive set of metadata.

Instead, look for your content management system to provide the capability to selectively present relevant metadata to different audiences. This should include the forms to view and edit metadata as well as a means to provide different search tools that make it easy to filter based on each category of users' requirements.Then you can maintain a large body of metadata appropriate to a wide range of requirements without overwhelming an individual user.

Context sensitivity

Today it is common for organizations ask for a content management system to provide them with the tools to re-use content. Yet even when the requirement is specified and delivered, in practice the capability often goes unused. One of the hidden reasons for this is in the inability of most systems to allow metadata to vary based on context.

In many situations a user may want to re-use a piece of content in a few places - but requires different metadata values in each place. As a simple example think of a photo caption. A caption appropriate in a novel may not work when the photo is used in a news article. Digital rights may be secured for the same photo multiple times, each needing its own context-specific representation. In countless situations like these, the inability to have some of the metadata vary by context means that users are forced to copy and paste content and cannot take advantage of the re-use tools provided by a system.

Even if the initial deployment of a system will not require contextual metadata you will probably find yourself wanting the capability at some point in the future.

Versioning... or not

Most content management inquiries will ask about versioning content. But what about metadata? In many systems, metadata is either a required part of the version history of content - meaning that every change to a metadata field creates a new version of the content. This may be desirable for many of the fields, but there are some cases where all of the versions created by metadata changes end up generating so much noise in the content's version history that those features become difficult or impossible to use.

At the other end of the spectrum are systems that do not version any metadata values. For important metadata values, this can be risky as there is no way to determine when or even if a metadata value was modified.

In reality, most organizations will have some metadata they want to keep in the version history and others that they do not. Unfortunately, it is generally after they have started putting a tool in production that most people find out the capabilties of their system.

These four metadata requirements are significant in their impact of content management and can have major implications on how a system is implemented and adapted over time. While not all of these are a necessity to a single solution, their absence may make it difficult or impossible to adapt or expand your system in the future.

You may be thinking of some other neglected metadata requirements you've run across. If so I'd like to hear about them.

Metadata Madness: What Publishers Already Knew

Posted by Barry Bealer on Jun 19, 2013, 8:45:00 AM

Metadata Madness: What Publishers Already Knew I find it almost comical that our mainstream media is latching onto (and blowing out of proportion) the report about the NSA pouring over phone records and other data. First, metadata is not new. It may have been disguised as health records, or school records, or whatever, but it is not new. People didn't care about the information in years past because it was secured and locked away on a printed piece of paper in a file cabinet at your doctor's office or at your child's school. Fast forward to today where many of our personal records, bills, and pretty much everything else is electronic and you have a massive amount of metadata. Yes, there is a massive amount of this metadata that lives in our world, and yes the NSA is not the only organization looking at it.

Twenty years ago when I worked at GE, we were hired by a well known large bank to develop a data mining system that would be able to forecast the likelihood of a person defaulting on a loan or missing a credit card payment. This system aggregated a ton of metadata including financial credit scores, loan payment history, economic status, etc. This was a commercial business, not the government, but why is this any different than the NSA using phone records to secure our country? Aren't both organizations (banks and NSA) invading our privacy? I am perplexed by our citizens who feel that our government is required to keep us safe, but don't want any inconveniences or intrusion in our lives. Meanwhile, public companies, advertisers, banks and pretty much every other large business is looking at your metadata to figure out your buying behavior. This is nothing new.

Up until a few weeks ago most people in the United States had no idea what metadata was and frankly, probably could care less because it was a techie thing. For most publishers, metadata is the backbone of their content. Publishers have invested heavily in metadata as their printed product revenue has evolved over time into electronic product revenue. We have touched on this subject several times over the past few years on this blog:

The Second Rule of Content Management: Enrich with Metadata - http://blog.reallysi.com/bid/92056/The-Second-Rule-of-Content-Management-Enrich-with-Metadata

Centralized Metadata, Content, and Assets: Paradise Lost - http://blog.reallysi.com/bid/41180/Centralizing-metadata-content-and-assets-Paradise-Lost-and-Regained

Metadata Lessons from Google Books - http://blog.reallysi.com/bid/40326/Metadata-lessons-from-Google-Books

Metadata management will continue to be a key part of their publishing and product development processes. This is one of the main reasons we developed RSuite CMS. There was a significant void in the CMS market when it came to both content and metadata management. We believe we have solved this issue with RSuite and welcome the opportunity to discuss our product with publishers who feel the need to more efficiently apply and manage metadata.

The recent elevation of the word "metadata" in the mainstream media probably has most publishers chuckling a bit, but the investment in metadata by publishers is very real and will continue as the ability to find content becomes ever more complex.

Metadata can be as valuable as content - just ask the NSA

Metadata Madness: What Publishers Already Knew

Comment below

Subscribe by Email

Most Popular Posts

Browse by Tag

Posts by Month

Follow Me