Metadata lessons from Google Books

Posted by Lisa Bos on Feb 15, 2011 9:24:00 AM

This Salon interview with Geoffrey Nunberg about Google Books' unfortunate use of metadata is fascinating as an illustration of why a publisher implementing a CMS should focus as much (maybe more) on metadata as on anything else. Bad metadata leads to all sorts of problems, and unfortunately it's a self-reinforcing problem - bad leads to worse as users repeat mistakes, act on inaccurate search results, and ultimately come to distrust the system. By "focus on metadata" I mean publishers implementing CMS should take care in:

  • modeling metadata
  • the creation of controlled lists and taxonomies
  • the design of automated and manual tools for assigning metadata
  • the development of automated validation tools to ensure quality
  • the development of search that leverages metadata
  • user interface design to make metadata easily visible in various contexts (browse, edit, search results, ...) to encourage consistent usage and metadata correction/entry whenever it's convenient to the user

Here's Nunberg's original article in the Chronicle of Higher Education from August 2009 and a related blog post. This topic is obviously fascinating at face value as well - as it relates to the usefulness of Google Books for different usages by different users with different expectations. The comments to Nunberg's article/blog posts illustrate effectively that smart, well-intentioned people strongly disagree on the value of metadata or of particular types of metadata as compared to the benefits of "simply" making content available through fulltext search. This basic disagreement often shows up during design projects for RSuite CMS implementation. Leaders within a publisher need to reach agreement about which metadata will truly be of value internally and to readers and about which types of usage are most important to support. They also need to determine the cost/benefit ratio (metadata is often relatively expensive to do right). If they can't reach such agreements, then it's also unlikely they will consistently and usefully build and leverage tools for metadata in the first place - thus leading to a self-fulfilling prophecy on the part of the fulltext-instead-of-metadata advocates.

Of course, there's also a role here for the technology vendor like Really Strategies - we need to make it as easy as possible for publishers to take the steps on the bulleted list at the top of this post, so that the human effort required to make metadata really valuable is also really efficient.

Topics: content management, publishing, metadata, Google Books

Comment below