Monday, February 10, 2014

Strategy for Developing a Taxonomy

By David Shaw, pmp, pcip

Tutorials available on Youtube


The strategy for developing a taxonomy should start with a statement of goals and purpose. Pick the objectives the taxonomy will support and limit the scope:
  • Limit domains, users, strategies
  • Identify target audiences, contributors, stake-holders, content sources, volume, overall objectives and related strategies
  • Do a “needs analysis”

The target audiences, contributors, stake-holders, content sources, and volumes should be identified in an environmental scan and inventory.

A needs analysis focuses on the requirements related to the goals, aspirations and needs of the users and/or the user community and feeds them into the taxonomy analysis process. The main purpose of needs analysis is the user's satisfaction. The output of this should be a statement of scope and objectives to establish a framework for the development of the taxonomy.

Once the business need is understood, a ‘Build’ or ‘Buy’ decision can be made based on criteria such as these:
  • Does a standard already exist
  • Can we use it as-is
  • Can we adapt it
  • Should we develop our own

Every case is different but in general a standard should be used as-is. Some users are probably already familiar with the standard and changing it in any way will sow confusion. Maintenance of an adapted version may be difficult if the standard changes. If the adaptation is extensive, the effort involved may be as much as developing a purpose-built (bespoke) taxonomy.

A good taxonomy should have these characteristics with respect to the established goals and purposes:

Usability is a quality attribute assessing the ease of use and learnability of a human-made object. Key quality components in usability are learnability, efficiency, memorability, error propensity and feelings of satisfaction.

Manageability refers to objects that can be managed and are governable; easily handled, worked and shaped; and can be contrived readily to meet needs.

Flexibility is the capability to meet diverse needs, both now and in the future.

Comprehensiveness is the scope of coverage, the degree to which it meets the needs. This may include the expansion of terms using a thesaurus or other mechanisms to address differences in the usage of terminology.

Effectiveness is the capability of producing a desired result. When something is deemed effective, it means it has an intended or expected outcome. 

Efficiency is the extent to which time, effort or cost is well used for the intended task or purpose. The web three-click rule is an example of an objective in efficiency.

Best practices for developing taxonomy are:
  • Develop hierarchy
  • Normalize
  • Review
  • Test
  • Iterate

 
The process of developing a hierarchy is:
  • Identify and involve stakeholders, subject matter experts (SME) and end-users/customers.
  • Develop a consensus through iteration, and develop an evaluation plan and criteria.
  • Develop a governance framework and decide how to manage the life-cycle, change requests and history.
  • Conduct an audit to determine what taxonomies, tags, keywords, and controlled vocabularies are already in use internally; how content is generated, where it is located and used; how is the life-cycle of the content managed; and what are the business processes and workflow. Include any known functional constraints in any planned application for managing content.
  • Draft a high-level architecture using knowledge gleaned in the audit and from research in external resources. Develop a broad, shallow taxonomy with no more than three levels and organized around major domains. Establish whether terms are meaningful and reconcile language issues and terminology. Balance the taxonomy and metadata – this is where art comes into play. In the absence of consensus, create a thesaurus – but note that planned applications might not support a thesaurus.
  • Normalize the tree by aggregating likes. Remove duplicates and merge terms. Standardise terms and flatten the tree. Ideally it should be no more than 10-12 siblings wide and 3-4 levels deep; although scientific and engineering taxonomies are likely to be much larger. What they gain in coverage they sacrifice in effectiveness and efficiency. (Large models are less effective for many end-users because they must have a deeper knowledge of the terminology in the domain.)
  • Review the resulting model with Subject Matter Experts, peer review and voice of the customer reviews. Test the taxonomy by prototyping applications in desktop exercises, and Wash Rinse and Repeat as many times as necessary to get it right. At least four iterations are usually required.