Sunday, December 16, 2012

Taxonomies & Their Cousins (Revised)


Taxonomy and metadata are closely related. A general understanding of different information schemes for organizing and classifying content helps in the design of a taxonomy. Best practice guidelines lead to a better result.

This is a revision of a blog article previously published here.

A PDF of the complete revised article is available by email from the author.

Saturday, December 15, 2012

Translations for Global Markets

On any multilingual project, translation is always the long tail. Same day release to all global markets can be achieved using controlled language, machine translation, and translation-management systems.

English documentation predominates around the world yet more than 50% of readers of English web pages have English as a second language. Many eLearning courses, web pages and technical documents are available in English only, or the other languages are delayed by many months or even years. On any multilingual project, translation is always the long tail. Translation cannot be done until the source language is finalized. However the tail can be shortened if we do translations concurrently.



A PDF of the complete article is available by email from the author.

Thursday, December 13, 2012

Basic Content Re-Use

Abstract 

This white paper outlines some of the metadata principles and relevant standards for re-using content in an eLearning application; and for moving from Flash to responsive mobile content. A marker like this lcms indicates a reference to a required function in an LCMS.

A PDF of this article inclduing a worked example is available by email from the author.

Introduction 

 Industrial-scale (~1 million text objects) content re-use is well established in XML content-management systems such as Astoria, Ixiasoft, Lenya, MarkLogic, SigmaLink, SyberSafe, Vasont, XPP Professional Publisher and others. Apart from graphics, it has not been significant in learning content-management systems (LCMS) because not much knowledge and expertise has transferred between these sectors, especially about automated assembly, synchronous translation and channel distribution.

Link management (discussed later) is an obvious requirement for re-use that is a missing function in most LCMS.

 Most LCMS content re-use has been based on manual operations, often using multiple copies of content assets. This creates a huge administrative burden and opportunities for error when revisions are necessary.

 The other limiting factor is a lack of training and experience in both XML and content re-use in the eLearning development, human-resources and information technology (IT) communities. Content re-use is straightforward when a CMS has been set up in a user-friendly way. But the design of the content architecture to achieve this is complex and somewhat of a black art.

Only a few LCMS vendors such as Giunti, Outstart Evolution, OutStart ForceTen, Thinking Cap and Xyleme are capable of supporting such a content architecture, and there are very few information architects who know how. There are also very few IT departments who understand the technology – pushing SharePoint is not the answer. It’s best to do a small pilot project, iron out the bugs, train people, then scale rapidly to production.  

Business Benefits 

There are numerous benefits to content re-use. The primary return on investment is reduced time to completion and reduced cost of production. With basic re-use it’s typical to reduce time and cost by 50%, and advanced techniques using concurrency can reduce these by 75-85%.

In some cases, for example in safety, operational and regulatory training, intangibles such as consistency and accuracy in content have a huge benefit from reduced liability and losses.

 • Tangible
  • Reduce time to market for new information products
  • Reduce service costs 
  • Decrease cost of translation and localization
  • Reduce cost in developing information
  • Easily create new eLearning or other information products as required
 • Intangible
  • Out-market and out-sell your competition
  • Increase consistency in information
  • Increase accuracy in information
  • Increase end-user productivity (customer satisfaction)
  • Better support your distribution channels o Increase accessibility, timeliness & customization for end users  

 Understanding Content Re-Use  

The basic concepts in content re-use are the component or unit of re-use, often called the minimum re-usable unit (MRU) and variants. Figure 1 shows content units assembled into an edition. By this we mean an edition is the final, approved version, i.e., the published version. But in the digital era this is not yet a product for market. To go to market the edition is submitted to a rendering engine that transforms it into different renditions suitable for different distribution channels, e.g., PDF for print or HTML for web pages.
 Fig 1 – Content management process
 


The source MRU content goes through revision and approval cycles. Each revision is logged in the CMS as a version. lcmsThus workflow and version control are significant aspects of an LCMS.
Content is re-used by pasting as an embedded link or by pasting as a copy. An embedded link (Figure 2) shows the content in situ but it is protected so you can’t change it. You are not the owner of embedded content so an LCMS should have link management. The issue is that when the owner changes this content it might not be suitable for you.

lcmsLink management allows you to specify conditions for linking such as link to: most current, last published, or some specific version. lcmsIdeally it would also include a notification to you when this content has been changed, and a do not delete notification to the owner that you are using this content.
  Fig 2 – Embedded content is shown in a protected field
 

Link management should also maintain relational integrity when content is moved, and transparently manage changes to the names of content assets. lcmsThis information is also important for where used reports.

When you copy content, you become the new owner of the copy or asynchronous variant. You can do whatever you want with it, including version it through numerous revision cycles. In other words, a variant re-purposes content.  lcmsAnother important LCMS function is to track source content to its variants. This is important in cases where, for example, the source got the name of the president wrong and the source and all variants must be corrected. A related case is when you want to track synchronous variants, typically translations.

The definition of an MRU is a very important and difficult decision. It has to have the right granularity, i.e., size. If it is too large, there will be few opportunities for re-use. If it is too small, the administrative burden will be very high. An MRU must also have a stand-alone context so it can be used without an external introduction.

An MRU is not the required or only unit of re-use. Larger aggregates of content can be re-used. But with a few exceptions noted below, an MRU is the minimum content that should be re-used. Otherwise re-use becomes ad hoc and copy paste.

Best practice in technical documentation is that an MRU is a topic (chapter > section > topic). In some cases it can be as small as a paragraph but this is usually for what is called standing copy or boilerplate. Extreme exceptions could also be certain names or definitions. lcmsIdeally this would be based on a glossary look-up.
In eLearning the smallest MRU is a learning object (LO) representing a single learning objective (LOJ). Some LCMS define this as a page. A larger MRU could be a module consisting of several LO or pages. In either scenario, the context should be stand-alone.

Other asset types such as graphics and videos can also be re-used subject to the same ownership, linking, notification and variant discussion. In this scenario XMP metadata (see further below) is in important standard.

Asset ID

In a CMS files are not stored with a file name. The file name is usually a unique random number or System ID. What we see, and think of as a file name, is a label or title that is a metadata field. For managing versions, variant copies, links and maintaining relational integrity, each asset should also have a unique Asset ID also stored in a metadata field.

lcmsThere are different ways of establishing unique IDs but the easiest is to extract the System ID and put it in the Asset ID field. The label or title should conform to a naming convention.

Content Architecture

There is no ideal content architecture. This might be hard to grasp. Content cannot be normalised in the same way as a database schema. This is why it’s partly a black art that has to be learned more than taught.

 A where used report in a CMS is essential for maintaining content but doesn’t tell us very much about its information architecture or the structure of content products. The starting point for architecture is a content-map analysis of source content and its input authorities, and target information products (outputs).

A content map is a type of network diagram showing the relationships between content. It shows where we plan to use content, and identifies common content. It also begins to give some insight into the granularity and classification of content in taxonomies and metadata models, and eventually helps in structuring information products. It answers the question, how do we re-use content?

A content map begins to clarify:

  • What business problem are we trying to solve
  • What content is relevant
  • What information do we need about that content
  • New opportunities – how do we exploit the content

lcmsA taxonomy is a hierarchical folder structure with no duplicate nodes for classifying assets. There is a school that favours folksonomy tags (keywords) over taxonomy but this implies that users know how to search efficiently using tags, and that they apply appropriate terms. A taxonomy is at least familiar to use; although an information architect is needed to develop it.

An effective LCMS must have the capability to have separate taxonomies for content and courses. Typically the course taxonomy will mirror the course taxonomy on the learning management system (LMS) used for course delivery. This makes it easier to keep the two systems synchronised. Usually course taxonomies should be based on functional, not organizational, roles.

lcmsMetadata is a flat classification whereas taxonomy is hierarchical. Metadata often has different aspects (facets). Most web sites use faceted navigation. We need metadata for:
  •  Classifying
  • Searching
  • Filtering
  • ·Aggregating
  • Knowledge management 
SSome possible metadata types are:
  • Administrative
  • Publication
  • Lifecycle
  • Rights
  • Applicability
  • Workflow
  • Descriptive
  • Structural
  • Navigational

A full discussion is beyond the scope here but, for example, applicability could be the course to which the content relates; and Rights is important metadata for managing intellectual property rights. From a re-use perspective metadata responds to these questions:
  • Is there information on this
  • Where is it
  • How do I find it
  • How do I retrieve it
  • How do I re-use it
  • How do I transform it
  • What standards are relevant

Standards

Finally, we have to know the relevant standards and the role they play in content architecture. These are the ones used in examples that follow:

SCORM 1.2 and SCORM 2004 are essential packaging standards for eLearning. They facilitate transferring information between conforming systems.

HTML5, CSS3 and JavaScript (dynamic HTML) are web page standards that are replacing the use of Flash FLV because Flash is not supported on mobile devices.

JPEG progressive is a graphic file format that can be used in responsive web page design.
YAML is a design pattern for responsive design.

Dublin Core is a publishing and metadata standard, and XMP is a metadata standard for managing graphics.

ePUB is a packaging standard for HTML electronic books.

MP4 is a container standard for AAC Audio files and H.264 Video files.

WebM is a container standard for Vorbis Audio files and VP8 Video files.

ZIP is a packaging standard for compressed content.

PDF is a portable document format. PDF/A is a version for archiving documents in records-management systems.

An Example

A PDF of this article inclduing a worked example is available by email from the author.

David Shaw is an information architect with experience in content-management systems for eLearning and technical documentation