Sunday, December 16, 2012
Taxonomies & Their Cousins (Revised)
Taxonomy and metadata are closely related. A general understanding of different information schemes for organizing and classifying content helps in the design of a taxonomy. Best practice guidelines lead to a better result.
This is a revision of a blog article previously published here.
A PDF of the complete revised article is available by email from the author.
Saturday, December 15, 2012
Translations for Global Markets
On any multilingual project, translation is always the long tail. Same day release to all global markets can be achieved using controlled language, machine translation, and translation-management systems.
English documentation predominates around the world yet more than 50% of readers of English web pages have English as a second language. Many eLearning courses, web pages and technical documents are available in English only, or the other languages are delayed by many months or even years. On any multilingual project, translation is always the long tail. Translation cannot be done until the source language is finalized. However the tail can be shortened if we do translations concurrently.
A PDF of the complete article is available by email from the author.
English documentation predominates around the world yet more than 50% of readers of English web pages have English as a second language. Many eLearning courses, web pages and technical documents are available in English only, or the other languages are delayed by many months or even years. On any multilingual project, translation is always the long tail. Translation cannot be done until the source language is finalized. However the tail can be shortened if we do translations concurrently.
A PDF of the complete article is available by email from the author.
Thursday, December 13, 2012
Basic Content Re-Use
Abstract
This white paper outlines some of the metadata principles and relevant standards for re-using content in an eLearning application; and for moving from Flash to responsive mobile content. A marker like this lcms indicates a reference to a required function in an LCMS.A PDF of this article inclduing a worked example is available by email from the author.
Introduction
Industrial-scale (~1 million text objects) content re-use is well established in XML content-management systems such as Astoria, Ixiasoft, Lenya, MarkLogic, SigmaLink, SyberSafe, Vasont, XPP Professional Publisher and others. Apart from graphics, it has not been significant in learning content-management systems (LCMS) because not much knowledge and expertise has transferred between these sectors, especially about automated assembly, synchronous translation and channel distribution.Link management (discussed later) is an obvious requirement for re-use that is a missing function in most LCMS.
Most LCMS content re-use has been based on manual operations, often using multiple copies of content assets. This creates a huge administrative burden and opportunities for error when revisions are necessary.
The other limiting factor is a lack of training and experience in both XML and content re-use in the eLearning development, human-resources and information technology (IT) communities. Content re-use is straightforward when a CMS has been set up in a user-friendly way. But the design of the content architecture to achieve this is complex and somewhat of a black art.
Only a few LCMS vendors such as Giunti, Outstart Evolution, OutStart ForceTen, Thinking Cap and Xyleme are capable of supporting such a content architecture, and there are very few information architects who know how. There are also very few IT departments who understand the technology – pushing SharePoint is not the answer. It’s best to do a small pilot project, iron out the bugs, train people, then scale rapidly to production.
Business Benefits
There are numerous benefits to content re-use. The primary return on investment is reduced time to completion and reduced cost of production. With basic re-use it’s typical to reduce time and cost by 50%, and advanced techniques using concurrency can reduce these by 75-85%.In some cases, for example in safety, operational and regulatory training, intangibles such as consistency and accuracy in content have a huge benefit from reduced liability and losses.
• Tangible
- Reduce time to market for new information products
- Reduce service costs
- Decrease cost of translation and localization
- Reduce cost in developing information
- Easily create new eLearning or other information products as required
- Out-market and out-sell your competition
- Increase consistency in information
- Increase accuracy in information
- Increase end-user productivity (customer satisfaction)
- Better support your distribution channels o Increase accessibility, timeliness & customization for end users
Understanding Content Re-Use
The basic concepts in content re-use are the component or unit of re-use, often called the minimum re-usable unit (MRU) and variants. Figure 1 shows content units assembled into an edition. By this we mean an edition is the final, approved version, i.e., the published version. But in the digital era this is not yet a product for market. To go to market the edition is submitted to a rendering engine that transforms it into different renditions suitable for different distribution channels, e.g., PDF for print or HTML for web pages.
Fig 1 –
Content management process
The source MRU content goes through revision and approval
cycles. Each revision is logged in the CMS as a version. lcmsThus
workflow and version control are significant aspects of an LCMS.
Content is re-used by pasting as an embedded link or by
pasting as a copy. An embedded link (Figure
2) shows the content in situ but
it is protected so you can’t change it. You are not the owner of embedded
content so an LCMS should have link management. The issue is that when the
owner changes this content it might not be suitable for you.
Fig 2 –
Embedded content is shown in a protected field
Link management should also maintain relational integrity
when content is moved, and transparently manage changes to the names of content
assets. lcmsThis
information is also important for where
used reports.
When you copy content, you become the new owner of the copy
or asynchronous variant. You can do
whatever you want with it, including version it through numerous revision
cycles. In other words, a variant re-purposes content. lcmsAnother
important LCMS function is to track source content to its variants. This is
important in cases where, for example, the source got the name of the president
wrong and the source and all variants must be corrected. A related case is when
you want to track synchronous variants,
typically translations.
The definition of an MRU is a very important and difficult
decision. It has to have the right granularity, i.e., size. If it is too large,
there will be few opportunities for re-use. If it is too small, the
administrative burden will be very high. An MRU must also have a stand-alone
context so it can be used without an external introduction.
An MRU is not the required or only unit of re-use. Larger
aggregates of content can be re-used. But with a few exceptions noted below, an
MRU is the minimum content that should be re-used. Otherwise re-use becomes ad
hoc and copy paste.
Best practice in technical documentation is that an MRU is a
topic (chapter > section > topic). In some cases it can be as small as a
paragraph but this is usually for what is called standing copy or boilerplate.
Extreme exceptions could also be certain names or definitions. lcmsIdeally this would be based on
a glossary look-up.
In eLearning the smallest MRU is a learning object (LO)
representing a single learning objective (LOJ). Some LCMS define
this as a page. A larger MRU could be a module consisting of several LO or
pages. In either scenario, the context should be stand-alone.
Other asset types such as graphics and videos can also be
re-used subject to the same ownership, linking, notification and variant
discussion. In this scenario XMP metadata (see further below) is in important
standard.
Asset ID
In a CMS files are not stored with a file name. The file
name is usually a unique random number or System ID. What we see, and think of
as a file name, is a label or title that is a metadata field. For managing
versions, variant copies, links and maintaining relational integrity, each
asset should also have a unique Asset ID also stored in a metadata field.
lcmsThere
are different ways of establishing unique IDs but the easiest is to extract the
System ID and put it in the Asset ID field. The label or title should conform
to a naming convention.
Content Architecture
There is no ideal content architecture. This might be hard
to grasp. Content cannot be normalised in the same way as a database schema.
This is why it’s partly a black art that has to be learned more than taught.
A where used report in a CMS is essential
for maintaining content but doesn’t tell us very much about its information architecture
or the structure of content products. The starting point for architecture is a
content-map analysis of source content and its input authorities, and target
information products (outputs).
A content map is a type of network diagram showing the
relationships between content. It shows where we plan to use content, and
identifies common content. It also begins to give some insight into the granularity
and classification of content in taxonomies and metadata models, and eventually
helps in structuring information products. It answers the question, how do we
re-use content?
A content map begins to clarify:
- What business problem are we trying to solve
- What content is relevant
- What information do we need about that content
- New opportunities – how do we exploit the content
lcmsA
taxonomy is a hierarchical folder structure with no duplicate nodes for
classifying assets. There is a school that favours folksonomy tags (keywords)
over taxonomy but this implies that users know how to search efficiently using
tags, and that they apply appropriate terms. A taxonomy is at least familiar to
use; although an information architect is needed to develop it.
An effective LCMS must have the capability to have separate
taxonomies for content and courses. Typically the course taxonomy will mirror
the course taxonomy on the learning management system (LMS) used for course
delivery. This makes it easier to keep the two systems synchronised. Usually
course taxonomies should be based on functional, not organizational, roles.
lcmsMetadata
is a flat classification whereas taxonomy is hierarchical. Metadata often has different
aspects (facets). Most web sites use faceted navigation. We need metadata for:
- Classifying
- Searching
- Filtering
- ·Aggregating
- Knowledge management
SSome possible metadata types are:
- Administrative
- Publication
- Lifecycle
- Rights
- Applicability
- Workflow
- Descriptive
- Structural
- Navigational
A full discussion is beyond the scope here but, for example,
applicability could be the course to
which the content relates; and Rights
is important metadata for managing intellectual property rights. From a re-use
perspective metadata responds to these questions:
- Is there information on this
- Where is it
- How do I find it
- How do I retrieve it
- How do I re-use it
- How do I transform it
- What standards are relevant
Standards
Finally, we have to know the relevant standards and the role
they play in content architecture. These are the ones used in examples that
follow:
SCORM 1.2 and SCORM 2004 are essential packaging
standards for eLearning. They facilitate transferring information between
conforming systems.
HTML5, CSS3 and JavaScript (dynamic HTML) are web page standards that are
replacing the use of Flash FLV
because Flash is not supported on mobile devices.
JPEG progressive
is a graphic file format that can be used in responsive web page design.
YAML is a design
pattern for responsive design.
Dublin Core is a
publishing and metadata standard, and XMP
is a metadata standard for managing graphics.
ePUB is a
packaging standard for HTML electronic books.
MP4 is a
container standard for AAC Audio
files and H.264 Video files.
WebM is a
container standard for Vorbis Audio
files and VP8 Video files.
ZIP is a
packaging standard for compressed content.
PDF is a portable
document format. PDF/A is a version
for archiving documents in records-management systems.
An Example
A PDF of this article inclduing a worked example is available by email from the author.
David Shaw is an
information architect with experience in content-management systems for
eLearning and technical documentation
Subscribe to:
Posts (Atom)