Galaxy Consulting
  • Home
  • About Us
    • Our Process
    • Meet Us at Industry Events
  • Services
    • Business Analysis and Usability
    • Content and Knowledge Management
    • Records Management
    • Information Architecture
    • Enterprise Search
    • Taxonomy and Metadata Development and Management
    • Document Control
    • Information Governance
  • Solutions
    • Information Overload
    • Compliance
    • E-Discovery
    • Internal and External Websites
    • Enterprise Search
    • Collaboration and New Employees’ Onboarding
    • Customer Service
    • Manual Processes
    • Vulnerability of Sensitive Information
  • Portfolio
    • Our Brochure
    • Our Clients
    • Case Studies
    • Presentations
    • Press Releases >
      • Galaxy Consulting Receives 2016 Best of Redwood City Award
      • Galaxy Consulting Receives 2015 Best of Redwood City Award
    • Videos
  • Testimonials
  • Blog
  • Free Consultation
  • Contact Us
  • Terms of Use/Privacy Policy

DITA, Metadata, and Taxonomy

2/22/2012

0 Comments

 
Picture
Component-oriented content creation enables more efficient content re-use and dynamic publishing at more languages at a lower cost. XML authoring is required for the component content creation. 

Research shows that organizations that use XML authoring are more mature than their peers with respect to the adoption of best practices for search and metadata. However, the use of native DITA (the Darwin Information Typing Architecture) metadata capabilities is rare, and many are also missing out on opportunities to use taxonomy for reuse and improved findability.

In this post, I am going to describe metadata capabilities within DITA, discuss two major benefits that can be achieved by using descriptive metadata and taxonomy, and recommend some best practices for getting started with metadata for component-oriented content.

Finding content in your file system or content repository is hard enough when you’ve got simple text documents to deal with. When you are using DITA and other component-oriented XML standards, you increase the difficulty by two or three orders of magnitude, because you’re looking for smaller needles in bigger haystacks. Having thousands of media-independent content objects that can be shared and reused across multiple deliverables allows you to create more sophisticated knowledge products, but it definitely poses a challenge in findability for content authors.

Among its many features for content reuse, DITA provides content creators with a facility for tagging content objects with metadata. Metadata (data about the data) lets content authors and others who manage content describe what the content is about ("descriptive metadata"), as well as assign properties like who created the content, when, in what language, and for which audience ("administrative metadata").

A taxonomy is a hierarchical structure that organizes concepts and controls vocabulary. Taxonomies allow organizations to create and centrally manage important terms that can be applied to content as metadata. For example, a telecommunications manufacturer might have a taxonomy that includes concepts such as product categories (Mobile Phones, Wireless Routers, and so on), industries (Healthcare, Utilities, Transportation, and so on), or product models.

Once applied, this metadata and taxonomy can be leveraged by a search application to help users find and use content. Search engines can use taxonomy to organize search results in meaningful ways, such as refining search based upon certain properties ("faceted search") and suggesting related searches based upon relationships between search terms and other concepts in the taxonomy.

It is a natural fit — DITA and taxonomy. DITA creates a multitude of reusable components, and taxonomy helps describe and organize the components so that they may be readily found and reused by content authors and users.

Taxonomies and descriptive metadata is also a natural fit since metadata-based search would improve findability of content objects.

DITA Support for Metadata

Compared to other XML standards, DITA provides a relatively rich and extensible framework for embedding metadata directly within the XML objects themselves. The embedded metadata can be used by processing tools like the publishing tools in the DITA Open Toolkit (DOTK) to conditionally publish content or to create metadata in the final outputs, like HTML.

DITA objects, both topics and maps, have a prolog section in which metadata can be specified. Within the prolog, the metadata section can define metadata about the topic itself such as the intended audience, the platform (for defining the applicability of the topic to specific hardware or operating systems), and so on. This metadata can be used for conditional publishing. For example, you can automate the production of a Linux version of your documentation by only outputting topics and maps that set platform to "Linux" in the metadata.

DITA objects can also embed administrative metadata about the author, copyright holder, source, publisher, and so on. Metadata can also contain descriptive keywords for the topic or map. Keywords or index terms are output to HTML or XHTML as metadata keywords to support search engines. Authors can also define index terms for the generation of back-of-book indices.

DITA also enables users to define custom metadata fields within the "othermeta" element. Like keywords, metadata defined as "othermeta" are output as HTML metadata elements but ignored for other types of output like PDF. Metadata is a powerful tool in helping to manage and publish DITA content.

Dynamic Publishing of Content

A major benefit of DITA is creating content that is media-independent. It also enables content objects to be organized by DITA maps, so that content can be recombined and re-sequenced into different deliverables. DITA maps provide flexibility.

Dynamic publishing lets content be chosen and presented to meet the unique needs of a user or situation. To best illustrate dynamic publishing, let’s compare it with static publishing of a help system.

In a statically published help system, the hierarchy of topics is fixed by the author and the selection of content is limited to what is in the DITA map at publish time. All of the related topics are manually linked. If an author wants to add a related topic, the author needs to manually add the link (or update the related-links table) and republish. The publishing process creates a deliverable that—while interactive—is static with respect to its contents and the relationships among them.

To create the same help system with dynamic publishing, the author would publish his/her content to a server, but he/she would not create the structure and relationships between topics at publish-time. Instead, a taxonomy would specify the relationship between concepts and properties that are defined in metadata. The relationships among topics are generated at run-time, based upon metadata on the topics. The richer the metadata and the more complete the taxonomy, the more sophisticated the user experience.

If you have experienced faceted search on consumer web sites, where we can refine search results by selecting specific values for different attributes, such as the number of megapixels for a camera. This experience is driven by metadata. With rich metadata on DITA content, we can create very sophisticated electronic content browsers, where metadata-based search creates browser-like user experiences.

Best Practices

Start by identifying all your taxonomy use cases. You will be using taxonomy not only for authors to search content objects for reuse but also potentially for serving up content to users dynamically or in a faceted interface. These perspectives will provide you with the framework for your taxonomy.

Reuse existing vocabulary. Many organizations already use controlled vocabularies for some metadata fields such as organization, audience, platform, and product. Look to existing sources for tagging your content such as hierarchical product or system models (from engineering), or hierarchical task models (from instructional/task analysis from the training organization) as places to start building hierarchical descriptive taxonomies.

Authors are the best people to apply descriptive metadata. After all, they do the analysis to determine what content was required in the first place, so they have the best context for classifying it. However, don’t expect authors to tag a lot: automate tagging when possible, especially for administrative metadata (author, organization, creation date, language).

Leverage the technology. Many content management systems can integrate third-party classification servers for automating descriptive metadata. These servers can automatically apply metadata from a taxonomy or controlled vocabulary when content topics are checked-in, then automatically populate subject metadata fields in the CMS. The metadata can in turn be reviewed and manually adjusted by authors. This metadata can be embedded into your DITA content for use in conditional publishing or to generate HTML tags in the final output to support search or dynamic publishing.

The next frontier of DITA adoption is leveraging semantic technologies (taxonomies, ontologies and text analytics) to automate the delivery of targeted content. For example, a service incident from a customer is automatically matched with the appropriate response, which is authored and managed as a DITA topic. 

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Archives

    April 2022
    March 2022
    January 2022
    July 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    July 2020
    April 2020
    March 2020
    December 2019
    November 2019
    September 2019
    August 2019
    July 2019
    May 2019
    March 2019
    February 2019
    January 2019
    December 2018
    October 2018
    July 2018
    June 2018
    May 2018
    March 2018
    February 2018
    January 2018
    December 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    January 2017
    December 2016
    November 2016
    September 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    August 2012
    July 2012
    June 2012
    May 2012
    April 2012
    March 2012
    February 2012
    January 2012
    December 2011
    November 2011

    Categories

    All
    Alfresco
    Arena
    Automatic Classification
    Autonomy
    Big Data
    Business Analysis
    Case Studies
    Change Control
    Change Management
    Cloud Content Management
    Cloud Ecm
    Cloud Enterprise Content Management
    Cms
    Collaboration
    Compliance
    Concept Searching
    Confluence
    Content Analysis
    Content Localization
    Content Management
    Content Management Systems
    Content Strategy
    Controlled Vocabulary
    Coveo
    Crisis Management
    Dams
    Data Integrity
    Data Security
    Digital Asset Management
    Digital Asset Management System
    Digital Transformation
    Dita
    Document Control
    Document Control Systems
    Documents Management
    Documentum
    Drupal
    Dublin Core Metadata
    Ecm
    E Discovery
    Engineering Change Process
    Enterprise Content Management
    Enterprise Search
    ERoom
    E-Signature
    Exalead
    Fatwire
    Gamification
    Gmp
    Gxp
    Hadoop
    Information Architecture
    Information Governance
    Information Overload
    Information Technology
    Iso 9001
    IT Systems Validation
    Joomla
    Knowledge Management
    Knowledge Management Applications
    Metadata
    Mobile Devices
    Naming Conventions
    Ontology
    Open Source Cms
    Open Text
    Oracle
    OWL
    Personalization
    RDF
    Records Management
    Risk
    Search Applications
    Self Service
    SEO
    Sharepoint
    Social Media
    Structured Content
    Taxonomy
    Teamsite
    Thesaurus
    Tridion
    Twiki
    Unified Data
    Usability
    User Adoption
    User Centered Design
    Vasont
    Vivisimo
    Web Site Content
    Web Site Design
    Wiki

    RSS Feed

Powered by Create your own unique website with customizable templates.