Andy's profile@ <XML> - About XML and ...PhotosBlogListsMore Tools Help

@ <XML> - About XML and its applications

Getting the most of XML
This person's network is empty (or maybe they're keeping it private).
Other links

Make your own T-Shirt

April 29

Microsoft promises to support Interoperability

28 April 2009. I attended the Microsoft's Interoperability Day held in Singapore. This is a half-day seminar about Microsoft interoperability initiative. I am quite surprise and I am also pleased that Microsoft has embarked on this strategy.

Microsoft will publish full documentation on how they implemented  the open standards as well as how others can add-on to their products.

Microsoft Office is widely used in business and the open source office suite such as Open Office are also gaining acceptance in the office space too. XML is the format of choice but there exist 2 dominants document standards (Open XML and Open Document) is a hassle.

With Microsoft announcing the release of Microsoft Office 2007 Service Pack 2, ODF will also be a native format is a welcome sign. Microsoft also provides templates to help users to create new documents that will be compatible to both standards. This will help the proliferation of using XML document standards and moving away from proprietary binary format.

For those still using Office 2003, a similar release is expected later this year. This is a good news too.

My burning question: In the future, will there be a possibility where one document standard emerge and what are the difficulties the industry have to face to reach this state? At this time no one is sure. The emergency of new technologies may change the scene. This moment the member of panelist is not sure too. Microsoft has opted to support all open standards.

In the meanwhile, two standards are needed and each created with different purpose. OOXML (Open XML) to take care of legacy issues in the Microsoft Office world while ODF (Open Document Format) cater for the general document format which does not have a legacy string attached.

Wait a minute, how about PDF-A? Currently most of us use PDF for document exchange (read only mode). But this is going to change as there are increasing use for data entry (form filling application). This bring us to possible three document standards?

Recent development in China on UOF (Unified Office Format) added another headache. How about other countries? The primary reason that China need a different format as there is a need to handle Chinese characters and layout.

We will have to wait and see.

March 15

Document standards - the struggle for dominance

XML is a universally accepted data format and there is a proliferation of XML-based standards around the world in many different areas and industries; for example ebXML, Rosettanet, HL7, PDX. In January 2002, the XML Working Group, Information Exchange Technical Committee, Information Technology Standards Committee (XML WG) initiated a XML Industrial Project (XIP) to develop an illustrative system for industries to evaluated the use of XML for business document exchange between trading partners. It is certain that smaller and especially non-IT companies can directly benefit from this project. Using XIP, companies can exchange business documents easily will foster greater collaboration amongst them. I will discuss about XIP in the next issue.

On 12 March at XMLOne User Group seminar, we focus on document standards and Microsoft was invited to present their OpenXML (see my photo album). We hope to get Sun, IBM and other ODF supporters to present ODF so that we can have a better understanding of the two standards.

The struggle continues ...

Recently, XML document standards come to the limelight: Open Document Format (ODF, now a OASIS standard) and OpenXML or Office Open XML (OOXML, now an Ecma Standard). Both of these standards focus on document format, the former Open Office and later Microsoft Office. The ODF is an ISO standard while OpenXML is now in the ratification stage.

Traditionally document files created by word processor are saved in binary format rendering them difficult for another program to extract information. When word processor saved them in XML format, it create a new opportunities to use this information and further processing is possible.

ODF and OOXML use different XML tag definitions or schema so make them very different.  You can find convertor to convert from one format to another; however there is some problem when you come to presentation of the visual aspect of a document.

In this issue I will just focus on OpenXML.

Is visual aspect of document important? How about compatibility with earlier version of  MS Office?

Open XML is designed to support all of the features in the Microsoft Office 97-2003 binary formats.  It is important that documents be converted to a new document format with minimal loss of information. However, gross layout differences can result from the compounding of any inaccuracies in the model employed for relative versus absolute positioning, margins, margin collapse, wrap modes, column layout, table layout, alignment, tabs, line spacing, baseline shifts, word spacing, character spacing, kerning, ligatures, hyphenation, etc. These differences in layout can change the meaning of a document worthy of archiving, since meaning is often conveyed by spatial relationships between elements of a document.

Compatibility to older format is an important consideration because users may take to much effort to modify the document to correct some of the problem by the conversion process.

There is much debate as to OOXML is duplicating the work of ODF (ODF was proposed earlier and become OASIS standard and later ISO standard).  The supporter of ODF is advocating the OOXML should not be accepted as ISO standard since there is already a document standard – ODF. The community is divided between these two camps. While the debate continues, I would like to look at the merit of OOXML.  

Is there a chance for both to become standards?

Various document standards and specifications exist; these include HTML, XHTML, PDF and its subsets, DocBook, DITA, and RTF. Like the numerous standards that represent bitmapped images, including TIFF/IT, TIFF/EP, JPEG 2000, and PNG, each was created for a different set of purposes.

Open XML addresses the need for a standard that covers the features represented in the existing document corpus (especially for existing Microsoft Office binary format files). To the best of my knowledge, it is the only XML document format that supports every feature in the binary formats.

Microsoft Office suite of applications is widely used in most companies and many editable documents were created. Open XML ensure that these documents can be accessed and converted to XML standard for further information integration. 

Low barrier to adoption

Although the Specification describes a large feature set, an Open XML conformant application need not support all of features in the Specification. Open XML implementations can be very small and provide focused functionality, or they can encompass the full feature set. Further more it can also be easily converted to ODF or other XML format if visual presentation accuracy is not needed.

Compactness

The OpenXML file format supports the creation of high-performance applications.  An OpenXML file is conventionally stored in a ZIP archive for purposes of packaging and compression, following the recommended implementation of the Open Packaging Conventions.

Modularity

The file structure is modular enables an application to accomplish many tasks by parsing or modifying only a small subset of a document.

Three features of the OpenXML format cooperate to provide this modularity.

·         A document is not monolithic; it is built out of multiple parts.

·         Relationships between parts are themselves stored in parts.

·         It supports random access to each part.

Extensibility

Extensibility mechanisms built into the format guarantee room for innovation.

Conclusion

The compelling need exists for an open document-format standard that is capable of preserving the billions of documents that have been created in the pre-existing binary formats is an important consideration.

Standardizing the format specification and maintaining it over time ensure that multiple parties can safely rely on it, confident that further evolution will enjoy the checks and balances afforded by an open standards process.

This is my humble opinion. I will review ODF in the coming issue.

November 11

XML used in Windows Script Host (WSH)

Today I'll show an example of one place where XML is used.
 
XML provides the glue for combining different scripts into a Windows Script File (WSF) used in Windows Script Host. An Example:
 
<?XML version="1.0" standalone="yes" ?>
<job>

  <?job error="true" debug="true"?>

  <comment>The following VBScript displays a information message</comment>
  <script language="VBScript">
    MsgBox "VBScript has displayed this message."
  </script>

  <comment>The following JScript displays an information message</comment>
  <script language="JScript">
    WScript.Echo("JScript has displayed this message.");
  </script>
</job>
 
Following describe the function of each tag and their functions.
TagDescription
<?job ?>.Enables or disabled error handling and debugging for a specified job.
<?XML ?>.Specifies the Windows Script File's XML level.
<comment> </comment>.Embeds comments within Windows Script Files.
<script> </script>.Identifies the beginning and ending of a script within a Windows Script File.
<job> </job>.Identifies the beginning and ending of a job inside a Windows Script File.
<package> </package>.Enables multiple jobs to be defined within a single Windows Script File.
<resource> </resource >.Defines static data (constants) that can be referenced by script within a Windows Script File.

Example of XML used within the context of the WSH to define the structure of Windows Script Files.

 

 
November 09

XML Syntax

I would like to explain more about the XML syntax in this issue.

XML uses tags to mark-up text. As a web developer, you're probably familiar with the concept of markup text. For those not familiar, I will explain here.

 

Without much hinting, a human can deduce that this web page is showing the comparison between the quotations from two suppliers. The HTML is a series of “<>” called tags which are notations for the display attributes. Browsers make use of these tags to display the information. For example the “<p>” and “</p>” pair is a notation for paragraphs. The browser needs to display the text between these tags on one line. “<b>” and “</b>” indicates that the text must be displayed in bold. While “<i>” and “</i>” pair tell the browser that the text is in italics.

A set of predefined tag name are used by HTML. For more information about these tags please visit W3C web site for detail. You do not need to know them as HTML web design tools add these tags automatically for you. As web developer this knowledge may come in handy.

XML allows you to create your own tags. For example <intro>Here is the introduction</into>. In this example the <intro> tag tells you the purpose of the marks up. It is self-describing. However you are not restricted, you can used any name.

This marks up are intended for software applications to pick up the information when parsing through the document. As a rule XML document must be well-formed.

What is well-formed?

A XML documents are well-form if they meet the following criteria:

  1. The document contains one or more tag (also know as element - will explain that later)
  2. The document contains a single document element, which can contain other elements.
  3. Each element is closes correctly
  4. Elements name are case sensitive.
  5. Attribute values are enclosed in quotation marks and cannot be empty.

Tags and Elements - any difference?

I used tag and element when talking about XML document. At a glance, they seem interchangeable. But there is slight difference between them.

Element consists of a pair of opening and closing tags. The content is enclosed by these tags. So a tag is one part of the element.

This example shows a simple quotation information which shows the information of the prices quoted by two different suppliers. You can development a software to automatically compare the prices and pick out the best price from this XML document.

Attribute is used to describe the property of the element.

Example:

In this example of type of transport, the mode is an attribute. Notice that it is enclosed in the opening tag of the element transport. It's value land, sea and air must be enclosed by quotation marks.

November 08

Web Vocabularies

As XML grows in popularity, the number of XML vocabularies have grown within various industry and communities sectors. These vocabularies are used to store database information, data exchange and describe concepts. XML when used in web to overcome some of the shortcoming associated with HTML (Hypertext Mark-up Language). XHTML is created specifically for the web.

XHTML is probably one of the most widespread use of of XML. Most modern web browsers supports XHTML; example Microsoft Explorer 6 for Windows, Mozilla FireFox 1.x for Windows and Safari 1.x for Macintosh.

XHTML is HTML reformulated in XML. XHTML provides a number of benefits compared with HTML. The most important benefits is that it separate the presentation from content. On the other hand HTML contains both presentation and content. Why is this feature important? This allow you to use XML specific tools and technologies to create modular documents. Most importantly, you can use your computer to extract data from these documents.

Content refers to the basic data and structures that form the document. Presentation determines how these data structures appear in the viewing devices which include fonts, colours, borders and other visual information. In this way, you can repackage the content for different audiences or applications.

Separating presentation from content has the following benefits:

  1. Accessibility
  2. Targeted presentation using style sheets
  3. Streamlined maintenance
  4. Improved processing

Accessibility

XHTML enable devices such as screen reader and voice browser to extract information for people with visual impairments.

Targeted presentation

When you can separate the content from the presentation, you can reformat the content for different devices. For example PDA, handheld mobile devices have smaller screen compared to standard computer screen so their presentation are different too. You do not need to have separate web pages for different devices.

 Streamlined maintenance

Storing content and presentation separately make it easier to maintain the web site. It speeds up the site maintenance process.

Improved processing

You can develop applications that can extract information for further processing.

Currently XHTML version 2.0 is in the process of drafting at W3C. The current version is XHTML version 1.1 which was released on May 2001.

 
Photo 1 of 7

Andy Tan