News, Events, Trends, Activities, Conferences and Notes to do with Web Culture, Development, New Media, Content Management, Mobile and PDA Access and Web Infrastructure
|
See Also: Home Links Personal Site Blogroll FriendFeed CV | |
Wiki Menu: |
Schema LanguagesIn 2004 I attended OSCON in Portland, Oregon and made the following notes after one of the sessions there... Have just attended the XML Schema languages tutorial. It was meant to be presented by Eric van der Vlist, but in his absence was presented by Micah Dubinko (W3C XForms project) and Kip Hampton (XML tech writer and Ax Kit developer) As the session notes weren't written by the presenter the session didn't flow all that smoothly at first, but both guys turned out to be authorities (especially Micah) and both were using and prefered different Schema methods which made it nice and balanced. The talk focused on the four main XML validating approaches that have serious traction at the moment. The rules based XSLT and Schematron methods, the grammar based RELAX-NG aproach, and the W3Cs descriptive XML-Schema.
Rules based Schema languagesThe rules based approaches were covered first, both being relatively simple means of testing XML structure. The XSLT approach is just that, you build an XSLT file with a sequence of TEMPLATE elements that perform XPath queries using axes tests to see if the XML is well formed, ie does this element 'follow' that, does a certain element contain a known attribute, are there any attributes, does it contain a certain number of child elements, etc etc etc. This XSLT approach seems simple, is definitely capable (ie you can do pretty much any test) and relatively easy to construct, but compared to the other methods its definatly a little more fragile, its wickedly verbose, would be difficult to maintain, and isnt very modular. The other rules-based approach is Schematron. This is a sort of higher-level meta-XSLT language invented by Rick Jelliffe, uses a simple and limited number of XML elements that allow you to declare a list of tests or rules, and error message to spit out should any of these rules and assertions fail as the XML test document is parsed. Its much more concise (less verbose) than XSLT only testing, and the tests themselves can use XPath queries to check more complex details of the XML instance your testing. The really cool thing about Schematron is that the engine transforms it into XSLT first and then uses that XSLT to test the XML. Basically you are creating the flexible method testfile mentioned above without all the hard word.
Grammar based Schema languagesThe language de-jour is RELAX-NG. Micah began to introduce the topic before the break but was obviously approached during the break by Mike Fitzgerald who volunteered to cover the topic. He was bloody brilliant, spoke frankly about some of the politics of W3C that led to the establishment of the RELAX group. In a nutshell RELAX was spawned when a couple of the core original developers of the W3's XML spec, incl James Clark, got peeved off about the heavy weight implementation and time delay developing W3-Schema. Fitzgerald mentioned that he often hears of folks being press-ganged into using W3C schema just because some PHB has read about it or likes the idea of it being rubber-stamped by the W3C, but the reality is that RELAX meets most of the common testing needs, and importantly, in the 4-years or so he's been on the XML-DEV list he has yet to see any valid or significant technical criticism or wingeing about RELAX. RELAX-NG is a much lighter-weight language. In RELAX you declare a list of patterns (actually everything is treated as a pattern), the patterns can be expressed in XML form (kinda representative of the actual data being checked) or in a compact form (where it looks kinda like a DTD). RELAX can check for unque ID type records as it parses the XML, and can use XPath expressions in its tests.
Libxml2
XML-SchemaThis is the grand-daddy of them all. Have mentioned most of the negative things about it already, but one of the greatest complaints in the group today concerned the complexity of the spec for XML-Schema, Micah saying he spent a good couple of months coming to grips with it. But of course there are positives. The typing is more low-level and granular, it supports easy and often used bounds and limits checking (e.g. maxOccurs, minOccurs). It is very modular and can import type libraries of type components, the descriptive nature of it is more object-oriented and potentially more easily used within OOP environments, supports pattern-matches within test assertions.
OPEN vs CLOSED schema testingThere are two general approaches to validating instance documents. Open schema tests are where you expect and allow anything to be parsed and only test for specific conditions. Closed testing is the opposite where you deny everything except for things that pass your tests or assertions. I'll show yaz some examples when I get back, but basically these two approaches each have their merits and useful applications, and in practice both are often used. Plus, any of the above schema test methods can be used in both Closed and Open tests.
ToolsThe most often mentioned tool was the Trang utility which allows you to convert one Schema language to another, most often between XML-DTDs, RELAX and or XML-Schema. The wiced thing about this (and seems like a popular strategy with folks here) is that you can develop in the less complex RELAX then use Trang to generate XML-Schema which can then be used for other purposes or delivered to other parties for their needs.
Which to Use?This is definitely horses-for-courses material, a number of folks in the audience and the speakers themselves mentioned a preference for one or the other, but everyone agreed that each has its strengths and weaknesses and that in a real world development environment its realistic to expect to be using possibly all of them where applicable. The general feeling however was that RELAX basically all of the features that are typically required without the complexity of W3-Schema, and that if you were to use W3-Schema you ought to keep it simple and avoid complex type construction.
See Also: XML Technologies | Web Development | Notes Index |