XML and Relational Databases
There is a wide hype about XML support in relational DBMSs nowadays. The basic feature of newest IBM’s system DB2 version 9 (a.k.a. ‘Viper’) is native XML storage support, they use a lot of XML-noise in their marketing program, write a lot of technical articles, attract scientists eyes and ears at conferences [1], etc. Actually, not only IBM’s developers do a lot of work trying to achieve a good support of XML in their DBMS products. Microsoft and Oracle obtained pretty good results during several years of XML type support development and I’m sure that Open Source people may learn many good things from that work. Well, what it all for? Is it really necessary to have abilities for working with XML in relational databases? Let me give several examples and decide yourself, yes or no.
I will give a very simple classification. It is based on dividing all major cases to as little as two parts, by direction of XML data flow (сan you suggest simpler way?). Then, I provide two types of examples, business-to-business (“B2B”) and ones from web development world (“webdev”). As for webdev, it’s pretty clear that any web developer can imagine plenty examples in addition to mine – and there is no strange here, because XML is widely used by web developers and actually, Internet is “native land” of XML.
At the end, I’ll give two remarks, which, I suppose, are pretty essential and should be taken into account if we want to talk about XML support in relational databases today.
Here is my simplest classification of XML-in-RDBMS use cases:
1) XML publishing, or, in other words, “SQL in, XML out”. This usually means that we have some relational database and want to export XML data, using SQL commands. Omitting some proprietary techniques, this usually implies that database implements features from SQL:2003 standard, part 14 “XML-Related Specifications (SQL/XML)”(1). This part defines a set of special SQL commands for exporting relational data as XML documents (well, actually, this can be not only documents, but XML fragments as well – e.g. sequences of XML elements).
B2B examples:
- e-commerce: some online shop wants to export its catalog to some aggregation service (e.g., Russian Yandex.Market accepts catalogs from big number of shops in its own XML format; so, that shop stores its data in relational database its maintainers may be interested in XML export functionality);
- extracting-transforming-loading (ETL) procedure in the world of Data Warehouses (DWs): one of the most common cases is the distributed system of a number of OLTP system and one DW center. In case of heterogeneous OLTP systems (different database structures or even different platforms) we have a difficult task – to find the most convenient way to push the data from these systems to the central DW periodically. This task is the part of ETL procedure and developers often use XML as an intermediate format to solve it. So, if that developers have SQL/XML publishing abilities in their database systems, they can easily write simple stored procedures and achieve mutual understanding with DW.
Webdev examples:
- you have a good content site and use relational database to store your articles/news/reviews; some day you understand that your site cannot live w/o RSS (RDF, Atom, FOAF or some of microformats, whatever) support. So, if your DBMS is able to do it, you just write new procedures wuth SQL/XML constructions and feel happy;
- you use some templating system based on XML; in this case it would be useful to have abilities to pass the data from relational database directly to templating system or, at least, to have the data in XML format returned by DBMS on query. It could make the life easier if programmer can define internal data structures (XML scheme) and pass the rules to the designer, who then use that rules during XSLT templates creation.
2) Storing XML values (documents, fragments) in the relational database (XML storage; XML type). This means that DBMS provides abilities to insert XML values in the relational (pure relational or, at least, at the logical level) tables and can process complex queries to the XML data. As for that queries, that can be, of course, usual SQL’s SELECTs, INSERTs, UPDATEs, DELETEs and, which is more interesting, queries in XPath and XQuery. Actually, support of the latest one usually means that we can forget about XPath as XQuery uses it heavily as embedded “sub-language”.
Before describing examples, I have to say that there is convention to define two classes of XML documents: data-centric and document-centric. In the documents from the first class the data is most significant while the structure is not (e.g., it may do not matter how XML elements are ordered in the document). Document-centric documents are those in which XML is used for its SGML-like capabilities, such as in user’s manuals; these XML documents can be used as configuration files or for definition of some application layer protocol.
So, in case of document-centric data one of possible situations would be the system where you intentionally combine two data model to achieve some level of flexibility (e.g., you want to store small config files in XML column and avoid using arrays or another composite relational types; or, maybe, you want to store WSDL files inside you relational tables). Let’s now be limited to data-centric class. Actually, you’ll probably find the following examples being symmetrical to that from the first section.
B2B examples:
- e-commerce: you are implementing the data aggregator with advanced search capabilities – for example, you are want to gather all market offers from your partners – online shops. If the latest are interested to work with you, the best way to follow is to create some XML schema (in DTD, XSD or another meta language) and to make your partners use it while providing the data to you. Then you have to options (surely, as we did it before, assume that you have relational DBMS): to parse and store the data using pure relational types or, what might be more attractive, to store the docs as is, which can mean that you will have the storage of initial content and nothing will be lost. Surely, if that online shops don’t want to use your format, you may consider an option to load their XHTML pages and store them and then use XML querying functions to retrieve the data you need;
- data warehouses: developer of DW itself can use XML type for initial storing for the documents just arrived, before they will be processed and loaded into DW with help of XML querying functions.
Webdev examples:
- web developer can store documents with XHTML tags inside relational database; a set of tools for work with tags (in other words, XML querying functionality) can be useful in many cases;
- returning to microformats, one possible problem would be parsing of hResume-s uploaded by site visitors to make their life easier while searching new job.
If you are interested in this theme, I recommend articles and reviews by Ronald Bourret [2].
Comments
(1) It’s worth to say that this year the new version of SQL/XML standard was published. In SQL:2003 the part 14 defined only publishing functions and described a quite weak data model – so, many people didn’t understand what it needed for, what is its goal and even what is the difference between SQL/XML and XQuery. The latest version (actually, now SQL standard is reffered to as SQL:200n, because ANSI and ISO decided to release different parts in different years separately – so, I preffer to call the latest version of part 14 “SQL/XML:2006”) introduces clear understanding of the data model and the roles that SQL/XML and XQuery play together. As for data model, W3C standard “XQuery 1.0 and XPath 2.0 Data Model (XDM)” [3] serves as the basis for its definition – so, SQL/XML functionality is closed on its data model (XML values as input and the values of the same type – as output). Also, one of the main changes in the standard is the new XMLQUERY function which is nothing else but the way to use XQuery in SQL context. So, now it’s clear that SQL/XML is a kinda “glue” between SQL and XQuery. Relational DBMS that implements SQL/XML:2006 can be considered as powerful system for working with two data models in one environment.
(2) What is the main difference between native XML databases and XML-enabled relational databases, if we consider physical aspects? The thing is that native XML databases are not limited with already created storage system and they concentrate on the nature of XML data. Do you remember how relational DBMS stores the tuples of data? The common approach is to store all attributes of the tuple sequentially, and each attribute in each tuple is stored at the address that has exact shift from the start of the tuple. Tthat shift doesn’t change from tuple to tuple in one relation, what allows to achieve the best performance in “flat” queries (in queries that are native for relational data model – when the result set is another set of tuples, that is, relation). XML data is completely different. For example Sedna XML DBMS tries to keep that XML elements closer to each other, which will most probably be requested sequentially – for example, during XML serialization process that would be “parent element and its first child” and “two adjacent elements with the same parent”. Furthermore, one of the most important tasks for the native XML storage implementation is improving the perfomance of partial XML document updates. This can be achieved, for example, with special partial-doc locking algorithms for transaction system [4]. So, if you have a set of large documents and need to be able to perform quick updates of their parts, you need to use the DBMS that has native XML storage system. This definitely doesn’t mean that XML-enabled systems where XML data is stored in relations are useless, I hope that examples I listed above are enough to understand it.
Links
[1] Native XML Support in DB2 Universal Database, Matthias Nikola, Bert Van der Linden, VLDB 2005.
[2] XML Database Products, Ronald Bourret, 05 June 2006.
[3] XQuery 1.0 and XPath 2.0 Data Model (XDM), W3C Candidate Recommendation, 11 July 2006.
[4] Transaction Isolation in the Sedna Native XML DBMS, Peter Pleshachkov, Leonid Novak, SYRCoDIS 2004.
