Updating semi-structured data

Amornsinlaphachai, Pensri (2007) Updating semi-structured data. Doctoral thesis, Northumbria University.

[img]
Preview
Text (PhD thesis)
amornsinlaphachai.pensri_phd.pdf

Download (18MB) | Preview

Abstract

The Web has had a tremendous success with its support for the rapid and inexpensive exchange of information. A considerable body of data exchange is in the form of semi- structured data such as the eXtensible Markup Language (XML). XML, an effective standard to represent and exchange semi-structured data on the Web, is used ubiquitously in almost all areas of information technology. Most researchers in the XML area have concentrated on storing, querying and publishing XML while not many have paid attention to updating XML; thus the XML update area is not fully developed. We propose a solution for updating XML as a representation of semi-structured data. XML is updated through an object-relational database (ORDB) to exploit the maturity of the relational engine and the newer object features of the OR technology. The engine is used to enforce constraints during the updating of the XML whereas the object features are used to handle the XML hierarchical structure. Updating XML via ORDB makes it easier to join XML documents in an update and in turn joins of XML documents make it possible to keep non-redundant data in multiple XML documents. This thesis contributes a solution for the update of XML documents via an ORDB to advance our understanding of the XML update area. Rules for mapping XML structure and constraints to an ORDB schema are presented and a mechanism to handle XML cardinality constraint is provided. An XML update language, an extension to XQuery, has been designed and this language is translated into the standard SQL executed on an ORDB. To handle the recursive nature of XML, a recursive function updating XML data is translated into SQL commands equipped with a programming capability. A method is developed to reflect the changes from the ORDB to XML documents. A prototype of the solution has been implemented to help validate our approach. Experimental study to evaluate the performance of XML update processing based on the prototype has been conducted. The experimental results show that updating multiple XML documents storing non-redundant data yields a better performance than updating a single XML document storing redundant data; an ORDB can take advantage of this by caching data to a greater extent than a native XML database. The solution of updating XML documents via an ORDB can solve some problems in existing update methods as follows. Firstly, the preservation of XML constraints is handled by the ORDB engine. Secondly, non-redundant data is stored in linked XML documents; thus the problem of data inconsistency and low performance caused by data redundancy are solved. Thirdly, joins of XML documents are converted to joins of tables in SQL. Fourthly, fields or tables involved in regular path expressions can be tackled in a short time by using mapping data. Finally, a recursive function is translated into SQL commands equipped with a programming capability.

Item Type: Thesis (Doctoral)
Subjects: G400 Computer Science
G500 Information Systems
Department: Faculties > Engineering and Environment > Computer and Information Sciences
University Services > Graduate School > Doctor of Philosophy
Related URLs:
Depositing User: EPrint Services
Date Deposited: 24 May 2010 10:42
Last Modified: 11 Oct 2022 09:30
URI: https://nrl.northumbria.ac.uk/id/eprint/3422

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics