Using Object-Relational Database Systems and XML in the Context of Mobile Environments
- 格式:pdf
- 大小:100.20 KB
- 文档页数:5
关系模型及对象关系数据库关系模型简介关系模型是一种用于表示和操作数据的方法,它基于关系代数和谓词逻辑。
关系模型将数据组织为表格形式的关系,其中每个表格称为关系。
关系模型使用属性和实体之间的关系来描述数据。
关系模型的基本概念包括实体、关系、属性和约束。
实体是现实世界中的一个对象,可以是一个人、一本书或者一辆汽车等。
关系是由实体之间的关系组成的二维表格。
属性是关系中的列,表示实体的某个特征。
约束是对关系的限制条件,例如主键、外键和唯一约束等。
关系模型的优势包括数据的易于理解、数据的一致性和数据的独立性。
通过关系模型,可以方便地对数据进行查询、修改和删除操作。
此外,关系模型还支持数据的完整性约束和安全性控制。
对象关系数据库(ORDB)对象关系数据库(Object-Relational Database,ORDB)是关系数据库的一种扩展,它将面向对象的特性引入关系模型中。
ORDB允许在关系数据库中存储和查询复杂的对象,包括类、继承、多态和封装等概念。
ORDB的核心思想是将实体映射为数据库中的对象,每个对象包含属性和方法。
通过对象的继承和多态性,可以实现更灵活的数据建模和查询。
ORDB还支持编程语言中的类与关系数据库中的关系之间的映射,使得对象的操作更加方便和灵活。
ORDB的优势包括更好的数据建模能力、更高的查询灵活性和更好的数据封装性。
通过ORDB,可以将面向对象的程序和关系数据库无缝集成,提高了系统的开发效率和易用性。
关系模型 vs. 对象关系数据库关系模型和对象关系数据库在数据建模和查询方面有一些不同之处。
数据建模关系模型采用二维表格的形式展示数据,每个实体对应一个关系,每个属性对应一个字段。
关系模型适用于简单的数据结构,数据之间的关系通过外键来表示。
对象关系数据库在关系模型的基础上引入了面向对象的概念,可以更灵活地表示和查询数据。
对象关系数据库支持继承、多态和封装等特性,可以更好地建模复杂的数据结构。
面向对象的数据库设计与实现随着信息时代的到来,各种数据库系统的应用如雨后春笋般涌现出来。
面向对象的数据库是一种新型的数据库,它具备了面向对象编程语言的特点,并将面向对象的技术应用在数据库设计中,为开发人员带来了更加方便、简洁、高效的编程方式。
本文将介绍面向对象的数据库设计与实现。
一、面向对象的数据库设计面向对象的数据库设计是一种以对象为中心的数据模型,它将数据存储在一个对象库中。
与传统的关系型数据库相比,面向对象的数据库设计更能够反映出现实中的复杂对象关系。
在面向对象的数据库设计中,需要对对象进行良好的分类,找出其中的关系,并建立对象之间的联系。
因此,正确地分类和建立对象的联系是面向对象数据库设计过程中最为关键的步骤。
在面向对象的数据库设计中,我们需要先定义对象的属性和方法,由此建立对象间的联系。
属性可以是类似于关系数据库中的字段,而只要是对象内部的数据,就可以定义为属性。
方法就相当于面向对象程序中的函数,在调用方法时可以执行相应的操作。
在建立一个对象之前,需要采取的关键步骤是确定对象的属性和方法。
例如,在一个银行账户系统中,我们可以定义一个账户对象,它有账户号、姓名、余额等属性,有存款、取款、转账等方法。
在确立好对象的属性和方法之后,就可以建立相应的类,以及构建对象之间的关系,构成面向对象的数据库。
二、面向对象的数据库实现面向对象的数据库实现主要有两种方式:关系映射(Object-Relational Mapping, ORM)和数据库缓存(Object Database Management System,ODMS)。
ORM是一种通过映射数据库关系的方式将Java对象存储到关系型数据库中。
ORM技术将领域对象映射到关系型数据表上,使得开发人员能够像访问Java对象一样访问关系型数据库中的数据。
ORM技术的优点在于Java程序员不再需要编写SQL代码,这样大大降低了模块之间的耦合性。
ODMS技术是一种以对象作为数据储存的技术,它消除了传统的关系数据库中的表之间的联系。
central processing unit 中央处理单元plug and play即插即用electronic commerce电子商务artificial intelligence人工智能URL(Uniform Resource Locator)网址software life cycle软件生命周期e-journal 电子期刊FAQ(Frequently Asked Question)常问问题network security 网络安全objective-oriented 以目标为导向Trojan horse 木马ink cartridge 墨盒object linking and embedding (OLE)对象链接和嵌入procedural language 程序语言Windows socket 窗口操作系统插座character set字集cost effectiveness成本效益run a computer program 运行的计算机程序event-driven programming 事件驱动编程scroll bar 滚动条analogue computer模拟计算机polymorphism多态性rotating shafts and gears 旋转轴和齿轮numerical approximation 数值逼近equation 方程middleware中间件integrated circuit 集成电路transistor 晶体管fabricate 编造scalability可扩展性silicon substrate 硅衬底throughput吞吐量operating instruction 操作指令microminiaturization 微型化superconductivity超导parser分析器convergence收敛Data Modification Language (DML)数据修改语言Distributed Component Object Model (DCOM)分布式组件对象模型electronic bulletin board电子公告板Random Access Memory (RAM)随机存取存储器assignment statement赋值语句relational database关系型数据库data access数据访问bootstrap sector引导扇区system expandability系统的可扩展性entity relationship diagram实体关系图permanent table永久表clustered index聚集索引synchronization同步file transfer protocol (FTP)文件传输协议load balancing负载均衡data encryption数据加密graphical user interface (GUI) 图形用户界面universal serial bus (USB)通用串行总线OSI(Open System Interconnection)开放系统互连virtual reality虚拟现实liquid crystal display (LCD) 液晶显示器database management system (DBMS)数据库管理系统development tool开发工具peer-to-peer network对等网络prototype原型赋值语句Assignment statement非易失性 Nonvolatile(NV)关系数据库Relational Database数据存取Data Access外围设备Peripheral Equipment系统可扩展性 System scalability多核处理器Multi-core processors实体关系图Entity Relationship Diagram(ERD)派生类 Derived class数据修改语言 Data Modification Language神经网络Neural Network永久(固定)表Permanent (fixed) Table簇索引Clustered index体系结构框架 Architecture Framework分布式组件对象模型 Distributed Component Object Model对象关系数据库Object-Relational Database电子公告板Electronic bulletin board随机存储器Random Access Memory(RAM)网络适配器 Network Adapter文件传输协议File Transfer Protocol(TCP)负载平衡Load Balancing嵌入式操作系统 Embedded operating system图形用户接口Graphical User Interface通用串行总线Universal Serial Bus(USB)Open Systems Interconnection Reference Model 开放式系统互联参考模型液晶显示器LCD display数字用户线路 Digital Subscriber Line(DSL)数据库管理系统Database Management Systems开发工具 Development Tools元数据 Metadata对等网络 Peer network谓词演算Predicate calculus。
dmbs的分类
DBMS主要通过数据的保存格式(数据库的种类)来进行分类,
现阶段主要有以下4种类型:
1. 层次数据库(Hierarchical Database,HDB):这是最古老
的数据库之一,它把数据通过层次结构(树形结构)的方式表现出来。
层次数据库曾经是数据库的主流,但随着关系数据库的出现和普及,现在已经很少使用了。
2. 关系数据库系统(Relational Database System,RDBS):
这是目前主流的关系数据库,包括Oracle、Db、Sybase、Microsoft SQL Server、Microsoft Access、MySQL等。
3. 面向对象数据库系统(Object-Oriented Database System,OODBS):这种数据库系统支持以对象形式对数据建模,包括对对象
的类、类属性的继承和子类的支持。
4. 对象关系数据库系统(Object-Oriented Relational Database System,ORDBS):在传统的关系数据模型基础上提供元组、数组、集合等更为丰富的数据类型以及处理新的数据类型操作的能力,这样形成的数据模型被称为“对象关系数据模型”,基于对象关系数据模型的DBS称为对象关系数据库系统。
Database Management Systems( 3th Edition ),Wiley ,2004, 5-12A introduction to Database Management SystemRaghu RamakrishnanA database (sometimes spelled data base) is also called an electronic database , referring to any collection of data, or information, that is specially organized for rapid search and retrieval by a computer. Databases are structured to facilitate the storage, retrieval , modification, and deletion of data in conjunction with various data-processing operations .Databases can be stored on magnetic disk or tape, optical disk, or some other secondary storage device.A database consists of a file or a set of files. The information in these files may be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage , and each field typically contains information pertaining to one aspect or attribute of the entity described by the database . Using keywords and various sorting commands, users can rapidly search , rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregate of data.Complex data relationships and linkages may be found in all but the simplest databases .The system software package that handles the difficult tasks associated with creating ,accessing, and maintaining database records is called a database management system(DBMS).The programs in a DBMS package establish an interface between the database itself and the users of the database.. (These users may be applications programmers, managers and others with information needs, and various OS programs.)A DBMS can organize, process, and present selected data elements form the database. This capability enables decision makers to search, probe, and query database contents in order to extract answers to nonrecurring and unplanned questions that aren’t available in regular reports. These questions might initially be vague and/or poorly defined ,but people can “browse” through the database until they have the needed information. In short, the DBMS will “manage” the stored data items and assemble the needed items from the common database in response to the queries of those who aren’t programmers.A database management system (DBMS) is composed of three major parts:(1)a storage subsystem that stores and retrieves data in files;(2) a modeling and manipulation subsystem thatprovides the means with which to organize the data and to add , delete, maintain, and update the data;(3)and an interface between the DBMS and its users. Several major trends are emerging that enhance the value and usefulness of database management systems;Managers: who require more up-to-data information to make effective decisionCustomers: who demand increasingly sophisticated information services and more current information about the status of their orders, invoices, and accounts.Users: who find that they can develop custom applications with database systems in a fraction of the time it takes to use traditional programming languages.Organizations : that discover information has a strategic value; they utilize their database systems to gain an edge over their competitors.The Database ModelA data model describes a way to structure and manipulate the data in a database. The structural part of the model specifies how data should be represented(such as tree, tables, and so on ).The manipulative part of the model specifies the operation with which to add, delete, display, maintain, print, search, select, sort and update the data.Hierarchical ModelThe first database management systems used a hierarchical model-that is-they arranged records into a tree structure. Some records are root records and all others have unique parent records. The structure of the tree is designed to reflect the order in which the data will be used that is ,the record at the root of a tree will be accessed first, then records one level below the root ,and so on.The hierarchical model was developed because hierarchical relationships are commonly found in business applications. As you have known, an organization char often describes a hierarchical relationship: top management is at the highest level, middle management at lower levels, and operational employees at the lowest levels. Note that within a strict hierarchy, each level of management may have many employees or levels of employees beneath it, but each employee has only one manager. Hierarchical data are characterized by this one-to-many relationship among data.In the hierarchical approach, each relationship must be explicitly defined when the database is created. Each record in a hierarchical database can contain only one key field and only one relationship is allowed between any two fields. This can create a problem because data do notalways conform to such a strict hierarchy.Relational ModelA major breakthrough in database research occurred in 1970 when E. F. Codd proposed a fundamentally different approach to database management called relational model ,which uses a table as its data structure.The relational database is the most widely used database structure. Data is organized into related tables. Each table is made up of rows called and columns called fields. Each record contains fields of data about some specific item. For example, in a table containing information on employees, a record would contain fields of data such as a person’s last name ,first name ,and street address.Structured query language(SQL)is a query language for manipulating data in a relational database .It is nonprocedural or declarative, in which the user need only specify an English-like description that specifies the operation and the described record or combination of records. A query optimizer translates the description into a procedure to perform the database manipulation. Network ModelThe network model creates relationships among data through a linked-list structure in which subordinate records can be linked to more than one parent record. This approach combines records with links, which are called pointers. The pointers are addresses that indicate the location of a record. With the network approach, a subordinate record can be linked to a key record and at the same time itself be a key record linked to other sets of subordinate records. The network mode historically has had a performance advantage over other database models. Today , such performance characteristics are only important in high-volume ,high-speed transaction processing such as automatic teller machine networks or airline reservation system.Both hierarchical and network databases are application specific. If a new application is developed ,maintaining the consistency of databases in different applications can be very difficult. For example, suppose a new pension application is developed .The data are the same, but a new database must be created.Object ModelThe newest approach to database management uses an object model , in which records are represented by entities called objects that can both store data and provide methods or procedures to perform specific tasks.The query language used for the object model is the same object-oriented programming language used to develop the database application .This can create problems because there is no simple , uniform query language such as SQL . The object model is relatively new, and only a few examples of object-oriented database exist. It has attracted attention because developers who choose an object-oriented programming language want a database based on an object-oriented model.Distributed DatabaseSimilarly , a distributed database is one in which different parts of the database reside on physically separated computers . One goal of distributed databases is the access of information without regard to where the data might be stored. Keeping in mind that once the users and their data are separated , the communication and networking concepts come into play .Distributed databases require software that resides partially in the larger computer. This software bridges the gap between personal and large computers and resolves the problems of incompatible data formats. Ideally, it would make the mainframe databases appear to be large libraries of information, with most of the processing accomplished on the personal computer.A drawback to some distributed systems is that they are often based on what is called a mainframe-entire model , in which the larger host computer is seen as the master and the terminal or personal computer is seen as a slave. There are some advantages to this approach . With databases under centralized control , many of the problems of data integrity that we mentioned earlier are solved . But today’s personal computers, departmental computers, and distributed processing require computers and their applications to communicate with each other on a more equal or peer-to-peer basis. In a database, the client/server model provides the framework for distributing databases.One way to take advantage of many connected computers running database applications is to distribute the application into cooperating parts that are independent of one anther. A client is an end user or computer program that requests resources across a network. A server is a computer running software that fulfills those requests across a network . When the resources are data in a database ,the client/server model provides the framework for distributing database.A file serve is software that provides access to files across a network. A dedicated file server is a single computer dedicated to being a file server. This is useful ,for example ,if the files are large and require fast access .In such cases, a minicomputer or mainframe would be used as a fileserver. A distributed file server spreads the files around on individual computers instead of placing them on one dedicated computer.Advantages of the latter server include the ability to store and retrieve files on other computers and the elimination of duplicate files on each computer. A major disadvantage , however, is that individual read/write requests are being moved across the network and problems can arise when updating files. Suppose a user requests a record from a file and changes it while another user requests the same record and changes it too. The solution to this problems called record locking, which means that the first request makes others requests wait until the first request is satisfied . Other users may be able to read the record, but they will not be able to change it .A database server is software that services requests to a database across a network. For example, suppose a user types in a query for data on his or her personal computer . If the application is designed with the client/server model in mind ,the query language part on the personal computer simple sends the query across the network to the database server and requests to be notified when the data are found.Examples of distributed database systems can be found in the engineering world. Sun’s Network Filing System(NFS),for example, is used in computer-aided engineering applications to distribute data among the hard disks in a network of Sun workstation.Distributing databases is an evolutionary step because it is logical that data should exist at the location where they are being used . Departmental computers within a large corporation ,for example, should have data reside locally , yet those data should be accessible by authorized corporate management when they want to consolidate departmental data . DBMS software will protect the security and integrity of the database , and the distributed database will appear to its users as no different from the non-distributed database .Database Management Systems( 3th Edition ),Wiley ,2004, 5-12数据库管理系统的介绍Raghu Ramakrishnan数据库(database,有时拼作data base)又称为电子数据库,是专门组织起来的一组数据或信息,其目的是为了便于计算机快速查询及检索。
sql所用的sql语言版本SQL(Structured Query Language)是一种用于管理关系型数据库的标准化语言。
它有多个版本,其中最常见的版本包括:1. SQL-86,这是最早的SQL标准,于1986年制定。
它定义了基本的SQL语法和功能,但很快就被后续的版本所取代。
2. SQL-89,在1989年进行了一些修订,增加了一些功能和特性,但并没有引入重大变化。
3. SQL-92,也被称为SQL2,这是SQL语言的重大更新版本,增加了许多新的功能,包括联接(JOIN)、子查询(Subquery)和存储过程(Stored Procedures)等。
许多数据库系统仍然基于SQL-92标准。
4. SQL:1999,这个版本引入了一些重大的改进,包括支持对象关系型数据库(Object-Relational Database)和XML处理能力。
5. SQL:2003,在2003年发布,继续扩展了SQL的功能,增加了对窗口函数(Window Functions)和递归查询(RecursiveQueries)的支持。
6. SQL:2008,这个版本进一步扩展了SQL的功能,引入了一些新的数据类型和操作符。
7. SQL:2011,在这个版本中,增加了对序列(Sequence)的支持,以及一些其他的改进。
8. SQL:2016,这个版本引入了JSON支持、行模式(Row Pattern Matching)和动态SQL等新功能。
9. SQL:2019,最新的SQL标准,引入了对SQL的扩展和改进,包括对大数据和机器学习的支持。
不同的数据库管理系统实现了不同版本的SQL标准,因此在实际应用中可能会有一些差异。
但总体来说,SQL是一种功能强大且不断发展的数据库查询语言。
THE RD-TREE:AN INDEX STRUCTURE FOR SETSJoseph M. HellersteinUniversity of Wisconsin, MadisonAvi PfefferUniversity of California, BerkeleyAbstractThe implementation of complex types in Object-Relational database systems requires the development of efficient access methods.In this paper we describe the RD-Tree, an index structure for set-valued attributes. The RD-Tree is an adaptation of the R-Tree that exploits a natural analogy between spatial objects and sets. A particular engineering difficulty arises in representing the keys in an RD-Tree. We propose several different representations, and describe the tradeoffs of using each.An implementation and validation of this work is underway in the SHORE object repository.1. INTRODUCTIONTraditional relational database systems (RDBMSs) have excellent query processing capabilities, but suffer from a rigid and semantically impoverished data model.Work in Object Oriented databases (OODBMSs) has stressed the need for a richer data model that allows complex types. Among the requirements of this model is sup-port for set-valued attributes,which are record elements of type set-of-x, where x is some type known to the sys-tem. These sets naturally occur in association with single objects, as, for example, the set of courses taken by a stu-dent, or the set of keywords in a document.Much work has been done in carefully defining data models and languages for OODBMSs, but less attention has been paid to actually processing queries in such a system.There are several commercial OODBMS products that have limited or non-existent query processing and optimization facilities.Object-Relational database systems (O-R systems) such as Postgres, Starburst and the commercial products UniSQL and Illustra [STON93], attempt to combine the richness of the OODBMS data model with the query pro-cessing performance of RDBMSs.In order to achieve this they require efficient support for manipulation of com-plex objects. In particular,they must be able to evaluate predicates involving set-valued attributes efficiently.Natu-ral queries on sets are not well supported in current O-R systems, because there are no efficient access methods for set valued attributes.In this paper we describe the RD-Tree, an index structure for sets.The RD-Tree is a variant of the R-Tree, a popular access method for spatial data [GUTT84].RD stands for "Russian Doll", which describes the transitive con-tainment relation that is fundamental to the tree structure.We discuss the engineering issues involved in represent-ing the keys in an RD-Tree, and propose several representations.RD-Trees have been implemented in Illustra and in the SHORE object repository,and we plan extensive tests to demonstrate the validity of this approach and evaluate the various key representations.1.1. SAMPLE QUERIESTo illustrate the types of queries that involve set predicates, we consider a sample database with two class defi-nitions:STUDENT = [name:text,passed:set of COURSE]COURSE = [name:text,department:text,number:int,prerequisites:setof COURSE]The predicates we want to evaluate on this database can be divided into several categories:1)superset predicatesselect from STUDENTwhere STUDENT.passed⊇{CS186, CS162}†This query selects all students who have passed CS186 and CS162.In the predicate of this query we are searching for supersets of a given set. An RD-Tree on the STUDENT.passed attribute will greatly facilitate evaluation of this predicate.2)subset predicatesselect , COURSE.department, COURSE.numberfrom COURSEwhere COURSE.prerequisites⊆{CS186, CS182}This query selects all courses that can be taken by a student who has passed CS186 and CS182.In the predicate of this query we are searching for subsets of a given set. Unfortunately the RD-Tree does not work well on this type of predicate. However, an inv e rted RD-Tree as described in section 4 can be used to evaluate subset predicates.3)overlap predicatesselect from STUDENTwhere STUDENT.passed∩{CS150, CS186, CS162}≥2This query selects all students who have passed at least two of CS150, CS186 and CS162.RD-Trees are effective for overlap queries.If the degree of overlap required is equal to the cardinality of the given set, the predicate is equivalent to a superset predicate.4) joinsselect , from STUDENT,COURSEwhere COURSE.prerequisites⊆STUDENT.passed†This is a minor abuse of notation.Sets of courses are normally represented by sets of object ids of course tuples.This query selects all (STUDENT,COURSE) pairs where the student has satisfied the prerequisites for the course. It can be executed as a nested-loop join where the outer relation is COURSE and the inner relation is STUDENT. The join then becomes a series of superset predicates for which an RD-Tree index on STUDENT.passed can be used.1.2. RELATED WORKThere has been a fair amount of work done on access methods for nested attributes. Bertino and Kim [BERT89] proposed three indexing mechanisms for complex objects: the nested index, path index and multiindex. However, these access methods are not designed to support efficient evaluation of set predicates.Ishikawa et al.[ISHI93] examined the use of signature file techniques for testing set inclusion.They provide a probabilistic algorithm that attempts to quickly determine whether one set is a subset of another.In some cases the algorithm will return "false", in others it will return "don’t know" and other methods must be used for determining the answer.Thus signature files provide a useful filter for restricting set predicates.However, they are not an index-ing method, as the entire signature file must be scanned when evaluating a predicate.Signature file techniques and RD-Trees can coexist. Signatures provide a way to quickly resolve a compari-son of two specific sets, while the RD-Tree is an access method that guides the DBMS to the appropriate data.The RD-Tree can use signatures to perform individual set comparisons.Inverted files are a popular technique for single-element lookups in a collection of set-valued attributes. See, for example, [BROW94]. Inverted files can be extended to search for supersets of a given set S with n elements, by performing a single-element lookup for each element in S and then taking the n-way intersection of the results.This can become inefficient if n is large, especially if the n result sets are large.2. THE RD-TREE STRUCTUREThe structure of an RD-Tree is similar to that of an R-Tree. Leaf nodes in an R-Tree contain entries of the form (data object,bounding box). The bounding box is the smallest n-dimensional rectangle that contains the data object. Non-leaf nodes in an R-Tree contain entries of the form (child pointer,bounding box). In this case the bounding box is the smallest n-dimensional rectangle that contains all the bounding boxes of the entries in the child node.Any data object indexed by the R-Tree is contained in its own bounding box and in the bounding box of all its ancestors in the tree.This transitive containment relation is at the heart of the R-Tree structure.It allows entire branches of the tree to be rejected when searching for a particular region in space.RD-Trees rely on a similar transi-tive containment relation, the set inclusion relation.We refer to the sets being indexed by the RD-Tree as base sets,and the elements which comprise them as base elements.The set of all base elements is the universe.Every base set has a bounding set,which is the small-est set containing the base set that satisfies certain properties.This is analogous to the idea that the bounding box of a polygon is a rectangle, which is a polygon with special properties.The particular properties that the bounding set must satisfy depends on the choice of representation of keys, to be discussed in section 3.Leaf nodes in an RD-Tree contain entries of the form (base set,bounding set). Non-leaf nodes contain entries of the form (child pointer,bounding set). The bounding set of a non-leaf node entry must contain all thebounding sets in the child node.Thus it is the bounding set of the union of the bounding sets in the child node. Each base set is contained in its bounding set, and each bounding set is contained in the bounding set of its parent, so the transitive set inclusion relation that is crucial to the RD-Tree structure exists.To find all base sets which are supersets of a given set, begin at the root node of the RD-Tree. On any non-leaf node, examine the bounding set of each entry.If the bounding set is not a superset of the given set, then none of the base sets descended from it can be either,so the branch of the tree rooted at that entry can be discarded.If the bounding set is a superset of the given set, then its child node must be examined. On a leaf node, the base sets can be directly examined, but in some cases it may be advantageous to examine the bounding sets first.Overlap searches are performed similarly.If the bounding set of a non-leaf entry does not overlap with the given set by the required amount, then the entire branch descended from that entry can be discarded from the search. If the bounding set of the entry being examined does overlap by the given amount, then its child node must be searched.When an object is inserted into an R-Tree, the position of the object in the tree must be determined.On each non-leaf node, the insertion algorithm examines all the bounding boxes and chooses the one whose bounding box needs least enlargement to include the new object. An alternative heuristic would be to choose the rectangle with the largest overlap with the new object. The analogous heuristic in the RD-Tree is to choose the bounding set whose intersection with the set to be inserted has the greatest cardinality.Similarly,the algorithms to split a node in an R-Tree involve heuristics to make the two new nodes as disjoint as possible and minimize the areas of their bounding boxes. Analogous heuristics exist for the RD-Tree. It is desir-able to place two entries whose bounding boxes have large intersection in the same node, and entries whose bound-ing boxes have little intersection in different nodes.The quadratic-cost R-Tree node splitting algorithm has a direct analog in RD-Trees. The linear-cost algorithm relies on the geometrical concepts of "high" and "low" and is not immediately generalizable to RD-Trees. However, for some representations of bounding sets described below, a linear-cost splitting algorithm can be used.2.1. AN EXAMPLETo illustrate how an RD-Tree can be used to index sets, consider an example with six sets containing integers from 0 to 9:S1 = {1, 2, 3, 5, 6, 9}S2 = {1, 2, 5}S3 = {0, 5, 6, 9}S4 = {1, 4, 5, 8}S5 = {0, 9}S6 = {3, 5, 6, 7, 8}S7 = {4, 7, 9}In this simple example, we define the bounding box of a set to be the set itself, and use an RD-Tree with up to two entries in each node.In practice RD-Trees would have a much larger number of entries in a node.The RD-Tree index to these sets is shown in figure 1.{1,2,3,5,6,9}{1,2,5}S1S2{0,9}{0,5,6,9}S5S3{1,4,5,8}S4{3,5,6,7,8}S6{0,1,2,3,5,6,9}{0,5,6,9}{1,2,3,5,6,9}{1,3,4,5,6,7,8}{1,3,4,5,6,7,8,9}{4,7,9}{4,7,9}S7Figure 1.A sample RD-TreeA search for all supersets of {2, 9} will begin at the root of the tree.Since {1, 3, 4, 5, 6, 7, 8, 9} is not a superset of {2, 9}, the right subtree is discarded.{0, 1, 2, 3, 5, 6, 9} is a superset of {2, 9} so the left child of the root is examined next. {0,5, 6, 9} is not a superset of {2, 9}, so the left subtree is rejected, but {1, 2, 3, 5, 6, 9} is a superset, so its child is examined. This is a leaf node, so the base sets can be examined directly,rev e aling the answer S1.Insertion of the set S8 = {2, 4, 8, 9} would begin by choosing the right subtree at the root, because {2, 4, 8, 9}has three elements in common with {1, 3, 4, 5, 6, 7, 8, 9} and only two with {0, 1, 2, 3, 5, 6, 9}.Descending to the right child, we find that {2, 4, 8, 9} has two elements in common with both {1, 3, 4, 5, 6, 7, 8} and {4, 7, 9}.Since {4, 7, 9} is smaller,we choose the right branch again. Finally,the child node is a leaf, and S8 is inserted into the empty space.2.2. STORING V ARIABLE LENGTH KEYSA technical issue that must be addressed when implementing RD-Trees is the fact that the keys describing the bounding sets may not all be the same size.In an R-Tree the bounding box is specified by a fixed number of n -dimensional points, so the keys are all the same size.However, some of the bounding set representations described in section 3 allow variable length keys.This problem complicates the RD-Tree in several ways. First of all, the maximum number of keys that will fit on a page becomes variable, making analysis and tuning more difficult. A more serious difficulty is that an entry in a non-leaf node may change size whenever a base set is inserted into or deleted from one of its descendants.This may require that the entry be moved, and in some cases it will no longer fit in its node.When this situation occurs, we remove the growing entry from the node it is currently in, and split the node.It is then unclear into which new node the growing entry should be inserted.Rather than attempt to calculate which node to use, we take advantage of a feature of R*-Trees [BECK90] which is available in SHORE.R*-Trees allow forced reinserts:that is, any entry can be inserted at any lev e l of the tree.The tree chooses which node at that level will be used.3. REPRESENTATION OF KEYSThe keys in an RD-Tree describe the bounding sets of entries.The precise definition of the bounding set depends on the choice of representation of the keys. A good representation will satisfy several criteria:size A good key will be small, so as to allow as many entries as possible in a node.This increases the fanout of the tree and reduces its height.completenessA good key will represent a set as completely as possible.In other words, the bounding set that it describesshould contain as few elements as possible that are not in the set being bounded.The lossiness of a represen-tation is a measure of the "noise-to-signal" ratio of the representation.It is defined by|bounding set−bounded set|⋅|bounding set|This is the probability that a single element will be in the bounding set but not in the bounded set.During a search for supersets of a set containing a single element, a branch of the tree will be selected for searching if the bounding set of the root of the branch contains the element.The lossiness of the representation indicates the probability that the branch is being searched unnecessarily.In a search for supersets of a set of cardinality n,the probability that the search of a branch is unnecessary equals the n-th power of the lossiness.Therefore lossiness is less of a factor when searching for supersets of large sets.computation costA good key will allow efficient computation of the set inclusion and intersection operations.These operationsare performed many times during a search and are critical to the performance of the RD-Tree.3.1. COMPLETE REPRESENTATIONThe first class of representations considered define the bounding set of a set to be exactly the same as the set itself. These representations have zero lossiness.The disadvantage of these representations is that they are either too large to be practical, or make computation of the basic set operations very expensive.The simplest approach is to directly represent the sets in the RD-Tree nodes.A set will typically be a list of simple objects if the base elements are of simple type, or a list of object ids if the base elements are complex objects. The list may be sorted to improve efficiency of computing the set operations.This approach is impractical in all but the simplest databases, as the keys quickly become large. Even if the base sets are all small, their union may be large, and keys in higher levels of the tree represent unions of many sets. As the key size grows, fanout decreases, and when a key becomes larger than half a page, the index can no longer function.A solution to this problem is to store keys externally to the RD-Tree, and to store a pointer to each entry’s key in the tree.This may be combined with the previous approach so that small keys are stored directly in the tree, while large ones are referred to by pointer.While this does guarantee high fanout, it makes computation of set operations expensive.As keys become larger than half a page, every key comparison will require reading a page from disk.If the universe of base elements is fixed and of small size, the sets can be represented by a bitmap.This has the advantages that keys hav e afixed size no matter how large the set they represent, and that set computations are cheap bitwise operations.The universe must be small enough for several bitmaps to fit on each page.Also, there must be a guarantee that new elements will not be added to the universe after the tree has been created.If these con-ditions hold this is a very practical approach.3.2. BLOOM FILTERS AND SIGNATURESA variation on the bitmap approach represents the keys by a bit vector that is smaller than the size of the uni-verse. In addition, new elements may be added to the universe at any time. This approach has the same advantages as the bitmap approach:fixed size keys and cheap computations.However, a degree of lossiness is introduced into the representation by the fact that individual bits no longer represent unique elements.A set can be represented by a Bloom filter.This is a vector in which all bits are initially set to zero.Every element in the set is hashed to a particular bit in the filter,which is then set to one.Elements that hash to the same position cannot be differentiated from each other.Thus each bit represents all the elements that hash to that bit, and the lossiness of the Bloom filter is equal to1−size of bit vector cardinality of universe⋅If the universe is significantly larger than the bit vector,the lossiness will be close to 1.Therefore this approach is only effective for queries that search for supersets of fairly large sets.A more sophisticated technique is to use signatures as described in [ISHI93].Every element has a signature, which is a pattern of bits in the bitmap that are set when that element is present.The signature of a set is a bitwise or of the signatures of the elements of the set.Since elements are mapped to collections of bits rather than individ-ual bits, each element can have a unique signature even in a universe much larger than the size of the bitmap.There-fore the lossiness of this representation is much lower than the lossiness of Bloom filters.Nevertheless, a small amount of lossiness still exists due to the fact that the combined signature of all elements in a set may also include the signature of other elements.3.3. RANGESETSAn alternative representation of sets that allows efficient processing of set operations is the rangeset [HELL93]. A range is an ordered pair of integers(a,b)where a≤b,that represents the set of integers x such that a≤x≤b.A rangeset is an ordered list of disjoint ranges((a1,b1),...,(a n,b n))such that b i<a j whenever i<j.Itrepresents the union of the sets represented by the ranges.Any set of integers can be represented by a rangeset.For example, the sets of section 2.1 would be repre-sented asS1 = ((1, 3), (5, 6), (9, 9))S2 = ((1, 2), (5, 5))S3 = ((0, 0) (5, 6) (9, 9))S4 = ((1, 1) (4, 5) (8, 8))S5 = ((0, 0) (9, 9))S6 = ((3, 3), (5, 8))S7 = ((4, 4), (7, 7) (9, 9))A set of integers may also be approximated by a rangeset consisting of fewer ranges.For example, S1 may be approximated by ((1,6), (9, 9)),and S6 may be approximated by ((3,8)).A rangeset representation of RD-Tree keys would fix a maximum number of ranges n allowed in a rangeset describing a key.The bounding set of a set S is the smallest set containing S that can be described by a rangeset containing n or fewer ranges.Given any set S and a maximum number of ranges n,the bounding set of S can be computed by the algorithm described in the appendix.If the universe consists of objects of some type other than integer,elements in the universe can be mapped to unique integers in the range (1,maxid), where maxid is the cardinality of the universe. A procedure for performing this mapping is described in [HELL93].Once this mapping has been performed, sets in the universe can be repre-sented by rangesets of integers.The lossiness of a rangeset representation depends on the correlation factor of the universe, and on the choice of mapping of base elements to the integers. The correlation factor is the degree to which elements in the universe associate into groups.In other words, it describes the degree to which the presence of one element in a set influences the probability that another element will be present.Elements which are closely correlated should be mapped to integers that are close together.Ranges of integers will then represent groups of elements that commonly occur together.Sets will naturally cluster into ranges so that they can be well approximated by rangesets.We are currently investigating ways of choosing an effective mapping that takes advantage of high correlation.The choice of the maximum number of ranges allowed in the rangesets describing bounding sets involves a tradeoff between decreasing lossiness and increasing fanout. Allowing more ranges reduces lossiness by making the bounding set approximate the bounded set more closely.Howev e r, it also increases the size of the key,thereby reducing fanout. The optimal key-size will vary from key to key,as some sets can be approximated closely by a small number of ranges whereas others cannot.We are studying the use of both fixed-size and variable-size range-sets in RD-Trees.3.4. COMBINED REPRESENTATIONSEach representation has its advantages and disadvantages. The best representation for a key may vary from place to place within an RD-Tree. For example, sets on leaf nodes are likely to be small while sets near the root are likely to be large. Small sets may be best represented directly,while large ones are probably best represented by one of the approximations described above.In addition, some of the representation methods have parameters that can be tweaked to different values at dif-ferent places in the tree.One example is the maximum number of ranges allowed in a rangeset, as described above. Another is the size of the bit vector used by the signature representation.The lossiness of a signature increases with the cardinality of the set being represented, and decreases with the size of the bit vector.Therefore smaller bit vec-tors may be more appropriate at lower levels of the tree, where sets being represented are smaller.An RD-Tree that combines different representations may prove to be more efficient than one that uses a single representation. The granularity of variation of representation may be the individual entry,the node, or the tree level. Whatever the granularity,a record must be kept at appropriate points in the tree indicating the representation being used and the values of the parameters.Another option is to store more than one representation for each key.A key for an entry could consist of a hint, in the form of one of the approximate bounding sets described above,together with a pointer to the complete set description.When performing a search on some predicate, first the predicate is evaluated for the bounding set. If the answer is negative,the entry can be rejected.If it is positive,the answer is checked against the complete set description. If the answer is still positive,the subtree rooted at that entry is searched.If it is now neg a tive,an unnecessary search of the subtree has been avoided.4. INVERTED RD-TREESAs mentioned above,RD-Trees do not perform well on subset predicates.If the bounding set of an entry is a subset of a given set, then all the base sets reached via that entry are also subsets.However, the branch of the tree rooted at that entry must still be explored in order to reach those base sets.If the bounding set is not a subset of the given set, the branch cannot be rejected, because it is possible that one of its descendants is a subset.The inverted RD-Tree allows subset predicates to be evaluated. Non-leaf nodes in an inverted RD-Tree con-tain entries of the form (child pointer,key), where key is the intersection of all keys in the child node.Thus the transitive containment relation is inverted, as parent keys are contained in the keys of all their children.If the key of any entry is not the subset of a given set, then none of its children can be either,and the branch of the tree rooted at that entry can be rejected from the search.An example of an inverted RD-Tree indexing the sets of section 3.1 is shown in figure 2.A combined normal/inverted RTree could also be constructed, by keeping both the union and intersection keys for each pointer.Unless the keys are stored externally,the combined RD-Tree would have lower fanout due to larger key size.For both inverted and combined RD-Trees, a heuristic must be chosen for splitting tree pages.For the inverted tree, maximum size of intersection is the criterion that will produce the best overlap within a page, but it also maxi-mizes the key size for representations with variable length keys. For the combined tree, minimizing the symmetric set difference (A−B)∪(B−A)appears to be a natural heuristic.Mathematically,an inv e rted RD-Tree is equivalent to an RD-Tree on the complements of the base sets.This is because the complement of the intersection of two sets is the union of the complements of those sets.Theoretically, the inverted RD-Tree should perform as well on subset predicates as an RD-Tree performs on superset predicates.However, in most practical domains, base sets will usually be small compared to the universe. In such situa-tions the intersection of all sets on a node is likely to be very small.The intersections will degenerate to the null set on higher levels of the tree, making the index useless. In domains with high correlation factor,where certain groups{1,2,3,5,6,9}{1,2,5}S2S1{1,4,5,8}{3,5,6,7,8}S6S4{0,9}{0,5,6,9}S5S3{5}{1,2,5}{5,8}{4,7,9}{0,9}{4,7,9}{9}S7Figure 2.An inv e rted RD-Treeof elements are shared by many sets, intersections will degenerate more slowly,making the use of inverted RD-Trees more reasonable.It remains to be seen whether the inverted RD-Tree can be effective in a practical application.5. PROPOSED PERFORMANCE STUDYWe plan a detailed performance study that implements several of the representations described above and ana-lyzes their performance on a variety of data.We will also compare the performance of RD-Trees and inverted RD-Trees to that of traditional set access methods such as signature files.We will test our implementation on synthetic data that varies the following parameters:size of universeav e rage base set sizevariance of base set sizenumber of base elements in universefrequency distribution of base elementscorrelation factorWe will also test the RD-Tree on real undergraduate enrollment data from the University of Wisconsin. Thisdatabase is similar to the one described in section 1.1, and will allow us to perform the queries presented there.6. CONCLUSIONWe hav e proposed and implemented the RD-Tree, an index structure for set-valued attributes. Studies are cur-rently underway to test the effectiveness of this access method.Once the basic validity of our approach has been demonstrated, we will undertake a more detailed analysis to determine the most efficient implementation of RD-Trees in various domains.The performance of an RD-Tree in a domain is likely to depend on the correlation factor of elements in the domain. An interesting direction of future research is an investigation into this concept.Methods are needed to ana-lyze and characterize the correlation factor of a universe, and to determine which elements are correlated with each other.We believe that a deep understanding of these issues will serve to elucidate the structure of collections of sets, and have practical applications for databases that manage such collections.REFERENCES[BROW94] Eric W. Brown, James P.Callan, and W.Bruce Croft, “Fast Incremental Indexing for Full-Te x t Information Retrieval”,Proc. 20th Conference on Very Large Databases,Santiago,Chile, September 1994.[BECK90] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider,and Bernhard Seeger,Bernhard “The R*-tree:An Efficient and Robust Access Method for Points and Rectangles” ,Proc.ACM-SIGMOD International Conference on Management of Data,Atlantic City,N.J., Nay1990.[BERT89] E.Bertino and W.Kim, “Indexing Techniques for Queries on Nested Objects”,IEEE Trans.on Knowledge and Data Engineering1(2):196-214, June 1989.[GUTT84] Antonin Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching”,Proc.ACM-SIGMOD International Conference on Management of Data,Boston, Mass., June1984.[HELL93] Joseph M. Hellerstein “Rangesets:A Data Representation for Quick Query Processing on Nested Sets” , working draft, University of Wisconsin, Madison, May 1993[ISHI93] Yoshiharu Ishikawa,Hiroyuki Kitagaw a,and Nobuo Ohbo, “Evaluation of Signature Files as Set Access Facilities in OODBs”,Proc. ACM-SIGMOD International Conference onManagement of Data,Washington, D.C., May 1993.[STON93] Michael Stonebraker,“The Miro DBMS”,Proc. ACM-SIGMOD International Conference on Management of Data,Washington, D.C., May 1993.APPENDIX: ALGORTIHM FOR COMPUTING RANGESETSProblemGiven a set S of n elements stored in sort-order,reduce it to a rangeset of k<n ranges such that a minimal number of new elements are introduced.。
IBM DB2 UDB产品介绍IBM DB2 UDB产品介绍前言数据库管理系统,尤其是关系型数据库,与IBM数据库有着不可分割的关系。
三十多年来从理论研究到具体的系统实现,IBM数据库研究人员对数据库管理系统的发展作出了极大的贡献。
70年代之前,数据库中的数据结构以层次型(如IBM IMS数据库)及网络型为主。
在这些数据库中记录与记录之间往往存在着指针(pointers)以方便应用程序搜寻有关联的数据。
1970年IBM数据库研究中心的E.F.Codd博士在其论文[Codd70]中提出了关系型的数据库模式。
在这创新的理论中记录与记录的关系建立在它们共享的数值上而非基于隐藏的指针。
数据库的查询因而可以用非过程化(nonprocedural)的语句表达。
Codd同时证明了用一阶谓词逻辑微积分(first-order predicate calculus)等数学理论作为非过程化语句基础的可能性,并进一步地发展了关系微积分(relational calulus[Codd71a])与关系代数(relational algebra[Codd71b]),奠定了关系型数据库日后发展的理论基础。
为此E.F.Codd博士在1981年得到了计算机科学界的最高荣誉奖ACM图灵奖(ACM Turing Award)。
1973年位于美国加州圣荷西市的IBM数据库研究中心开始了一个大的关系型数据库系统研究项目System R[Astrahan 76],探讨并验证在多用户与大量数据下关系型数据库的实际可行性。
System R对关系型数据库的商业化起着关键性的催化作用。
在D.Chamberlin博士的领导下System R的一个研究小组发明了一套比关系微积分与关系代数更适合最终用户使用的非程序化查询语言SQL[Chamberlin74,76,80]。
SQL的设计宗旨是面向最终用户,达到简单,易学,易用。
并且,SQL把早期数据管理系统中各种独立的功能如查询,数据修改,数据定义和控制等整合到一个单一的语言环境内。
Using Object-Relational Database Systems and XML in the Context of Mobile EnvironmentsGuntram FlachComputer Graphics Center Rostock , Joachim-Jungius-Str. 11, D-18059 Rostock, Germanyphone: +49 381 4024 156, Fax: +49 381 446088e-mail: gf@rostock.zgdv.deKeywordsMobile access, Object-relational Databases, XMLAbstractIn this paper, we present an architecture for presentation, access and exchange of information in a mobile computing environment. In the project MOVI, a new approach is developed which enables applications on mobile computers for exchange and visualization of multimedia objects with applications on stationary database servers via the Object Bus.We focus our attention on new strategies for the adaptation and integration of database techniques regarding the management of multimedia data types at the particular conditions of mobile infrastructures. We believe that object-relational database systems e.g. Informix and Oracle8i are ideal repositories on the stationary server side for next generation multimedia applications, especially in inter-networked LAN or WAN environments. Moreover, our approach illustrates the use and benefits of XML technology as a generic gateway to any mobile device.The basic idea of this paper is the integration of object-relational database technology and XML techniques in a Mobility Information Center (MIC) as a framework for mobile environments. The major aim is to optimize the mobile data exchange and presentation by using of media specific queries, data preprocessing and reduction methods within the database system as well as XML/XSL based data transformation and presentation on mobile end systems. All these methods are influenced by contexts like local resources, available communica-tion channels, and user preferences.1 IntroductionThe project ”Mobile Visualization” (M O V I)1 [3, 4] is focused on investigations which shall enable the graphic presentation of scientific and of other, even multimedia data in a mobile environment. The presented architecture is one main part of the project.I n the following, we focus our attention on the work packages multimedia data management and access as well as data exchange. These work packages address the investigation and development of strategies for the efficient exchange of multimedia objects over wireless and wired networks. All strategies consider the dynamically changing resources of mobile end systems and network resources.O ne typical mobile scenario for us is a person accessing very different public or private data stored on several globally distributed stationary data servers (SDS) – the Infoverse – for example from a mobile end system (MES) using wireless networks, like GSM (Global System for Mobile Communication). The information contained in the Infoverse could have multimedia character, e.g. WWW data including audios and videos and data with textual descriptions and images that are managed in different database systems.T herefore we need a suitable architecture which includes the several methods for the access, exchange, reduction and retrieval adapted to the mobile computing environment [10].1This research is supported by the DFG (German Research Association) under contract: UR 61/1-3At present in the field of database research, new architectures and models for multimedia database systems [1, 2] are investigated and designed, especially on the base of the object-relational technology [11]. Such a object-relational database management system (ORDBMS) provides a suitable environment for using and managing multimedia information. Therefore, it must support the various multimedia data types (e.g. images, text, audio, video) in addition to providing facilities for traditional DBMS functions like database definition and creation, data retrieval, data access and organization, data independence, integrity control, data replication and concurrency support. The functions of a multimedia DBMS basically resemble those of a traditional DBMS.In the context of mobile environments, the question is how we can integrate known compression/conversion algorithms into the database system or how we can map the already developed request processing and information retrieval methods to server-side built-in operations. The aim is to use the new possibilities of object-relational database systems as stationary data servers by integration of combined features from document/image management, information retrieval and mobile computing. An other issue in our approach is to support several QoS levels. That means that the database system must support the specified QoS levels dependent on user defined parameters and available resources on mobile system.In spite of the comfortable possibilities for management and retrieval of multimedia data in object-relational database systems [12], there is a need for application and device dependent presentation and visualisation of the complex data structures on mobile environments. A universal interface to the database system is required in order to use the object-relational multimedia functionality and to realize the derivation of the application and device dependent visualisation. In this context XML, the eXtensible Markup Language [5] as the universal format for publishing and exchanging data offers new solutions.XML is rapidly gaining momentum in e-commerce and Interned-based information exchange, where its simplicity and custom-defined tags make it usable as a semantics-preserving data exchange format. The benefit of XML is that the concept of isolating source from target applies not only to wireless world but also to any set of data transformation, no matter how simple or complex. XML/XSL implements a generic gateway to any mobile device (e.g. PDAs, WAP phone).However, to realize this potential, it is necessary to be able to extract structured data from XML documents and store it in a database, as well as to generate XML documents from data extracted from a database. Therefore we developed the database independent DaS utility (Da tabase to XML S ervlet), which offer this functionality.The next section contains a brief survey of the MIC framework architecture for selection, exchange and presentation of information in a mobile computing environment.2 Mobile Information CenterThe Object Bus Architecture of the Mobility Information Center (MIC) has to serve as a flexible platform for the efficient exchange of user data and applications between stationary data servers and mobile end systems. Our main concept is an object oriented approach with the Object Bus (OBus) as one central feature. This Object Bus serves as a transparent layer for mobile communication and isresponsible for the delivery of messages.Figure 1. Traversal and exchange of structured objects This object bus was extended in order to face and minimize the problems caused by limited bandwidth, end system resource limitations, and frequent disconnections. Basic components of this extended object bus are Network Schedulers and context-sensitive Request/Reply caches. They use techniques like priority- and QoS-controlled communication, transparent data compression and context-controlled caching to provide basic solutions for these problems. The second main feature is the introduction of Message Handlers (MH) that act in place of the communicating processes when exchanging structured objects.A prototype of the Mobile Multimedia Information Center was built to demonstrate the use of ourobject bus architecture, the multimedia database access as well as the deployment of exchange strategies and the information retrieval. The application consists of several modules: the user interface, a huge database holding actualmultimedia data and visualization tools.Figure 2. Multimedia application on a mobile client All these modules are implemented as CORBA clients and servers that communicate via the OBus. The complete database is stored at the stationary multimedia support server at the office or can be accessed via this server.More details about the components of the Object Bus Architecture and the experimental validation results of our methods and communication protocols you will find in [3, 4].Based on this architecture, we present in the next section our approach of the integration of an object-relational database system as an extension of the Stationary Data Server (SDS) within the M O V I framework architecture.3 ORDBMS: Challenges and IssuesAt present there is a new family of object-relational DBMS (e.g. Informix , Oracle8, Universal DB2) provides all some new important extensions, especially multimedia extensions [1, 2, 11, 12]. These capabilities allow the behavior of the multimedia objects to be implement directly into the DBMS. Object-relational database systems support a number of multimedia data types - such as images, audio, or video - in their built-in system provide class hierarchies.All of these features are very important to enable the object-relational DBMS to handle a broader set of application requirements.In the scope of the M O V I project the aim is to integrate known compression or conversion algorithms into the database system or to map the developed request processing and information retrieval methods to server-side built-in operations. So we reach the effect to relieve the communication components of the MIC object bus architecture (e.g. Message Handlers) from this functionality and to transfer this to the database backend side (SDS). Forinstance the functions included in the Image DataBlade Module, which we can use within thereduction based on the user preferences and the available resources on mobile end systems.As an other fact the querying and searching techniques in DBMS need to extend to information-retrieval capabilities. In information retrieval the emphasis is more on finding the objects that satisfy as closely as possible a user’s query and leads so to an reduction of the amount of information in the case of limited bandwidth of the communication channels. By using specialized indexes for complex object retrieval we can achieve large performance increases in comparison to blob retrieval for non-indexed.The technical realization of the above functionality is done as a part of the Stationary Data Server (SDS) in the system architecture of M O V I.Beside our data preprocessing approach (e.g. image conversion/compression) we can use content based retrieval techniques on images (based on theInformix Image DataBlade) in order to reduce the amount of data which have to be transferred.We believe, that the use of an object-relational database system (e.g. Informix) for multimedia data management, data preprocessing, QoS and information retrieval will be an excellent, integrated approach in the MOVI framework [7]. On the other hand there is the need for an application and device independent interface to the database system. Based on the XML technology, we present in the next section our approach of the XML/XSL integration of the Stationary Data Server (SDS) within the MOVI framework architecture.4 DAS: Database to XML ServletXML [5] the eXtensible Markup Language, has quickly emerged as the universal format for publishing and exchanging data in the World Wide Web. Although XML was originally conceived as a replacement for HTML, it has emerged as a generic data exchange format. Its hierarchical structure and user-defined tags can be adapted to a wide variety of structured and semi-structured data, and many operations on XML documents – parsing, editing, validation, transformation can be performed independent of the actual tags in the document.A key advantage of using XML as a data source is that its presentation is separate from its structure and content. The XML data defines the structure and content, and then a stylesheet is applied to it to define the presentation. XML data can be presented in a variety of ways (both in appearance and organization) simply by applying different XSL2 stylesheets [13] to it. For example, a different interface can be presented to different users based on user profile, browser type, or other criteria by defining a different stylesheet for each different presentation style. or, stylesheets can be used to transform XML data into a format tailored to the specific application that receives and processes the data.However, to realize this potential, it is necessary to be able to extract structured data from XML documents and store it in a database, as well as to generate XML documents from data extracted from database. We need methods to convert XML documents to relational tupels (in both directions) 2XSL provides for stylesheets that allow you to transform XML into HTML or other text-based formats, rearrange or filter data.translates semi-structured queries over XML documents to SQL queries over tables and converts the results to XML.Therefore we developed the DaS utility [6] (Da tabase to XML S ervlet, see Figure 2), that solves these problems. The utility is based on widely accepted standards – JDBC3 for database access and DOM4 for XML document access.In the scope of the MIC framework, by using of the DaS utility we have now the possibility to exchange XML structures over the object bus. On the mobile end system (MES) device dependent XSLT stylesheets convert the XML document into the target presentation. This enables to present context in the format best suited to target device (e.g. HTML, WML, Plain Text, see figure 3), influenced by contexts like local resources and user preferences.5 ConclusionsIn this paper, we have described some parts of our framework approach based on object-relational databases and XML techniques. The major aim is to get a basic system for mobile multimedia information exchange, retrieval and presentation, allowing applications to access to globally distributed multimedia information and to exchange it effectively with respect to mobile resources and other parameters of the global environment. Moreover, we considered especially the new possibilities of object-relational database technology by using of content based information retrieval and reduction methods as well as media specific queries within the database system in order to reduce the amount of data to be transferred. With the integrated DaS utility we are able to generate XML documents from databases. As a result, based on the XML philosophy, we can reach any set of data transformation and reduction as well as XSL controlled visualisation to any mobile device like PDAs and WAP phone.Until now, we could only present some prototypes as part of the complete architecture, because the implementation is just in progress. Later we will3JDBC is a java API for SQL-based access to relational databases.4 DOM (Document Object Model) specifies an object model for XML documents, with objects for elements, PCDATA and entity references.perform an experimental validation of the complex framework architecture.In future work, studies will be performed in order todevelop a suitable transaction management [8, 9] in the context of a mobile environment. Traditionaltransaction mechanism and criteria have to be adjusted to accommodate the limitations of amobile computing environment. The major aim is to develop a new approach of a context based andtransaction-driven multimedia data management with support for content based and mediadependent request processing.References[1]Aberer, K., Thimm, H., Neuhold, E.J. MultimediaDatabase Systems, In: Handbook of Multimedia Computing , Borko Furth (Editor), CRC Press, 1998[2]Aberer K., Klass W. Multimedia and ist Impact onDatabase System Architectures. In Apers, P.M.G,Blanken, H.M., Houtsma, M.A.W (Eds.). MultimediaDatabases in Perspective . Springer , 1997[3]B önigk, J., Lukas,U.v., Interactive Exchange of structured multimedia Data with mobile hosts. In Global Communication Interactive , published by Hanson Cooke Ltd, 1997[4]B önigk, J., Flach, G., System Architecture and Strategies for the Exchange of structured multimedia Data in the Context of mobile Visualization. In Proc. 5th Intl.Workshop on Mobile Multimedia Communication (MoMuc), October 1998[5]Bray, T., Paoli, J., Sperberg-McQueen, C.M. Extensible markup language (xml) 1.0, Technical Report, World Wide Web Consortium, 1998, W3C Recommendation 10-Feb-98[6]Courvoisier T., Flach, G. Integration of relational data structures into XML applications - Da tabase to XML S ervlet, DaS, Proc. of GI Workshop “Internet Databases”(german), Berlin, 2000[7]Flach, G., G ünther, N. Architecture for the Interaction and Access on Multimedia Database Systems in theContext of Mobile Environments, Proc. of Int. Database Engineering and Application Symposium, IDEAS ,Yokohama, Japan, 2000[8]Narasayya, V.R.: Distributed transactions in a Mobile Computing System, First IEEE Workshop on Mobile Computing Systems and Applications , 1994[9]Pitoura, E.; Bhargava, B.: Revising Transaction Concepts for Mobile Computing, , First IEEE Workshop on Mobile Computing Systems and Applications , 1994[10]Satyanarayanan, M., Noble, B., Kumar, P., Price, M..Application-aware adaptation for mobile computing.Operating Systems Review, 29(1) January 1995[11] Stonebraker, M., Object-Relational DBMSs – The Next Great Wave, Morgan Kaufmann, 1996[12]Westermann, U., Klas, W., Architecture of a DataBlade Module for the Integrated Management of Multimedia Assets, First Int. Workshop on Multimedia Intelligent Storage and Retrieval and Retrieval Mangement (MISRM), October 1999[13]World Wide Web Consortium, XSL Transformations (XSLT) Version 1.0, W3C Recommendation , 16November 1999, /TR/1999/REC-xslt-19991116Figure 4. Content delivery f ür PDAs, Browser, WAP。