A DTD graph based XPath query subsumption test
- 格式:pdf
- 大小:1.24 MB
- 文档页数:15
xpath 常用语法一、XPath简介XPath是一种用于在XML文档中定位节点的语言。
它是一种基于路径表达式的查询语言,可以通过节点名称、属性、位置等信息来查找XML文档中的节点。
二、XPath语法1. 节点选择器- 通配符:使用星号(*)表示选择所有节点。
- 元素节点:使用元素名称选择节点。
- 属性节点:使用[@属性名]选择节点。
- 文本节点:使用text()选择节点。
2. 路径表达式- 相对路径:使用斜杠(/)表示从根节点开始的路径。
- 绝对路径:使用双斜杠(//)表示不考虑节点位置的路径。
- 父节点:使用双点(..)表示选取当前节点的父节点。
- 子节点:使用斜杠(/)表示选取当前节点的子节点。
- 兄弟节点:使用斜杠加节点名称(/节点名称)表示选取当前节点的同级节点。
3. 谓语- 谓语是用来过滤节点的条件表达式,可以在节点选择器后面使用方括号([])来添加谓语。
- 谓语可以使用比较运算符(=、!=、<、>等)和逻辑运算符(and、or)进行条件判断。
4. 逻辑运算符- and:逻辑与运算符,表示同时满足两个条件。
- or:逻辑或运算符,表示满足其中一个条件。
- not:逻辑非运算符,表示不满足条件。
5. 数字函数- count():返回指定节点集合的节点数量。
- sum():计算指定节点集合的数值之和。
- avg():计算指定节点集合的数值平均值。
- min():返回指定节点集合的最小值。
- max():返回指定节点集合的最大值。
6. 字符串函数- concat():连接两个字符串。
- contains():判断一个字符串是否包含另一个字符串。
- starts-with():判断一个字符串是否以另一个字符串开头。
- ends-with():判断一个字符串是否以另一个字符串结尾。
- substring():截取字符串的一部分。
- string-length():返回字符串的长度。
什么是xpath及xpath指的是什么XPath即为XML路径语言,它是一种用来确定XML文档中某部分位置的语言,以下是由店铺整理关于什么是xpath的内容,希望大家喜欢!xpath的含义XPath 使用路径表达式在 XML 文档中进行导航XPath 包含一个标准函数库XPath 是 XSLT 中的主要元素XPath 是一个 W3C 标准xpath的运算符xpath的表达式XPath 使用路径表达式来选取 XML 文档中的节点或者节点集。
这些路径表达式和我们在常规的电脑文件系统中看到的表达式非常相似。
路径表达式是从一个XML节点(当前的上下文节点)到另一个节点、或一组节点的书面步骤顺序。
这些步骤以“/”字符分开,每一步有三个构成成分:轴描述(用最直接的方式接近目标节点)节点测试(用于筛选节点位置和名称)节点描述(用于筛选节点的属性和子节点特征)一般情况下,我们使用简写后的语法。
虽然完整的轴描述是一种更加贴近人类语言,利用自然语言的单词和语法来书写的描述方式,但是相比之下也更加啰嗦。
xpath的存取函数名称说明fn:node-name(node) 返回参数节点的节点名称。
fn:nilled(node) 返回是否拒绝参数节点的布尔值。
fn:data(item.item,...) 接受项目序列,并返回原子值序列。
fn:base-uri()fn:base-uri(node)返回当前节点或指定节点的 base-uri 属性的值。
fn:document-uri(node) 返回指定节点的document-uri 属性的值。
xpath的语言升级在 W3C 建议下,XPath 1.0于 1999年 11月16日发表。
XPath 2.0 目前正在W3C审核过程的最终阶段。
XPath 2.0表达了XPath语言在大小与能力上显著的增加。
最值得一提的改变是XPath 2.0有了更丰富的型别系统;XPath 2.0支持不可分割型态,如在 XML Schema 内建型态定义一样,并且也可自纲要(schema)导入用户自定型别。
一、什么是XPath?XPath(XML Path Language)是一种用来确定XML文档中的特定部分的语言。
它是一种在XML文档中选取节点的能力。
二、 XPath的语法1. 选取节点:XPath使用路径表达式来选取XML文档中的节点或者元素。
要选取XML文档中的某个节点,可以使用路径表达式来表明。
2. 路径表达式:路径表达式是XPath中最重要的内容之一,由一个或多个步组成,用来定位XML文档中的节点或者元素。
路径表达式的基本语法为 nodename,表示选取此节点的所有子节点。
3. 谓语:谓语用来查找某个特定的节点或者含有某个特定值的节点。
4. 选取节点:XPath中用星号 * 来选取所有元素节点,用符号来选取属性。
三、 XPath的常用表达式1. 选取所有节点:选取XML文档中所有的节点,可以使用双斜杠 // 表示。
//title 表示选取XML文档中的所有 title 节点。
2. 选取特定节点:选取XML文档中指定的节点或者元素,可以使用节点名称来表示。
/bookstore/book 表示选取 bookstore 下的所有 book 节点。
3. 选取节点的属性:选取节点的属性可以使用符号, //book[id] 表示选取所有具有id属性的book节点。
4. 谓语:使用谓语可以筛选出符合条件的节点,//book[price>35.00] 表示选取价格大于35.00的所有book节点。
四、使用XPath的场景在实际开发中,XPath可以用于各种场景,包括但不限于:1. Web自动化测试:在自动化测试中,XPath可以用来定位网页元素,进行页面操作和验证。
2. 数据抓取:XPath可以用来定位和抓取网页中所需的数据,进行数据分析和处理。
3. XML文档处理:XPath可以用来处理XML文档,提取所需的信息,进行数据处理和转换。
五、XPath的局限性在使用XPath的过程中,也需要注意一些局限性,包括但不限于:1. 性能:使用XPath可能会影响性能,尤其是在大规模的文档上。
【xpath获取标签下所有文本的方法】一、什么是xpath?在学习xpath获取标签下所有文本的方法之前,首先要了解xpath是什么。
简单来说,xpath是一种用来在XML文档中进行导航和查询的语言。
它可以帮助我们定位XML文档中的节点,并获取节点的内容或属性。
在网页开发中,xpath也常常用来定位和提取HTML标签。
二、基本的xpath语法在使用xpath获取标签下所有文本的方法前,我们先来看一下xpath的基本语法。
xpath的语法包括节点选择、谓词、运算符等。
其中,节点选择是最基本的部分,可以用来定位节点并提取内容。
我们可以使用路径表达式来选取节点,使用“/”来表示根节点,使用“//”来表示从任意节点开始匹配,使用“.”来表示当前节点等等。
三、获取标签下所有文本的方法在网页开发中,我们经常会遇到需要获取某个标签下所有文本的情况。
这时,就可以借助xpath来实现。
如果我们想要获取某个div标签下所有文本,可以使用以下的xpath表达式://div//text()这个表达式的含义是:选取所有div节点下的所有文本。
通过这样的xpath表达式,我们就可以获取到该div标签下所有的文本内容。
四、个人观点和理解使用xpath获取标签下所有文本的方法在网页开发中是非常常见和实用的。
它可以帮助我们快速准确地定位到需要的内容,并进行提取和处理。
在实际应用中,我们还可以结合其他xpath语法和方法,来进一步实现对网页内容的抓取和分析。
xpath是一个非常强大的工具,能够大大提高我们的开发效率和准确性。
五、总结回顾通过本文的介绍,我们了解了什么是xpath,掌握了基本的xpath语法,以及如何使用xpath获取标签下所有文本的方法。
在实际的网页开发中,xpath是一个非常实用的工具,能够帮助我们快速准确地定位和提取所需的内容。
xpath的应用范围非常广泛,不光可以用在网页开发中,还可以用在各种XML文档的处理中。
xpath定位中详解id、starts-with、contains、text()和last。
1、XPATH使⽤⽅法使⽤XPATH有如下⼏种⽅法定位元素(相⽐CSS选择器,⽅法稍微多⼀点):a、通过绝对路径定位元素(不推荐!)WebElement ele = driver.findElement(By.xpath("html/body/div/form/input"));b、通过相对路径定位元素WebElement ele = driver.findElement(By.xpath("//input"));c、使⽤索引定位元素WebElement ele = driver.findElement(By.xpath("//input[4]"));d、使⽤XPATH及属性值定位元素WebElement ele = driver.findElement(By.xpath("//input[@id='fuck']"));//其他⽅法(看字⾯意思应该能理解吧)WebElement ele = driver.findElement(By.xpath("//input[@type='submit'][@name='fuck']"));WebElement ele = driver.findElement(By.xpath("//input[@type='submit' and @name='fuck']"));WebElement ele = driver.findElement(By.xpath("//input[@type='submit' or @name='fuck']"));e、使⽤XPATH及属性名称定位元素元素属性类型:@id 、@name、@type、@class、@tittle//查找所有input标签中含有type属性的元素WebElement ele = driver.findElement(By.xpath("//input[@type]"));f、部分属性值匹配WebElement ele = driver.findElement(By.xpath("//input[start-with(@id,'fuck')]"));//匹配id以fuck开头的元素,id='fuckyou'WebElement ele = driver.findElement(By.xpath("//input[ends-with(@id,'fuck')]"));//匹配id以fuck结尾的元素,id='youfuck'WebElement ele = driver.findElement(By.xpath("//input[contains(@id,'fuck')]"));//匹配id中含有fuck的元素,id='youfuckyou'g、使⽤任意值来匹配属性及元素WebElement ele = driver.findElement(By.xpath("//input[@*='fuck']"));//匹配所有input元素中含有属性的值为fuck的元素元素定位总结//注:本专题只介绍java版//By idWebElement ele = driver.findElement(By.id());//By NameWebElement ele = driver.findElement(By.id());//By classNameWebElement ele = driver.findElement(By.className());//By tabNameWebElement ele = driver.findElement(By.tagName());//By linkTextWebElement ele = driver.findElement(By.linkText());//By partialLinkTextWebElement ele = driver.findElement(By.partialLinkText());//通过部分⽂本定位连接//By cssSelectorWebElement ele = driver.findElement(By.cssSelector());//By XPATHWebElement ele = driver.findElement(By.xpath());=================================栗⼦=====================================1、id 获取id 的属性值2、starts-with 顾名思义,匹配⼀个属性开始位置的关键字 -- 模糊定位3、contains 匹配⼀个属性值中包含的字符串 -- 模糊定位4、text() 函数⽂本定位5、last() 函数位置定位eg<input id="su" class="bg s_btn btnhover" value="百度⼀下" type="submit"/>//*[@id='su'] 获取id 的属性为'su' 的值或//input[contains(@class,'bg s_btn')]<a class="lb" href="https:///v2/?login&tpl=mn&u=http%3A%2F%%2F" name="tj_login" onclick="return false;">登录</a>//a[starts-with(@name,'tj_lo')] 属性模糊定位//a[contains(@name,'tj_lo')] 属性模糊定位<a href="">百度搜索</a>//a[text()='百度搜索']或//a[contains(text(),"搜索")] --⽂本模糊定位<a id="setf" href="///cache/sethelp/help.html" onmousedown="returnns_c({'fm':'behs','tab':'favorites','pos':0})" target="_blank">把百度设为主页</a>//a[text()='把百度设为主页']/A/B/C[last()] 表⽰A元素→B元素→C元素的最后⼀个⼦元素,得到id值为e2的E元素。
Python xpath写法一、概述XPath(XML Path Language)是一门在 XML 文档中查找信息的语言,可以用来在 XML 文档中对元素和属性进行定位。
在 Python 中,使用 XPath 可以很方便地对 XML 或 HTML 文档进行解析和提取信息。
本文将介绍 Python 中使用 XPath 的写法,帮助读者更好地理解和应用这一技术。
二、导入相关库在使用 Python 进行 XPath 解析之前,需要导入相关的库。
通常情况下,我们会使用lxml 库进行XPath 解析。
在代码中需要先导入该库。
``` pythonfrom lxml import etree```三、XPath 基本写法在使用 Python 进行 XPath 解析时,需要掌握一些基本的写法规则。
下面将介绍几种常用的 XPath 写法。
1. 选取节点要选取节点,可以使用路径表达式。
路径表达式(Path Expression)用于选取 XML 文档中的节点或者节点集。
要选取 XML 文档中的所有<book> 节点,可以使用以下写法:``` pythonxpath = '//book'```2. 选取子节点如果要选取某个节点的子节点,可以使用斜杠(/)。
要选取 XML 文档中 <book> 节点的所有 <title> 子节点,可以使用以下写法:``` pythonxpath = '//book/title'```3. 选取父节点要选取某个节点的父节点,可以使用两个点(..)。
要选取 <title> 节点的父节点 <book>,可以使用以下写法:``` pythonxpath = '//title/..'```4. 选取指定属性的节点如果要选取具有指定属性的节点,可以使用方括号。
要选取所有带有category 属性的 <book> 节点,可以使用以下写法:``` pythonxpath = '//book[category]'```5. 选取指定条件的节点XPath 还支持使用谓语(Predicates)来选取满足指定条件的节点。
xpath的爬取规则
XPath是一种用于在XML和HTML文档中定位元素的语言。
在网络爬虫中,XPath可以用来提取网页中的信息。
以下是一些XPath的爬取规则:
1. 使用Chrome浏览器的开发者工具
在Chrome浏览器中,按下F12键可以打开开发者工具。
在Elements选项卡中,选中要提取的元素,在右侧的Styles选项卡中找到该元素的XPath。
在XPath前面加上“//”即可使用。
2. 使用XPath表达式
XPath表达式包括节点、轴、运算符、函数和变量。
通过使用XPath 表达式,可以快速定位和提取网页中的信息。
例如,要提取一个网页中的所有链接,可以使用以下XPath表达式:
//a/@href
该表达式中,“//a”表示所有的链接元素,“/@href”表示链接的URL属性。
3. 使用XPath语法
XPath语法包括节点、谓词、运算符和函数。
以下是一些常用的XPath语法:
- 节点选择器:使用“/”或“//”选择元素或属性节点。
- 谓词:用于筛选元素或属性节点。
- 运算符:包括算术运算符、比较运算符和逻辑运算符。
- 函数:包括字符串、数学、逻辑、日期等函数,可以用于处理提取的数据。
4. 注意XPath的路径
XPath的路径必须根据HTML结构正确编写。
如果路径错误,将无法提取到正确的数据。
可以使用Chrome浏览器的开发者工具检查元素的路径是否正确。
XPath的爬取规则需要结合实际情况进行灵活运用,提取网页中的信息。
xpath获取标签下所有文本的方法XPath是一种用于在XML文档中定位和选择节点的语言。
它是一种基于路径表达式的查询语言,非常适用于从XML文档中提取特定的数据。
在XPath中,可以使用特殊的路径表达式来获取标签下所有文本。
要理解如何使用XPath获取标签下所有文本,首先需要了解XPath 的基本语法。
XPath使用路径表达式来描述XML文档中的节点结构,并且支持在路径表达式上进行过滤和排序。
下面是XPath的基本语法:-单斜杠(/)用于从根节点开始选择;-双斜杠(//)用于在整个文档中选择节点;-节点名称用于选择具有特定名称的节点;-方括号([])用于添加限制条件,例如位置或属性;-逻辑运算符(and,or,not)用于将多个限制条件组合在一起。
为了获取标签下所有文本,可以使用XPath的text()函数。
text()函数用于获取节点的文本内容,可以通过将节点名称和text()函数组合在一起来获取标签下的所有文本。
下面是一个示例XML文档:```xml<books><book><title>Book 1</title><author>Author 1</author></book><book><title>Book 2</title><author>Author 2</author></book></books>```要获取所有书籍的标题,可以使用以下XPath表达式:```xpath//book/title/text()```该表达式使用双斜杠(//)选择整个文档中的所有book节点,然后使用/title/text()选择book节点下的title子节点,并获取其文本内容。
使用XPath的text()函数,可以获取标签下的所有文本内容。
xpath表达式XPath(XML路径语言)是一门在XML文档中查找信息的语言,它基于 XPointer 和在XML文档中使用路径表达式来查找节点。
XPath主要共四种类型的表达式:路径表达式、运算表达式、谓词表达式和函数表达式。
1. 路径表达式路径表达式是XPath中最基本的表达式类型。
它是一系列按特定顺序构造的标记,每个标记描述了一种“如何”或“何时”走下一步。
例如:/HTML/BODY/H1/P,这种表达式表示从HTML节点开始,然后匹配(即走向)包含在其中的BODY节点,再匹配H1节点,最后匹配P节点,我们就可以轻松地访问XML文档中的元素,属性和文本。
2. 运算表达式运算表达式包括算术运算符、比较运算符和逻辑运算符。
它们可用于将两个值(或变量)进行运算并产生结果。
例如,使用 XPATH 表达式 //BODY[2]可以比较值 2 与 BODY 元素内容,以确定是否存在第二个 BODY 元素。
3. 谓词表达式谓词表达式有助于筛选XML文档中的元素。
它们可以将给定的XML 值与另一个给定的值进行比较,以确定它们是否符合我们的需要。
如果一个指定的谓词为True,则结果集将包含相应的结果。
例如,/HTML/BODY/H1/P[id='doc1']将会精确地只返回ID属性等于 doc1 的P元素。
4. 函数表达式函数表达式用于将字符串处理为需要的格式,并进行一些特殊处理。
常用函数表达式有:string()-用于将传入参数转换成字符串;translate()-用于替换一个字符串中的字符;contains()-用于检查一个字符串中是否包含另一个字符串;concat()-用于连接两个或多个字符串;count()-用于获取列表长度;sum()-用于求和;and、not、or-用于实现布尔逻辑;substring()-可以裁剪一个字符串;substring-after()-用于提取字符串的结尾;substring-before()-用于提取字符串的开头;position()-可以返回当前节点在节点列表中的位置;last()-用于获取节点列表中最后一个元素;name()-返回当前节点的指定名称;text()-可以返回当前节点的所有文本;parent()-返回无当前元素的父元素等。
pythonxpath用法Python's xpath is a powerful library used for parsing XML and HTML documents, allowing users to extract specific data using XPath expressions. XPath is a query language for selecting nodes from an XML or HTML document.Here's how to use xpath in Python:1.Installation: Install the xpath library using pip:pip install xpath2.Import: Import the xpath module in your Python script:import xpath3.Parsing: Parse the XML or HTML document usingxpath.parse():doc = xpath.parse('example.xml')4.Querying: Use XPath expressions to query specificelements or attributes in the document:result = doc.find('//element[@attribute="value"]')5.Accessing Results: Access the results returned by thepythonCopy codefor element in result: print(element.text)6.Cleanup: Close the document parser when done:doc.close()Here's a simple example of using xpath to parse an XML document:# Parse the XML documentdoc = xpath.parse('example.xml')# Query specific elements using XPathresult = doc.find('//book[@category="programming"]')# Access and print resultsfor book in result:print("Title:", book.find('title').text)print("Author:", book.find('author').text)# Close the document parserdoc.close()Chinese Parsing: Python的xpath是一个强大的库,用于解析XML和HTML文档,允许用户使用XPath表达式提取特定数据。
A DTD Graph Based XPath Query Subsumption TestStefan Böttcher, Rita SteinmetzUniversity of PaderbornFaculty 5 (Computer Science, Electrical Engineering & Mathematics)Fürstenallee 11, D-33102 Paderborn, Germanyemail : stb@uni-paderborn.de, rst@uni-paderborn.deAbstract.XPath expressions play a central role in querying for XML frag-ments. We present a containment test of two XPath queries which checkswhether a new XPath query XP1 can reuse a previous query result XP2. Thekey idea is to transform XP1 into a graph which is used to search for sequencesof elements which are used in the XPath query XP2.1. Introduction1.1. Problem origin and motivationThe development of our XPath containment test was motivated by an XML data-base system that allows to cache and reuse previous query results (e.g. on a mobile client) for the evaluation of a new XPath query. Whenever we can prove that a previ-ous query result which is already stored on a client can be reused for a new XPath query, this may be a considerable advantage in comparison to shipping the fragment selected by an XPath query to the mobile client again. Our goal is to prove without access to data stored in the XML database, i.e. independent of the actual database state, that one XPath expression XP1 selects a subset of the data selected by another XPath expression XP2 – or as we say that XP1 is subsumed by XP2. We allow how-ever our tester to be incomplete, i.e., whenever our tester cannot efficiently decide whether or not an XPath query XP2 subsumes another query XP1, we allow the sub-sumption tester to return false, i.e., we assume that the previous query result cannot be reused. In this case, the query is sent to the database system. If however our algorithm returns true, we are then sure, that a previous query result can be reused. This paper focuses on the subsumption tester itself, whereas the reuse of old query results which are stored in the client’s cache for a new query is discussed in [2], and concurrency control issues, e.g. how we treat a cached query result, when concurrent transactions modify the original XML database fragment, are discussed in [3].1.2. Relation to other work and our focusOur contribution is related to other contributions to the area of containment tests for XPath and semi-structured data. Query containment on semi-structured data for other query languages has been examined e.g. by [4, 7]. In comparison, we examinethe XPath expressions themselves in order to decide whether or not the set of data se-lected by a new XPath query expression is a subset of the data selected by a previous XPath expression. For this purpose, we follow [5, 8, 9, 10], which also contribute to the solving of the containment problem for two XPath expressions under a given DTD. The focus of the contributions [5, 8, 9, 10] is to find a general solution to the containment tests for certain subclasses of XPath expressions, and to report on de-cidability results or to give upper and lower bounds for the complexity of the con-tainment test for certain subclasses of XPath expressions. However, in contrast to this, we focus on a fast decision as to whether or not an XPath query result can be reused and we allow our containment test to be incomplete.The other contributions (e.g. [5, 8, 9, 10]) use tree patterns in order to normalize the XPath query expressions and to compare them to the structure of the database. They consider the DTD either as a set of constraints that has to be met [5, 10] or as an automaton [9]. In contrast to this, we follow [1] and use the concept of a DTD graph to expand all paths that are selected by an XPath expression, and we right-shuffle all filter expressions within sets of selected paths. Our transformation of an XPath query into a graph is similar to the transformation of an XPath query into an automaton, which was used in [6] to decide whether an XML document fulfills this query. How-ever, in contrast to all other contributions, our approach combines a graph based search for paths (selected by XP1 but not by XP2) with the right-shuffling of predi-cate filters in such a way, that the containment test for XPath expressions including all axes (except preceding (-sibling) and following (-sibling)) can be reduced to a con-tainment test of filters combined with a containment test of possible filter positions. 1.3 The supported subset of XPath expressionsAn XPath expression is defined as being a sequence of location steps /<LocationStep1> / … / <LocationStepN>,where <LocationStepI> is defined asaxis-specifierI :: node-testI [predicate filterI].Because XPath is a very rich language and we have to have a balance between the allowed complexity of XPath expressions and the complexity of the subsumption test algorithm on two of these XPath expressions, we restrict the set of XPath expressions to the following set of allowed XPath expression s:1. Axis specifiersWe allow absolute or relative location paths with the following axis specifiers in their location steps: self, child, descendant, descendant-or-self, parent, ancestor, ancestor-or-self and attribute, and we forbid namespace, following (-sibling) and preceding (-sibling) axes.2. Node testsWe allow all node name tests except name-spaces, but we forbid node type tests like text(), comment(), processing-instruction() and node(). We allow wildcards, but only at the end of an XPath query.For example, when A is an attribute and E is an element, then ./@A, ./E, ./E/E, .//E , ./E/* and ./E/*//* are allowed XPath expressions.3. Predicate filtersWe restrict predicate filters of allowed XPath expressions to be either a simple predicate filter [B] with a simple filter expressions B or to be a compound predi-cate filter.3.1. Simple filter expressionsLet <path> be a relative XPath expression which uses the parent-axis, the at-tribute-axis or a child-axis location step, and let <value> be a constant.•<path> is a simple filter expression. For example, when A is an attribute and E is an element, then @A and ./E are allowed filter expressions which are usedto check for the existence of an attribute A or an element ./E respectively.•Each comparison <path> = <value> and each comparison <path> != <value> is a simple filter expression.1•‘not <path>’ is a simple predicate filter, which means that the element or at-tribute described by <path> does not exist.3.2. Compound predicate filtersIf [B1] and [B2] are allowed predicate filters, then ‘[B1] [B2]’, ‘[B1 and B2]’,‘[B1 or B2]’ and ‘[not (B1)]’ are also allowed predicate filters. In our subset of allowed predicate filters, ‘[B1] [B2]’ and ‘[B1 and B2]’ are equivalent, because we excluded the sibling-axes, i.e., we do not consider the order of sibling nodes.1.4 Basic definitions and problem descriptionWe use an XPath expression XP = // E1 / E2 // E3 // E4 / E5 in order to explain some basic terms that we use throughout the rest of the paper. The nodes selected by XP (in this case elements with the node name E5) can be reached from the root node by a path (that must contain at least the nodes ‘root’, E1, E2, E3, E4, and E5). All these paths will be called paths selected by XP or selected paths for short.Since every path to elements selected by XP contains an element E1 which is di-rectly followed by an element E2, we call E1/E2 an element sequence. A single ele-ment (like E3) which does not require another element to directly follow it, will also be called an element sequence (consisting only of this single element). So, the ele-ment sequences in this example are: E1/E2, E3, and E4/E5.The input of our tester consists of three parts: a DTD and two XPath expressions (XP1 and XP2) which are used for a query and a previous query result respectively. The goal is to prove, based on the DTD only, that the first XPath expression selects a subset of the node set selected by the second XPath expression. This should be done as efficiently as possible and without access to the actual XML database, because the subset test is completely executed on the client-side.Throughout this paper, we use the term XP1 is subsumed by XP2 as a short notation for “XP1 selects a subset of the node set selected by XP2 in all XML documents which are valid according to the given DTD”. Given this definition, we can also state the subsumption test as follows: XP1 is subsumed by XP2, if and only if a path to an 1 Note that we can also allow for more comparisons, e.g. comparisons which contain the op-erators ‘<’, ‘>’, ‘ ¶ RU µ ¶ DQG FRPSDULVRQV OLNH SDWK! FRPSDULVRQ RSHUDWRU! SDWK ! ,Q such a case the predicate filter tester which is used as part of our tester in Section 3.5 would have to check more complex formulas.element which is selected by XP1 and is not selected by XP2 can never exist in any valid XML document.Section 2 describes the preparation steps for the subsumption test, i.e., how we transform the DTD into a so called DTD graph which was introduced in [1], and how we use this DTD graph in order to normalize the XPath expressions. Section 3 out-lines our two major algorithms, i.e. how we compute a so called XP1 graph which de-scribes all paths selected by XP1, and how we try to place XP2 sequences on each path of the XP1 graph in such a way that the filter of each XP2 sequence subsumes an XP1 filter.2. DTD graph construction and normalization of XPath expressions2.1 Translating the DTD into a DTD graph and a set of associated DTD filtersA directed DTD graph is a directed graph G=(N,C) where each node E∈N corre-sponds to an element of the DTD and an edge c ∈ C, c=(E1,E2) from E1 to E2 exists for each element E2 that is used to define the element E1 in the DTD. For example (Example 1), Figure 1 shows a DTD and the corresponding DTD-graph:Figure 1. DTD and corresponding DTD graph of Example 1 The DTD graph can be used to check whether or not at least one path selected by XP1 (or XP2 respectively) exists. If there does not exist any path selected by XP1 (or XP2 respectively), then XP1 (or XP2) selects the empty node set. We consider this to be a special case, i.e., if XP2 selects the empty node set, we try to prove that XP1 also selects the empty node set. For the purpose of the discussion in the following sections, we assume that at least one path for XP1 (and XP2 respectively) exists.The concept of the DTD graph represents an upper bound for the computation of possible paths for XP1, but it does not yet contain all the concepts of a DTD. In order to distinguish between optional child elements and mandatory elements and in order to distinguish between disjunctions and conjunctions found in the DTD, etc., we can associate additional DTD filters to each element of the DTD (and to each node of the DTD graph respectively). For example (Example 2), a DTD rule< !element E1 ( E2 ,( E3+ | E4+ ), E5? ) >is translated into the following DTD filter for the element (or DTD graph node) E1: [ ./E2 and ( ./E3 xor ./E4 ) and unique(E2) and unique(E5) ] .The relative path ‘./E2’ which occurs in the filter requires every element E1 to have a child node E2, and unique(E2) states that there can not be more than one child node E2 per node E1. Since E5 is an optional child of E1, its existence is not required.A complete tester would have to add the DTD filter for a node to each XPath expres-sion which selects (or passes) this node. These DTD filters can be used as predicate filters to both XPath expressions whenever the edge is passed. These filters can also be used to eliminate so called forbidden paths, i.e. paths which are contained in the DTD graph, but which are not allowed when also filters or DTD filters are regarded. Given the rule of Example 2, all paths which require the existence of both children, E3 and E4, (e.g. //E1[./E3]/E4) are forbidden paths which can be discarded. However, an incomplete tester may ignore some or even all of these DTD con-straints for the following reason. The number of valid documents can only be in-creased and can never be decreased by ignoring a DTD constraint. When XP1 is sub-sumed by XP2 within this relaxed DTD (represented by the DTD graph) which allows for even more paths, then XP1 is also subsumed by XP2 under the more restrictive DTD which was originally given. Therefore, a successful proof of the subsumption of XP1 by XP2 within the relaxed DTD is sufficient for the reuse of a previous query re-sult.2.2 Element distance formulasA further preparation step involves the use of the DTD graph in order to compute all possible distances for each pair of elements and to store these distances in a dis-tance table, called the DTD distance table[1]. We use the distances for the right-shuffling of predicate filters in Section 3. For example, the distance from E1 to E2 in the DTD graph of Example 1 is any positive odd number of child-axis location steps, i.e., the distance table contains an entry “2*x+1 (x≥0)” which describes the set of all possible distances from E1 to E2. The distance table entry “2*x+1 (x≥0)” rep-resents a loop of 2 elements (E1 and E2) in the DTD graph with a minimum distance of 1. Whenever two elements (E1,E2) occur in the DTD graph in a single loop which contains ‘c’ elements and the shortest path from E1 to E2 has the length k, then the DTD distance entry for the distance from E1 to E2 is ‘c*x+k (x≥0)’. Since alternative paths or multiple loops which connect one element to another may exist, the general form of an entry of the DTD distance table is∑1≤i≤n ai*xi + k (xi≥0) or ... or ∑1≤j≤m bj*yj + p (yj≥0),where ai,bj,k,p are natural numbers or 0.Each name of a variable – x in the case of the previous example – is connected uniquely to one circle in the DTD graph and is called the circle variable of this circle.2.3 Transformation, normalization and simplification of XPath queriesBefore our two main algorithms start, we transform both XPath expressions into an equivalent normalized form by the following transformation steps. Firstly, relative XPath expressions are transformed into equivalent absolute XPath expressions. Sec-ondly, if the XPath expression does not start with a child-axis location step, we insert ‘/root’ at the beginning of the XPath expression, i.e., in front of the first location step of the XPath expression. ‘root’ corresponds to the root-node of the DTD, so that after this normalization all XPath expressions start with a child-axis location step ‘/root’.If the XPath expression contains one or more parent-axis location steps or ances-tor-axis location steps, these parent-axis location steps and ancestor-axis location steps are replaced form left to right according to the following rules. Let LS1,…,LSn be location steps which do neither use the parent-axis nor the ancestor-axis, and let XPtail be an arbitrary sequence of location steps. Then we replace /LS1/…/LSn/child::E[F]/../XPtail with /LS1/…/LSn[./E[F]]/XPtail .Similarly, in order to replace the first parent-axis location step in the XPath expres-sion /LS1/…/LSn/descendant::E[F]/../XPtail, we use the DTD graph in order to com-pute all parents P1,…,Pm of E which can be reached after LSn has been performed, and we replace the XPath expression with /LS1/…/LSn//(P1|…|Pm)[./E[F]]/XPtail .In order to substitute an ancestor location step ancestor::E[F] in an XPath expres-sion /LS1/…/LSn/ancestor::E[F]/XPtail, we use the DTD graph in order to compute all possible positions where E may occur between the ‘root’ and the element selected by LSn. Depending on the DTD graph, there may be more than one position, i.e. we replace the given XPath expression with( //E[F][/LS1/.../LSn] / XPtail ) | ( /LS1//E[F][/LS2/.../LSn]/XPtail ) | ... |( /LS1/.../LSn-1/E[F][/LSn]/XPtail ) .Similar rules can be applied in order to eliminate the ancestor-or-self-axis, the self-axis, and the descendent-axis, such that we finally have only child-axis and descen-dant-or-self-axis-location steps (and additional filters) within our XPath expressions.Finally, nested filter expressions are eliminated, e.g. a filter [./E1[./@a and not (@b=”3”) ] ] is replaced with a filter [ ./E1 and (./E1/@a and not ./E1/@b=”3”) ] . More general, a nested filter [./E1[F1]] is replaced with a filter [./E1 and F1’] where the filter expression F1’ is equal to F1 except that it adds the prefix ./E1 to each loca-tion path in F1 which is defined relative to E1. This approach to the unnesting of filter expressions can be extended to the other axes and to sequences of location steps, such that after these unnesting steps, we do not have any nested filters.3. The two major algorithms of our subsumption testFirstly, we construct a graph which contains the set of all possible paths for XP1 in any valid XML document according to the given DTD. Then, XP1 is subsumed by XP2, if and only if for all paths for XP1 which are allowed by the DTD the following holds: the path for XP1 contains all sequences of XP2 in the correct order and for each XP2 sequence which has a filter there exists a corresponding XP1 node with a filter which is as least as restrictive as the filter attached to the XP2 sequence. In other words, if one path selected by XP1 which does not contain all sequences of XP2 in the correct order is found, then XP1 is not subsumed by XP2.3.1 Extending the DTD graph to a graph for paths selected by XP1 (Algorithm 1)In order to represent the set of paths selected by XP1, we use a graph which we will call the (rolled out) XP1 graph in the remainder of the paper [1]. Each path se-lected by XP1 corresponds to one path from the root node of the XP1 graph to the node(s) in the XP1 graph that represent the selected node(s). The XP1 graph containsa superset of all paths selected by XP1, because some paths contained in the XP1 graph may be forbidden paths, i.e. paths that have predicate filters which are incom-patible with DTD constraints and/or the selected path itself (c.f. Section 2.1). We use the (rolled out) XP1 graph in order to check whether or not it contains all the se-quences of XP2, and if so, we are then sure that all of the paths selected by XP1 con-tain all the sequences of XP2.Example 3: Consider the DTD graph of Example 1 and an XPath expression ’XP1new=/root/E2/E1/E2//E3’, which requires that all XP1 paths start with the ele-ment sequence /root/E2/E1/E2 and end with the element E3. The rolled out XP1 graph for the XPath expression XP1new isFig. 2. XP1 graph of Example 3where node labels visualize the nodes of the paths selected by XP1 and the edge la-bels visualize the possible distances between two nodes.The (rolled out) XP1 graph depends on the DTD graph and on the given XPath ex-pression, i.e., the XP1 graph for the DTD graph given in Section 2.1 and the XPath expression XP1 = //E3 is identical to the DTD graph given in Example 1 (as long as we ignore the distance labels attached to the edges). The following algorithm com-putes the XP1 graph from a DTD graph given and an XPath expression XP1:G RAPH G ET XP1G RAPH(G RAPH DTD, XP ATH XP1)(1){(2) G RAPH XP1Graph = NEW G RAPH ( DTD.G ET R OOT() );(3) N ODE lastGoal = DTD.G ET R OOT();(4)while(not XP1.I S E MPTY()) {(5) N ODE goalElement = XP1.R EMOVE F IRST E LEMENT();(6)if (XP1.L OCATION S TEP B EFORE(goalElement) == ‘/’)(7) XP1Graph.A PPEND( N ODE(goalElement) );(8) else(9) XP1Graph.E XTEND((10) DTD.C OMPUTE R EDUCED DTD(lastGoal,goalElement));(11) lastGoal = goalElement;(12) }(13)return XP1Graph;(14)}The algorithm generates a sequence of XP1 graph nodes for each element se-quence of XP1. Furthermore, for each descendent-axis step E1//E2 which occurs in XP1, the algorithm inserts a subgraph of the DTD graph, called the reduced DTD graph for paths from E1 to E2. This reduced DTD graph represents all paths from E1 to E2 (which are allowed according to the DTD graph) between the nodes for E1 and E2. The method C OMPUTE R EDUCED DTD(lastGoal,goalElement)returns such asubgraph of the DTD graph which only contains edges which are part of a path from lastGoal to goalElement.If XP1 ends with //* (or /* respectively), i.e., XP1 is of the form XP1 = XP1’//*, the XP1graph is computed for XP1’. Afterwards one reduced DTD graph that con-tains all nodes that are successors of the endnode of the XP1’graph (or that contains all nodes that can be reached within one step from the end node respectively) is ap-pended to the end node of the XP1’graph and all these appended nodes are marked as end nodes of the XP1 graph.During the execution of Algorithm 1, we compute the distances between adjacent nodes and we attach them as labels to the edges between them. We distinguish two different types of edges: Edges which are generated by a method-call XP1Graph.A PPEND(N ODE(goalElement)) append only a single node and have the label “1”. However, edges which are copied from reduced DTD graphs may con-tain a distance formula which can be looked up in the DTD distance table.3.2 XP1 graph sequencesA path in the XP1 graph where each node except the last one and the first one has ex-actly one outgoing edge will be called a XP1 graph sequence in the remainder of this document.If one node N has more than one outgoing edges, but all of these edges point to nodes that have exactly the same node label, these nodes are as well added to the XP1 graph sequence that ends in N.For example the XP1 graph sequences of the XP1 graph of Example 3 are root→E2→E1→E2→E1 and E1→E3.3.3 Combining XP2 predicate filters within each XP2 sequenceBefore Algorithm 2 starts, we perform a further normalization step with the XPath expression XP2. Within each sequence of XP2 we shuffle all filters to the rightmost element that carries a filter expression itself, so that after this normalization step all filters within this sequence are attached to one element.To shuffle a filter one location-step to the right means to add one parent-axis loca-tion step to the path within the filter expression and to attach it to the next location step. For example the XPath expression XP2 = // E1[./@b] / E2[./@a] / E3 will be transformed into the equivalent XPath expression XP2’=//E1/E2[../@b and ./@a]/E3.3.4 Placing one XP2 element sequence with its filters in the XP1 graphWithin Algorithm 2 (Section 3.7), we need a procedure which we call B OOLEAN P LACE F IRST S EQUENCE(in XP1Graph,inout XP2,inout startNode). It tests for a given XP2 sequence and a given startNode within an XP1 graph whether this sequence can be placed successfully in the XP1 graph beginning at startNode,such that each filter of the XP2 sequence subsumes an XP1 filter (as outlined in Sec-tion 3.5).Because we want to place XP2 element sequences in paths selected by XP1, we de-fine the correspondence of XP2 elements and XP1 graph nodes as follows. An XP1 graph node and a node name which occurs in an XP2 location step correspond to each other, if and only if the node has a label which is equal to the element name of the lo-cation step. We say, a path (or a node sequence) in the XP1 graph and an element se-quence of XP2 correspond to each other, if the n-th node corresponds to the n-th ele-ment for all nodes in the XP1 graph node sequence and for all elements in the XP2 element sequence.The procedure P LACE F IRST S EQUENCE(…,…,…)checks whether or not each path in the XP1 graph which begins at startNode fulfils the following two conditions: firstly, the path has a prefix which corresponds to the first sequence of XP2 (i.e. the node sequences that correspond to the first element sequence of XP2 can not be cir-cumvented by any XP1 path), secondly, if the first sequence of XP2 has a filter, then this filter subsumes at least one filter given for each XP1 path.In general, there may exist more than one path in the XP1 graph which starts at startNode and corresponds to a given XP2 sequence, and therefore there may be more than one XP1 graph node which corresponds to the final node of the XP2 element se-quence. The procedure P LACE F IRST S EQUENCE(…,…,…)internally stores that final node which is nearest to the end node of the XP1 graph (we call it the last final node).2 If only one path that begins at startNode is found which does not have a prefix cor-responding to the first sequence of XP2 or which does not succeed in the filter impli-cation test described in Section 3.5, then the procedure P LACE F IRST S EQUENCE(…,…,…) does not change XP2, does not change startNode, and returns false.If however the XP2 sequence can be placed on all paths and the filter implication test is successful for all paths, then the procedure removes the first sequence from XP2, copies the last final node to the inout parameter startNode and returns true.3.5 A filter implication test for one XP2 element sequence and one path in the XP1 graphFor this section, let us consider only one XP2 sequence E1/…/En and only one path in the XP1 graph that starts at a given node corresponding to E1.As described in Section 2.3 the filters within one XP2 sequence are normalized, so that all filters are attached to exactly one element which we will call the current ele-ment. When given a startNode and a path of the XP1 graph, the node which corre-sponds to the current element is called the current node.Within the first step we shuffle all predicate filters of the XP1 XPath expression which are attached to nodes that are predecessors of the current node into the current node. To shuffle a filter expression from one node into another one means to simply attach (../)d at the beginning of the path expression inside this filter expression, 2 When we place the next XP2 sequence at or ‘behind’ this last final node, we are then sure, that this current XP2 sequence has been completely placed before the next XP2 sequence, whatever path XP1 will choose.whereas d is the distance from the first node to the other node. This distance can be calculated by summing up all distances of the paths that have to be passed from the first node to the second one.After shuffling an XP1 filter to the right, an implication test is performed on this right-shuffled XP1 filter [f1] and the XP2 filter [f2] attached to the current node.XP1 is subsumed by XP2, if and only if [f1] is at least as restrictive as [f2], i.e., f1⇒f2. For example, if the input contains only two filters [f1]=[../@a=”5”] and [f2]=[../@a], then [f1] is more restrictive than [f2], because the implication f1⇒f2 holds, but f2⇒f1 does not hold.Let d1 be a distance formula of a filter of XP1 and d2 be a distance formula of a filter in a location step of XP2, i.e. let XP1 have a filter [f1]=[(../)d fexp1i] and let XP2 have a filter [f1]=[(../)d fexp1i]. Both distance formulas d1 and d2 depend on (zero ore more) circle variables x1,…,xn of the XP1 graph, and d2 may additionally depend on x’ where (x≥x’≥0) and x is equal to one of the circle variables x1, …, xn. We say, a loop loop1=(../)d1is subsumed by a loop3loop2=(../)d2, if all distances which are selected by loop1 are also selected by loop2. More specifically, loop1 is subsumed by loop2, if for all possible tuples of values for [x1, …, xn] x’ can be found so that d1=d2 holds. Whenever a loop1 of a filter [f1i] of XP1 is subsumed by a loop2 of a filter [f2j] of XP2, then XP2 can place its filter [f2j] in such a way, that it is ap-plied to the same elements as [f1i].However, for a counter-example, let XP1 be //E3 [(../)2*x+1@a] (x≥0) and XP2 be //E3 [(../)2*x+3@a] (x≥0), i.e., the loop 2*x+1 is not subsumed by the loop 2*x+3, then XP1 can choose x=0 (i.e. place its filter [@a] to the parent of E3), but XP2 has to place its filter [@a] to some previous ancestor element, say Ep. Because XP1 has no filter placed on Ep, XP1 includes also elements Ep which do not have an attribute ‘a’, i.e., XP1 is not subsumed by XP2.Altogether, a filter [f1] of XP1 is subsumed by a filter [f2] of XP2 attached to the same node, if loop1 is subsumed by loop2 and fexp1i⇒fexp2j. Since both filter ex-pressions that occur in the implication fexp1i⇒fexp2j contain no loops, we can use a predicate tester for XPath expressions (e.g. [1]), which extends a theorem prover for Boolean logic in such a way that it takes the special features of simple XPath expres-sions into account (e.g. the tester has to consider that [not ./@a=”5” and not ./@a!=”5”] is equivalent to [not ./@a]).If the tester returns for one XP2 filter that this filter is subsumed by the XP1 filter, this XP2 filter is discarded. This is performed repeatedly until either all XP2 filters are discarded or all XP1 filters which are attached to nodes that are predecessors of the current node are shuffled into the current node.If then not all XP2 filters are discarded, in a second step, all these remaining filters are right-shuffled into the next node to which an XP1 filter is attached. It is again tested, if one of the XP2 filters can be discarded, as this XP2 filter subsumes the XP1 filter. This is as well performed until either all XP2 filters are discarded (then the fil-ter implication test returns true) or all XP1 filters have been checked and at least 3 The definition one loop is subsumed by another also includes paths without a loop, because distances can be a constant value.。