当前位置:文档之家› A DTD graph based XPath query subsumption test

A DTD graph based XPath query subsumption test

A DTD graph based XPath query subsumption test
A DTD graph based XPath query subsumption test

A DTD Graph Based XPath Query Subsumption Test

Stefan B?ttcher, Rita Steinmetz

University of Paderborn

Faculty 5 (Computer Science, Electrical Engineering & Mathematics)

Fürstenallee 11, D-33102 Paderborn, Germany

email : stb@uni-paderborn.de, rst@uni-paderborn.de

Abstract.XPath expressions play a central role in querying for XML frag-

ments. We present a containment test of two XPath queries which checks

whether a new XPath query XP1 can reuse a previous query result XP2. The

key idea is to transform XP1 into a graph which is used to search for sequences

of elements which are used in the XPath query XP2.

1. Introduction

1.1. Problem origin and motivation

The development of our XPath containment test was motivated by an XML data-base system that allows to cache and reuse previous query results (e.g. on a mobile client) for the evaluation of a new XPath query. Whenever we can prove that a previ-ous query result which is already stored on a client can be reused for a new XPath query, this may be a considerable advantage in comparison to shipping the fragment selected by an XPath query to the mobile client again. Our goal is to prove without access to data stored in the XML database, i.e. independent of the actual database state, that one XPath expression XP1 selects a subset of the data selected by another XPath expression XP2 – or as we say that XP1 is subsumed by XP2. We allow how-ever our tester to be incomplete, i.e., whenever our tester cannot efficiently decide whether or not an XPath query XP2 subsumes another query XP1, we allow the sub-sumption tester to return false, i.e., we assume that the previous query result cannot be reused. In this case, the query is sent to the database system. If however our algorithm returns true, we are then sure, that a previous query result can be reused. This paper focuses on the subsumption tester itself, whereas the reuse of old query results which are stored in the client’s cache for a new query is discussed in [2], and concurrency control issues, e.g. how we treat a cached query result, when concurrent transactions modify the original XML database fragment, are discussed in [3].

1.2. Relation to other work and our focus

Our contribution is related to other contributions to the area of containment tests for XPath and semi-structured data. Query containment on semi-structured data for other query languages has been examined e.g. by [4, 7]. In comparison, we examine

the XPath expressions themselves in order to decide whether or not the set of data se-lected by a new XPath query expression is a subset of the data selected by a previous XPath expression. For this purpose, we follow [5, 8, 9, 10], which also contribute to the solving of the containment problem for two XPath expressions under a given DTD. The focus of the contributions [5, 8, 9, 10] is to find a general solution to the containment tests for certain subclasses of XPath expressions, and to report on de-cidability results or to give upper and lower bounds for the complexity of the con-tainment test for certain subclasses of XPath expressions. However, in contrast to this, we focus on a fast decision as to whether or not an XPath query result can be reused and we allow our containment test to be incomplete.

The other contributions (e.g. [5, 8, 9, 10]) use tree patterns in order to normalize the XPath query expressions and to compare them to the structure of the database. They consider the DTD either as a set of constraints that has to be met [5, 10] or as an automaton [9]. In contrast to this, we follow [1] and use the concept of a DTD graph to expand all paths that are selected by an XPath expression, and we right-shuffle all filter expressions within sets of selected paths. Our transformation of an XPath query into a graph is similar to the transformation of an XPath query into an automaton, which was used in [6] to decide whether an XML document fulfills this query. How-ever, in contrast to all other contributions, our approach combines a graph based search for paths (selected by XP1 but not by XP2) with the right-shuffling of predi-cate filters in such a way, that the containment test for XPath expressions including all axes (except preceding (-sibling) and following (-sibling)) can be reduced to a con-tainment test of filters combined with a containment test of possible filter positions. 1.3 The supported subset of XPath expressions

An XPath expression is defined as being a sequence of location steps / / … / ,

where is defined as

axis-specifierI :: node-testI [predicate filterI].

Because XPath is a very rich language and we have to have a balance between the allowed complexity of XPath expressions and the complexity of the subsumption test algorithm on two of these XPath expressions, we restrict the set of XPath expressions to the following set of allowed XPath expression s:

1. Axis specifiers

We allow absolute or relative location paths with the following axis specifiers in their location steps: self, child, descendant, descendant-or-self, parent, ancestor, ancestor-or-self and attribute, and we forbid namespace, following (-sibling) and preceding (-sibling) axes.

2. Node tests

We allow all node name tests except name-spaces, but we forbid node type tests like text(), comment(), processing-instruction() and node(). We allow wildcards, but only at the end of an XPath query.

For example, when A is an attribute and E is an element, then ./@A, ./E, ./E/E, .//E , ./E/* and ./E/*//* are allowed XPath expressions.

3. Predicate filters

We restrict predicate filters of allowed XPath expressions to be either a simple predicate filter [B] with a simple filter expressions B or to be a compound predi-cate filter.

3.1. Simple filter expressions

Let be a relative XPath expression which uses the parent-axis, the at-tribute-axis or a child-axis location step, and let be a constant.

? is a simple filter expression. For example, when A is an attribute and E is an element, then @A and ./E are allowed filter expressions which are used

to check for the existence of an attribute A or an element ./E respectively.

?Each comparison = and each comparison != is a simple filter expression.1

?‘not ’ is a simple predicate filter, which means that the element or at-tribute described by does not exist.

3.2. Compound predicate filters

If [B1] and [B2] are allowed predicate filters, then ‘[B1] [B2]’, ‘[B1 and B2]’,‘[B1 or B2]’ and ‘[not (B1)]’ are also allowed predicate filters. In our subset of allowed predicate filters, ‘[B1] [B2]’ and ‘[B1 and B2]’ are equivalent, because we excluded the sibling-axes, i.e., we do not consider the order of sibling nodes.

1.4 Basic definitions and problem description

We use an XPath expression XP = // E1 / E2 // E3 // E4 / E5 in order to explain some basic terms that we use throughout the rest of the paper. The nodes selected by XP (in this case elements with the node name E5) can be reached from the root node by a path (that must contain at least the nodes ‘root’, E1, E2, E3, E4, and E5). All these paths will be called paths selected by XP or selected paths for short.

Since every path to elements selected by XP contains an element E1 which is di-rectly followed by an element E2, we call E1/E2 an element sequence. A single ele-ment (like E3) which does not require another element to directly follow it, will also be called an element sequence (consisting only of this single element). So, the ele-ment sequences in this example are: E1/E2, E3, and E4/E5.

The input of our tester consists of three parts: a DTD and two XPath expressions (XP1 and XP2) which are used for a query and a previous query result respectively. The goal is to prove, based on the DTD only, that the first XPath expression selects a subset of the node set selected by the second XPath expression. This should be done as efficiently as possible and without access to the actual XML database, because the subset test is completely executed on the client-side.

Throughout this paper, we use the term XP1 is subsumed by XP2 as a short notation for “XP1 selects a subset of the node set selected by XP2 in all XML documents which are valid according to the given DTD”. Given this definition, we can also state the subsumption test as follows: XP1 is subsumed by XP2, if and only if a path to an 1 Note that we can also allow for more comparisons, e.g. comparisons which contain the op-erators ‘<’, ‘>’, ‘ ? RU μ ? DQG FRPSDULVRQV OLNH SDWK! FRPSDULVRQ RSHUDWRU! SDWK ! ,Q such a case the predicate filter tester which is used as part of our tester in Section 3.5 would have to check more complex formulas.

element which is selected by XP1 and is not selected by XP2 can never exist in any valid XML document.

Section 2 describes the preparation steps for the subsumption test, i.e., how we transform the DTD into a so called DTD graph which was introduced in [1], and how we use this DTD graph in order to normalize the XPath expressions. Section 3 out-lines our two major algorithms, i.e. how we compute a so called XP1 graph which de-scribes all paths selected by XP1, and how we try to place XP2 sequences on each path of the XP1 graph in such a way that the filter of each XP2 sequence subsumes an XP1 filter.

2. DTD graph construction and normalization of XPath expressions

2.1 Translating the DTD into a DTD graph and a set of associated DTD filters

A directed DTD graph is a directed graph G=(N,C) where each node E∈N corre-sponds to an element of the DTD and an edge c ∈ C, c=(E1,E2) from E1 to E2 exists for each element E2 that is used to define the element E1 in the DTD. For example (Example 1), Figure 1 shows a DTD and the corresponding DTD-graph:

Figure 1. DTD and corresponding DTD graph of Example 1 The DTD graph can be used to check whether or not at least one path selected by XP1 (or XP2 respectively) exists. If there does not exist any path selected by XP1 (or XP2 respectively), then XP1 (or XP2) selects the empty node set. We consider this to be a special case, i.e., if XP2 selects the empty node set, we try to prove that XP1 also selects the empty node set. For the purpose of the discussion in the following sections, we assume that at least one path for XP1 (and XP2 respectively) exists.

The concept of the DTD graph represents an upper bound for the computation of possible paths for XP1, but it does not yet contain all the concepts of a DTD. In order to distinguish between optional child elements and mandatory elements and in order to distinguish between disjunctions and conjunctions found in the DTD, etc., we can associate additional DTD filters to each element of the DTD (and to each node of the DTD graph respectively). For example (Example 2), a DTD rule

< !element E1 ( E2 ,( E3+ | E4+ ), E5? ) >

is translated into the following DTD filter for the element (or DTD graph node) E1: [ ./E2 and ( ./E3 xor ./E4 ) and unique(E2) and unique(E5) ] .

The relative path ‘./E2’ which occurs in the filter requires every element E1 to have a child node E2, and unique(E2) states that there can not be more than one child node E2 per node E1. Since E5 is an optional child of E1, its existence is not required.

A complete tester would have to add the DTD filter for a node to each XPath expres-sion which selects (or passes) this node. These DTD filters can be used as predicate filters to both XPath expressions whenever the edge is passed. These filters can also be used to eliminate so called forbidden paths, i.e. paths which are contained in the DTD graph, but which are not allowed when also filters or DTD filters are regarded. Given the rule of Example 2, all paths which require the existence of both children, E3 and E4, (e.g. //E1[./E3]/E4) are forbidden paths which can be discarded. However, an incomplete tester may ignore some or even all of these DTD con-straints for the following reason. The number of valid documents can only be in-creased and can never be decreased by ignoring a DTD constraint. When XP1 is sub-sumed by XP2 within this relaxed DTD (represented by the DTD graph) which allows for even more paths, then XP1 is also subsumed by XP2 under the more restrictive DTD which was originally given. Therefore, a successful proof of the subsumption of XP1 by XP2 within the relaxed DTD is sufficient for the reuse of a previous query re-sult.

2.2 Element distance formulas

A further preparation step involves the use of the DTD graph in order to compute all possible distances for each pair of elements and to store these distances in a dis-tance table, called the DTD distance table[1]. We use the distances for the right-shuffling of predicate filters in Section 3. For example, the distance from E1 to E2 in the DTD graph of Example 1 is any positive odd number of child-axis location steps, i.e., the distance table contains an entry “2*x+1 (x≥0)” which describes the set of all possible distances from E1 to E2. The distance table entry “2*x+1 (x≥0)” rep-resents a loop of 2 elements (E1 and E2) in the DTD graph with a minimum distance of 1. Whenever two elements (E1,E2) occur in the DTD graph in a single loop which contains ‘c’ elements and the shortest path from E1 to E2 has the length k, then the DTD distance entry for the distance from E1 to E2 is ‘c*x+k (x≥0)’. Since alternative paths or multiple loops which connect one element to another may exist, the general form of an entry of the DTD distance table is

∑1≤i≤n ai*xi + k (xi≥0) or ... or ∑1≤j≤m bj*yj + p (yj≥0),

where ai,bj,k,p are natural numbers or 0.

Each name of a variable – x in the case of the previous example – is connected uniquely to one circle in the DTD graph and is called the circle variable of this circle.

2.3 Transformation, normalization and simplification of XPath queries

Before our two main algorithms start, we transform both XPath expressions into an equivalent normalized form by the following transformation steps. Firstly, relative XPath expressions are transformed into equivalent absolute XPath expressions. Sec-ondly, if the XPath expression does not start with a child-axis location step, we insert ‘/root’ at the beginning of the XPath expression, i.e., in front of the first location step of the XPath expression. ‘root’ corresponds to the root-node of the DTD, so that after this normalization all XPath expressions start with a child-axis location step ‘/root’.

If the XPath expression contains one or more parent-axis location steps or ances-tor-axis location steps, these parent-axis location steps and ancestor-axis location steps are replaced form left to right according to the following rules. Let LS1,…,LSn be location steps which do neither use the parent-axis nor the ancestor-axis, and let XPtail be an arbitrary sequence of location steps. Then we replace /LS1/…/LSn/child::E[F]/../XPtail with /LS1/…/LSn[./E[F]]/XPtail .

Similarly, in order to replace the first parent-axis location step in the XPath expres-sion /LS1/…/LSn/descendant::E[F]/../XPtail, we use the DTD graph in order to com-pute all parents P1,…,Pm of E which can be reached after LSn has been performed, and we replace the XPath expression with /LS1/…/LSn//(P1|…|Pm)[./E[F]]/XPtail .

In order to substitute an ancestor location step ancestor::E[F] in an XPath expres-sion /LS1/…/LSn/ancestor::E[F]/XPtail, we use the DTD graph in order to compute all possible positions where E may occur between the ‘root’ and the element selected by LSn. Depending on the DTD graph, there may be more than one position, i.e. we replace the given XPath expression with

( //E[F][/LS1/.../LSn] / XPtail ) | ( /LS1//E[F][/LS2/.../LSn]/XPtail ) | ... |

( /LS1/.../LSn-1/E[F][/LSn]/XPtail ) .

Similar rules can be applied in order to eliminate the ancestor-or-self-axis, the self-axis, and the descendent-axis, such that we finally have only child-axis and descen-dant-or-self-axis-location steps (and additional filters) within our XPath expressions.

Finally, nested filter expressions are eliminated, e.g. a filter [./E1[./@a and not (@b=”3”) ] ] is replaced with a filter [ ./E1 and (./E1/@a and not ./E1/@b=”3”) ] . More general, a nested filter [./E1[F1]] is replaced with a filter [./E1 and F1’] where the filter expression F1’ is equal to F1 except that it adds the prefix ./E1 to each loca-tion path in F1 which is defined relative to E1. This approach to the unnesting of filter expressions can be extended to the other axes and to sequences of location steps, such that after these unnesting steps, we do not have any nested filters.

3. The two major algorithms of our subsumption test

Firstly, we construct a graph which contains the set of all possible paths for XP1 in any valid XML document according to the given DTD. Then, XP1 is subsumed by XP2, if and only if for all paths for XP1 which are allowed by the DTD the following holds: the path for XP1 contains all sequences of XP2 in the correct order and for each XP2 sequence which has a filter there exists a corresponding XP1 node with a filter which is as least as restrictive as the filter attached to the XP2 sequence. In other words, if one path selected by XP1 which does not contain all sequences of XP2 in the correct order is found, then XP1 is not subsumed by XP2.

3.1 Extending the DTD graph to a graph for paths selected by XP1 (Algorithm 1)

In order to represent the set of paths selected by XP1, we use a graph which we will call the (rolled out) XP1 graph in the remainder of the paper [1]. Each path se-lected by XP1 corresponds to one path from the root node of the XP1 graph to the node(s) in the XP1 graph that represent the selected node(s). The XP1 graph contains

a superset of all paths selected by XP1, because some paths contained in the XP1 graph may be forbidden paths, i.e. paths that have predicate filters which are incom-patible with DTD constraints and/or the selected path itself (c.f. Section 2.1). We use the (rolled out) XP1 graph in order to check whether or not it contains all the se-quences of XP2, and if so, we are then sure that all of the paths selected by XP1 con-tain all the sequences of XP2.

Example 3: Consider the DTD graph of Example 1 and an XPath expression ’XP1new=/root/E2/E1/E2//E3’, which requires that all XP1 paths start with the ele-ment sequence /root/E2/E1/E2 and end with the element E3. The rolled out XP1 graph for the XPath expression XP1new is

Fig. 2. XP1 graph of Example 3

where node labels visualize the nodes of the paths selected by XP1 and the edge la-bels visualize the possible distances between two nodes.

The (rolled out) XP1 graph depends on the DTD graph and on the given XPath ex-pression, i.e., the XP1 graph for the DTD graph given in Section 2.1 and the XPath expression XP1 = //E3 is identical to the DTD graph given in Example 1 (as long as we ignore the distance labels attached to the edges). The following algorithm com-putes the XP1 graph from a DTD graph given and an XPath expression XP1:

G RAPH G ET XP1G RAPH(G RAPH DTD, XP ATH XP1)

(1){

(2) G RAPH XP1Graph = NEW G RAPH ( DTD.G ET R OOT() );

(3) N ODE lastGoal = DTD.G ET R OOT();

(4)while(not XP1.I S E MPTY()) {

(5) N ODE goalElement = XP1.R EMOVE F IRST E LEMENT();

(6)if (XP1.L OCATION S TEP B EFORE(goalElement) == ‘/’)

(7) XP1Graph.A PPEND( N ODE(goalElement) );

(8) else

(9) XP1Graph.E XTEND(

(10) DTD.C OMPUTE R EDUCED DTD(lastGoal,goalElement));

(11) lastGoal = goalElement;

(12) }

(13)return XP1Graph;

(14)}

The algorithm generates a sequence of XP1 graph nodes for each element se-quence of XP1. Furthermore, for each descendent-axis step E1//E2 which occurs in XP1, the algorithm inserts a subgraph of the DTD graph, called the reduced DTD graph for paths from E1 to E2. This reduced DTD graph represents all paths from E1 to E2 (which are allowed according to the DTD graph) between the nodes for E1 and E2. The method C OMPUTE R EDUCED DTD(lastGoal,goalElement)returns such a

subgraph of the DTD graph which only contains edges which are part of a path from lastGoal to goalElement.

If XP1 ends with //* (or /* respectively), i.e., XP1 is of the form XP1 = XP1’//*, the XP1graph is computed for XP1’. Afterwards one reduced DTD graph that con-tains all nodes that are successors of the endnode of the XP1’graph (or that contains all nodes that can be reached within one step from the end node respectively) is ap-pended to the end node of the XP1’graph and all these appended nodes are marked as end nodes of the XP1 graph.

During the execution of Algorithm 1, we compute the distances between adjacent nodes and we attach them as labels to the edges between them. We distinguish two different types of edges: Edges which are generated by a method-call XP1Graph.A PPEND(N ODE(goalElement)) append only a single node and have the label “1”. However, edges which are copied from reduced DTD graphs may con-tain a distance formula which can be looked up in the DTD distance table.

3.2 XP1 graph sequences

A path in the XP1 graph where each node except the last one and the first one has ex-actly one outgoing edge will be called a XP1 graph sequence in the remainder of this document.

If one node N has more than one outgoing edges, but all of these edges point to nodes that have exactly the same node label, these nodes are as well added to the XP1 graph sequence that ends in N.

For example the XP1 graph sequences of the XP1 graph of Example 3 are root→E2→E1→E2→E1 and E1→E3.

3.3 Combining XP2 predicate filters within each XP2 sequence

Before Algorithm 2 starts, we perform a further normalization step with the XPath expression XP2. Within each sequence of XP2 we shuffle all filters to the rightmost element that carries a filter expression itself, so that after this normalization step all filters within this sequence are attached to one element.

To shuffle a filter one location-step to the right means to add one parent-axis loca-tion step to the path within the filter expression and to attach it to the next location step. For example the XPath expression XP2 = // E1[./@b] / E2[./@a] / E3 will be transformed into the equivalent XPath expression XP2’=//E1/E2[../@b and ./@a]/E3.

3.4 Placing one XP2 element sequence with its filters in the XP1 graph

Within Algorithm 2 (Section 3.7), we need a procedure which we call B OOLEAN P LACE F IRST S EQUENCE(in XP1Graph,inout XP2,inout startNode). It tests for a given XP2 sequence and a given startNode within an XP1 graph whether this sequence can be placed successfully in the XP1 graph beginning at startNode,

such that each filter of the XP2 sequence subsumes an XP1 filter (as outlined in Sec-tion 3.5).

Because we want to place XP2 element sequences in paths selected by XP1, we de-fine the correspondence of XP2 elements and XP1 graph nodes as follows. An XP1 graph node and a node name which occurs in an XP2 location step correspond to each other, if and only if the node has a label which is equal to the element name of the lo-cation step. We say, a path (or a node sequence) in the XP1 graph and an element se-quence of XP2 correspond to each other, if the n-th node corresponds to the n-th ele-ment for all nodes in the XP1 graph node sequence and for all elements in the XP2 element sequence.

The procedure P LACE F IRST S EQUENCE(…,…,…)checks whether or not each path in the XP1 graph which begins at startNode fulfils the following two conditions: firstly, the path has a prefix which corresponds to the first sequence of XP2 (i.e. the node sequences that correspond to the first element sequence of XP2 can not be cir-cumvented by any XP1 path), secondly, if the first sequence of XP2 has a filter, then this filter subsumes at least one filter given for each XP1 path.

In general, there may exist more than one path in the XP1 graph which starts at startNode and corresponds to a given XP2 sequence, and therefore there may be more than one XP1 graph node which corresponds to the final node of the XP2 element se-quence. The procedure P LACE F IRST S EQUENCE(…,…,…)internally stores that final node which is nearest to the end node of the XP1 graph (we call it the last final node).2 If only one path that begins at startNode is found which does not have a prefix cor-responding to the first sequence of XP2 or which does not succeed in the filter impli-cation test described in Section 3.5, then the procedure P LACE F IRST S EQUENCE(…,…,…) does not change XP2, does not change startNode, and returns false.

If however the XP2 sequence can be placed on all paths and the filter implication test is successful for all paths, then the procedure removes the first sequence from XP2, copies the last final node to the inout parameter startNode and returns true.

3.5 A filter implication test for one XP2 element sequence and one path in the XP1 graph

For this section, let us consider only one XP2 sequence E1/…/En and only one path in the XP1 graph that starts at a given node corresponding to E1.

As described in Section 2.3 the filters within one XP2 sequence are normalized, so that all filters are attached to exactly one element which we will call the current ele-ment. When given a startNode and a path of the XP1 graph, the node which corre-sponds to the current element is called the current node.

Within the first step we shuffle all predicate filters of the XP1 XPath expression which are attached to nodes that are predecessors of the current node into the current node. To shuffle a filter expression from one node into another one means to simply attach (../)d at the beginning of the path expression inside this filter expression, 2 When we place the next XP2 sequence at or ‘behind’ this last final node, we are then sure, that this current XP2 sequence has been completely placed before the next XP2 sequence, whatever path XP1 will choose.

whereas d is the distance from the first node to the other node. This distance can be calculated by summing up all distances of the paths that have to be passed from the first node to the second one.

After shuffling an XP1 filter to the right, an implication test is performed on this right-shuffled XP1 filter [f1] and the XP2 filter [f2] attached to the current node.

XP1 is subsumed by XP2, if and only if [f1] is at least as restrictive as [f2], i.e., f1?f2. For example, if the input contains only two filters [f1]=[../@a=”5”] and [f2]=[../@a], then [f1] is more restrictive than [f2], because the implication f1?f2 holds, but f2?f1 does not hold.

Let d1 be a distance formula of a filter of XP1 and d2 be a distance formula of a filter in a location step of XP2, i.e. let XP1 have a filter [f1]=[(../)d fexp1i] and let XP2 have a filter [f1]=[(../)d fexp1i]. Both distance formulas d1 and d2 depend on (zero ore more) circle variables x1,…,xn of the XP1 graph, and d2 may additionally depend on x’ where (x≥x’≥0) and x is equal to one of the circle variables x1, …, xn. We say, a loop loop1=(../)d1is subsumed by a loop3loop2=(../)d2, if all distances which are selected by loop1 are also selected by loop2. More specifically, loop1 is subsumed by loop2, if for all possible tuples of values for [x1, …, xn] x’ can be found so that d1=d2 holds. Whenever a loop1 of a filter [f1i] of XP1 is subsumed by a loop2 of a filter [f2j] of XP2, then XP2 can place its filter [f2j] in such a way, that it is ap-plied to the same elements as [f1i].

However, for a counter-example, let XP1 be //E3 [(../)2*x+1@a] (x≥0) and XP2 be //E3 [(../)2*x+3@a] (x≥0), i.e., the loop 2*x+1 is not subsumed by the loop 2*x+3, then XP1 can choose x=0 (i.e. place its filter [@a] to the parent of E3), but XP2 has to place its filter [@a] to some previous ancestor element, say Ep. Because XP1 has no filter placed on Ep, XP1 includes also elements Ep which do not have an attribute ‘a’, i.e., XP1 is not subsumed by XP2.

Altogether, a filter [f1] of XP1 is subsumed by a filter [f2] of XP2 attached to the same node, if loop1 is subsumed by loop2 and fexp1i?fexp2j. Since both filter ex-pressions that occur in the implication fexp1i?fexp2j contain no loops, we can use a predicate tester for XPath expressions (e.g. [1]), which extends a theorem prover for Boolean logic in such a way that it takes the special features of simple XPath expres-sions into account (e.g. the tester has to consider that [not ./@a=”5” and not ./@a!=”5”] is equivalent to [not ./@a]).

If the tester returns for one XP2 filter that this filter is subsumed by the XP1 filter, this XP2 filter is discarded. This is performed repeatedly until either all XP2 filters are discarded or all XP1 filters which are attached to nodes that are predecessors of the current node are shuffled into the current node.

If then not all XP2 filters are discarded, in a second step, all these remaining filters are right-shuffled into the next node to which an XP1 filter is attached. It is again tested, if one of the XP2 filters can be discarded, as this XP2 filter subsumes the XP1 filter. This is as well performed until either all XP2 filters are discarded (then the fil-ter implication test returns true) or all XP1 filters have been checked and at least 3 The definition one loop is subsumed by another also includes paths without a loop, because distances can be a constant value.

one XP2 filter remains that does not subsume any XP1 filter (then the filter implica-tion test returns false).

We have a special case, if the XP2 sequence consists only of one element and this element corresponds to an XP1 graph node which lays on a circle, or if all elements of the XP2 sequence correspond to XP1 graph nodes which lay on exactly one circle. In comparison to an XP1 filter which is right-shuffled over a circle by adding c*x+k to the filter distance (where c is the number of elements in the circle, k is the shortest distance that the filter can be shuffled, and x is the circle variable which describes how often a particular path follows the circle of the XP1 graph), an XP2 filter as de-scribed above is right-shuffled by adding c*x’+k to the filter distance (where c, x, and k are defined as before and x≥x’≥0). While x describes the number of times XP2 has to pass the loop in order to select the same path as XP1, the x’ within (x≥x’≥0) de-scribes the number of times the circle is passed, after XP2 has set its filter.

3.6 Including DTD filters into the filter implication test

The DTD filter associated with a node can be used to improve the tester as follows. For each node on a path selected by XP1 (and XP2 respectively) the DTD filter [FDTD] associated with that node must hold. For the DTD given in Example 2, we conclude in Section 2.1, that there can not exist a node E1 which has both, a child node E3 and a child node E4, i.e. (ignoring the other DTD filter constraints for E1) the DTD filter for each occurrence of E14 is [FDTD_E1]=[not (./E3 and ./E4)]. Let further an XP1 graph node E1 have a filter [F1]=[./E3], and the corresponding ele-ment sequence of XP2 consist of only the element E1 with a filter [F2]=[not (./E4) ] . We then can conclude that

FDTD_E1 and F1 ? FDTD_E1 and F2 ,

i.e., with the help of the DTD filter, we can prove that the XP1 filter is at least as spe-cific as the XP2 filter. Of course, the implication can be simplified to FDTD_E1 and F1 ? F2 .

More general, for each node E1 in the XP1 graph which is referred to by an XP2 fil-ter, we can include the DTD filter [FDTD_E1] which is required for all elements E1, and right-shuffle it like an XP1 filter. That is how the filter implication test above and the Algorithm 2 described in the next section can be extended to include DTD filters.

3.7 Placing all XP2 sequences using the XP1 graph (Algorithm 2)

The following algorithm is used on an XPath expression XP2 and the XP1 graph, and it tests whether or not all paths in the XP1 graph contain all sequences of XP2 (in the correct order). If we find at least one path from the root node to the selected node in the XP1 graph which does not contain all the element sequences of XP2 in the cor-rect order (and this path is not a forbidden path), then XP1 is not subsumed by XP2.

4 Note that the DTD filter has to be applied to each occurrence of a node E1, in comparison to an XP1 filter or an XP2 filter assigned to E1, both of which have to applied to a single occur-rence of E1 only.

The algorithm stated below firstly (line (3)) deals with the special case that XP2 consists of only one sequence, i.e. contains no descendant-axis location step. In this case, XP1 is subsumed by XP2, if the XP1 graph consists of only one path and this path corresponds to the sequence.

The main part of Algorithm 2 (starting at line (4)) solves the case that XP2 consists of more than one element sequence. A special treatment of the first sequence of XP2 (lines (6)-(8)) is required for the following reason. Because the first sequence of XP2 has to be placed in such a way that it starts at the root node and (if XP1 is subsumed by XP2) all paths selected by XP1 must start with this sequence, a prefix of each path which starts at the root node of XP1 graph must correspond to the first element se-quence of XP2.

This test is performed (at line (7)) by a call of the procedure B OOLEAN P LACE F IRST S EQUENCE(…,…,…) which computes a new startNode and either re-turns true (i.e. the sequence has been placed successfully) or false (i.e. no place for the sequence has been found).

The middle sequences are placed by the middle part of the algorithm (lines (9)-(14)). In order to find the next possible startNode in the XP1 graph which corre-sponds to the first element of the currently first sequence of XP2, our algorithm calls a function S EARCH(in XP1Graph,in XP2,in startNode). As the next XP2 sequence must be contained in each path of the XP1 graph, the new startNode has to be only searched among an XP1 graph sequence. If such a next startNode does not exist, the function S EARCH(…,…,…) then returns null.

Whenever the call of S EARCH(…,…,…) returns a startNode (which is different to null), this startNode is then the candidate for placing the next XP2 sequence. That is why we call the procedure P LACE F IRST S EQUENCE(…,…,…) again (line (12)). If this procedure call returns false, the current startNode is not a successful candidate for placing the sequence, and a call of N EXT N ODE(…,…,…)looks for the next node that is common to all paths in XP1 graph, as this is the first possible posi-tion of the new place for the XP2 sequence.

This middle part of the algorithm is continued until only one sequence remains in XP2. Similar to the first sequence of XP2, the last XP2 sequence is again a special case. However, this time it has to be ensured that each path which starts at the current startNode must have a suffix which corresponds to the last XP2 element sequence (if XP1 is subsumed by XP2), because the last XP2 sequence has to be placed in such a way that it terminates at the selected element of XP1 (line (15)).

This leads us to the following algorithm which decides (as long as filter implica-tion tests are ignored) in polynomial time whether or not each path selected by XP1 contains all sequences of XP2 in the correct order.

(1)B OOLEAN P LACE XP2(G RAPH XP1Graph, XP ATH XP2)

(2){ if(XP2.C ONTAINS O NLY O NE S EQUENCE())

(3)return (XP1Graph consists only of one path and this path corresponds to XP2 and the

XP2 filter subsumes an XP1 filter)

(4)else // XP2 contains multiple sequences

(5) { //place the first sequence of XP2:

(6) startNode:=XP1Graph.get ROOT() ;

(7) if(not P LACE F IRST S EQUENCE(XP1Graph,XP2,startNode))

(8) return false;

// place middle sequences of XP2:

(9) while (XP2.containsMoreThanOneSequence())

(10) { startNode:=S EARCH(XP1Graph,XP2,startNode);

(11)if(startNode == null) return false;

(12)if(not P LACE F IRST S EQUENCE(XP1Graph,XP2,startNode))

(13) startNode:= N EXT N ODE(XP1Graph,XP2,startNode);

(14) }

//place last sequence of XP2:

(15)return (each path from startNode to one endNode

has a suffix corresponding to XP2 and

the XP2 filter subsumes an XP1 filter)

(16) }

(17)}

If XP2 ends with //*, i.e. XP2 is of the form XP2 = XP2’//* , the algorithms is per-formed with XP2’ as above until XP2’ contains only one sequence, but the condition in line (15) is changed to (each path from startNode to one endNode contains a path corresponding to XP2 and each XP2 filter subsumes an XP1 filter).

If XP2 ends with /*, i.e. XP2 is of the form XP2 = XP2’/*, the algorithms is per-formed with XP2’ as above until XP2’ contains only one sequence, but the condition in line (15) is changed to (each path from startNode to a direct predecessor of one endNode has a suffix corresponding to XP2 and each XP2 filter subsumes an XP1 filter).

3.8 An extended example

We complete Section 3 with an extended example that includes all three major steps of Algorithm 2. Consider the DTD of Example 1 and the following XPath ex-pressions

XP1 = / root / E2[./@b] / E1[./@c=7] / E2 // E3[../../@a=5] and

XP2 = // E2 / E1[./../@b] // E2[./@a] // E1 / * (Example 4).

Figure 3. XP1 graph of Example 4

Step1: The XP1 graph is computed. Figure 3 shows which filters of XP1 are at-tached to which XP1 graph nodes. In order to be able to explain the algorithm we

have assigned in this example an id to each node of the XP1 graph. The two XP1 se-quences are root→E2→E1→E2→E1 and E1→E3.

Step2: XP2 is transformed into the equivalent XPath expression XP2= / root // E2/E1 [(../)1@b] // E2 [./@a] // E1 / * .

Step3: Algorithm 2 is started. For the first XP2 sequence (i.e. ‘root’), the corre-sponding node (i.e. the node with id 1) is found. The next sequence of XP2 to be placed is E2/E1[(../)1@b] and one corresponding path in the XP1 graph is E2→E1, whereas E2 has id 2 and E1 has id 3. The node with id 3 is our current node. Now the XP1 filter [./@b] of node 2 is right-shuffled into the current node and thereby trans-formed into [(../)1@b]. Because this filter is subsumed by the filter of the current XP2 sequence, the filter of the current XP2 sequence is discarded. Since each filter of XP2 is discarded, the sequence is successfully placed, and the startNode is set to be node 3.

The next sequence of XP2 to be placed is E2[./@a]. The first corresponding node is the node with id 4, which is now the current node. The filters of node 2 and node 3 are shuffled into the current node and are transformed into one filter [(../)2@b and (../)1@c=7], but this filter is not subsumed by the filter [./@a] of the actual XP2 se-quence. That is why afterwards the filter [./@a] is shuffled into the next node to which an XP1 filter is attached (i.e. into node 6). Thereby, the XP2 sequence filter [./@a] is transformed into [(../)2x’+2@a], (x≥x’≥0) - the distance formula contains the variable x’, as the XP2 sequence contains only one element which corresponds to an XP1 graph node which is part of a circle. As x’ can be chosen to be 0 for each value x≥0, the filter attached to node 6 (which is equivalent to [(../)2@a=5]) is subsumed by the filter of the current XP2 sequence. Altogether, the current XP2 element sequence is successfully placed, and the startNode is set to be node 4. As now only the se-quence E1/* remains in XP2 which ends with /*, it is tested whether or not each path from the startNode (i.e. node 4) to the predecessor of the endNode (i.e. node 5) has a suffix which corresponds to E1, which is true in this case.

Therefore the result of the complete test is true, i.e., XP1 is subsumed by XP2. XP1 therefore selects a subset of the data selected by XP2.

4. Summary and Conclusions

We have developed a tester that checks whether or not a new XPath query XP1 is subsumed by a query XP2. Before we apply the two main algorithms of our tester, we normalize the XPath expressions XP1 and XP2 in such way that we thereafter only have to consider child-axis and descendent-or-self-axis location steps. Furthermore, nested filters are unnested, and thereafter within each XP2 element sequence filters are right-shuffled into the right-most location step of this element sequence which contains a filter.

Different from other contributions to the XPath containment problem, we trans-form the DTD into a DTD graph and DTD filters, and we use this DTD graph in order to compute the so called XP1 graph, i.e., a graph which contains all paths selected by XP1. This allows us to split the subsumption test into two parts: a placement test for

XP2 element sequences in the XP1 graph and an implication test for filter expres-sions.

Our implication test on predicate filters tries to find an equivalent or more specific filter of XP1 (including the DTD filters) for each filter of XP2, whereas two predicate filters can be compared by independently examining the filter distance formulas and the filter expressions of these predicate filters. Finally, the choice of the implication tester for filter expressions is independent of our other ideas, i.e., we can choose a powerful but complex tester in order to test a larger subset of XPath expressions or we can choose a fast predicate tester, which is either incomplete or covers only a smaller subset of XPath.

The results presented here seem to be not limited to DTDs, i.e., we consider the transfer of the presented results to XML Schema to be a challenging research topic. References:

[1] S. B?ttcher, R. Steinmetz: Testing Containment of XPath Expressions in order to Reduce

the Data Transfer to Mobile Clients. ADBIS 2003

[2] S. B?ttcher, A. Türling: XML Fragment Caching for Small Mobile Internet Devices. 2nd

International Workshop on Web-Databases. Erfurt, Oktober, 2002. Springer, LNCS 2593, Heidelberg, 2003.

[3] S. B?ttcher, A. Türling: Transaction Validation for XML Documents based on XPath. In:

Mobile Databases and Information Systems. Workshop der GI-Jahrestagung, Dortmund, September 2002. Springer, Heidelberg, LNI-Proceedings P-19, 2002.

[4] Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Moshe Y. Vardi: View-

Based Query Answering and Query Containment over Semistructured Data. DBPL 2001: 40-61

[5] Alin Deutsch, Val Tannen: Containment and Integrity Constraints for XPath. KRDB 2001

[6] Yanlei Diao, Michael J. Franklin: High-Performance XML Filtering: An Overview of

YFilter, IEEE Data Engineering Bulletin, March 2003[13] Gerome Miklau, Dan Suciu: Containment and Equivalence for an XPath Fragment. PODS 2002: 65-76

[7] Daniela Florescu, Alon Y. Levy, Dan Suciu: Query Containment for Conjunctive Queries

with Regular Expressions. PODS 1998: 139-148

[8] Gerome Miklau, Dan Suciu: Containment and Equivalence for an XPath Fragment. PODS

2002: 65-76

[9] Frank Neven, Thomas Schwentick: XPath Containment in the Presence of Disjunction,

DTDs, and Variables. ICDT 2003: 315-329

[10] Peter T. Wood: Containment for XPath Fragments under DTD Constraints. ICDT 2003:

300-314.

Xpath工具使用教程

https://www.doczj.com/doc/357268884.html, Xpath工具使用教程 本教程告诉大家如何使用八爪鱼内置的Xpath工具。 一、常见使用场景 在日常使用八爪鱼采集数据时,偶尔会出现一些特殊情况,比如说某个采集步骤因为网页或八爪鱼识别的问题,定位发生了偏差,导致自动生成的Xpath有一点问题,采集出错。这个时候需要我们手写Xpath来定位想要设置的步骤,而八爪鱼有个内置的Xpath工具,可以帮助大家写一些简单的Xpath位置(除了打开网页步骤没有Xpath工具以外,其他步骤都有)。 二、Xpath工具位置 Xpath工具可以在两个地方打开。 一个入口是:登陆进去后的软件首页-工具箱里可以直接打开。

https://www.doczj.com/doc/357268884.html, 另一个入口是:流程中步骤的“自定义”按钮,点击进入

https://www.doczj.com/doc/357268884.html, 点击“自定义”按钮后,点击“不懂xpath,试试xpath工具” 三、Xpath工具界面介绍 打开xpath工具,该工具界面主要分为五个部分:

https://www.doczj.com/doc/357268884.html, 左上是填写网址 左中是浏览器 左下是页面HTML 源码(由于 xpath 工具的网页源码层次不分明,查看源码的话建议使用火狐浏览器的插件firebug 和firepath ,这是xpath 的入门教程,新用户有兴趣的也可以去学习一下: https://www.doczj.com/doc/357268884.html,/tutorial?type=1&category=XPath&version=v7.00) 右上是定位参数(工具将根据你填写的参数生成Xpath ) 右下是按要求点击生成后匹配到的xpath 1、我们来看一下定位参数

XPath注入攻击原理及防御

XPath注入攻击原理及防御 作者美创科技安全实验室 01什么是XPath XPath即为XML路径语言,是W3C XSLT标准的主要元素,它是一种用来确定XML(标准通用标记语言的子集)文档中某部分位置的语言。 XPath基于XML的树状结构,有不同类型的节点,包括元素节点,属性节点和文本节点,提供在数据结构树中找寻节点的能力,可用来在XML文档中对元素和属性进行遍历。 02XPath基础语法 1、查询基本语句 //users/user[name/text()=’abc’and password/text()=’test123’]。 这是一个XPath查询语句,获取name为abc的所有user数据,用户需要提交正确的name和password才能返回结果。如果黑客在name字段中输入:'or1=1并在password中输入:'or1=1就能绕过校验,成功获取所有user数据 //users/user[name/text()=''or1=1and password/text()=''or1=1] 2、节点类型 在XPath中,XML文档被作为节点树对待,XPath中有七种结点类型:元素、属性、文本、命名空间、处理指令、注释以及文档节点(或成为根节点)。文档的根节点即是文档结点;对应属性有属性结点,元素有元素结点。 element(元素) attribute(属性) text(文本) namespace(命名空间) processing-instruction(处理指令) comment(注释) root(根节点) 3、表达式 XPath通过路径表达式(Path Expression)来选取节点,基本规则:

Python Selenium 常用功能(实战详解)

1.2 把下载好的chromedriver.exe放到Python安装目录下,下载方法 二、启动浏览器 2.1 普通启动方式 #!/usr/bin/python3 # encoding:utf‐8 from selenium import webdriver #启动Firefox浏览器 #browser = webdriver.Firefox() #启动IE浏览器 #browser = webdriver.Ie() #启动Chrome浏览器 #指定驱动方式启动:webdriver.Chrome(executable_path="D://chromedriver.exe") browser = webdriver.Chrome() browser.get("https://www.doczj.com/doc/357268884.html,") 2.2 Headless启动方式 说明:浏览器的无界面形态,无需打开浏览器即可运行,此种方式只chrome60+版本#!/usr/bin/python3 # encoding:utf‐8 from selenium import webdriver chrome_hless = webdriver.ChromeOptions() # 使用headless无界面浏览器模式 chrome_hless.add_argument('‐‐headless') chrome_hless.add_argument('‐‐disable‐gpu') # 启动浏览器,获取网页源代码 browser = webdriver.Chrome(chrome_options=chrome_hless) mainUrl = "https://https://www.doczj.com/doc/357268884.html,/" browser.get(mainUrl) print(browser.title) browser.quit() ''' 运行之后结果打印百度标题: 百度一下,你就知道 ''' 三、元素定位

八爪鱼如何通过xpath实现自定义定位元素

https://www.doczj.com/doc/357268884.html, 八爪鱼如何通过xpath实现自定义定位元素 定位元素:八爪鱼通过Xpath来实现元素的定位。 适用情况:八爪鱼自动定位方式不能满足需求的情况。 下面演示如何通过自定义定位元素方式来修改元素匹配的Xpath,借此修改提取元素步骤采集到的数据。 示例网址: https://www.doczj.com/doc/357268884.html,/guide/demo/genremoviespage1.html 步骤一:点击自定义采集下的立即使用→输入网址并保存 自定义定位元素方式-图1

https://www.doczj.com/doc/357268884.html, 自定义定位元素方式-图2 步骤二:点击采集位置→循环采集元素→补充并修改提取元素步骤 自定义定位元素方式-图3

https://www.doczj.com/doc/357268884.html, 自定义定位元素方式-图4 说明:循环采集元素会采集所有信息,我们在补充并修改提取元素步骤进行了删除第一个字段操作,同时添加了我们需要的正确字段。 步骤三:修改自定义定位元素方式 选中要修改的字段→点击高级选项中自定义数据字段(如下图) →点击自定义定位元素方式 进入自定义定位元素方式后,我们在下图红框处修改Xpath

https://www.doczj.com/doc/357268884.html, 自定义定位元素方式-图6 其中元素匹配的Xpath是指可以通过这个Xpath路径在网页中直接找到所需数据的路径;相对Xpath指相对于循环Xpath的路径,将循环中的Xpath接上相对Xpath路径就可以生成一条直接匹配元素的路径。下面进行演示。 演示中使用了火狐浏览器的Firebug插件,详细使用情况请到Xpath使用教程中查看。 自定义定位元素方式-图7

java_Dom4j解析XML详解

学习:Dom4j 1、DOM4J简介 DOM4J是https://www.doczj.com/doc/357268884.html, 出品的一个开源XML 解析包。DOM4J应用于Java 平台,采用了Java 集合框架并完全支持DOM,SAX 和JAXP。 DOM4J 使用起来非常简单。只要你了解基本的XML-DOM 模型,就能使用。 Dom:把整个文档作为一个对象。 DOM4J 最大的特色是使用大量的接口。它的主要接口都在org.dom4j里面定义:

接口之间的继承关系如下: interface https://www.doczj.com/doc/357268884.html,ng.Cloneable interface org.dom4j.Node interface org.dom4j.Attribute interface org.dom4j.Branch interface org.dom4j.Document interface org.dom4j.Element interface org.dom4j.CharacterData interface org.dom4j.CDATA interface https://www.doczj.com/doc/357268884.html,ment interface org.dom4j.Text interface org.dom4j.DocumentType interface org.dom4j.Entity interface org.dom4j.ProcessingInstruction 2、XML文档操作1 2.1、读取XML文档: 读写XML文档主要依赖于org.dom4j.io包,有DOMReader和SAXReader两种方式。因为利用了相同的接口,它们的调用方式是一样的。 public static Docum ent load(String filenam e) { Document docum ent =null; try { SAXReader saxReader = new SAXReader(); docum ent =saxReader.read(new File(filename)); //读取XML文件,获得docum ent 对象 } catch (Exception ex) { ex.printStackTrace();

最简单的黑客入门教程大全

最简单的黑客入门教程大全 目录 1 黑客简介 (3) 2 保护自己电脑绝对不做黑客肉鸡 (4) 3 抓肉鸡的几种方法 (8) 4 防止黑客通过Explorer侵入系统 (17) 5 SQL注入详解 (19) 5.1 注入工具 (20) 5.2 php+Mysql注入的误区 (21) 5.3 简单的例子 (23) 5.4 语句构造 (26) 5.5 高级应用 (42) 5.6 实例 (50) 5.7 注入的防范 (55) 5.8 我看暴库漏洞原理及规律1 (56) 5.9 我看暴库漏洞原理及规律2 (61) 6 跨站脚本攻击 (65) 6.1 跨站脚本工具 (65) 6.2 什么是XSS攻击 (66)

6.3 如何寻找XSS漏洞 (66) 6.4 寻找跨站漏洞 (67) 6.5 如何利用 (67) 6.6 XSS与其它技术的结合 (71) 7 XPath注入 (71) 7.1 XPath注入介绍 (71) 7.2 XPath注入工具 (76) 声明:文章来源大多是网上收集而来,版权归其原作者所有。

1黑客简介 "黑客"(hacker)这个词通常被用来指那些恶意的安全破坏者。关于"黑客"一词的经典定义,最初来源于麻省理工学院关于信息技术的一份文档,之后便被新闻工作者们长期使用。但是这个在麻省理工被当做中性词汇的术语,却逐渐被新闻工作者们用在了贬义的环境,而很多人也受其影响,最终导致了"黑客"一词总是用于贬义环境。有些人认为,我们应该接受"黑客"一词已经被用滥并且有了新的意义。他们认为,如果不认可这种被滥用的词汇,那么将无法与那些不懂技术的人进行有效的交流。而我仍然认为,将黑客和恶意的骇客(cracker)分开表述,对交流会更有效,比如使用"恶意的安全骇客"会更容易让对方理解我所指的对象,从而能够达到更好的沟通交流效果,也避免了对"黑客"一词的滥用。之所以要区分黑客和恶意骇客,是因为在某些情况下,我们讨论的对象是那些毫无恶意并且不会对安全防御或者用户隐私造成损害的对象,这些人只有用"黑客"这个词来描述才最贴切。如果你只是简单的将"黑客"和"恶意的安全骇客"划等号,将无法在与人交流安全技术问题时,轻松的分辨别人所指的到底是哪种类型的人。黑客和骇客的区别是,黑客仅仅对技术感兴趣,而后者则是通过技术获取职业发展或者谋生。很多黑客和骇客都具有技术天赋,有些骇客据此进行职业发展。当然,并不是每个有技术天赋的人都必须沿着黑客或者骇客的方向发展。黑客这个术语的经典意义是指那些对于事物如何工作非常感兴趣的人,他们修理,制作或者修改事物,并以此为乐。对于某些人来说,这个词并不准确,而对于另一些人来说,黑客意味着最终能完全掌握某些事情。根据RFC1392的记载,互联网用户词汇将"黑客"定义为:迷恋于获取某些系统尤其是计算机和计算机网络系统内部运作机制的人。而这个词经常被错误的用于贬义环境。在贬义环境中,正确的用词应该是"骇客"。TheJargonWiki对于"黑客"的首次定义为:迷恋于探知可编程系统细节以及如何扩展其功能的人,与大多数只需了解系统基本知识的人

【黑马程序员】使用DOM4J+XPATH解析带有schema约束的XML文件

【黑马程序员】使用DOM4J+XPATH 解析带有schema 约束的XML 文件 当在XML 文件中引入了外部约束,使用了命名空间的时候,如果要使用DOM4J+XPATH 解析XML 文件 可能会出现解析不到节点内容的问题,下面给出一种解决办法。 【步骤一】准备XML 文件和约束文件 XML 文件(aaa.xml ,该文件放置在src 目录下): 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 名称1 1992-11-09 名称2 1999-03-03 约束文件(members.xsd ): 01 02 03 04 05 06 07 08 09 10 11

四种XML解析器比较

1.详解 1)DOM(JAXP Crimson解析器) DOM是用与平台和语言无关的方式表示XML文档的官方W3C标准。DOM是以层次结构组织的节点或信息片断的集合。这个层次结构允许开发人员在树中寻找特定信息。分析该结构通常需要加载整个文档和构造层次结构,然后才能做任何工作。由于它是基于信息层次的,因而DOM被认为是基于树或基于对象的。DOM以及广义的基于树的处理具有几个优点。首先,由于树在内存中是持久的,因此可以修改它以便应用程序能对数据和结构作出更改。它还可以在任何时候在树中上下导航,而不是像SAX那样是一次性的处理。DOM使用起来也要简单得多。 2)SAX SAX处理的优点非常类似于流媒体的优点。分析能够立即开始,而不是等待所有的数据被处理。 而且,由于应用程序只是在读取数据时检查数据,因此不需要将数据存储在内存中。这对于大型文档来说是个巨大的优点。事实上,应用程序甚至不必解析整个文档;它可以在某个条件得到满足时停止解析。一般来说,SAX还比它的替代者DOM快许多。 选择DOM还是选择SAX?对于需要自己编写代码来处理XML文档的开发人员来说,选择DOM 还是SAX解析模型是一个非常重要的设计决策。 DOM采用建立树形结构的方式访问XML文档,而SAX采用的事件模型。 DOM解析器把XML文档转化为一个包含其内容的树,并可以对树进行遍历。用DOM解析模型的优点是编程容易,开发人员只需要调用建树的指令,然后利用navigation APIs访问所需的树节点来完成任务。可以很容易的添加和修改树中的元素。然而由于使用DOM解析器的时候需要处理整个XML文档,所以对性能和内存的要求比较高,尤其是遇到很大的XML文件的时候。由于它的遍历能力,DOM解析器常用于XML文档需要频繁的改变的服务中。 SAX解析器采用了基于事件的模型,它在解析XML文档的时候可以触发一系列的事件,当发现给定的tag的时候,它可以激活一个回调方法,告诉该方法制定的标签已经找到。SAX对内存的要求通常会比较低,因为它让开发人员自己来决定所要处理的tag.特别是当开发人员只需要处理文档中所包含的部分数据时,SAX这种扩展能力得到了更好的体现。但用SAX解析器的时候编码工作会比较困难,而且很难同时访问同一个文档中的多处不同数据。 3)JDOM https://www.doczj.com/doc/357268884.html, JDOM的目的是成为Java特定文档模型,它简化与XML的交互并且比使用DOM实现更快。由于是第一个Java特定模型,JDOM一直得到大力推广和促进。正在考虑通过“Java规范请求JSR-102” 将它最终用作“Java标准扩展”。从2000年初就已经开始了JDOM开发。 JDOM与DOM主要有两方面不同。首先,JDOM仅使用具体类而不使用接口。这在某些方面简化了API,但是也限制了灵活性。第二,API大量使用了Collections类,简化了那些已经熟悉这些类的Java开发者的使用。 JDOM文档声明其目的是“使用20%(或更少)的精力解决80%(或更多)Java/XML问题”(根据学习曲线假定为20%)。JDOM对于大多数Java/XML应用程序来说当然是有用的,并且大多数开

Selenium XPath定位详解

Selenium XPath定位详解 By:授客 QQ:1033553122 什么是 XPath:https://www.doczj.com/doc/357268884.html,/TR/xpath/ XPath 基础教程:https://www.doczj.com/doc/357268884.html,/xpath/xpath_syntax.asp selenium 中被误解的 XPath :https://www.doczj.com/doc/357268884.html,/blog/category/webdriver/ XPath 是一种在 XML 文档中定位元素的语言。因为HTML可以看做 XML 的一种实现,selenium 用户可使用这种强大语言在web应用中定位元素。 注意:xpath_test.html页面内容如上,并把其放置于src目录下 语法:nodename 语义:选择名为"nodename"的所有节点 说明:必须结合使用 语法:/rootname

语义:选择根元素rootname driver.find_element_by_xpath('/html') 示例: # coding= utf-8 from selenium import webdriver import os import time if __name__ == "__main__": driver = webdriver.Firefox() driver.maximize_window() file_path = os.path.abspath('xpath_test.html') driver.get(file_path) #定位根元素(/root 定位) driver.find_element_by_xpath('/html') time.sleep(5) driver.quit() 语法:parent/child_element 语义:选择父元素parent节点下所有名为child_element的子元素: 示例: # coding= utf-8 from selenium import webdriver import os import time if __name__ == "__main__": driver = webdriver.Firefox() driver.maximize_window() file_path = os.path.abspath('xpath_test.html') driver.get(file_path) time.sleep(2) #定位复选框(parent/child_element 定位) 注意:匹配到第一个就不再往下点击了 driver.find_element_by_xpath('/html/body/form/input').click() time.sleep(5) driver.quit() 语法://element 语义:选择所有的element元素,不管它们在文档中的位置(个人理解:类似全文查找)

八爪鱼xpath入门学习(以提取网页中公司名和地址为例)

https://www.doczj.com/doc/357268884.html, xpath入门学习(以提取网页中公司名和地址为例) 本文用来讲解xpath的入门基础,适合对八爪鱼已经有一些基础的用户来学习。 文中示例地址为:https://www.doczj.com/doc/357268884.html,/qiye2309554/ https://www.doczj.com/doc/357268884.html,/qiye2275810/ 提取两个网页中的公司名称和地址字段。 Xml和Html之间既有相似之处,又有很大区别。Xml包含数据和对数据的描述,主要用来交换数据。Html也包含了数据和对数据的描述,但只是针对描述网页这种用途,Html结构看起来和Xml类似,但并不严格遵循Xml标准,可以看做不标准的Xml。 Xpath是专门针对Xml设计的,在复杂结构化数据中查找信息的语言,而我们的网页实质上是Html的文档,那如何对网页执行Xpath查询呢?八爪鱼采集器内部有一套针对Html 的Xpath引擎,使得直接用Xpath就能精准的查找定位网页里面的数据。 给大家介绍一个类似的工具,就是火狐浏览器里面firebug和firepath插件。 首先在电脑上先安装火狐浏览器,然后打开火狐浏览器右上角的打开菜单按钮,选择添加组件。

https://www.doczj.com/doc/357268884.html, Xpath入门1-图1:附件组件 在弹出的对话框中搜索firebug组件,搜索出来之后选择安装。

https://www.doczj.com/doc/357268884.html, Xpath入门1-图2:安装firebug 安装成功之后同样的方式搜索firepath进行安装。 小贴士:安装成功之后,浏览器需要重启一下才能完全安装成功。重新打开浏览器中,可以看到多了一个昆虫按钮,代表安装成功。 在浏览器中打开一个网页,再点击浏览器中的firebug按钮,就弹出了可以用xpath的firepath工具。 Xpath入门1-图3:firepath工具 按照下面的操作可以找到数据的精确位置。 点击firepath工具中“查看页面中的元素”按钮→选择网页中要提取的字段→可以看到firepath工具中显示出了xpath路径

通过Xpath定位元素

使用XPath进行元素定位 在Selenium中,定位HTML元素经常用到XPath表达式,下面将进行详细的介绍。XPath是在XML文档中查找信息的一种语言,可用来在XML文档中对元素和属性进行导航。XPath是W3C XSLT标准的主要元素,并且XQuery和Xpointer都构建于XPath表达之上。因此,对XPath的理解是很多高级XML应用的基础。 XPath使用路径表达式来选取XML文档中的节点或者节点集。这些路径表达式和常规的计算机文件系统中看到的表达式非常相似。 虽然XPath用于查找XML的节点,但由于HTML和XML结构类似,所以XPath也经常用于查找HTML文档中的节点。 为了使读者更好地了解XPath表达式是什么,这里直接用实例进行说明,列举一些最常用的XPath语法。 实例1-1 基本的XPath语法类似于在一个文件系统中定位文件,如果路径以斜线“/”开始,那么该路径就表示到一个元素的绝对路径,如表1-1至表1-3所示。 表1-1 以斜线开始的路径实例(一) 表1-2 以斜线开始的路径实例(二) 表1-3 以斜线开始的路径实例(三)

实例1-2 如果路径以双斜线//开始,则表示选择文档中所有满足双斜线“//”之后规则的元素(无论层级关系),如表1-4和表1-5所示。 表1-4 以双斜线开始的路径实例(一) 表1-5 以双斜线开始的路径实例(一)

星号* 表示选择所有由星号之前的路径所定位的元素,如表1-6至表1-8所示。表1-6 以星号开始的路径实例(一) 表1-7 以星号开始的路径实例(二) 表1-8 以星号开始的路径实例(三)

XML选择简答题

一选择题 1.W3C的DOM核心定义(A)的最小集合 A.访问和操纵文档对象的接口 B.用XML解析器实现JA V A对象 C.创建“活的”HTML页面的惯例和过程 D.多个文档树 2.下面哪一个选项只包含Schema中的简单类型(D) A.anvURL.char,encoding,UTF-8 B.fullname,double,long,int C.TOKEN,timestamp,range,char D.byte,duration,ENTITY,NMTOKEN 3.对一个复杂结构的呈现时,使用元素而不使用属性的原因是因为 解析:属性取值只能为简单类型,不能包含子元素。 4.在下面XML文档解析过程中,有多少个各startElement 解析:有多少个元素就有多少个startELEMENT 5.下面的XML片断中,元素item1属于(B)名字空间 A.https://www.doczj.com/doc/357268884.html, B.https://www.doczj.com/doc/357268884.html, C.https://www.doczj.com/doc/357268884.html,/namespace D.不属于任何名字空间 7.很多部门间不能正常的交互,但是需要共享一个复杂的XML格式,至少需要共享(C) A.格式的XSDL文档 B.文档的ehXML语法 C.格式的schema文档或者DTD D.样式表 8.为了使XSTL模版更加健壮,对parameters(X,Y)函数进行编辑和错误测试,下面那一项没有必要(D) A.X=Y B.X!=Y C.X不是一个数值 D.Y为空 10.服务器通过运行在网络上的浏览器为客户提供服务,客户的机器性能是有限的,对于XSLT哪一个是最好的方法(D) A.将XML文档和样式表传送到客户端 B.在服务器端采用XSLT输出XHTML文档 C.用XHTML的一个子集,并且用FO应用到样式 D.将XML转换成开放文档格式后呈现 11.一个XML文档由元素和三个元素表示卖方的不同销售价格,最好采用(C)方法 A.在每个price标记前增加一个前缀,例如来区分卖方的不同销售价格

八爪鱼xpath入门教程以及定位元素实例

https://www.doczj.com/doc/357268884.html, xpath入门教程以及定位元素实例 本文用来讲解xpath的入门基础,本教材是xpath入门2,建议大家从入门1教程开始学习 Xpath的教程适合对八爪鱼已经有一些基础的用户来学习。 示例地址 /tutorial?type=0&page=0&tag=%E8%BF%9B%E9%98%B6&version=other Xpath:是一种路径查询语言,简单的说就是利用一个路径表达式找到我们需要的数据位置。Html:超文本标记语言,是用来描述网页的一种语言。主要用于控制数据的显示和外观。HTML文档也被称为网页。 Xpath专用于xml中沿着路径查找数据用的,但是八爪鱼采集器内部有一套针对Html的 就能精准的查找定位网页里面的数据。 Xpath引擎,使得直接用Xpath

https://www.doczj.com/doc/357268884.html, 例如下图通过火狐的firebug 、firepath 查看网页源码。查看方法参考“xpath 入门1”教程 xpath 入门2-图2 完整的HTML 文件至少包括标签、标签、标签和<BODY>标签,并且这些标签都是成对出现的,开头标签为<> ,结束标签为</>,在这两个标签之间添加内容。通过这些标签中的相关属性可以设置页面的背景色、背景图像等。 Html 标签</p><p>https://www.doczj.com/doc/357268884.html, 作为开始和结束的标记由尖括号包围的关键词,比如<html>标签对中,第一个标签是开始标签,第二个标签是结束标签 元素 HTML的网页内容是由元素组成的,从开始标签到结束标签的所有代码。 元素的开始和结束都使用标签作为开始和结束的标记 节点 所有事物都是节点 整个文档是一个文档节点 每个HTML 元素是元素节点 HTML元素内的文本是文本节点 每个HTML 属性是属性节点 注释是注释节点 Html常见标签</p><h2>XML的四种解析器(dom,sax,jdom,dom4j)原理及性能比较[收藏]</h2><p>1)DOM(JAXP Crimson解析器) DOM是用与平台和语言无关的方式表示XML文档的官方W3C标准。DOM 是以层次结构组织的节点或信息片断的集合。这个层次结构允许开发人员在树中寻找特定信息。分析该结构通常需要加载整个文档和构造层次结构,然后才能做任何工作。由于它是基于信息层次的,因而DOM被认为是基于树或基于对象的。DOM以及广义的基于树的处理具有几个优点。首先,由于树在内存中是持久的,因此可以修改它以便应用程序能对数据和结构作出更改。它还可以在任何时候在树中上下导航,而不是像SAX那样是一次性的处理。DOM使用起来也要简单得多。 2)SAX SAX处理的优点非常类似于流媒体的优点。分析能够立即开始,而不是等待所有的数据被处理。而且,由于应用程序只是在读取数据时检查数据,因此不需要将数据存储在内存中。这对于大型文档来说是个巨大的优点。事实上,应用程序甚至不必解析整个文档;它可以在某个条件得到满足时停止解析。一般来说,SAX还比它的替代者DOM快许多。 选择DOM还是选择SAX?对于需要自己编写代码来处理XML文档的开发人员来说,选择DOM还是SAX解析模型是一个非常重要的设计决策。DOM 采用建立树形结构的方式访问XML文档,而SAX采用的事件模型。 DOM解析器把XML文档转化为一个包含其内容的树,并可以对树进行遍历。用DOM解析模型的优点是编程容易,开发人员只需要调用建树的指令,然</p><p>后利用navigation APIs访问所需的树节点来完成任务。可以很容易的添加和修改树中的元素。然而由于使用DOM解析器的时候需要处理整个XML文档,所以对性能和内存的要求比较高,尤其是遇到很大的XML文件的时候。由于它的遍历能力,DOM解析器常用于XML文档需要频繁的改变的服务中。 SAX解析器采用了基于事件的模型,它在解析XML文档的时候可以触发一系列的事件,当发现给定的tag的时候,它可以激活一个回调方法,告诉该方法制定的标签已经找到。SAX对内存的要求通常会比较低,因为它让开发人员自己来决定所要处理的tag。特别是当开发人员只需要处理文档中所包含的部分数据时,SAX这种扩展能力得到了更好的体现。但用SAX解析器的时候编码工作会比较困难,而且很难同时访问同一个文档中的多处不同数据。 3)JDOM https://www.doczj.com/doc/357268884.html,/ JDOM的目的是成为Java特定文档模型,它简化与XML的交互并且比使用DOM实现更快。由于是第一个Java特定模型,JDOM一直得到大力推广和促进。正在考虑通过“Java规范请求JSR-102”将它最终用作“Java标准扩展”。从2000年初就已经开始了JDOM开发。 JDOM与DOM主要有两方面不同。首先,JDOM仅使用具体类而不使用接口。这在某些方面简化了API,但是也限制了灵活性。第二,API大量使用了Collections类,简化了那些已经熟悉这些类的Java开发者的使用。</p><h2>从零开始学习黑客技术入门教程(基础)</h2><p>最简单的黑客入门教程 目录 1 黑客简介 (3) 2 保护自己电脑绝对不做黑客肉鸡 (5) 3 抓肉鸡的几种方法 (10) 4 防止黑客通过Explorer侵入系统 (19) 5 SQL注入详解 (22) 5.1 注入工具 (23) 5.2 php+Mysql注入的误区 (24) 5.3 简单的例子 (27) 5.4 语句构造 (30) 5.5 高级应用 (48) 5.6 实例 (57) 5.7 注入的防范 (62) 5.8 我看暴库漏洞原理及规律1 (64) 5.9 我看暴库漏洞原理及规律2 (70) 6 跨站脚本攻击 (75) 6.1 跨站脚本工具 (75) 6.2 什么是XSS攻击 (76) 6.3 如何寻找XSS漏洞 (77) 6.4 寻找跨站漏洞 (78) 6.5 如何利用 (78)</p><p>6.6 XSS与其它技术的结合 (81) 7 XPath注入 (82) 7.1 XPath注入介绍 (82) 7.2 XPath注入工具 (87) 声明:文章来源大多是网上收集而来,版权归其原作者所有。</p><p>1黑客简介 "黑客"(hacker)这个词通常被用来指那些恶意的安全破坏者。关于"黑客"一词的经典定义,最初来源于麻省理工学院关于信息技术的一份文档,之后便被新闻工作者们长期使用。但是这个在麻省理工被当做中性词汇的术语,却逐渐被新闻工作者们用在了贬义的环境,而很多人也受其影响,最终导致了"黑客"一词总是用于贬义环境。有些人认为,我们应该接受"黑客"一词已经被用滥并且有了新的意义。他们认为,如果不认可这种被滥用的词汇,那么将无法与那些不懂技术的人进行有效的交流。而我仍然认为,将黑客和恶意的骇客(cracker)分开表述,对交流会更有效,比如使用"恶意的安全骇客"会更容易让对方理解我所指的对象,从而能够达到更好的沟通交流效果,也避免了对"黑客"一词的滥用。之所以要区分黑客和恶意骇客,是因为在某些情况下,我们讨论的对象是那些毫无恶意并且不会对安全防御或者用户隐私造成损害的对象,这些人只有用"黑客"这个词来描述才最贴切。如果你只是简单的将"黑客"和"恶意的安全骇客"划等号,将无法在与人交流安全技术问题时,轻松的分辨别人所指的到底是哪种类型的人。黑客和骇客的区别是,黑客仅仅对技术感兴趣,而后者则是通过技术获取职业发展或者谋生。很多黑客和骇客都具有技术天赋,有些骇客据此进行职业发展。当然,并不是每个有技术天赋的人都必须沿着黑客或者骇客的方向发展。黑客这个术语的经典意义是指那些对于事物如何工作非常感兴趣的人,他们修理,制作或者修改事物,并</p><h2>课题_C#Xpath解析HtmlDocument的使用方法与递归取得页面所有标签xpath值</h2><p>C#Xpath解析HtmlDocument的使用方法与递归取得页面所有标签 xpath值 在学习HTML Xpath之前呢我们先来下载一下Dll文件 大家下载单击如下图片下载就行了 <ignore_js_op> 接下来就是在程序中引用一下, <ignore_js_op> 然后就可以直接调用了,大家看看 代码吧 普通浏览复制代码 1. //htmlDcoument对象用来访问Html文档s 2. HtmlAgilityPack.HtmlDocument hd = new HtmlAgilityPack.Ht mlDocument();</p><p>3. //加载Html文档 4. hd.LoadHtml(strhtml); 5. string str = hd.DocumentNode.SelectSingleNode("//*[@id='e_font']" ).OuterHtml; 这样就可以得到一个标签的HTml代码了 OuterHtml是取包含本身的Html如果是InnerHtml就是取的包含在这个标签之内的所有Html代码了 这点大家要注意了 如果大家想获取Html代码的Xpath路径就是这部分 1.//*[@id='e_font'] 复制代码 这个其实很简单只在大家安装一个Firbug就行了, 看下图片 <ignore_js_op></p><p>大家只要进入选择模式,然后选择你要的内容,然后右键复制一下就行了。 然后放在SelectSingleNode()方法里就OK了 下面我说说几个方法和属性的意思吧、 方法 SelectNodes 获取的是一个集合 SelectSingleNode 获取一个标签 SetAttributeValue 设置标签的属性值例如:SetAttributeValue("name","xpath-89");这说明把name属性的值修改为xpath-89 属性 OuterHtml 是取包含本身的Html InnerHtml 取的包含在这个标签之内的所有Html代码了 XPath 获取相对应的Xpath值</p><h2>八爪鱼采集器提取数据-找不到时如何处理</h2><p>https://www.doczj.com/doc/357268884.html, 八爪鱼采集器提取数据-找不到时如何处理 八爪鱼提取字段时,有找不到时如何处理的选项。如下图: 八爪鱼提取数据 找不到时如何处理-图1 下面介绍如何设置找不到字段时的操作: 步骤一、点击需要设置的字段名称→自定义数据字段→自定义定位元素方式</p><p>https://www.doczj.com/doc/357268884.html, 八爪鱼提取数据找不到时如何处理-图2 八爪鱼提取数据找不到时如何处理-图3 进入自定义定位元素方式后,我们可以看到下图中红框内,有找不到时如何处理的三个选项,分为:使用默认值、该字段留空以及该步骤所有字段留空。 八爪鱼提取数据找不到时如何处理-图4</p><p>https://www.doczj.com/doc/357268884.html, 这里为了方便演示,我们修改一下元素匹配的Xpath,这样八爪鱼就抓取不到原来的字段了。 八爪鱼提取数据找不到时如何处理-图5 八爪鱼提取数据找不到时如何处理-图6 由于我们在标题处选择的是找不到时该字段留空,所以修改Xpath后,标题处提取到的数据为空。 八爪鱼提取数据找不到时如何处理-图7</p><p>https://www.doczj.com/doc/357268884.html, 我们同样修改类型和评分处的Xpath 看一下其余两项效果。评分处的使用默认值设置提取不到内容容时出现默认值,默认值设置如下: 八爪鱼提取数据 找不到时如何处理-图8 步骤二:保存并启动</p><p>https://www.doczj.com/doc/357268884.html, 八爪鱼提取数据找不到时如何处理-图9 可以看到弹出了采集错误报告,当前网页三条数据均未采集到信息 八爪鱼提取数据找不到时如何处理-图10 此处是因为类型中,找不到字段时该步骤所有字段留空,导致标题、类型、评分、上映年份以及时间均为空值,当八爪鱼一条信息采集不到任何一个字段时便会弹出错误提醒,我们可</p><h2>XML创建与解析常用方法介绍</h2><p>XML解析方式介绍 1.DOM4J(Document Object Model for Java) 虽然DOM4J代表了完全独立的开发结果,但最初,它是JDOM的一种智能分支。它合并了许多超出基本XML文档表示的功能,包括集成的XPath支持、XML Schema支持以及用于大文档或流化文档的基于事件的处理。它还提供了构建文档表示的选项,它通过DOM4J API和标准DOM接口具有并行访问功能。从2000下半年开始,它就一直处于开发之中。 为支持所有这些功能,DOM4J使用接口和抽象基本类方法。DOM4J大量使用了API中的Collections 类,但是在许多情况下,它还提供一些替代方法以允许更好的性能或更直接的编码方法。直接好处是,虽然DOM4J付出了更复杂的API的代价,但是它提供了比JDOM大得多的灵活性。 在添加灵活性、XPath集成和对大文档处理的目标时,DOM4J的目标与JDOM是一样的:针对Java 开发者的易用性和直观操作。它还致力于成为比JDOM更完整的解决方案,实现在本质上处理所有Java/XML问题的目标。在完成该目标时,它比JDOM更少强调防止不正确的应用程序行为。 DOM4J是一个非常非常优秀的Java XML API,具有性能优异、功能强大和极端易用使用的特点,同时它也是一个开放源代码的软件。如今你可以看到越来越多的Java软件都在使用DOM4J来读写XML,特别值得一提的是连Sun的JAXM也在用DOM4J. 【优点】 ①大量使用了Java集合类,方便Java开发人员,同时提供一些提高性能的替代方法。 ②支持XPath。 ③有很好的性能。 【缺点】 ①大量使用了接口,API较为复杂。 2.SAX(Simple API for XML) SAX处理的优点非常类似于流媒体的优点。分析能够立即开始,而不是等待所有的数据被处理。而且,由于应用程序只是在读取数据时检查数据,因此不需要将数据存储在内存中。这对于大型文档来说是个巨大的优点。事实上,应用程序甚至不必解析整个文档;它可以在某个条件得到满足时停止解析。一般来说,SAX还比它的替代者DOM快许多。 选择DOM还是选择SAX?对于需要自己编写代码来处理XML文档的开发人员来说,选择DOM还是SAX解析模型是一个非常重要的设计决策。 DOM采用建立树形结构的方式访问XML文档,而SAX 采用的是事件模型。</p><h2>C操作xml之xpath语法</h2><p>以前也发过关于.net中操作XML的帖子,但不是很详细,现在我将详细介绍一下c#如何操作xml文件,正如学习操作数据库要学习SQL语言一样,在学习操作xml与语言之前,我们要先熟悉一下xml的“sql”语句xpath。由于本系列帖子的目的不在于详细介绍xpath语法所以,我借用了园子里leves的帖子来简单介绍一下xpath语法: XPath 是XML的查询语言,和SQL的角色很类似。以下面XML为例,介绍XPath 的语法。 <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque Bob Dylan 10.90 Hide your heart Bonnie Tyler 9.90 Greatest Hits Dolly Parton 9.90 定位节点 XML是树状结构,类似档案系统内数据夹的结构,XPath也类似档案系统的路径命名方式。不过XPath 是一种模式(Pattern),可以选出XML档案中,路径符合某个模式的所有节点出来。例如要选catalog底下的cd中所有price元素可以用: /catalog/cd/price 如果XPath的开头是一个斜线(/)代表这是绝对路径。如果开头是两个斜线(//)表示文件中所有符合模式的元素都会被选出来,即使是处于树中不同的层级也会被选出来。以下的语法会选出文件中所有叫做cd的元素(在树中的任何层级都会被选出来): //cd 选择未知的元素 使用星号(Wildcards,*)可以选择未知的元素。下面这个语法会选出/catalog/cd 的所有子元素: /catalog/cd/* 以下的语法会选出所有catalog的子元素中,包含有price作为子元素的元素。 /catalog/*/price 以下的语法会选出有两层父节点,叫做price的所有元素。 /*/*/price 以下的语法会选择出文件中的所有元素。

相关主题
文本预览
相关文档 最新文档