With the preva­lence of XML as the markup language for platform-in­de­pen­dent data exchanges, there is an in­creas­ing need for a standard that enables non-XML-based ap­pli­ca­tions to submit complex queries to XML documents.

Note

The Ex­ten­si­ble Markup Language (short for XML) is a markup language used for dis­play­ing hi­er­ar­chi­cal­ly-struc­tured data in text form. XML is equally easy to read for both humans and machines. One of its uses is the exchange of data between two computer systems on the world wide web.

The relevant standards for program-con­trolled access to XML documents was developed by the W3 Con­sor­tium along with XQuery and XSLT. These have program in­ter­faces available that can access ap­pli­ca­tions on XML documents, query content or transform XML documents. They require a standard that enables elements in XML documents to be addressed: the XPath path de­scrip­tion language.

We’ll get you started with the XPath Data Model (XDM) and introduce to you to the syntax that un­der­lines the XPath ex­pres­sions used to localize XML elements.

$1 Domain Names – Register yours today!
  • Simple reg­is­tra­tion
  • Premium TLDs at great prices
  • 24/7 personal con­sul­tant included
  • Free privacy pro­tec­tion for eligible domains

What is XPath?

XML Path Language (XPath) is a path de­scrip­tion language for XML documents developed by the W3 Con­sor­tium. XPath provides users with non-XML-based syntax that makes it possible to specif­i­cal­ly address the elements of an XML document.

XPath is normally used in an embedded host language that enables the addressed XML elements to be processed. XQuery, for example, is used to query the XML elements addressed by XPath. XSLT uses the query language when trans­form­ing XML documents.

  • XPath: Nav­i­ga­tion in XML documents
  • XQuery: Queries for XML documents
  • XSLT: Trans­for­ma­tion of XML documents

3.1, the current XPath version, is specified in the W3C rec­om­men­da­tion from March 21, 2017.

Note

Despite ongoing de­vel­op­ment, numerous XSLT proces­sors, web browsers and ap­pli­ca­tions still only support the standard XPath 1.0 from the year 1999.

How Does XPath Work?

A data model underlies XPath and this in­ter­prets XML documents as a sequence of elements that are arranged in a tree structure. The tree structure of the XPath data model is com­pa­ra­ble to the Document Object Model (DOM). This also acts as an interface between HTML and dynamic JavaScript in the web browser.

In the form of paths, the lo­cal­iza­tion of XML elements occurs based on the unix directory system. The basic elements of this lo­cal­iza­tion path are nodes, axes, node tests and pred­i­cates.

Node Types

The in­di­vid­ual elements of an XPath tree structure are referred to as nodes. Ordering the nodes occurs both through the document sequence and through nesting the XML elements.

The XPath data model dis­tin­guish­es seven node types with different functions:

  • Element node
  • Document node (from XPath 2.0 onwards—pre­vi­ous­ly they were known as root nodes)
  • Attribute node
  • Text node
  • Namespace node
  • Pro­cess­ing in­struc­tion node
  • Comment node

The following example il­lus­trates the XPath data model node types. The XML document below, used to exchange data for a book order, contains all seven node types.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Order SYSTEM "order.dtd">
<?xml-stylesheet type="text/css" href="style.css"?>
<!--This is a comment!-->
<order date="2019-02-01">
    <address xmlns:shipping="http://localhost/XML/delivery" xmlns:billing="http://localhost/XML/billing">
        <shipping:name>Ellen Adams</shipping:name>
        <shipping:street>123 Maple Street</shipping:street>
        <shipping:city>Mill Valley</shipping:city>
        <shipping:state>CA</shipping:state>
        <shipping:zip>10999</shipping:zip>
        <shipping:country>USA</shipping:country>
        <billing:name>Mary Adams</billing:name>
        <billing:street>8 Oak Avenue</billing:street>
        <billing:city>Old Town</billing:city>
        <billing:state>PA</billing:state>
        <billing:zip>95819</billing:zip>
        <billing:country>USA</billing:country>
    </address>
    <comment>Please use gift wrapping!</comment>
    <items>
        <book isbn="9781408845660">
            <title>Harry Potter and the Prisoner of Azkaban</title>
            <quantity>1</quantity>
            <priceus>22.94</priceus>
            <comment>Please confirm delivery date until Christmas.</comment>
        </book>
        <book isbn="9780544003415">
            <title>The Lord of the Rings</title>
            <quantity>1</quantity>
            <priceus>17.74</priceus>
        </book>
    </items>
</order>

Element Node

In the XPath data model tree structure, each XML document element cor­re­sponds to an element node. Some ex­cep­tions are the XML de­c­la­ra­tion and the document de­f­i­n­i­tion at the beginning of the document.

XML de­c­la­ra­tion:

<!--?xml version="1.0"? encoding="utf-8"?-->

Document Type De­f­i­n­i­tion (DTD):

<!DOCTYPE Order SYSTEM "order.dtd">

Element nodes begin with a start tag, finish with an end tag and are usually nested into each other.

The first element nodes in the document sequence are referred to as root elements.

The XML document pictured above, for example, contains the element node order as a root element. This acts as a parent element for the sub­or­di­nat­ed element nodes address, comment and items that again contain ad­di­tion­al element nodes as child elements.

Document Node

The roots of the tree structure are referred to as document nodes. In the XML document itself, this is neither demon­strat­ed visually nor rep­re­sent­ed by text. It is a con­cep­tu­al node that contains all the other elements of a node. Child elements of the document node are root elements as well as (where ap­plic­a­ble) pro­cess­ing in­struc­tion nodes and comment nodes.

Attribute Node

The at­trib­ut­es of an XML element are rep­re­sent­ed in the XPath data model as attribute nodes. Each attribute node consists of an iden­ti­fi­er and a value assigned to the attribute.

In the code example, the first element node contains book and the attribute node isbn with the value 9781408845660.

<book isbn="9781408845660">

Attribute nodes are con­sid­ered part of the element node, but not a child element of the element.

Text Node

Character data within the start and end tags of an element node are referred to as text nodes.

In the code example, the element node contains title and the text node contains Harry Potter and the Prisoner of Azkaban.

Harry Potter and the Prisoner of Azkaban

Namespace Node

In the case of well-formed XML documents, the element and attribute names being used are assigned a namespace. The as­sign­ment usually occurs through the Document Type De­f­i­n­i­tion right at the beginning of the document.

If different name­spaces are used in an XML document element or attribute, the re­spec­tive name­spaces will be ex­plic­it­ly defined with the xmlns attribute or xmlns prefix in the start tag of the element in question. The attribute xmlns presumes a Uniform Resource Iden­ti­fi­er (URI) as a value that specifies which namespace is to be assigned to the cor­re­spond­ing element. The option of assigning a namespace to an xmlns prefix is possible for the element or child element. Each namespace cor­re­sponds to a namespace node in the tree structure.

In the code example, two name­spaces were defined for the XML element address: xmlns:shipping and xmlns:billing. The child elements of the address element bear the re­spec­tive as­sign­ment as a prefix.

<address xmlns:shipping="http://localhost/XML/delivery" xmlns:billing="http://localhost/XML/ billing">
        <shipping:name>Ellen Adams</shipping:name>
        <shipping:street>123 Maple Street</shipping:street>
        <shipping:city>Mill Valley</shipping:city>
        <shipping:state>CA</shipping:state>
        <shipping:zip>10999</shipping:zip>
        <shipping:country>USA</shipping:country>
        <billing:name>Mary Adams</billing:name>
        <billing:street>8 Oak Avenue</billing:street>
        <billing:city>Old Town</billing:city>
        <billing:state>PA</billing:state>
        <billing:zip>95819</billing:zip>
        <billing:country>USA</billing:country>
    </address>

The xmlns prefix makes it possible to clearly assign elements of the same name from different name­spaces. The element street with the prefix shipping, for example, contains the street specified in the delivery address. The element street with the prefix billing, in contrast, contains the street specified in billing address.

Pro­cess­ing In­struc­tion Node

Pro­cess­ing in­struc­tions in XML documents are located outside the document tree structure and are referred to in XPath ter­mi­nol­o­gy as a pro­cess­ing in­struc­tion node. A process in­struc­tion node begins with <? and ends with ?>.

In the code example presented above you find the following pro­cess­ing in­struc­tion:

<!--?xml-stylesheet type="text/css" href="style.css"?-->

The XML de­c­la­ra­tion at the beginning of the XML file is syn­tac­ti­cal­ly con­struct­ed like a process in­struc­tion. However, it is not valid as a process in­struc­tion node as defined by the XPath data model.

Comment Node

XML document content marked as a comment will be processed by XPath as a comment node. In this situation, the node comprises only the marked character content, not the markup.

In the code example presented above, you find the following comment node:

This is a comment!

Lo­cal­iza­tion Path

Ad­dress­ing nodes occurs with the help of a lo­cal­iza­tion path. With lo­cal­iza­tion paths, it is a matter of using an XPath ex­pres­sion to navigate through the tree structure and to choose a desired node set. The node set is the outcome of an XPath ex­pres­sion.

Lo­cal­iza­tion paths are evaluated from left to right. One dis­tin­guish­es between absolute and relative lo­cal­iza­tion paths. An absolute lo­cal­iza­tion path begins at the document node. In this case, you prefix the XPath ex­pres­sion with a slash (/). Relative lo­cal­iza­tion paths begin at an arbitrary node within the tree structure. This starting point is called the context node.

A lo­cal­iza­tion path consists of in­di­vid­ual lo­cal­iza­tion steps that, as is the case when ad­dress­ing files in the directory system, are separated by a slash (/).

Each lo­cal­iza­tion step consists of up to three parts: the axis, the node test and an arbitrary number of pred­i­cates.

  • Axis: When choosing the axis, you determine the nav­i­ga­tion direction in the tree structure starting from the context or document node.
  • Node test: The node test cor­re­sponds to a filter with which you limit the notes lying on the axis to the desired node set.
  • Pred­i­cates: Pred­i­cates enable you to again filter the nodes selected through the axis and node test.

The lo­cal­iza­tion path for an XPath ex­pres­sion is notated in ac­cor­dance with the following syntax:

axis::nodetest[predicate1][ predicate 2]…
Notation Function
/ Functions as path separator between two lo­cal­i­sa­tion steps
:: Functions as path separator between axis and node test

Axes

The XPath syntax enables a nav­i­ga­tion by means of the following axes.

Axis Selected Nodes
child All directly sub­or­di­nat­ed child nodes
parent The directly su­per­or­di­nate parent node
de­scen­dant All sub­or­di­nat­ed nodes
ancestor* All su­per­or­di­nat­ed nodes
following All the sub­se­quent nodes in the document sequence with the exception of de­scen­dants
preceding* All preceding nodes in the document series with the exception of ancestors
following-sibling All the sub­se­quent nodes in the XML document that descend from the same parent node
preceding-sibling* All the preceding nodes in the XML document that descend form the same parent node
attribute All attribute nodes for an element node
namespace All namespace nodes for an element node. As of version 2.0, this axis is no longer contained in the spec­i­fi­ca­tion
self The context node itself
de­scen­dant-or-self All sub­or­di­nat­ed nodes including the context node
ancestor-or-self* All su­per­or­di­nat­ed nodes including the context node
Note

In the case of the axes denoted with an asterisk (*), there are backward ap­pli­ca­tions that are an optional component according to the XPath spec­i­fi­ca­tion version 1.0 and do not have to be supported by standard-compliant ap­pli­ca­tions.

The following graph shows a schematic rep­re­sen­ta­tion of the most important axes in the XPath data model starting from the context node (red).

For example, all child:: elements choose D from the context node. The node set comprises the nodes E, H and I.

Node Test

With the node test you define a filter for the node set selected via the axis. According to the XPath spec­i­fi­ca­tion there are two possible filter criteria.

  • Node name: Specify a node name as a node test in order to choose all nodes with the cor­re­spond­ing name on the chosen axis.
  • Node type: Specify a node type as a node test in order to choose all nodes on the chosen axis with the cor­re­spond­ing type.

Node Names as a Filter Criterion

With the following lo­cal­iza­tion path, for example, you could choose—based on the code example presented above—all de­scen­dants with the name book starting from the document node.

/descendant::book

If, however, you would like to filter out the attribute isbn for all element nodes with the name book, you’ll need a lo­cal­iza­tion path with two lo­cal­iza­tion steps.

/descendant::book/attribute::isbn

Node Type as Filter Criterion

If you’d like to define a node type as a filter criterion for selecting the node set, use one of the following functions as a node test:

Function Selected Nodes
node() The node() function selects all nodes on the chosen axis.
text() The text() function selects all text nodes on the chosen axis.
comment() The comment() function selects all comment nodes on the chosen axis.
pro­cess­ing-in­struc­tion() The pro­cess­ing in­struc­tion() function selects all process in­struc­tion nodes on the chose axis.
Note

XPath 1.0 already defines 25 functions. Beginning with XPath 2.0 there are 111 functions available for spec­i­fy­ing lo­cal­iza­tion paths. You’ll find an overview in the W3C rec­om­men­da­tion XPath and XQuery functions and operators 3.1 from March 21, 2017.

Node Test with Wild Card

If you use the place holder * (asterisk) instead of the node test, all nodes will be chosen on the selected axis that cor­re­spond to the axis’ main node type. So, if an axis contains element nodes, then this node type is the axis’ main node type. This applies to all axes with the exception of attribute and namespace. In this case, attribute nodes or namespace nodes qualify as main node types.

The following lo­cal­iza­tion path, for example, displays all the at­trib­ut­es of the current context node:

attribute::*

Shortened Notation

For the fre­quent­ly-used axes and lo­cal­iza­tion steps, short cuts were defined that can be used in the XPath ex­pres­sion as an al­ter­na­tive to the English des­ig­na­tions.

Standard Notation Short Cut Example
child:: blank In the case of child, it concerns the standard axis. The axis des­ig­na­tion can be omitted when necessary. The child::book/child::title lo­cal­iza­tion path thus cor­re­sponds to the book/title short ab­bre­vi­a­tion.
attribute:: @ The axis attribute, including the separator, can be shorted with the @ symbol. The lo­cal­iza­tion path book/attribute::isbn selects the isbn attribute node of the book element and states book/@isbn in the shortened notation.
/de­scen­dant-or-self::node()/ // The lo­cal­iza­tion step /de­scen­dant-or-self::node()/ selects the document node and all de­scen­dants and is ab­bre­vi­at­ed with //. Instead of /de­scen­dant-or-self::node()/child::item write //item in shortened form. The lo­cal­iza­tion path selects all item nodes in the document.
parent::node() .. The lo­cal­iza­tion step parent::node() selects the parent node of the context node and is shortened with ..
self::node() . The lo­cal­iza­tion step self::node() selects the current context node and is shortened with .

Pred­i­cates

With pred­i­cates you define further filter criteria for the node sets selected through the axis and node test.

Pred­i­cates form the optional third part of a lo­cal­i­sa­tion step and are notated in brackets. The filter criteria within the brackets is for­mu­lat­ed as ex­pres­sions, that, among other things, can contain path ex­pres­sions, functions, operators and strings.

The XPath syntax supports universal pred­i­cates and numerical pred­i­cates.

Universal Pred­i­cates

Ex­pres­sions in universal pred­i­cates filter the node set that has been selected through the axis and node test by issuing a Boolean value (true or false) for each node in the selection. All nodes with the value true are part of the result set.

The for­mu­la­tion of ex­pres­sions for universal pred­i­cates occurs with the help of operators. These are used in order to specif­i­cal­ly select specific nodes with specific content or prop­er­ties—for example, all nodes that include a character string, an attribute value or a specific child element (perhaps at a specific position).

The following tables give you an overview of the operators that are available. There is a dis­tinc­tion between arith­metic operators, logical operators and re­la­tion­al operators.

Arith­metic Operators Function
+ Addition
- Sub­trac­tion
* Mul­ti­pli­ca­tion
div Floating point separator
mod Modulo
Re­la­tion­al Operators Function
= Equal
!= Unequal
< Less than; masking required within XSLT (&lt;)
> Greater than; masking within XSLT (&gt;) is recommend
<= Less than or equal; masking required within XSLT (&lt;)
>= Greater than or equal; Masking within XSLT (&gt;) rec­om­mend­ed
Logical Operators Function
and Logical And Con­nec­tive
or Logical Or Con­nec­tive

In the following example the predicate isolates [title="Harry Potter and the Prisoner of Azkaban"] the result set on an element node called book, which contains the child element title and the string Harry Potter and the Prisoner of Azkaban.

Note

The example cor­re­sponds to the XPath 3 syntax, which may not be supported by online tools. Have the presented query re­pro­duced here, for example, with the following online tester: http://videlibri.source­forge.net/cgi-bin/xidelcgi.

/order/items/book[title="Harry Potter and the Prisoner of Azkaban"]

We have now chosen the element node book, which contains the data for the Harry Potter book.

<book isbn="9781408845660">
        <title>Harry Potter and the Prisoner of Azkaban</title>
        <quantity>1</quantity>
        <priceus>22.94</priceus>
        <comment>Please confirm delivery date before Christmas.</comment>
    </book>

Another child element of this element node is the comment element. If we would like to select its content, the lo­cal­iza­tion path must only be expanded by two lo­cal­iza­tion steps.

/order/items/book[title="Harry Potter and the Prisoner of Azkaban"]/comment/text()

We navigate with the comment lo­cal­iza­tion step (ab­bre­vi­ate form of child::comment) to the book element’s child element of the same name and select its text node with the text() function. This cor­re­sponds to the following string:

Please confirm delivery date before Christmas.

Should only a path ex­pres­sion be used in a predicate, then it’s called an existence test. With the following lo­cal­iza­tion path, for example, it can be tested if the XML document presented above contains one or several nodes with the name comment.

Shortened notation:

//book[comment]

Standard notation:

/descendant-or-self::node()/child::book[child::comment]

The lo­cal­iza­tion path //book[comment] selects all nodes with the name book that have a child element with the name comment.

Numerical Pred­i­cates

Numerical pred­i­cates enable you to address nodes using your position. The following lo­cal­iza­tion path, for example, selects the second node in ac­cor­dance with the document sequence with the name book:

//book[2]

Strictly speaking, predicate [2] is the ab­bre­vi­at­ed form of [position()=2]. XPath thus initially selects all nodes with the name “book” and then filters out the node for which the position()=2 function yields the true Boolean value.

Note

Unlike with pro­gram­ming languages, XPath numbering begins with 1.

Ad­di­tion­al In­for­ma­tion on XML Path Language

On the W3C website you will find an overview of the current de­vel­op­ment status of XML Path language as well as all released standards and designs.

Free in­for­ma­tion and tools for using XPath for web ap­pli­ca­tions are available to you at MDN Web Docs as well as in the Microsoft Developer Network.

Go to Main Menu