Rust-Sxd xpath: sxd-xpath — An XPath library in Rust

SXD-XPath

An XML XPath library in Rust.

Build Status Current Version Documentation

Overview

The project is broken into two crates:

  1. document - Basic DOM manipulation and reading/writing XML from strings.
  2. xpath - Implementation of XPath 1.0 expressions.

There are also scattered utilities for playing around at the command line.

In the future, I hope to add support for XSLT 1.0.

Goals

This project has a lofty goal: replace libxml and libxslt.

Contributing

  1. Fork it ( https://github.com/shepmaster/sxd-xpath/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Add a failing test.
  4. Add code to pass the test.
  5. Commit your changes (git commit -am 'Add some feature')
  6. Ensure tests pass.
  7. Push to the branch (git push origin my-new-feature)
  8. Create a new Pull Request

License

Licensed under either of

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

Comments

  • Implement XPath on traits instead of concrete types
    Implement XPath on traits instead of concrete types

    Sep 26, 2017

    sxd-xpath works with sxd-dom. Data needs to be converted to an sxd-dom before an XPath can be run on it.

    If sxd-path would work on traits, it could be used on any data structure that implements those traits.

    The traits might look something like this:

    pub trait Node<'a> {
        fn as_attribute(&self) -> Option<&Self>
        where
            Self: Attribute<'a>,
        {
            None
        }
        fn as_element(&self) -> Option<&Self>
        where
            Self: Element<'a>,
        {
            None
        }
        fn as_text(&self) -> Option<&Self>
        where
            Self: Text<'a>,
        {
            None
        }
    }
    
    pub trait QName {
        fn namespace_uri(&self) -> Option<&str>;
        fn local_part(&self) -> &str;
    }
    
    pub trait NamedNode<'a>: Node<'a> {
        type QName: QName;
        fn name(&self) -> &QName;
    }
    
    pub trait Attribute<'a>: NamedNode<'a> {
        type AttributeValue: Into<String>;
        fn value(&self) -> &Self::AttributeValue;
    }
    
    pub trait Element<'a>: NamedNode<'a> {
        type Attribute: Attribute<'a> + 'a;
        type AttributeIter: Iterator<Item = &'a Self::Attribute>;
        type Child: Node<'a> + 'a;
        type ChildIter: Iterator<Item = &'a Self::Child>;
    
        fn attributes(&'a self) -> Self::AttributeIter;
        fn children(&'a self) -> Self::ChildIter;
    }
    
    pub trait Text<'a>: Node<'a> {
        fn data(&self) -> &str;
    }
    
    Reply
  • Computation of document order needs to be cached
    Computation of document order needs to be cached

    Mar 1, 2018

    This program causes intense performance usage. Ultimately, it's because Value::into_string is being called repeatedly, which triggers the computation of nodes in document order. "Thankfully", I realized this problem when adding it:

    https://github.com/shepmaster/sxd-xpath/blob/350c51e89bccd0271898b46caa9c074f4ef31927/src/nodeset.rs#L349

    image
    extern crate sxd_document;
    extern crate sxd_xpath;
    
    use std::fs::File;
    use std::io::Read;
    use std::collections::HashMap;
    use std::borrow::Cow;
    use sxd_document::dom::{Document, Element};
    use sxd_xpath::{Context, Factory, Value};
    use sxd_xpath::nodeset::Node;
    use sxd_document::parser;
    
    type DynResult<T> = Result<T, Box<::std::error::Error>>;
    
    fn main() {
        let filename = "radlex.owl";
        println!("Reading file");
        let mut f = File::open(filename).unwrap();
        let mut data = String::new();
        f.read_to_string(&mut data).unwrap();
        let package = parser::parse(&data).unwrap();
        build_rid_index(&package.as_document()).unwrap();
    }
    
    /// Build a dictionary of an RID to its respective XML element
    fn build_rid_index<'d>(
        radlex: &'d Document<'d>,
    ) -> DynResult<HashMap<Cow<'d, str>, Element<'d>>> {
        let root = radlex.root();
    
        let mut ctx = Context::new();
        ctx.set_namespace("xsp", "http://www.owl-ontologies.com/2005/08/07/xsp.owl#");
        ctx.set_namespace("xsd", "http://www.w3.org/2001/XMLSchema#");
        ctx.set_namespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
        ctx.set_namespace("rdfs", "http://www.w3.org/2000/01/rdf-schema#");
        ctx.set_namespace("owl", "http://www.w3.org/2002/07/owl#");
        ctx.set_namespace("swrl", "http://www.w3.org/2003/11/swrl#");
        ctx.set_namespace("swrlb", "http://www.w3.org/2003/11/swrlb#");
    
        println!("Building query");
        let factory = Factory::new();
        let xpath = factory.build("/rdf:RDF/*[starts-with(@rdf:ID, 'RID')]")?;
        let xpath = xpath.expect("No XPath was compiled");
    
        println!("Evaluating query");
        let value = xpath.evaluate(&ctx, root)?;
    
        println!("Building dictionary");
        if let Value::Nodeset(nodeset) = value {
            let dict: HashMap<_, _> = nodeset
                .into_iter()
                .filter_map(|x| match x {
                    Node::Element(e) => Some(e),
                    _ => None,
                })
                .map(|e| {
                    let rid = e.attributes()
                        .into_iter()
                        .find(|x| x.name().local_part() == "ID")
                        .unwrap()
                        .value();
                    (rid.into(), e)
                })
                .collect();
    
            Ok(dict)
        } else {
            panic!()
        }
    }
    

    Source file

    /cc @Enet4

    Reply
  • Consider switching Nodeset to a set that preserves insertion order
    Consider switching Nodeset to a set that preserves insertion order

    Mar 17, 2018

    This will "do the right thing" in many simple XPaths, but still wouldn't guarantee that the entire thing is in order.

    Reply
  • Add missing docs
    Add missing docs

    May 28, 2018

                                                                                                                                                                                                           
    Reply
  • nested queries
    nested queries

    Apr 28, 2020

    could I perform further queries on the result of a previous query? like cast the result to document somehow? I cannot find any mention of it, thanks

    Reply
  • This might be a bug or more likely something I dont understand :)
    This might be a bug or more likely something I dont understand :)

    Jun 13, 2020

    Ok so im trying to parse MARC 21 XML reccords. And I might mention I do this to learn rust, and MARC wich both are quite complicated.

    However Ive got tvo examples one that work and one where I change this:

        <record
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
                xmlns="http://www.loc.gov/MARC21/slim">    
    

    to simply

    The later works and the first one just return empty :

    Example1:

    fn do_something_with_metadata() {
    
        let package = parser::parse(r#"<?xml version="1.0" encoding="UTF-8"?>
        <record
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
                xmlns="http://www.loc.gov/MARC21/slim">    
    
    
          <leader>00446nim a22001813  45  </leader>
          <controlfield tag="001">11469144</controlfield>
          <controlfield tag="003">SE-LIBR</controlfield>
          <controlfield tag="005">20171005164357.0</controlfield>
          <controlfield tag="007">sd|||||||||||||||||||||</controlfield>
          <controlfield tag="008">091027s2001    xx |||_j_|||||_|| _|swe||</controlfield>
          <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">(LibraSE)24008</subfield>
          </datafield>
          <datafield tag="084" ind1=" " ind2=" ">
            <subfield code="a">Yq</subfield>
          </datafield>
          <datafield tag="245" ind1="1" ind2="0">
            <subfield code="a">Astrid Lindgrens favoriter: CD 1 Sånger</subfield>
          </datafield>
          <datafield tag="260" ind1=" " ind2=" ">
            <subfield code="a">Stockholm :</subfield>
            <subfield code="b">Bonnier Music,</subfield>
            <subfield code="c">2001</subfield>
          </datafield>
          <datafield tag="300" ind1=" " ind2=" ">
            <subfield code="a">1 CD</subfield>
          </datafield>
          <datafield tag="653" ind1=" " ind2=" ">
            <subfield code="a">CD-bok</subfield>
          </datafield>
          <datafield tag="653" ind1=" " ind2=" ">
            <subfield code="a">Barn och ungdom</subfield>
          </datafield>
          <datafield tag="852" ind1=" " ind2=" ">
            <subfield code="h">Hcf/LC</subfield>
            <subfield code="l">AST</subfield>
          </datafield>
          <datafield tag="942" ind1=" " ind2=" ">
            <subfield code="c">BARN LJUD</subfield>
          </datafield>
          <datafield tag="999" ind1=" " ind2=" ">
            <subfield code="c">9782</subfield>
            <subfield code="d">9782</subfield>
          </datafield>
        </record>"#).expect("failed to parse XML");
        let document = package.as_document();
        let value = evaluate_xpath(&document, "/record/leader").expect("XPath evaluation failed");
    
        println!("Found: {}({:?})", value.string(),value);
    
    }
    

    Output:

    Found: (Nodeset(Nodeset { nodes: {} }))
    Found: (Nodeset(Nodeset { nodes: {} }))
    Found: (Nodeset(Nodeset { nodes: {} }))
    Found: (Nodeset(Nodeset { nodes: {} }))
    Found: (Nodeset(Nodeset { nodes: {} }))
    

    Example2:

    fn do_something_with_metadata() {
    
        let package = parser::parse(r#"<?xml version="1.0" encoding="UTF-8"?>
        <record>
    
          <leader>00446nim a22001813  45  </leader>
          <controlfield tag="001">11469144</controlfield>
          <controlfield tag="003">SE-LIBR</controlfield>
          <controlfield tag="005">20171005164357.0</controlfield>
          <controlfield tag="007">sd|||||||||||||||||||||</controlfield>
          <controlfield tag="008">091027s2001    xx |||_j_|||||_|| _|swe||</controlfield>
          <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">(LibraSE)24008</subfield>
          </datafield>
          <datafield tag="084" ind1=" " ind2=" ">
            <subfield code="a">Yq</subfield>
          </datafield>
          <datafield tag="245" ind1="1" ind2="0">
            <subfield code="a">Astrid Lindgrens favoriter: CD 1 Sånger</subfield>
          </datafield>
          <datafield tag="260" ind1=" " ind2=" ">
            <subfield code="a">Stockholm :</subfield>
            <subfield code="b">Bonnier Music,</subfield>
            <subfield code="c">2001</subfield>
          </datafield>
          <datafield tag="300" ind1=" " ind2=" ">
            <subfield code="a">1 CD</subfield>
          </datafield>
          <datafield tag="653" ind1=" " ind2=" ">
            <subfield code="a">CD-bok</subfield>
          </datafield>
          <datafield tag="653" ind1=" " ind2=" ">
            <subfield code="a">Barn och ungdom</subfield>
          </datafield>
          <datafield tag="852" ind1=" " ind2=" ">
            <subfield code="h">Hcf/LC</subfield>
            <subfield code="l">AST</subfield>
          </datafield>
          <datafield tag="942" ind1=" " ind2=" ">
            <subfield code="c">BARN LJUD</subfield>
          </datafield>
          <datafield tag="999" ind1=" " ind2=" ">
            <subfield code="c">9782</subfield>
            <subfield code="d">9782</subfield>
          </datafield>
        </record>"#).expect("failed to parse XML");
        let document = package.as_document();
        let value = evaluate_xpath(&document, "/record/leader").expect("XPath evaluation failed");
    
        println!("Found: {}({:?})", value.string(),value);
    
    }
    

    Returns:

    Found: 00446nim a22001813  45  (Nodeset(Nodeset { nodes: {Element(Element { name: QName { namespace_uri: None, local_part: "leader" } })} }))
    Found: 00446nim a22001813  45  (Nodeset(Nodeset { nodes: {Element(Element { name: QName { namespace_uri: None, local_part: "leader" } })} }))
    Found: 00446nim a22001813  45  (Nodeset(Nodeset { nodes: {Element(Element { name: QName { namespace_uri: None, local_part: "leader" } })} }))
    Found: 00446nim a22001813  45  (Nodeset(Nodeset { nodes: {Element(Element { name: QName { namespace_uri: None, local_part: "leader" } })} }))
    Found: 00446nim a22001813  45  (Nodeset(Nodeset { nodes: {Element(Element { name: QName { namespace_uri: None, local_part: "leader" } })} }))
    

    Im not sure why it dont lake the part of record, it just dont :)

    Kind regards /Jacob

    Reply
  • Implement EXSLT functions, e.g. exsl:node-set ?
    Implement EXSLT functions, e.g. exsl:node-set ?

    Mar 29, 2017

    I have an XSLT script that uses the node-set function defined by EXSLT. It works in Firefox, which has some EXSLT support. Depending on how hard this would be to implement (node-set shouldn't be too hard), I think having it in this Rust library would be great!

    Reply
  • expression::Error is private
    expression::Error is private

    Jan 2, 2017

    This is unfortunate because now there's a private type in the public Expression trait that I can't implement From<expression::Error> for.

    Update: It kind of works if you chain multiple into calls together. Still kind of bad if you wanted to use try! or the question mark operator, as that's completely impossible with the current situation.

    bug 
    Reply
  • Unable to evaluate a namespaced xpath
    Unable to evaluate a namespaced xpath

    Feb 20, 2017

    Hello,

    I have an error when I try to evaluate a namespaced xpath:

    $ cargo run
       Compiling quick-error v1.1.0
       Compiling typed-arena v1.2.0
       Compiling peresil v0.3.0
       Compiling sxd-document v0.2.0
       Compiling sxd-xpath v0.4.0 (https://github.com/shepmaster/sxd-xpath.git#16b17d77)
       Compiling rust v0.1.0 (file:///home/sanpi/test/rust)
        Finished debug [unoptimized + debuginfo] target(s) in 8.74 secs
         Running `target/debug/rust`
    thread 'main' panicked at 'No namespace for prefix', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/option.rs:715
    note: Run with `RUST_BACKTRACE=1` for a backtrace.
    
    extern crate sxd_document;
    extern crate sxd_xpath;
    
    fn main() {
        let package = ::sxd_document::parser::parse(r#"
    <d:multistatus xmlns:d="DAV:" xmlns:cs="http://calendarserver.org/ns/">
        <d:response>
            <d:href>/</d:href>
            <d:propstat>
                <d:prop>
                    <d:current-user-principal>
                        <d:href>/principals/users/johndoe/</d:href>
                    </d:current-user-principal>
                </d:prop>
                <d:status>HTTP/1.1 200 OK</d:status>
            </d:propstat>
        </d:response>
    </d:multistatus>"#).unwrap();
    
        let document = package.as_document();
    
        ::sxd_xpath::evaluate_xpath(&document, "/d:multistatus")
            .unwrap();
    }
    
    question 
    Reply
  • Relicense under dual MIT/Apache-2.0
    Relicense under dual MIT/Apache-2.0

    Jan 8, 2016

    Why?

    The MIT license requires reproducing countless copies of the same copyright header with different names in the copyright field, for every MIT library in use. The Apache license does not have this drawback, and has protections from patent trolls and an explicit contribution licensing clause. However, the Apache license is incompatible with GPLv2. This is why Rust is dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for GPLv2 compat), and doing so would be wise for this project. This also makes this crate suitable for inclusion in the Rust standard distribution and other project using dual MIT/Apache.

    How?

    To do this, get explicit approval from each contributor of copyrightable work (as not all contributions qualify for copyright) and then add the following to your README:

    ## License
    
    Licensed under either of
     * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
     * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
    at your option.
    
    ### Contribution
    
    Unless you explicitly state otherwise, any contribution intentionally submitted
    for inclusion in the work by you shall be dual licensed as above, without any
    additional terms or conditions.
    

    and in your license headers, use the following boilerplate (based on that used in Rust):

    // Copyright (c) 2015 t developers
    // Licensed under the Apache License, Version 2.0
    // <LICENSE-APACHE or
    // http://www.apache.org/licenses/LICENSE-2.0> or the MIT
    // license <LICENSE-MIT or http://opensource.org/licenses/MIT>,
    // at your option. All files in the project carrying such
    // notice may not be copied, modified, or distributed except
    // according to those terms.
    

    And don't forget to update the license metadata in your Cargo.toml!

    Contributor checkoff

    • [x] @shepmaster
    • [x] @carols10cents
    • [x] @flying-sheep
    • [x] @Boddlnagg
    • [x] @vky
    • [x] @messense
    • [x] @ljedrz
    Reply
  • Functionality to specify a default namespace.
    Functionality to specify a default namespace.

    Apr 9, 2017

    Hi. Maybe the functionality is already there and I'm missing something, but essentially I was parsing this XML:

    <?xml version="1.0" encoding="UTF-8"?>
    <metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
    <area id="a1411661-be21-4290-8dc1-50f3d8e3ea67" type-id="6fd8f29a-3d0a-32fc-980d-ea697b69da78" type="City">
        <name>Honolulu</name>
        <sort-name>Honolulu</sort-name>
    </area>
    </metadata>
    

    and I had some issues selecting elements due to the namespace. After I saw the test and #108 I've figured out I can use a Context with

    context.set_namespace("mb", "http://musicbrainz.org/ns/mmd-2.0#");
    

    and then use mb: for all elements in my XPath queries. e.g. //mb:area/mb:name/text(). However this is a bit cumbersome, so I wanted to ask if functionality to specify a default namespace which would be used for unqualified selectors in XPath queries is desirable? Or would this break the XPath spec somehow?

    (I might implement this functionality, I just wanted to ask in advance for some thoughts so I don't end up coding for nothing. ^^)

    Reply
  • Add a function for simple absolute XPath evaluation.
    Add a function for simple absolute XPath evaluation.

    Dec 27, 2016

    The existing evaluate function requires lots of research, configuration and additional imports to use, while the poor souls that have to process XMLs usually just need to be able to execute simple XPath expressions. The evaluate_absolute_xpath function only needs a Document and an XPath &str to work and it returns a Value in wrappers ready to accomodate various possible errors.

    In addition, this extends the expression::Errorstruct with an InvalidXPath variant so that potential XPath-related parser::ParseErr errors can be handled. It is not perfect; instead of a String I would prefer to relay ParseErr, but it would break the ExpressionError's Hash property - deriving Hash would fail due to Token having an f64 variant.

    The last modification in the integration.rs file is just a deletion of an unused trait import (Expression).

    Reply