Rust-Rust htmlescape: rust-htmlescape — encoding/decoding HTML entities

A HTML entity encoding library for Rust

Build Status

Example usage

All example assume a extern crate htmlescape; and use htmlescape::{relevant functions here}; is present.

###Encoding htmlescape::encode_minimal() encodes an input string using a minimal set of HTML entities.

let title = "Cats & dogs";
let tag = format!("<title>{}</title>", encode_minimal(title));
assert_eq!(tag.as_slice(), "<title>Cats &amp; dogs</title>");

There is also a htmlescape::encode_attribute() function for encoding strings that are to be used as html attribute values.

###Decoding htmlescape::decode_html() decodes an encoded string, replacing HTML entities with the corresponding characters. Named, hex, and decimal entities are supported. A Result value is returned, with either the decoded string in Ok, or an error in Err.

let encoded = "Cats&#x20;&amp;&#32;dogs";
let decoded = match decode_html(encoded) {
  Err(reason) => panic!("Error {:?} at character {}", reason.kind, reason.position),
  Ok(s) => s
};
assert_eq!(decoded.as_slice(), "Cats & dogs");

###Avoiding allocations Both the encoding and decoding functions are available in forms that take a Writer for output rather than returning an String. These version can be used to avoid allocation and copying if the returned String was just going to be written to a Writer anyway.

Comments

  • Improve encode speed by ~50%
    Improve encode speed by ~50%

    Sep 22, 2017

    Moved entity lookup into match arms instead of using a function that does a binary search in static array, then boxes up the result, and then immediately unboxes it in the match.

    Before cargo bench

    running 4 tests
    test bench_decode_attribute ... bench:   2,585,276 ns/iter (+/- 1,174,543) = 93 MB/s
    test bench_decode_minimal   ... bench:   1,314,817 ns/iter (+/- 793,375) = 89 MB/s
    test bench_encode_attribute ... bench:   2,103,300 ns/iter (+/- 874,982) = 54 MB/s
    test bench_encode_minimal   ... bench:   1,799,386 ns/iter (+/- 889,072) = 63 MB/s
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out
    

    After cargo bench

    running 4 tests
    test bench_decode_attribute ... bench:   2,273,943 ns/iter (+/- 986,740) = 106 MB/s
    test bench_decode_minimal   ... bench:   1,292,612 ns/iter (+/- 840,655) = 91 MB/s
    test bench_encode_attribute ... bench:   1,528,778 ns/iter (+/- 725,928) = 74 MB/s
    test bench_encode_minimal   ... bench:   1,177,384 ns/iter (+/- 560,557) = 96 MB/s
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out
    
    Reply
  • Support invalid entities
    Support invalid entities

    Nov 30, 2017

    Allow to decode valid entities leaving invalid parts intact, for example, convert "& &#" to "&&#"

    Reply
  • Add decoding function that ignores unknown/malformed entities instead of failing
    Add decoding function that ignores unknown/malformed entities instead of failing

    Dec 1, 2017

                                                                                                                                                                                                           
    Reply
  • html escape feature to handle quotes in custom manner
    html escape feature to handle quotes in custom manner

    Jun 25, 2019

    Thanks for providing this library. We are using it and are building a php style html_entities support to allow custom handling of quotes and need a feature to leave quotes untouched if user wants similar to https://www.php.net/manual/en/function.htmlentities.php

    Constant Name | Description
    -- | --
    ENT_COMPAT | Will convert double-quotes and leave single-quotes alone.
    ENT_QUOTES | Will convert both double and single quotes.
    ENT_NOQUOTES | Will leave both double and single quotes unconverted.
    

    Is it possible to have this feature in this library?

    Reply
  • escaping/unescaping should return `Cow`
    escaping/unescaping should return `Cow`

    Feb 13, 2020

    ... to avoid reallocations when nothing has changed

    https://doc.rust-lang.org/std/borrow/enum.Cow.html

    Reply
  • `htmlescape::DecodeErr` should implement `std::error::Error`
    `htmlescape::DecodeErr` should implement `std::error::Error`

    May 15, 2020

    In order to be usable with common error-handling libraries

    Reply
  • hex entities are not decoded
    hex entities are not decoded

    Nov 30, 2017

    decode_html: cannot decode "Lead Data Scientist – R&D,Machine Learning – Big Data An" reason:DecodeErr { position: 30, kind: UnknownEntity }

    Could I get a support for those? I'd expect this to be decoded to dash unicode character.

    Reply
  • Fix README
    Fix README

    Nov 1, 2017

    Headings in the README are not properly rendered by GitHub without spaces after ###

    Reply
  • Change package name
    Change package name

    Oct 19, 2014

    I had trouble using Cargo with this package name and got some comments in #rust channel on IRC to change the package name.

    Reply
  • Register with Travis CI
    Register with Travis CI

    Oct 16, 2013

    See http://hiho.io/rust-ci/

    Necessary steps: http://hiho.io/rust-ci/help/

    enhancement 
    Reply
  • Fix examples in README.md
    Fix examples in README.md

    Aug 5, 2013

    The encoded/decoded versions of &amp; and > are compared.

    Maybe change > to &?

    bug 
    Reply
  • why not publish to crates.io?
    why not publish to crates.io?

    Dec 7, 2016

                                                                                                                                                                                                           
    Reply