Rust-Bson rust: bson-rust — Encoding and decoding support for BSON in Rust

bson-rs

Build Status crates.io crates.io

Encoding and decoding support for BSON in Rust

Index

Useful links

Installation

This crate works with Cargo and can be found on crates.io with a Cargo.toml like:

[dependencies]
bson = "0.14"

Overview of BSON Format

BSON, short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a datetime type and a binary data type.

// JSON equivalent
{"hello": "world"}

// BSON encoding
\x16\x00\x00\x00                   // total document size
\x02                               // 0x02 = type String
hello\x00                          // field name
\x06\x00\x00\x00world\x00          // field value
\x00                               // 0x00 = type EOO ('end of object')

BSON is the primary data representation for MongoDB, and this crate is used in the mongodb driver crate in its API and implementation.

For more information about BSON itself, see bsonspec.org.

Usage

BSON values

Many different types can be represented as a BSON value, including 32-bit and 64-bit signed integers, 64 bit floating point numbers, strings, datetimes, embedded documents, and more. To see a full list of possible BSON values, see the BSON specification. The various possible BSON values are modeled in this crate by the Bson enum.

Creating Bson instances

Bson values can be instantiated directly or via the bson! macro:

let string = Bson::String("hello world".to_string());
let int = Bson::Int32(5);
let array = Bson::Array(vec![Bson::Int32(5), Bson::Boolean(false)]);

let string: Bson = "hello world".into();
let int: Bson = 5i32.into();

let string = bson!("hello world");
let int = bson!(5);
let array = bson!([5, false]);

bson! has supports both array and object literals, and it automatically converts any values specified to Bson, provided they are Into<Bson>.

Bson value unwrapping

Bson has a number of helper methods for accessing the underlying native Rust types. These helpers can be useful in circumstances in which the specific type of a BSON value is known ahead of time.

e.g.:

let value = Bson::Int32(5);
let int = value.as_i32(); // Some(5)
let bool = value.as_bool(); // None

let value = bson!([true]);
let array = value.as_array(); // Some(&Vec<Bson>)

BSON documents

BSON documents are ordered maps of UTF-8 encoded strings to BSON values. They are logically similar to JSON objects in that they can contain subdocuments, arrays, and values of several different types. This crate models BSON documents via the Document struct.

Creating Documents

Documents can be created directly either from a byte reader containing BSON data or via the doc! macro:

let mut bytes = hex::decode("0C0000001069000100000000").unwrap();
let doc = Document::decode(&mut bytes.as_slice()).unwrap(); // { "i": 1 }

let doc = doc! {
   "hello": "world",
   "int": 5,
   "subdoc": { "cat": true },
};

doc! works similarly to bson!, except that it always returns a Document rather than a Bson.

Document member access

Document has a number of methods on it to facilitate member access:

let doc = doc! {
   "string": "string",
   "bool": true,
   "i32": 5,
   "doc": { "x": true },
};

// attempt get values as untyped Bson
let none = doc.get("asdfadsf"); // None
let value = doc.get("string"); // Some(&Bson::String("string"))

// attempt to get values with explicit typing
let string = doc.get_str("string"); // Ok("string")
let subdoc = doc.get_document("doc"); // Some(Document({ "x": true }))
let error = doc.get_i64("i32"); // Err(...)

Modeling BSON with strongly typed data structures

While it is possible to work with documents and BSON values directly, it will often introduce a lot of boilerplate for verifying the necessary keys are present and their values are the correct types. serde provides a powerful way of mapping BSON data into Rust data structures largely automatically, removing the need for all that boilerplate.

e.g.:

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

fn typed_example() {
    // Some BSON input data as a `Bson`.
    let bson_data: Bson = bson!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    });

    // Deserialize the Person struct from the BSON data, automatically
    // verifying that the necessary keys are present and that they are of
    // the correct types.
    let mut person: Person = bson::from_bson(bson_data).unwrap();

    // Do things just like with any other Rust data structure.
    println!("Redacting {}'s record.", person.name);
    person.name = "REDACTED".to_string();

    // Get a serialized version of the input data as a `Bson`.
    let redacted_bson = bson::to_bson(&person).unwrap();
}

Any types that implement Serialize and Deserialize can be used in this way. Doing so helps separate the "business logic" that operates over the data from the (de)serialization logic that translates the data to/from its serialized form. This can lead to more clear and concise code that is also less error prone.

Breaking Changes

In the BSON specification, unsigned integer types are unsupported; for example, u32. In the older version of this crate (< v0.8.0), if you uses serde to serialize unsigned integer types into BSON, it will store them with Bson::Double type. From v0.8.0, we removed this behavior and simply returned an error when you want to serialize unsigned integer types to BSON. #72

For backward compatibility, we've provided a mod bson::compat::u2f to explicitly serialize unsigned integer types into BSON's floating point value as follows:

#[test]
fn test_compat_u2f() {
    #[derive(Serialize, Deserialize, Eq, PartialEq, Debug)]
    struct Foo {
        #[serde(with = "bson::compat::u2f")]
        x: u32
    }

    let foo = Foo { x: 20 };
    let b = bson::to_bson(&foo).unwrap();
    assert_eq!(b, Bson::Document(doc! { "x": Bson::Double(20.0) }));

    let de_foo = bson::from_bson::<Foo>(b).unwrap();
    assert_eq!(de_foo, foo);
}

In this example, we added an attribute #[serde(with = "bson::compat::u2f")] on field x, which will tell serde to use the bson::compat::u2f::serialize and bson::compat::u2f::deserialize methods to process this field.

Contributing

We encourage and would happily accept contributions in the form of GitHub pull requests. Before opening one, be sure to run the tests locally; check out the testing section for information on how to do that. Once you open a pull request, your branch will be run against the same testing matrix that we use for our continuous integration system, so it is usually sufficient to only run the integration tests locally against a standalone. Remember to always run the linter tests before opening a pull request.

Running the tests

Integration and unit tests

To actually run the tests, you can use cargo like you would in any other crate:

cargo test --verbose # runs against localhost:27017

Linter Tests

Our linter tests use the nightly version of rustfmt to verify that the source is formatted properly and the stable version of clippy to statically detect any common mistakes. You can use rustup to install them both:

rustup component add clippy --toolchain stable
rustup component add rustfmt --toolchain nightly

To run the linter tests, run the check-clippy.sh and check-rustfmt.sh scripts in the .evergreen directory:

bash .evergreen/check-clippy.sh && bash .evergreen/check-rustfmt.sh

Continuous Integration

Commits to master are run automatically on evergreen.

Comments

  • Map undefined to None
    Map undefined to None

    May 18, 2020

    Using bson with current mongodb driver, if a Deserialize target has an Option<> field, where the database holding an undefined value, it try to deserialise undefined ad the content of the Option. I think most peopole will expect undefined to be deseralized to None when possible.

    Also, can you consider making a release soon, having any undefined value in database makes it impossible to work with the current mongodb driver, thanks.

    Reply
  • Any reason for not providing Uuid wrapper just like UtcDataTime?
    Any reason for not providing Uuid wrapper just like UtcDataTime?

    May 25, 2020

    Hi there! Thanks for all of your great work!

    In my project, I want to use Uuid as document field, but it seems not to easy. When I serialize Uuid into Bson, it always serialized as String because of its original implementation.

    I think this situation is pretty similar to DateTime<Utc> whose thin wrapper is provided by this crate.

    Is there any reason for not providing Uuid wrapper?

    If it is provided, it would be very useful for me.

    Sorry if I missed something. I'm pretty new to this crate.

    Thanks.

    Reply
  • Bump dependencies for hex and md5 crates
    Bump dependencies for hex and md5 crates

    Jun 5, 2020

    • hex -> 0.4.2
    • md5 -> 0.7.0

    The dependency updates should be innocuous and bring us to the latest version of these crates.

    Reply
  • Error convert string Json to Document using Serde.
    Error convert string Json to Document using Serde.

    Jun 5, 2020

    Hi everyone, i try convert a String JSON to Document, but i receive a error when key value is a Integer, if i put in quotes number then it's work.

    it's work: let d: Document = serde_json::from_str( &"{ "id" : "1" }".to_string() ).unwrap();

    Don't work: let d: Document = serde_json::from_str( &"{ "id" : 1 }".to_string() ).unwrap();

    I receive this message below:

    thread '' panicked at 'called Result::unwrap() on an Err value: Error("invalid type: integer 1, expected a signed integer", line: 1, column: 10)' note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

    Reply
  • The ability to custom tag DateTime<Utc> for automatic conversion.
    The ability to custom tag DateTime for automatic conversion.

    Jun 9, 2020

    Here's an example of a collection with records expiring by expires_at:

    #[collection = "subscriptions"]
    #[derive(Serialize, Deserialize, Debug, Clone)]
    pub struct Subscription {
        #[serde(rename = "_id")]
        pub id: Option<ObjectId>,
        pub expires_at: DateTime<Utc>,
    }
    

    In order for that to work I need to use the Bson::UtcDateTime helper (now DateTime), but in that case when I deliver this model to the frontend I end up with

        "expires_at": {
            "$date": {
                "$numberLong": 1591700287095
            }
        },
    

    instead of proper date. The approach with duplicating the object just for the sake date serialization is not the most convenient one.

    What would be nice to have is some sort of annotation that would tell Bson Encoder/Decoder that this DateTime field has to be de/serialized as Bson date, while serde_json will serialize it as usual.

    The same applies to ObjectId, currently if one wants a readable hex string instead of a json object there's a lot of duplication involved.

    Reply
  • How to merge two bson doc?
    How to merge two bson doc?

    Jun 14, 2020

    let doc1 = doc! {
      "field1": "value1"
    }
    
    let doc2 = doc! {
      "field2": "value2"
    }
    

    Merged doc:

    let merged_doc = doc! {
      "field1": "value1",
      "field2": "value2",
    }
    

    I know there is an API: doc1.insert("field2", "value2"); I think it would be great if have an API like this: doc1.merge(doc2)

    Reply
  • How do I decode binary data?
    How do I decode binary data?

    Mar 8, 2018

    Hi,

    I'm implementing an auth mechanism based on bson and sodiumoxide, and I'm having trouble decoding the challenge. Sodiumoxide treats plaintexts as a &[u8], so I'd like to use a binary string instead of a Unicode string for the challenge, but I can't seem to get bson to decode it correctly.

    Code

    With a "standard" string, everything works fine:

    #[derive(Serialize, Deserialize, Debug)]
    pub struct AuthChallenge {
        pub challenge: String,
    }
    
    // [...]
    
        let challenge: AuthChallenge = bson::from_bson(
            bson::Bson::Document(
                bson::decode_document(
                    &mut Cursor::new(&message.payload[..])
                ).chain_err(|| "Could not decode bson")?
            )
        ).expect("Decoding failed");
        info!("Challenge: {:?}", challenge.challenge);
    

    This yields:

    2018-03-08T08:22:04+01:00 - INFO - Received message ReqAuthentication (payload: [53, 0, 0, 0, 2, 99, 104, 97, 108, 108, 101, 110, 103, 101, 0, 33, 0, 0, 0, 49, 56, 55, 54, 50, 98, 57, 56, 98, 55, 99, 51, 52, 99, 50, 53, 98, 102, 57, 100, 99, 51, 49, 53, 52, 101, 52, 97, 53, 99, 97, 51, 0, 0])
    2018-03-08T08:22:04+01:00 - INFO - Challenge: "18762b98b7c34c25bf9dc3154e4a5ca3"
    

    However, if I changeString to Vec<u8> in the struct and change the server side to send a binary string (5) instead of a standard string (2), I get this:

    2018-03-08T08:28:12+01:00 - INFO - Received message ReqAuthentication (payload: [53, 0, 0, 0, 5, 99, 104, 97, 108, 108, 101, 110, 103, 101, 0, 32, 0, 0, 0, 0, 54, 55, 98, 54, 100, 53, 50, 99, 50, 101, 48, 51, 52, 52, 56, 49, 98, 52, 57, 101, 102, 51, 56, 56, 101, 100, 100, 54, 51, 98, 50, 102, 0])
    thread 'main' panicked at 'Decoding failed: InvalidType("a sequence")', /checkout/src/libcore/result.rs:906:4
    

    I'm also having trouble encoding the signature, because to_bson would always complain that there are no unsigned types in bson, so I ended up doing the encoding manually:

        bson::Bson::Binary(
            bson::spec::BinarySubtype::Generic,
            Vec::from(&signature[..])
        )
    

    Am I doing it wrong, or does bson not currently support binary strings correctly? Can I help in fixing it somehow?

    Reply
  • Implemented `Default` for `Bson`, `Document`, and `ObjectId`
    Implemented `Default` for `Bson`, `Document`, and `ObjectId`

    Sep 29, 2017

    Implemented Default for Bson, Document, and ObjectId. I also fixed some warnings with the tests.

    Reply
  • how to deserialize or serialize a DateTime<UTC>?
    how to deserialize or serialize a DateTime?

    Mar 15, 2017

    I can't work with https://serde.rs/custom-date-format.html.

    Reply
  • Bugs in decoder found by fuzzing
    Bugs in decoder found by fuzzing

    Mar 14, 2017

    Found the following:

    • [x] "thread '' panicked at 'No such local time'" From: chrono-0.2.25/src/offset/mod.rs:151 via src/decoder/mod.rs:172
    • [x] "thread '' panicked at 'attempt to multiply with overflow'" - src/decoder/mod.rs:172
    • [x] "thread '' panicked at 'attempt to subtract with overflow'" src/decoder/mod.rs:45
    • [ ] "AddressSanitizer failed to allocate 0xffffffff93000000 bytes" (whatever that means in real life)

    Full logs: https://gist.github.com/killercup/5e8623e0d8b0fe9868b45eb223ef51d8 (See last few lines for inputs used, in bytes or base64)

    See https://github.com/rust-fuzz/targets/pull/51 for sources, I ran it with

    $ env ASAN_OPTIONS="detect_odr_violation=0 allocator_may_return_null=1" ./run-fuzzer.sh bson read_bson
    

    cc https://github.com/rust-fuzz/targets/issues/39

    Reply
  • use random byte array instead of process_id and machine_id
    use random byte array instead of process_id and machine_id

    Jan 14, 2019

    Hello o/

    I made a PR last week to compat for WASM compilation, adding a WASM specific method to use random bytes arrays to substitute a call to libc and getting hostname. In the discussion there, @saghm pointed out MongoDB drivers are now transitioning to use 5 random bytes instead of process_id and machine_id (spec). This PR implements that.

    In addition to using random bytes, this PR also:

    • [x] removes getter method for process_id (spec outlines that random value must not be accessible)
    • [x] removes process_id and machine_id entirely
    • [x] adds a check to travis for a WASM build

    This would be probably a semver major in semver terms, but I am not sure how you'd want to handle that (or deprecation for that matter).

    Thank you for your time!

    Reply
  • Bson to Document
    Bson to Document

    Sep 15, 2015

    What would be the best way to convert a bson to a document that is suitable for inserting into the database?

    Reply