Rust-PumpkinDB v0.2.0: PumpkinDB — an event sourcing database engine

icon
Latest Release: v0.2.0

In this release, the most notable change was splitting of PumpkinDB into multiple crates. This enables PumpkinDB users to build custom systems (embedded and client-server ones) that use PumpkinDB as an engine. Also, pumpkinscript and pumpkindb_client crates allow building Rust applications that use PumpkinDB as remote clients. pumpkindb_mio_server crate simplifies developing custom PumpkinDB server builds.

A confusing "word" term has been changed to "instruction" to provide a clearer indication of the concept behind it.

We've also introduced more conventions around numbers (especially signed and unsigned sized integers, as well as floats) and added respective instructions to the library.

Initial UUID support has been added. Cursor API surface has been reduced and is much easier to comprehend now.

Internally, a lot has changed. A new messaging system (it is now available not only through the server component, but pumpkindb_engine as well). Dispatcher trait allows defining custom sets of instruction "modules" and is flexible enough to support both static and dynamic dispatching.

And, of course, a number of bugs has been spotted and some of them were even fixed. We've also optimized a number of things and found out that there's a lot more to optimize in the upcoming versions!

This is the first release with published library crates on crates.io

Also, this release includes statically built versions of PumpkinDB server and terminal for Linux (using musl). This is an experimental build as in the future we might need to depend on some dynamically libraries.

New contributors: alex, Christoph Herzog, Guillaume Gauvrit, Matteo Semenzato and Stuart Hinson.

Thank you!

SHA-256 checksums:

| File | Checksum | |------|----------| | pumpkindb-v0.2.0-x86_64-apple-darwin.tar.gz | f7e81dfa0bfc02de31b4bd70c5b6a13e95a44550d33d53a77179f5c23a9efde4 | | pumpkindb-v0.2.0-x86_64-pc-windows-msvc.zip | cf4b0fb7e893fdcd4fc7881b01e7c4bf045594e85bdd73cc88d2fec8ac2418a1 | | pumpkindb-v0.2.0-x86_64-unknown-linux-gnu.tar.gz | efcca81f4b482d6a69bfe08f935d2da61c8f7e2262798dcd93a656aeb8c58e15 | | pumpkindb-v0.2.0-x86_64-unknown-linux-musl.tar.gz | 4dc32dbf05dbe0df727128c7455c5482d3366ebf1a32f7308d22120a53c67388 |

Source code(tar.gz)
Source code(zip)
pumpkindb-v0.2.0-x86_64-apple-darwin.tar.gz(2.19 MB)
pumpkindb-v0.2.0-x86_64-pc-windows-msvc.zip(1.87 MB)
pumpkindb-v0.2.0-x86_64-unknown-linux-gnu.tar.gz(3.92 MB)
pumpkindb-v0.2.0-x86_64-unknown-linux-musl.tar.gz(3.96 MB)

PumpkinDB

Gitter chat Code Triagers OpenCollective OpenCollective

Build status (Linux) Build Status
Build status (Windows) Windows Build status
Project status Usable, between alpha and beta
Production-readiness Depends on your risk tolerance

PumpkinDB is an immutable ordered key-value database engine, featuring:

  • ACID transactions
  • Persistent storage
  • An embedded programming language (PumpkinScript)
  • Binary keys and values (allows any encoding to be used: JSON, XML, Protobuf, Cap'n Proto, etc.)
  • Standalone and embedded scenarios

Why immutable?

Simply put, the data replaced is data deleted and is therefore, an unsafe way to manage data. Bugs, misunderstanding, changing scope and requirements and other factors might influence what data (and especially past data) means and how can it be used.

By guaranteeing the immutability of key's value once it is set, PumpkinDB forces its users to think of their data through a temporal perspective.

This approach is highly beneficial for implementing event sourcing and similar types of architectures.

What is PumpkinDB?

PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn't allow overriding values once they are set. Core motivation for immutability was that with the cost of storage declining, erasing data is effectively a strategical mistake.

While not intended for general purpose programming, its main objective is to facilitate building specialized application-specific and generic databases with a particular focus on immutability and processing data as close to storage as possible, incurring as little communication penalty as possible.

Applications communicate with PumpkinDB by sending small PumpkinScript programs over a network interface (or API when using PumpkinDB as an embedded solution).

PumpkinDB offers a wide array of primitives for concurrency, storage, journalling, indexing and other common building blocks.

Why is it a database engine?

The core ideas behind PumpkinDB stem from the so called lazy event sourcing approach which is based on storing and indexing events while delaying domain binding for as long as possible. That said, the intention of this database is to be a building block for different kinds of architectures, be it classic event sourcing (using it as an event store), lazy event sourcing (using indices) or anything else. It's also possible to implement different approaches within a single database for different parts of the domain.

Instead of devising custom protocols for talking to PumpkinDB, the protocol of communication has become a pipeline to a script executor. This offers us enormous extension and flexibility capabilities.

While an external application can talk to PumpkinDB over a network connection, PumpkinDB's engine itself is embeddable and can be used directly. Currenly, it is available for Rust applications only, but this may one day extend to all languages that can interface with C.

Client libraries

Language Library Status
Rust pumpkindb_client Early release (0.2.0)
Java pumpkindb-client Pre-release

Trying it out

You can download PumpkinDB releases from GitHub.

Docker

You can try out latest PumpkinDB HEAD revision by using a docker image:

$ docker pull pumpkindb/pumpkindb

Alternatively, you can build the image yourself:

$ docker build . -t pumpkindb/pumpkindb

Run the server:

$ docker run -p 9981:9981 -ti pumpkindb/pumpkindb
2017-04-12T02:52:47.440873517+00:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-12T02:52:47.440983318+00:00 INFO pumpkindb - Starting up
2017-04-12T02:52:47.441122740+00:00 INFO pumpkindb_engine::storage - Available disk space is approx. 56Gb, setting database map size to it
2017-04-12T02:52:47.441460231+00:00 INFO pumpkindb - Starting 4 schedulers
2017-04-12T02:52:47.442375937+00:00 INFO pumpkindb - Listening on 0.0.0.0:9981

Finally, connect to it using pumpkindb-term:

$ docker run -ti pumpkindb/pumpkindb pumpkindb-term 172.17.0.1:9981 # replace IP with the docker host IP

Building from the source code

You are also welcome to clone the repository and build it yourself. You will need Rust Nightly to do this. The easiest way to get it is to use rustup

$ rustup install nightly
$ rustup override set nightly # in PumpkinDB directory

After that, you can run PumpkinDB server this way:

$ cargo build --all
$ ./target/debug/pumpkindb
2017-04-03T10:43:49.667667-07:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-03T10:43:49.668660-07:00 INFO pumpkindb - Starting up
2017-04-03T10:43:49.674139-07:00 INFO pumpkindb_engine::storage - Available disk space is approx. 7Gb, setting database map size to it
2017-04-03T10:43:49.675759-07:00 INFO pumpkindb - Starting 8 schedulers
2017-04-03T10:43:49.676113-07:00 INFO pumpkindb - Listening on 0.0.0.0:9981

You can connect to it using pumpkindb-term:

$ ./target/debug/pumpkindb-term
Connected to PumpkinDB at 0.0.0.0:9981
To send an expression, end it with `.`
Type \h for help.
PumpkinDB> ["Name" HLC CONCAT "Jopn Doe" ASSOC COMMIT] WRITE.

PumpkinDB> ["Name" HLC CONCAT "John Doe" ASSOC COMMIT] WRITE.

PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP CURSOR/VAL] READ (Get last value).
"John Doe"
PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP DUP CURSOR/PREV DROP CURSOR/VAL] READ (Get previous value).
"Jopn Doe"

(The above example shows how one can query and navigate for values submitted at a different time, using low level primitives).

You can change some of the server's parameters by creating pumpkindb.toml:

[storage]
path = "path/to/db"
# By default, mapsize will equal to the size of
# available space on the disk, except on Windows,
# where default would be 1Gb.
# `mapsize` is a theoretical limit the database can
# grow to. However, on Windows, this also means that
# the database file will take that space.
# This parameter allows to specify the mapsize
# in megabytes.
# mapsize = 2048

[server]
port = 9981

Components

PumpkinDB project is split into a couple of separate components (crates):

  • pumpkinscript — PumpkinScript parser. Allows to convert text PumpkinScript form into binary one.
  • pumpkindb_engine — Core PumpkinDB library. Provides PumpkinScript scheduler, and a standard library of instructions
  • pumpkindb_mio_server — Async MIO-based PumpkinDB server library. Useful for building custom PumpkinProtocol-compatible servers.
  • pumpkindb_client — PumpkinProtocol client library.
  • pumpkindb_server — Stock PumpkinDB server. Built on top of pumpkindb_mio_server.
  • pumpkindb_term — console-based PumpkinDB server client.
  • doctests — a small utility to run instructions doctests.

Contributing

This project is in its very early days and we will always be welcoming contributors.

Our goal is to encourage frictionless contributions to the project. In order to achieve that, we use Unprotocols C4 process. Please read it, it will answer a lot of questions. Our goal is to merge pull requests as quickly as possible and make new stable releases regularly.

In a nutshell, this means:

  • We merge pull requests rapidly (try!)
  • We are open to diverse ideas
  • We prefer code now over consensus later

To learn more, read our contribution guidelines

We also maintain a list of issues that we think are good starters for new contributors.

Backers

Support us with a monthly donation and help us continue our activities. [Become a backer]

Sponsors

Become a sponsor and get your logo on our README on Github with a link to your site. [Become a sponsor]

Comments

  • Problem: switching storage backends is nearly impossible
    Problem: switching storage backends is nearly impossible

    Jun 30, 2017

    Currently, lmdb is very entangled with PumpkinDB's code. Should we want to add another backend (existing or a new one we develop for PumpkinDB), it will require some serious modifications.

    help wanted PumpkinScript Storage dependency 
    Reply
  • Problem: different users would use different naming conventions
    Problem: different users would use different naming conventions

    Jun 30, 2017

    For example, they would use different prefixes for "constant" instructions, "private" instructions, namespaces, etc.

    Proposed solution: develop a "standard convention" proposal

    WhatCanIStartWith? 
    Reply
  • Problem:
    Problem: "builtins" files don't allow for computed constants

    Jun 30, 2017

    Currently, all "builtin" instruction files (currently using .builtins extension) with instruction / implementation pairs, handle these as rough equivalents of DEF. This works most of the time, except that in some cases it would be preferable to avoid recomputing values that are static in nature.

    Proposed solution: find a way to have a notation for a semantic equivalent of SET and implement it

    enhancement PumpkinScript WhatCanIStartWith? 
    Reply
  • Problem: PumpkinDB is too low-level
    Problem: PumpkinDB is too low-level

    Jul 4, 2017

    For most end-user cases, PumpkinDB is too "close to the metal" to be easily plugged into a business application, especially if "event sourcing" type of scenarios are considered.

    Proposed solution: implement a higher level database on top of PumpkinDB

    discussion 
    Reply
  • Problem: lazy builtins! macro defers syntax checking until run time
    Problem: lazy builtins! macro defers syntax checking until run time

    Jul 10, 2017

    Currently, builtins! uses lazy_static! to prepare BUILTINS map of instructions. This means that until handle_builtins is first called, nothing is going to get parsed. So if there's, for example, a parsing mistake, the whole engine will crash.

    Proposed solution: find a way to parse builtins during compile time

    PumpkinScript WhatCanIStartWith? 
    Reply
  • Problem: Rust nightly after 2017-06-20 affects benchmarks negatively
    Problem: Rust nightly after 2017-06-20 affects benchmarks negatively

    Aug 3, 2017

    Running cargo bench acker in the pumpkindb_engine crate:

    nightly-2017-06-20: (last known "good" nightly)

    test script::tests::ackermann                                                  ... bench:  32,562,056 ns/iter (+/- 4,313,209)
    test script::tests::ackermann_stack                                            ... bench:  24,391,250 ns/iter (+/- 4,758,534)
    

    Most likely "first offender"

    nightly-2017-06-21: (https://github.com/rust-lang/rust/commit/445077963)

    test script::tests::ackermann                                                  ... bench:  99,398,843 ns/iter (+/- 5,556,874)
    test script::tests::ackermann_stack                                            ... bench:  51,625,752 ns/iter (+/- 3,971,108)
    

    Random builds:

    2017-07-24:

    test script::tests::ackermann                                                  ... bench:  49,617,433 ns/iter (+/- 6,028,955)
    test script::tests::ackermann_stack                                            ... bench:  36,797,438 ns/iter (+/- 4,442,438)
    

    2017-07-01:

    test script::tests::ackermann                                                  ... bench:  97,953,353 ns/iter (+/- 5,756,554)
    test script::tests::ackermann_stack                                            ... bench:  51,599,955 ns/iter (+/- 3,756,541)
    

    2017-07-12:

    test script::tests::ackermann                                                  ... bench:  50,183,660 ns/iter (+/- 7,364,549)
    test script::tests::ackermann_stack                                            ... bench:  39,244,258 ns/iter (+/- 6,607,862)
    

    2017-06-23:

    test script::tests::ackermann                                                  ... bench: 111,698,229 ns/iter (+/- 17,495,313)
    test script::tests::ackermann_stack                                            ... bench:  58,807,694 ns/iter (+/- 4,591,336)
    

    Last build:

    2017-08-02:

    test script::tests::ackermann                                                  ... bench:  49,253,683 ns/iter (+/- 9,255,754)
    test script::tests::ackermann_stack                                            ... bench:  35,636,600 ns/iter (+/- 5,089,613)
    

    Proposed solution: figure out what happened in the first place, why did it get so much slower?

    Reply
  • Problem: Missing input / output elements from queue operations
    Problem: Missing input / output elements from queue operations

    Aug 3, 2017

    Queue operation doc pages are missing their input / output sections. A quick update to this will make it easier to understand their usage.

    Reply
  • Is this project abandoned?
    Is this project abandoned?

    Feb 9, 2018

    There hasn't been any activity for the last half year and documentation is still very sparse. Is this already dead? Looked like a useful software.

    Reply
  • Problem: does not build on Linux anymore
    Problem: does not build on Linux anymore

    Mar 21, 2020

    Hi, I worked on making PumpkinDB compile on Linux again. I also removed deprecation warnings along the way and adopted stabilized features regarding slice patterns in Rust. As I'm not very familiar with contributing to foreign code bases please check and provide me feedback in case this pull request is strange or wrong.

    I would like to see this pull request merge thus giving PumpkinDB kind of revival. What do you think?

    Thank you.

    Kind regards

    Tobias

    BTW I also checked build with Travis beforehand.

    Reply
  • `Read` on uninitialized buffer may cause UB ( `PacketReader::read()` )
    `Read` on uninitialized buffer may cause UB ( `PacketReader::read()` )

    Jan 31, 2021

    Hello :crab:, we (Rust group @sslab-gatech) found a memory-safety/soundness issue in this crate while scanning Rust code on crates.io for potential vulnerabilities.

    Issue Description

    https://github.com/PumpkinDB/PumpkinDB/blob/f37b0b586357cafd2fd0eba8bea9072665af0b00/pumpkindb_client/src/packet.rs#L50-L55

    PacketReader::read() method creates an uninitialized buffer and passes it to user-provided Read implementation. This is unsound, because it allows safe Rust code to exhibit an undefined behavior (read from uninitialized memory).

    This part from the Read trait documentation explains the issue:

    It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

    How to fix the issue?

    The Naive & safe way to fix the issue is to always zero-initialize a buffer before lending it to a user-provided Read implementation. Note that this approach will add runtime performance overhead of zero-initializing the buffer.

    As of Jan 2021, there is not yet an ideal fix that works in stable Rust with no performance overhead. Below are links to relevant discussions & suggestions for the fix.

    Reply
  • ASL 2.0
    ASL 2.0

    Mar 1, 2017

    Is it possible to have ASL 2.0 also as a license?

    question 
    Reply
  • Problem: sized INTs with different signs don't compare well
    Problem: sized INTs with different signs don't compare well

    Apr 30, 2017

    Wherever possible, typing conventions in PumpkinScript enforce lexicographical value sorting, i.e. if a value is considered to be smaller, it should appear earlier during cursor iteration.

    Recent changes introduced in #258 break this convention:

    PumpkinDB> -100i8 +1i8 LT?.
    0x00 (incorrect)
    PumpkinDB> -100i8 -1i8 LT?.
    0x01 (correct)
    PumpkinDB> -2i8 -0i8 LT?.
    0x00 (incorrect)
    

    Proposed solution:

    I am not sure if there is a good solution to this, given how signed integers are encoded, but I think at the very least it is worth to highlight this fact (if we don't find any reasonable solution, we should explicitly document this).

    It might also be valuable to add INTxx/LT? and GT? since INT/LT? and GT? don't work on sized ints and LT and GT? don't produce desirable solution.

    Any other thoughts?

    cc @Matt8898

    Reply
  • Problem: expressing numbers in text form is inconvenient
    Problem: expressing numbers in text form is inconvenient

    Feb 10, 2017

    One has to make sure that they are hexadecimal and little-endian.

    Proposed solution: prohibit words to start with - and 0..9 and allow these to be interpreted as bigint/biguints.

    PumpkinScript 
    Reply
  • Problem: implementing words without subwords is difficult
    Problem: implementing words without subwords is difficult

    Feb 17, 2017

    Technically speaking, loaded words can define their own words but they will leak into the remainder of the program, which less than ideal.

    Using stack alone requires a significant amount of juggling that makes writing code incredibly frustrating. I personally believe that the value of stack based programming languages is in concatenative abilities (composition via stack), not just being able to do everything by juggling items in the stack.

    Solution: make all SETs and DEFs done within any closure local to that closure.

    This means that if eventually we'll need to do code injection that's not injecting what effectively amounts to closures, we'll need to have a separate pass result type that can indicate that.

    PumpkinScript 
    Reply
  • Problem: lexicographical sorting for signed integers doesn't always work
    Problem: lexicographical sorting for signed integers doesn't always work

    May 4, 2017

    Solution: Flip the first bit of sized INTs and encode unsized INTs as two's complement.

    Reply
  • Problem: Compile times are slow
    Problem: Compile times are slow

    Feb 26, 2017

    Compiling PumpkinDB takes a while, sometimes making it pretty annoying to work on.

    There are a few possible reasons as to why this could be:

    1 Lots of macros being used; 1 Lots of manual in-lining; 1 Code size.

    Maybe compile time can be brought down by breaking the project into sub-crates? I'm not sure how cargo goes about compiling then.

    build 
    Reply
  • Problem: in multi-VM setup, multiple write transactions are possible
    Problem: in multi-VM setup, multiple write transactions are possible

    Feb 13, 2017

    It is not a problem right now, but soon enough multi-VM setups will be a thing (use all the cores!) and they need to synchronize their access to writing, since lmdb only permits one active writer

    Storage 
    Reply
  • Problem: HLC invariants can be broken (#32) and too much use of static
    Problem: HLC invariants can be broken (#32) and too much use of static

    Mar 12, 2017

    Solution: This PR introduces a memory mapped meta database that is used by modules to persist whatever state they might need to reconstruct their previous state after a restart. For now this is only done for HLCs (see 77b2506).

    Additionally, remove excessive usage of lazy_static (06e1156, 5200d65, 4c7065f).

    Reply
  • Problem: wall clock going back in time between restarts will break HLC guarantee
    Problem: wall clock going back in time between restarts will break HLC guarantee

    Feb 13, 2017

    Basically, it means that after a restart values returned by HLC will be less than those generated before.

    Proposed solution: let timestamp module persist last generated HLC.

    Letting it store in the same database is rather intricate: there might be no write txn, the write txn might never succeed, etc. So I propose that PumpkinDB maintains a separate "meta" database within the environment with its own transactions.

    Any better solution (that would involve I/O penalty?)

    bug 
    Reply
  • Problem: lexicographical sorting for signed integers doesn't always work
    Problem: lexicographical sorting for signed integers doesn't always work

    May 2, 2017

    Due to the way signed integers are encoded (two's complement), even if we flip MSB to be 0 for negative and 1 for positive, it would "still incorrectly compare negative INTs, for example: 0b00000001 (-1) 0b00000010 (-2) Using LT? and GT? with this encoding would mean that -2 > -1 (the same problem we have right now with INTs), and changing LT? and GT? to take the sign bit into account would break comparisons between unsigned INTs, because 0b01111111(unsigned 127) would be interpreted as -127." @Matt8898 (see #265)

    It is an important issue as order key navigation is fairly critical to overall usefulness of PumpkinDB as a database engine.

    Proposed solution (long term): explore the direction of maintaining multiple b-trees or collation sequencing for key ranges.

    Any other, less invasive solutions?

    PumpkinScript 
    Reply