Rust-Weld: weld — High-performance runtime for data analytics applications

Weld

Build Status

Documentation

Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a common intermediate representation, and optimizing across each framework.

Modern analytics applications combine multiple functions from different libraries and frameworks to build complex workflows. Even though individual functions can achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. Weld’s take on solving this problem is to lazily build up a computation for the entire workflow, and then optimizing and evaluating it only when a result is needed.

You can join the discussion on Weld on our Google Group or post on the Weld mailing list at [email protected].

Contents

Building

To build Weld, you need the latest stable version of Rust and LLVM/Clang++ 6.0.

To install Rust, follow the steps here. You can verify that Rust was installed correctly on your system by typing rustc into your shell. If you already have Rust and rustup installed, you can upgrade to the latest stable version with:

rustup update stable

MacOS LLVM Installation

To install LLVM on macOS, first install Homebrew. Then:

brew install [email protected]

Weld's dependencies require llvm-config on $PATH, so you may need to create a symbolic link so the correct llvm-config is picked up (note that you might need to add sudo at the start of this command):

ln -sf `brew --prefix [email protected]`/bin/llvm-config /usr/local/bin/llvm-config

To make sure this worked correctly, run llvm-config --version. You should see 6.0.x.

Ubuntu LLVM Installation

To install LLVM on Ubuntu, get the LLVM 6.0 sources and then apt-get:

On Ubuntu 16.04 (Xenial):

wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-6.0 main"
sudo apt-get update
sudo apt-get install llvm-6.0-dev clang-6.0

On Ubuntu 14.04 (Trusty):

wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-6.0 main"

# gcc backport is required on 14.04, for libstdc++. See https://apt.llvm.org/
sudo apt-add-repository "deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu trusty main"
sudo apt-get update
sudo apt-get install llvm-6.0-dev clang-6.0

Weld's dependencies require llvm-config, so you may need to create a symbolic link so the correct llvm-config is picked up. sudo may be required:

ln -s /usr/bin/llvm-config-6.0 /usr/local/bin/llvm-config

To make sure this worked correctly, run llvm-config --version. You should see 6.0.x or newer.

You will also need zlib:

sudo apt-get install zlib1g-dev

Building Weld

With LLVM and Rust installed, you can build Weld. Clone this repository, set the WELD_HOME environment variable, and build using cargo:

git clone https://www.github.com/weld-project/weld
cd weld/
export WELD_HOME=`pwd`
cargo build --release

Weld builds two dynamically linked libraries (.so files on Linux and .dylib files on Mac): libweld and libweldrt.

Finally, run the unit and integration tests:

cargo test

Documentation

The Rust Weld crate is documented here.

The docs/ directory contains documentation for the different components of Weld.

  • language.md describes the syntax of the Weld IR.
  • api.md describes the low-level C API for interfacing with Weld.
  • python.md gives an overview of the Python API.
  • tutorial.md contains a tutorial for how to build a small vector library using Weld.

Python Bindings

Weld's Python bindings are in python, with examples in examples/python.

Grizzly

Grizzly is a subset of Pandas integrated with Weld. Details on how to use Grizzly are in python/grizzly. Some example workloads that make use of Grizzly are in examples/python/grizzly. To run Grizzly, you will also need the WELD_HOME environment variable to be set, because Grizzly needs to find its own native library through this variable.

Testing

cargo test runs unit and integration tests. A test name substring filter can be used to run a subset of the tests:

cargo test <substring to match in test name>

Tools

This repository contains a number of useful command line tools which are built automatically with the main Weld repository, including an interactive REPL for inspecting and debugging programs. More information on those tools can be found under docs/tools.md.

Comments

  • Weld as a Foundation for ML in Rust
    Weld as a Foundation for ML in Rust

    Dec 1, 2019

    Hello, thank you for your work. I'm currently looking for a suitable solution for building Machine Learning foundation libraries in Rust.

    I'm evaluating how Weld could help in this space by generating optimised code for CPU and GPU by leveraging it's IR. However, I couldn't find much information about how to use Weld to generate GPU code.

    Is this something in progress? Where can I find more info about it?

    Reply
  • unable to import grizzly.grizzly
    unable to import grizzly.grizzly

    Dec 10, 2019

    Hi, I followed the instructions for Mac installation and ran setup.py for "pyweld" and "grizzly". Importing grizzly.grizzly threw error for me since, inside the grizzly.py it was unable to import utils.py or seriesweld.py etc.
    The imports worked once I manually changed the imports to -- "import grizzly.utils" "import grizzly.seriesweld" But now, it popped another error saying - "dlopen(/usr/local/lib/python3.7/site-packages/grizzly/numpy_weld_convertor.dylib, 6): image not found" This error also showed up for 'libweld.dylib' which was solved again by manually copying to the path.

    Reply
  • Is there any benchmark between pandas and grizzly?
    Is there any benchmark between pandas and grizzly?

    Jan 9, 2020

    thx

                                                                                                                                                                                                           
    Reply
  • Incorrect website grizzly install command
    Incorrect website grizzly install command

    Mar 9, 2020

    According to https://www.weld.rs/grizzly/ the command to get started is $ pip install grizzly however this doesn't seem to be correct. The grizzly package on PyPi is some sort of USB driver.

    I believe the documentation should be: pip install pygrizzly but that is failing for me, too

    Reply
  • "simplify_assignments" creates infinite loop with iterate

    Mar 18, 2020

    e.g., the following code will loop infinitely due to an incorrect variable deletion in the SIR:

    |e: vec[i8]|
    let lenString = len(e);
    iterate(lenString - 1L, |p| {p - 1L, p > 7L})
    
    Reply
  • Build error while on Fedora
    Build error while on Fedora

    Apr 25, 2020

    While running the python setup.py develop command under weld-python on Fedora. We found the following error:

    Fresh uuid v0.7.4
           Fresh llvm-sys v60.4.1
       Compiling weld v0.4.0 (/home/zyu/weld/weld)
         Running `/home/zyu/weld/weld-python/target/debug/build/weld-6c83ee8deea2cb14/build-script-build`
    error: failed to run custom build command for `weld v0.4.0 (/home/zyu/weld/weld)`
    
    Caused by:
      process didn't exit successfully: `/home/zyu/weld/weld-python/target/debug/build/weld-6c83ee8deea2cb14/build-script-build` (exit code: 101)
    --- stdout
    cargo:rustc-env=BUILD_ID=dcbba9a
    
    cargo:rustc-link-lib=dylib=stdc++
    make: Entering directory '/home/zyu/weld/weld/llvmext'
    clang++-9.0 -O3 -fno-use-cxa-atexit -I/usr/include -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/include -std=c++11   -fno-exceptions -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -c llvmext.cpp -o /home/zyu/weld/weld-python/target/debug/build/weld-0807948382241f60/out/llvmext.o
    make: Leaving directory '/home/zyu/weld/weld/llvmext'
    
    --- stderr
    make: clang++-9.0: Command not found
    make: *** [Makefile:26: llvmext.o] Error 127
    thread 'main' panicked at 'assertion failed: status.success()', /home/zyu/weld/weld/build.rs:41:5
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    
    warning: build failed, waiting for other jobs to finish...
    
    Reply
  • Python encoder
    Python encoder

    Mar 22, 2017

    @sppalkia: just as an FYI, not ready to merge.

    Major issues right now:

    • Some benchmarks don't yet work correctly
    • Encoder / decoder much slower than C++ encoder / decoder
    Reply
  • Annotations
    Annotations

    May 17, 2017

                                                                                                                                                                                                           
    Reply
  • String ~ vec[i8] comparisons Python3
    String ~ vec[i8] comparisons Python3

    Oct 30, 2018

    Am attempting in baloo to encode strings to Weld for e.g. sr[sr != 'abc'] to work, however there seems to be a bug somewhere. Are vec[i8] <comparison> vec[i8] expected to work correctly at the Weld level?

    For example:

    // _inp2 here is the index associated with the _inp0 strings data
    |_inp0: vec[vec[i8]], _inp1: vec[i8], _inp2: vec[i64]| let obj100 = (_inp0);
    let obj101 = (map(
        obj100,
        |a: vec[i8]| 
            a != _inp1
    ));
    result(
        for(
            zip(_inp2, obj101),
            appender[i64],
            |b: appender[i64], i: i64, e: {i64, bool}| 
                if (e.$1, 
                    merge(b, e.$0), 
                    b)
        )
    )
    

    This only seems to work when _inp1 is of length 1. So for:

    sr = Series(np.array(['abc', 'Burgermeister', 'b'], dtype=np.bytes_))
    sr[sr != 'b']  # will correctly return the first 2 elements
    sr[sr != 'abc']  # does not; (returns all elements)
    

    The most likely culprit is the encoding with Python3. The only changes I made are essentially moving from PyString_AsString and PyString_Size to the PyBytes_* equivalents (in the .cpp file) and encoding the str to utf-8, e.g. abc.encode('utf-8') (in the encoders.py file):

    extern "C"
    weld::vec<uint8_t> str_to_weld_char_arr(PyObject* in) {
      int64_t dimension = (int64_t) PyBytes_Size(in);
      weld::vec<uint8_t> t;
      t.size = dimension;
      t.ptr = (uint8_t*) PyBytes_AsString(in);
      return t;
    }
    
    ...
    if isinstance(obj, str):
            numpy_to_weld = self.utils.str_to_weld_char_arr
            numpy_to_weld.restype = WeldVec(WeldChar()).ctype_class
            numpy_to_weld.argtypes = [py_object]
    
            return numpy_to_weld(obj.encode('utf-8'))
    

    Note that

    1. En-/decoding numpy arrays of bytes works fine with the grizzly encoders (and using PyBytes_FromStringAndSize instead of PyString_FromStringAndSize).
    2. Also toyed around with modifying WeldChar.ctype_class to c_char_p as opposed to c_wchar_p which seemed more appropriate yet produces the same result.
    3. Encoding as ascii would probably be more appropriate, since Weld can't handle unicode from what I can tell. Nevertheless, the tested data is ascii.
    4. This is with the master branch Weld.

    Any feedback/idea on what the issue might be?

    Reply
  • Apply transforms as part of optimization passes
    Apply transforms as part of optimization passes

    May 3, 2017

                                                                                                                                                                                                           
    Reply
  • Build refactor
    Build refactor

    Sep 27, 2017

    1. Removes the make commands from build.rs that were used to build the convertor dylib.
    2. Change the package name grizzly to pygrizzly
    3. Added a binary extension for libweld in pyweld/setup.py which somehow allowed auditwheel to stop complaining about libweld.so being part of the python wheel. I could run auditwheel successfully and it changed the platform tag which allowed me to upload to pypi.
    Reply
  • Codegen cleanup
    Codegen cleanup

    Jan 3, 2017

    On leg 2 of my winter break journey across the US, I made a prototype of a new LLVM code generator using llvm-rs. Three major changes:

    1. Code generation through builders. Instead of generating LLVM code strings, builders provide a more type-safe (both at compile time and runtime) and concise means of creating an IR.
    2. New code execution runtime. The llvm-rs JitEngine is pretty comparable to what's already implemented in easy_ll, except it integrates well with the code builders instead of relying on a string intermediary.
    3. Revamped the REPL to actually produce output and use a command line parser.
    Reply