Strymonas Streams: stream fusion, to completeness

strymonas is a streaming library design for OCaml and Scala that offers support for fast, bulk, in-memory processing. It is developed using the state of the art facilities of Multi-Stage Programming (MSP) for each language. The utmost goal of the library is to offer a streaming API that achieves stream fusion at the highest level without altering the compiler backend. It covers the combination of many interesting (and challenging) combinators. The OCaml flavor, depends on BER MetaOCaml, OCaml’s dialect for MSP. The Scala one depends on scala-lms.

News

OCaml repository Build Status Gitter Star

Scala repository Build Status Gitter Star

Getting Started

The project consists of three separate projects that you can clone separately:

git clone https://github.com/strymonas/staged-streams.ocaml
git clone https://github.com/strymonas/staged-streams.scala
git clone https://github.com/strymonas/java8-benchmarks

Prerequisites

To run each one of the three benchmark and test suites, please install the following:

  1. Java 8 (JVM version of at least 1.8.0 65): from your system’s package manager
  2. Install maven the Apache build manager for Java projects
  3. OCaml: from your system’s package manager
  4. OPAM: after you install OPAM you will need to initialize it with opam init and resolve any dependencies with:
    • either the opam depext command
    • or your system’s package manager (e.g., OML’s dependencies)
  5. sbt: install following sbt’s documentation

How to compile and run the benchmarks

MetaOCaml

The following will enable MetaOCaml, install dependencies, make the project and run unit tests.

$ opam update
$ opam switch 4.02.1+BER
$ eval "opam config env"
$ cd staging-streams.ocaml
$ opam pin add staged-streams http://github.com/strymonas/staged-streams.ocaml.git

To run the benchmarks via command line use the commands below:

$ ./benchmark_batteries.byte
$ ./benchmark_stream.byte

LMS

The following will fire-up the sbt console, compile and run the test suite.

$ cd staging-streams.scala
$ sbt
$ test # to run the property tests or
$ testOnly <name of test> # to run the property tests (tab completion works)

To run the benchmarks via the sbt command line:

$ cd staging-streams.scala
$ sbt
$ jmh:run -i 10 -wi 10 -f1 .* # to run all benchmarks

Java 8

To run the benchmarks via command line:

$ cd java8-benchmarks
$ mvn clean install
$ java -Xms6g -Xmx6g -XX:-TieredCompilation -jar target/benchmarks.jar -i 10 -wi 10 -f1 .* # to run all benchmarks

Hello World

Developers can use our library as any other streaming/collection library. We assume the reader is familiar with: i) streaming APIs and ii) Multi-Stage Programming.

Firstly, the API is similar with F#’s Seq type, Java 8’s Stream API, collection APIs from OCaml and OCaml Batteries and many more. Our library supports all the core combinators for pipeline construction. Creation of streams is realized with of_arr (from arrays) and unfold (for infinite streams). We support: fold, map, filter, take, flat_map and zip_with.

Secondly, Multi-Stage Programming (MSP) is the core technique that this library follows. MSP is a meta-programming technique that allows a disciplined and safe form of run-time code generation. Staging, refers to the distinction of stages between compile- and run-time with more stages in-between. When more information about run-time values is available, staging can make a program profit in performance, by partially evaluating.

In staged-streams, this kind of (dynamic) information consists of the structure of the pipeline, the in-between combinators and the bodies of the lambdas used.

We prompt the user to read about MSP in the paper A Gentle Introduction to Multi-stage Programming, on BER MetaOCaml and on Scala-Lightweight Modular Staging (LMS) which we use for our Scala implementation. The details of our technique and the benefits on the high level of stream fusion that is achieved, are included in the POPL17 paper.

Hello World in OCaml!

In the following example we create a simple stream from an array of six elements in OCaml using MetaOCaml’s staging annotations.

open Stream_combinators

let example =
    of_arr .<[| 0;1;2;3 |]>.
      |> filter (fun x   -> .<.~x mod 2 = 0>.)
      |> fold   (fun z a -> .<.~a :: .~z>.) .<[]>.

Runcode.run example;;

When the execution reaches the run method, an optimized version of the stream will be compiled (emitted) and executed (for optimal results, native code compilation must be used using ocamlopt). The resulted code will consist of a tight, for-loop.

Hello World in Scala!

With Scala LMS, the same example is demonstrated in the snippet below.

def example (xs : Rep[Array[Int]]) : Rep[Int] = Stream[Int](xs)
      .filter(d => d % 2 == 0)
      .fold(0, ((a : Rep[Int]) => (b : Rep[Int]) => a + b))

val example_s = compile(example)

example_s(Array(0, 1, 2, 3))

For more examples see the unit directories with the unit tests for OCaml and Scala.

Bugs and Feedback

To discuss bugs and improvements please use our Github Issues for each corresponding flavor, for discussions and recommendations please use our gitter pages.

Team