Getting started with the Streamlio Sandbox (preview)

Run a single-node sandbox preview version of Streamlio inside a Docker container

We’re currently hard at work building Streamlio’s enterprise-grade real-time platform. For now, you can try out Streamlio Sandbox (preview) locally on your laptop.

What the sandbox contains

The Streamlio Sandbox (preview) combines three platforms into a single Docker image:

The Docker image also contains an example word count Heron topology. That topology does the following:

  • consumes randomly chosen sentences published to a Pulsar topic by a Pulsar producer
  • splits incoming sentences into individual words
  • counts each word into an aggregated time interval
  • periodically publishes those counts to a Pulsar topic that is then read by a Pulsar consumer

Here’s a diagram of the sandbox:

Figure 1. Architecture of the Streamlio Sandbox
Figure 1. Architecture of the Streamlio Sandbox

The producer.py and consumer.py scripts are processes that you’ll run from outside the Docker container; the gray section in the middle shows everything inside the container.

Setup

Your initial setup steps will depend on how you choose to run the sandbox. Regardless of your method of running the sandbox, you’ll need to have Python 2.7+ installed on your system, as well as the pulsar-client library. You can install it using pip:

$ pip install pulsar-client

Running the sandbox

There are three ways that you can run the sandbox:

Run the sandbox image using Docker

To run the Streamlio sandbox using Docker, you’ll need to install Docker for your platform:

The Docker image for the Streamlio sandbox is available via Docker Hub. You can run it using this command:

$ docker run -d \
  --name streamlio-sandbox \
  -p 9000:9000 \
  -p 8889:8889 \
  -p 6650:6650 \
  -p 8080:8080 \
  -p 8000:8000 \
  streamlio/sandbox

If you’d prefer to build the Docker image from source rather than pulling from Docker Hub, see the instructions below.

You can check to make sure the image is running using docker ps, which should output something like this:

CONTAINER ID        IMAGE               ...
c90100be5ea8        streamlio/sandbox   ...

Shut down and remove the image

Once you’re finished experimenting with the Streamlio sandbox, you can kill the running container:

$ docker kill streamlio-sandbox

You can also remove the container at any time:

$ docker rm streamlio-sandbox

Run the sandbox on Kubernetes

You can run the Streamlio sandbox on a running Kubernetes cluster using just a few kubectl commands. First, apply the YAML configuration:

$ kubectl apply -f \
  https://raw.githubusercontent.com/streamlio/sandbox/master/kubernetes/streamlio-sandbox.yaml

The streamlio/sandbox Docker image is fairly large, so it may take a minute or more to pull the image and start it up. You can watch the progress of the installation :

$ kubectl get pods -w -l app=streamlio-sandbox

Once the STATUS changes to RUNNING, you can connect to the running pod using kubectl’s port-forward command:

$ kubectl port-forward \
  $(kubectl get pods \
    -l app=streamlio-sandbox \
    -o=jsonpath='{.items[0].metadata.name}') \
  9000:9000 \
  8889:8889 \
  6650:6650 \
  8080:8080 \
  8000:8000

This will open all the ports necessary for running the example. You can now proceed with the rest of the example.

When you’re finished, you can remove the sandbox from your cluster:

$ kubectl delete -f \
  https://raw.githubusercontent.com/streamlio/sandbox/master/kubernetes/streamlio-sandbox.yaml

Run the sandbox image from source files

If you prefer to build the Docker image from source rather than pulling from Docker Hub, follow the instructions below.

Maven

Maven is required to build the Heron topology from source (the topology is written in Java). Follow the instructions here if Maven isn’t present on your system.

Clone the repo

$ git clone https://github.com/streamlio/sandbox.git
$ cd sandbox

Build and copy the Heron Topology

$ mvn package
$ cp target/heron-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar ./docker/

Build the Docker Image

$ docker build docker/ -t streamlio-sandbox:latest

Check to make sure that the image has been created:

$ docker images streamlio-sandbox

Run a container based on the image

$ docker run -d \
  --name streamlio-sandbox \
  -p 9000:9000 \
  -p 8889:8889 \
  -p 6650:6650 \
  -p 8080:8080 \
  -p 8000:8000 \
  streamlio-sandbox:latest

Check that the image is running using docker ps.

Ports explanation

As you can see, the image requires several open ports. The table below explains what each port is used for.

Component Port
Heron API Server 9000
Heron UI 8889
Pulsar Broker 6650
Pulsar Admin 8080
Pulsar UI 8000

Run the producer and consumer scripts

There are two Python scripts in the sandbox that act as a Pulsar producer and consumer, respectively. You can fetch them like this:

$ wget https://raw.githubusercontent.com/streamlio/sandbox/master/producer.py
$ wget https://raw.githubusercontent.com/streamlio/sandbox/master/consumer.py

If the Docker image is currently running, start up the consumer (just make sure to wait a few seconds after you’ve started up the Docker image):

$ python consumer.py

If you get an error along the lines of Exception: Pulsar error: ConnectError, try waiting a few seconds and retrying. If that doesn’t work, run docker ps to check on the status of the running image.

Initially, no messages will be published to the topic that the consumer is listening on. This will change when you start up the producer:

$ python producer.py

Once you start up the producer, you should begin to see messages like this via the consumer:

2017-08-13 16:23:44,561 INFO : Received message '{ "word" : "am" , "count" : 11 }'
2017-08-13 16:23:44,561 INFO : Received message '{ "word" : "an" , "count" : 11 }'
2017-08-13 16:23:44,563 INFO : Received message '{ "word" : "the" , "count" : 42 }'
2017-08-13 16:23:44,565 INFO : Received message '{ "word" : "doctor" , "count" : 11 }'
2017-08-13 16:23:44,565 INFO : Received message '{ "word" : "with" , "count" : 11 }'
2017-08-13 16:23:44,577 INFO : Received message '{ "word" : "moon" , "count" : 10 }'
2017-08-13 16:23:44,578 INFO : Received message '{ "word" : "at" , "count" : 11 }'
2017-08-13 16:23:44,578 INFO : Received message '{ "word" : "snow" , "count" : 11 }'

The producer, in turn, should be producing output like this:

2017-08-13 16:23:39,156 INFO : Sending message - four score and seven years ago
2017-08-13 16:23:39,217 INFO : Sending message - i am at two with nature
2017-08-13 16:23:39,277 INFO : Sending message - i am at two with nature
2017-08-13 16:23:39,338 INFO : Sending message - four score and seven years ago
2017-08-13 16:23:39,398 INFO : Sending message - an apple a day keeps the doctor away
2017-08-13 16:23:39,461 INFO : Sending message - the cow jumped over the moon
2017-08-13 16:23:39,524 INFO : Sending message - snow white and the seven dwarfs

If your output looks something like that, then the sandbox is working! That means that you now have an end-to-end, real-time, stateful processing platform powered by Apache Pulsar (incubating), Heron, and Apache BookKeeper running on your laptop.

Examine the running topology

The Heron UI is a browser dashboard that you can use to examine numerous aspects of running topologies. While the Docker image is running, you can access the UI at http://localhost:8889. The Heron UI page for the word count topology should look like this:

Figure 2. The Heron UI
Figure 2. The Heron UI

Examine Pulsar topics

You can get insight into Pulsar topics using the Pulsar Dashboard. The sandbox uses two topics: sentences and wordcount. You can get info on those topics by navigating to http://localhost:8000/stats/namespace/sample/standalone/ns1 in your browser.

The Pulsar Dashboard updates once every minute.

You can see the input and output topics in Pulsar:

Figure 3. Pulsar Dashboard topics page
Figure 3. Pulsar Dashboard topics page

You can also drill down into the stats of the input topic queue (named sentences):

Figure 4. Pulsar Dashboard sentences topic drilldown
Figure 4. Pulsar Dashboard sentences topic drilldown

We can also take a look at the wordcount topic, which contains word count results:

Figure 5. Pulsar Dashboard word count topic drilldown
Figure 5. Pulsar Dashboard word count topic drilldown