What is VALO?

Valo is a distributed computation engine for streams of data. It combines analytical functions and machine learning algorithms with a unified approach to real-time and historical queries, backed by heterogenous storage engines and an advanced execution engine, all within a single easy to deploy platform.

How can I get started?

Just click the button to download a Valo node and get started. All the APIs are ready to use out of the box, from creating streams of data to running queries.

The only pre-requisites are the Oracle JRE 8. Valo runs on Linux, Windows and OSX.

Note that the download is for evaluation purposes only. See our pricing plans and the limitations on clustering.

I've got a question for you, how can I get in touch?

Great! Sling us a message at info@valo.io and we’ll be in touch.

You can also find us at the meet-ups we support in Malaga: Malaga Scala Developers, Malaga JUG, Yes We Tech and DataBeersMLG.

Can Valo do both real-time and historical analysis?

You bet. Queries are handled by a new generation of execution engine which automatically runs the same processing in memory on real-time data streams or against the built-in storage engines. This means excellent locality, running the analysis against the data on disk and taking advantage of any indices present.

No complex interaction between multiple systems or nasty surprises after streaming terabytes of data across the network.

In most cases, moving from real-time to historical is as simple as adding “from historical” to the start of the query.

What sort of data is handled by the storage engines?

Data ingestion is simple, just an HTTP POST. We support JSON, CSV, YAML and CBOR documents. To perform historical queries, you need to decide which repository to use. We currently have two:

  • Semi-Structured Repo - Indexes document style data using
  • Lucene - Time-series Repo - Column oriented store

Selecting the repository that most closely matches the structure of your data allows Valo to execute your queries as optimally as possible. See Repositories for more information.

What algorithms are built-in? Can I add my own?

There are loads of built-in functions and algos, ranging from CountDistinct through linear statistics to online anomaly detection in time series and adaptive histograms.

The SDK to add custom functions will be available in the second half of 2016. Once added, custom functions are treated as “native”, optimally distributed and executed just like the built-ins.

How do I access the Valo API? What languages do you support?

Valo is accessible through an HTTP REST API, accessible from any language and a wealth of command line tools.

Query results are returned as an EventStream and can be consumed directly by Javascript running in a browser.

We’re putting the finishing touches to a Scala client with add Java and .NET coming this year. We also plan to integrate with Python soon.

Get in touch at info@valo.io and let us know what languages you’d like to see supported.

This thing is awesome; how can I take it to production?

Please get in touch at info@valo.io. The single node can handle a lot of data but for full scalability and resiliency you’ll need to be running a cluster of nodes. We’re looking to run trial deployments in the first half of 2016 to ensure the minimum of risk.

See cluster architecture for more information about the architecture.

How do I upload data to Valo?

If you already have a stream created, it’s as simple as an HTTP POST:

curl -X POST -H "Content-Type: application/json" -d '{"hello": "world"}'

Creating the stream is just as easy. See the Stream Reference.

We’ve also got a quick start guide for feeding Valo from Logstash.

How do I run a query?

Queries are written in our query language. It’s been designed to be familiar to users of SQL and LINQ whilst succinctly representing the operations available.

Just two HTTP requests have you covered to submit the query.

Valo is a distributed system. What's the cluster architecture?

Valo is a highly available for writes, eventually consistent (AP) system. There is no single point of failure and stream data is replicated on multiple nodes.

It is elastically scalable, allowing the addition of new nodes to handle increased demand and storage.

Clustering is undergoing extensive testing right now to check these properties hold up against production loads. Please get in touch with us for more information.