SCALABLE AND RESILIENT BIG DATA ANALYTICS

Combining analytical functions and machine learning with an advanced execution engine and big data storage

It’s fast, seriously fast

feature1
Valo’s unique architecture and stream processing ability mean this system truly is real-time.

Machine learning algorithms

feature2
Bring your own algorithms or use in-built analytical functions to perform complex analysis.

Big data storage

feature3
Take advantage of the immutable and append only data model to optimally index and retrieve time series and semi-structured data.

Highly available and scalable

feature4
All nodes have their own computation and storage ability making Valo scalable and resilient, while maintaining high throughput and low latency.

All of your data and analytics in one place

feature5
Multiple sources and types of data are easily ingested and query results can be used to create new streams of data to query against.

Open API

feature6
Easy to integrate with your existing tools and architectures for both input and output of your data and analytics.

ARCHITECTURE

Data organisation

Valo treats all data as streams - immutable append - only sequences of events. Immutability ensures data integrity and a comprehensive audit trail.

Valo also enables users to assign metadata to ‘contributors’ (publishers of data) and intelligently organise contributors into ‘domains’, which are logical groupings of streams.

CONTRIBUTORS

Decentralised cluster

Valo simplifies the data analytics stack into one decentralised cluster, which is easy to monitor. The cluster scales to any volume of data whilst maintaining high throughput, low latency and resiliency.

With a symmetric masterless architecture, every node can perform the same tasks, with no single point of failure.

Efficient query execution

Valo processes queries as part of an analytical pipeline. Because Valo knows where and how data is stored, query execution takes advantage of data locality and minimises network transfers

All nodes are equal

VALO'S DECENTRALISED CLUSTER

The clustered node architecture makes Valo resilient and elastic, meaning it can be easily scaled up and back as required. Simply increase the number of nodes to increase the number of queries, streams and contributors processed.

Scalable

The clustered node architecture makes Valo resilient and elastic, meaning it can be easily scaled up and back as required. Simply increase the number of nodes to increase the number of queries, streams and contributors processed.

Scalable

Every node is equal in its ability to store, process streams and analyse data. This means less data movement is required, making computations faster and providing higher performance for distributed computing.

Capable

Valo is an AP system, prioritising availability and partition tolerance. If a node goes down, the failed part of the system is partitioned off and Valo continues to process data successfully across the remaining system.

Available

SEE VALO IN ACTION

WATCH VIDEO

KEEP UP TO DATE

Connect with us and receive the latest info on Valo
Thanks for registering to our newsletter!
Apologies, we seem to be having a problem processing your input, please try again

DEVELOPERS

For evaluation or application development only. Single Node with no clustering

TRY ME

PRODUCTION

For all production usage, including maintenance and support.

CONTACT US
VERSION 1.5.1 May 17, 2017

Collectors

  • Collectors are now working with Valo clusters and there are no more data delays occurring due to null states.

 

Contributors

  • Contributor names have stopped causing parser errors.

 

Errors

  • The VendorPayloadMalformed error no longer requires an id field.
  • DirectedParser is now throwing the correct error messages for data types.
  • Sigar access violation no longer occurring on Windows 10.
  • No more unhandled timeouts and NullPointerExceptions coming from an issue at the AkkaActorMonitoringView occur in the Monitoring API.
  • The code now handles the exceptions thrown by unsupported platforms in Sigar.
  • Normalisation errors are no longer logged.
  • Akka warning no longer logged at startup when a TSR actor is first created.

 

Query

  • Very large queries should no longer stall due to lack of memory or GC issues.
  • Queries were occasionally generating NullPointer and VBOR exceptions. The problem was arising when running slow queries and has now been corrected.
  • Queries with order by now work on a cluster as well as on a single node.
  • Historical queries with joins are no longer causing Valo to crash.
  • Query errors previously reported as unknown are now showing detailed information about the errors.
  • Online Algorithm initialisation arguments are now case-insensitive in queries.
  • Time Window queries are now working in clusters. Bulk messages sent by the node registry are now handled correctly and delivered to the target node. Stale nodes are no longer halting the execution of other nodes.
  • Window JoinActor is now properly handling CloseWindowRequest messages and there are no more reported errors.
  • Queries are now returning correct results on a group by clause whenever results are expected.
  • Queries with count distinct are no longer crashing.
  • ifNull function is now available for string values.
  • Monitoring API, monitoring/system/node/runtime, now shows correct CPU values.
  • The number of file handles used is now shown correctly when querying the monitoring runtime.
  • Unusual characters no longer prevent a query from parsing correctly and any error messages coming from queries containing incorrect characters are now more meaningful.
  • JVM no longer crashes when Valo performs joins on large sets of historic data.
  • There is no longer any problem running historical queries with parameters.
  • Queries with an into clause are now parsing correctly.
  • Basic time projection queries are now running without errors.
  • Queries with an order by clause are now returning results when followed by additional query clauses.
  • There are no more issues with queries on time windows that use nested fields.
  • Nulls are now supported in scalar functions.

 

Schema

  • geoPoint, geoRectangle, and geoCircle data types now accept latitude and longitude parameters in the order lat / long
  • It is now possible to submit a payload with a value larger than Java double or long when the schema type is string
  • EncodedVectorClock is no longer failing silently.

 

Search API

  • Logging now added to the SSR indexer.

 

Streams

  • Stream actor names now have a UUID which ensures uniqueness and eliminates duplication errors.
  • Stream actor names no longer add invalid characters.
  • Stream and tagging error counters now update whenever an error occurs.

 

Retention Policies

  • Now returns default policy documents via GET instead of 404
  • Retention policies may now be created for nested fields.

 

Repository

  • There is now no mismatch between IO partition file sizes and internal partition size in TSR.
  • Array fields published in Valo are now persisted in SSR.

 

Performance

  • We have removed trait implementations of the UTF8ByteSequences that were causing performance issues.
  • We have made performance improvements to the SSR/EE which eliminate the problem of queries running extremely slowly on small datasets.

 

Upgrade (Lucene)

  • Valo now uses Lucene version 6.3. This fixes issues with IPv6 hostname data.

DOWNLOAD
VERSION 1.3 Nov 04, 2016

We are pleased to announce that Valo version 1.3 is now available for download. This release includes; enhancements to better support ingestion of large amounts of data from external sources and large CSV data sets, further optimisation of disk and memory usage for better data organisation and storage, as well as the ability to perform joins on queries that contain aggregations. We’ve also taken on board feedback from users to fix those pesky bugs.

 

Transport improvements

Improvements have been made to the way Valo ingests data from external sources, such as ITRS Geneos.  Higher rates of data ingestion will no longer cause poll messages to stack up, eventually causing an out of memory exception.

 

Symbol type

A new subset of the string type, Symbol has been added.  Symbols are atomic case sensitive strings representing an identifier (such as an error or status code).  Unlike regular strings, symbols are never tokenised, but are indexed as-is in their entirety.  This dramatically reduces the storage requirements for these texts, but means they cannot be partially searched for; thus an “InvalidParameter” symbol would not be found by a search for “Invalid”.

 

Symbols are defined by setting the type-params of the text type.  E.g.:

 


{
    “type”: “string”,
    “type-params”: {
        “sub-type”: “symbol”
    }
}

 

Asian language analysers

Streams can now be configured with different language analysers and will be indexed accordingly.  This will allow character-based languages such as Chinese to be searched correctly.

 

Improved messaging

Information and error messages have been significantly improved to more easily resolve issues such as invalid queries and references.

 

Other highlights

  • Disk and memory usage has been further improved
  • Joins can now be performed on queries that contain aggregations
  • Large CSV data sets can now be written without causing an out of memory exception
  • The execution engine output now contains the node address so that it can be referred to outside a firewall
  • Time-series repository write failures (such as when the disk runs out of space) are now rolled back to the last known good state
  • Temporary network failures will not cause nodes to be marked as permanently unavailable
  • Boolean expressions are now correctly aliased, allowing queries like from /streams/demo/infrastructure/cpu select user > 10
  • Distribution policy documents with numbers passed as strings will be interpreted correctly

We’re always keen to hear about your experiences using Valo and any suggestions on new features you would like to be considered for future releases, so please get in touch.

VERSION 1.2 Jul 20, 2016

Valo 1.2 is now released.  This release is packed full of new features, primarily around scalability, resilience and improvements to the back end infrastructure of Valo, plus a multitude of fixes in all areas of the application.

 

Command line interface improvements

The Valo executable has been renamed from valo-cluster-app to simply valo. The command line interface has also been significantly improved, particularly when managing larger clusters.  

Full documentation running Valo and the relevant command line arguments can be found here.

 

Full clustering support

Previously we were recommending single node clusters for testing purposes.  This restriction is now lifted, and Valo fully supports multi-node clusters.  For best performance we recommend keeping one node per physical box though there is no restriction on running multiple nodes on a single machine for testing.

 

Dynamic scaling

Nodes can now be added and removed from a cluster dynamically using the new command line interface.  This will not affect the flow of data into the Valo cluster and the repositories will be rebalanced automatically.  Provided sufficient data redundancy has been configured, this will not result in any data loss.

 

Data atomicity

Writes in Valo are now guaranteed atomically, which means that events such as power outage or hardware failure will not result in data corruption or loss.

 

Extended character support

Character support for streams, fields and custom functions has been extended.  These are now case-sensitive and support unicode characters, with the following restrictions:

  • Fields cannot start with __ (2 underscores) These are considered ‘private’
  • Cannot be an expressions constant: (true, false) in any case
  • Cannot be an operator: +, -, *, /, %, ||, &&, !, >=, <=, !=, ==, >, <, NOT, and, or
  • Cannot contain any operator symbol: +, -, *, /, %, ||, &&, !, >=, <=, !=, ==, >, <
  • All characters must be valid java identifier characters, where Character.isJavaIdentifierPart is true

 

Breaking changes

Please note that due to the scale and scope of the new features in 1.2, there have been a number of breaking changes to the repository.  When performing an upgrade, first remove the old version completely, using the data tool to back up the repositories if necessary, then install Valo 1.2.  Any backed up data can then be reimported using the data tool.  In-place upgrades will not work between 1.2 and earlier versions.

The configuration file, application.conf, has also been improved significantly with new parameters and options.  These are documented inline.

 

Other highlights

  • Disk and memory usage has been significantly improved
  • Query nodes provide better error handling so that malformed data will not cause a query to abort
  • The standard query operators (such as +, -, ==, and !=) now handle null and missing values.
  • Different data types such as int, shorts and strings are now automatically converted when required for certain query functions
  • The output of certain streaming algorithms such as TopK can now be chained using additional selects
  • The TopK algorithm now consumes significantly less memory
  • TopK can now be performed against historical data
  • Historical SSR queries will now execute significantly faster
  • Historical SSR queries will perform correctly when grouped by a contributor
  • Subtracting datetimes is now performed in the correct order
  • Dates and datetimes can be compared correctly
  • Additional information such as OS and JVM versions are now available in the application log file
  • Unsorted SSR data is now handled correctly when executed in a time window
  • Daemonised queries are now persisted when Valo is restarted
  • All REST requests can be logged to a file if the parameter valo.frontend.debug.dump-all-requests is set in application.conf
  • CSVs with empty columns will no longer cause a memory overflow
  • POSTing to a stream that does not exist now returns 404
  • When starting a node, front-end services are initialised last, after any data transfers have been completed
  • Large log files will now be rolled over into the correct folder
  • Large payloads no longer cause extremely slow SSR queries
  • The SSR now only loads the stream schema when the local node is reachable (e.g. has joined the cluster)
  • Unreachable nodes no longer cause certain queries to time out
  • Port collisions are now handled gracefully
  • Information and debug messages are no longer logged by default
  • Extremely large data sets no longer cause an out of memory exception under certain circumstances Sorting is now done using a disk-based buffer, avoiding an out of memory exception on large data sets