TRY ME

Try Valo for free

We want to show you something amazing.

We'll send you a link to download a fully functional Valo copy to play with.



Great! Check your email and enjoy Valo



Apologies, we seem to be having a problem processing your input, please try again

Distribution Policy

A distribution policy specifies how stream data will be distributed amongst cluster nodes.

Distribution policies only apply to streams whose data is is stored for historical analysis (as specified via a repository mapping).

If no distribution policy is specified the following policy will be used:

{
  "replicas": 3,
  "ring_size": 128,
  "hash_payload": { "bytes": 128 }
}

There are two fields common to all distribution policies:

replicas specifies the number of cluster nodes that will store each stream event. Larger values require more storage (when considering the cluster as a whole), but provide better durability and availability. This field can be omitted if you want to use the default value of 3.

ring_size specifies the number of segments into which a stream’s data will be bucketed for distribution. This value should be a small multiple of the maximum number of cluster nodes you expect to have. This field can be omitted if you want to use the default value of 128.

There are currently three distribution methods - as discussed in the following sections.

Distribute by hashing event content

This method hashes the first n bytes of an event posted to a stream (after the event has been normalised into a compact internal data format) and uses this value to decide which cluster nodes will store the event.

Events will be distributed amongst cluster nodes in a fairly uniform (if somewhat arbitrary) manner.

Use this method if your data has no time field; or if the majority of the analysis you will perform is not time based.

An example policy document looks like this:

{
  "replicas": 3,
  "ring_size": 128,
  "hash_payload": { "bytes": 128 }
}

bytes specifies the number of bytes that will be used to compute the hash. This field can be omitted if you want to use the default value of 128.

Distribute by time

This method takes the value of a datetime, date, or time field, reduces its granularity (to group events that occurred at a similar time), then uses the result to decide which cluster nodes will store the event.

Events will be distributed amongst cluster nodes in time buckets, so distribution can be skewed.

Use this method if the majority of the analysis you will perform is time based.

An example policy document looks like this:

{
  "replicas": 3,
  "ring_size": 128,
  "time_range": {
    "field": "sampleTime",
    "size": 30,
    "unit": "minutes"
  }
}

field specifies the field to be used to distribute the event.

size specifies the number of units by which to bucket the event.

unit specifies the unit by which to bucket the event. Valid units include:

  • “seconds”
  • “minutes”
  • “hours”
  • “days”
  • “weeks”

Distribute by stream URI

This method hashes the entire stream URI and uses this value to decide which cluster nodes will store the event.

Events will be distributed to at most replicas cluster nodes; each of which will contain all stream events.

Use this method if you want to restrict the number of cluster nodes that may be involved in storing and analysing a stream’s events.

An example policy document looks like this:

{
  "replicas": 3,
  "ring_size": 128,
  "stream_name": {}
}