Valo is designed to handle the temporary failure of a node, including network reachability issues and machine crashes. Each node in the cluster is aware of the reachability of the other nodes, maintained by an internal heartbeating and failure detector system. If a node becomes unreachable, other nodes will act as stand-ins for API operations. Once the node becomes reachable again, data will move back to the correct owner.
All this happens “under-the-hood”, meaning API clients do not have to consider it. The most important thing to consider is the number of replicas configured for a stream. This indicates how many nodes can be unreachable in the cluster whilst queries for that stream still succeed.
Writes to Valo will succeed under all failure conditions, however if the number of nodes unreachable is greater than or equal to the number of replicas configured, the data written will not be seen in queries for a short time after those nodes recover, as Valo internally hands-off the data back to the correct owner.
To see data moving through the hand-off process, use the monitoring API at