TRY ME

Try Valo for free

We want to show you something amazing.

We'll send you a link to download a fully functional Valo copy to play with.



Great! Check your email and enjoy Valo



Apologies, we seem to be having a problem processing your input, please try again

Search API (SSR)

Query Structure

{
    "query": {
        "base": "search('tolkien')",
        "filters": [],
        "useLatestIndexGeneration": true
    }
}
field description
base optional query expression
filters optional filters
useLatestIndexGeneration true if the search should be applied on the latest index generation

Query expression

A query expression is a regular Valo expression;

tradeid == 'ABC123559'

counter > 10
counter > 2 && toFind == 'needle'
counter < 8 || toFind == 'needle'

year(fieldName) > 2014
tradeid == 'ABC123559' && year(ts) > 2015

search('needle')
search(fieldName, 'needle')

search('transaction completed') && year(ts) > 2012
search('started completed') && search('192.0.10.\*')

prefix(fieldA, 'error') && search(fieldB, 'trade*')

counter > 8 && prefix(fieldName, 'find')

The most powerful searching which can be done with the SSR is text searching. The search function translates to a native Lucene search which allows for searching on terms, wildcards and fuzzy logic.

Supported functions

Function Description
year(datetime_field) Extracts the year part of a datetime
month(datetime_field) Extracts the month part of a datetime
day(datetime_field) Extracts the month day part of a datetime
prefix(field, prefix) Matches field values that start with the given prefix
like(field, pattern) Matches field values against a given pattern (see below for syntax).
search(pattern) Returns all data matching the given pattern (see below for syntax).
search(field, pattern) Returns all data where the values for a field match a given pattern (see below for syntax).

Supported Search Patterns

Pattern Description
te?t Single character wildcard. Matches text and test.
test* Multiple character wildcard. Matches test, tests and tester.
te*t Multiple character wildcard. Matches text, test and tenet.
roam~ Fuzzy search. Matches roam, foam and roams.
test~0.8 Fuzzy search, specifying similarity between 0 and 1.

Facet

POST /ssr/:tenant/_facet

Gets taxonomy and range facets.

Example request:

POST /ssr/demo/_facet HTTP/1.1
Content-Type: application/json

{
  "uris"  : [
    "/streams/demo/infrastructure/log"
  ],
  "domain": "/domain/demo/support/windows-servers",
  "query": {
    "base": "tradeId == 'ABC123443'",
    "filters": []
  },
  "facets" : [
     { "type": "taxonomy", "dimension": "date", "topN": 10, "depth": 3 }
  ]
}
field description default
uri the streams to search across  
query the query  
domain the domain from which results will be returned optional
facets the facets to return  

Example response:

HTTP/1.1 200 OK
Content-Type: application/json

{
   "facets": [
     { "name": "date", "count": 40, "children": [
            { "value": "2015", "count": 12, "children":[
                  {"value": "June", "count": 7, "children": [
                       { "value": "2", "count": 1, "children":[] },
                       { "value": "3", "count": 3, "children":[] },
                       { "value": "5", "count": 3, "children":[] }
                    ]
                  },
                  {"value": "July", "count": 5, "children": [
                       { "value": "12", "count": 3, "children":[] },
                       { "value": "23", "count": 2, "children":[] }
                    ]
                  }
                ]
            }
        ]
     }
   ]
}
field description
facets the facets

The following facets are supported;

Taxonomy facet

The taxonomy facet returns counts for a predefined stream taxonomy.

{
  "facets": [
    { "type": "taxonomy", "dimension": "os", "topN": 10, "depth": 3, "include": [
          { "path": ["2015","7","8"] }
       ]
    }
  ]
}
field description default
dimension a reference to a stream taxonomy  
topN the top N facet values to return (by count) 5
depth specifies how deep into the taxonomy hierarchy to retrieve values 3
include.path in additional to the default values returned by topN, specifies additional paths to include in the results.  

Range facets

Range facets does not need a predefined taxonomy definition.

{
  "facets": [
    { "type": "range", "field": "memory.installed", "ranges": [
            {"label": "0GB-8GB", "from": 10, "fromInclusive": false, "to": 8000, "toInclusive": true},
            {"label": "8GB-16GB", "from": 8000, "fromInclusive": false, "to": 16000, "toInclusive": true}
        ]
    }
  ]
}
field description default
field the payload field the range facet is based on  
ranges.label the facet label from-to
ranges.from the min range value.  
ranges.fromInclusive true if from is inclusive true
ranges.to the max range value.  
ranges.toInclusive true if to is inclusive true

Histogram

Attention

Only simple time range histograms have been implemented so far.

The underlying SSR technology is capable of N dimensional histograms with both fixed and range buckets. This will be exposed further in future iterations of the API.

POST /ssr/:tenant/_timeRangeHistogram

Creates a histogram based on the time range given and the counts of data payloads stored within the range.

The results can be filtered in the same manner as _search.

The number of buckets returned in the histogram can be controlled in two ways:

  • maxBucketCount: Use the frequency that gives the closest number of buckets to the number requested
  • frequency: Use the frequency provided. This may lead to lots of buckets (eg range of 2 days and buckets of seconds)

The following values can be requested and returned for the histogram frequency:

  • YEAR
  • MONTH
  • DAY
  • HOUR
  • MINUTE
  • SECOND
  • MILLIS

If no setting is past, the frequency is determined by Valo.

The client can also pass a specific field name to use when determining the range. If no field is passed, the primary timestamp field is used. If tsField is set, data from streams that do not contain that field will not be included in the response.

Example Request (no bucket frequency):

POST /ssr/demo/_timeRangeHistogram HTTP/1.1

{
    "uris": [
        "/streams/demo/infrastructure/cpu",
        "/streams/demo/infrastructure/mem",
    ],
    "query": {
        "base": "search('tolkien')",
        "filters": [
            { "type": "taxonomy", "dimension": "Date", "drilldown": ["2015", "7", "20"] }
        ]
    },
    "range": {
        "start": "2014-08-24T15:10:00Z",
        "end": "2014-08-24T15:15:00Z"
    },
    "tsField": "sampleTime",
    "domain": "/domains/demo/support/windows-servers"
}

Example Request (maximum number of buckets):

POST /ssr/demo/_timeRangeHistogram HTTP/1.1

{
    "uris": [
        "/streams/demo/infrastructure/cpu",
        "/streams/demo/infrastructure/mem",
    ],
    "query": {
        "base": "search('tolkien')"
    },
    "range": {
        "start": "2014-08-24T15:10:00Z",
        "end": "2014-08-24T15:15:00Z"
    },
    "maxNumberOfBuckets": 200
}

Example Request (bucket frequency):

POST /ssr/demo/_timeRangeHistogram HTTP/1.1

{
    "uris": [
        "/streams/demo/infrastructure/cpu",
        "/streams/demo/infrastructure/mem",
    ],
    "query": {
        "base": "search('tolkien')"
    },
    "range": {
        "start": "2014-08-24T15:10:00Z",
        "end": "2014-08-24T15:15:00Z"
    },
    "frequency": "SECOND"
}

Example Response:

{
    "histogram": {
        "start": "2014-08-24T15:10:00Z",
        "end": "2014-08-24T15:15:00Z",
        "frequency": "MINUTE",
        "buckets": [100, 25, 67, 12, 109]
    }
}
field description  
uris the streams to query  
query the query  
range the start and end (both inclusive) for the search  
tsField the field to use to determine the range. default: primary timestamp in schema optional
domain filter results to only those originating from contributors in the given domain optional

Discover

POST /ssr/:tenant/_discover

Searches for documents that have been stored within a given time range in the SSR repository.

The search can be limited to a single domain, in which case results are returned for each stream to which contributors in the domain have contributed.

With no domain, results are returned for each stream under the tenant specified in the URI.

The following data are returned in the results:

  • A count histogram of the results across all streams that have payloads matching the search.
  • The schema for each stream matching the search.
  • Facets (values and counts) for each taxonomy defined on each stream from above.
  • The actual results matching the search, returned per stream from above.

The number of buckets can be configured as per _timeRangeHistogram. Note that the settings are passed under the histogram object.

The number of results and facets returned for each stream can be configured:

  • Facets are returned in order of count, with the highest returned first.
  • The results are returned in order of relevance, again with the most relevant returned first.

The intention is that this end-point can be used a spring-board for additional queries using the other end-points in the SSR API. See below.

Example request:

POST /ssr/demo_discover HTTP/1.1
Content-Type: application/json

{
    "query": {
        "base": "search('tradeID 30204')",
        "filters": [
            { "type": "taxonomy", "dimension": "Date", "drilldown": ["2015", "7", "20"] }
        ]
    },
    "range": {
        "start": "2014-08-24T15:10:00Z",
        "end": "2014-08-24T15:15:00Z"
    },
    "histogram": {
        "frequency": "SECONDS"
    },
    "domain": "/domain/demo/support/windows-servers",
    "facets": {
        "topN": 10
    },
    "results": {
        "topN": 10
    }
}

Top level request fields

field description  
query the query  
range the start and end (both inclusive) for the search  
histogram.frequency the frequency of buckets to use optional
histogram.maxNumberOfBuckets the maximum number of buckets to return. Frequency will be selected automatically optional
facets.topN the number of facets to return (ordered by those with the most values)  
results.topN the number of results to return (orderd by the most relevant)  
domain filter results to only those originating from contributors in the given domain optional

Example Response:

See above for the range of frequency values that can be returned.

{
    "histogram": {
        "start": "2014-08-24T15:10:00Z",
        "end": "2014-08-24T15:15:00Z",
        "frequency": "MINUTE",
        "buckets": [100, 25, 67, 12, 109]
    },
    "streams": [
        {
            "name": "/streams/demo/infrastructure/mem",
            "atLeastNMoreResults": 100,
            "schema": "schema of stream omitted for brevity",
            "facets": [
                { "name": "time", "count": 50, "children": [
                    { "value": "15", "count": 50, "children": [
                        { "value": "10", "count": 4,  "children": [] },
                        { "value": "11", "count": 6,  "children": [] },
                        { "value": "12", "count": 5,  "children": [] },
                        { "value": "13", "count": 10, "children": [] },
                        { "value": "14", "count": 5,  "children": [] },
                        { "value": "15", "count": 15, "children": [] }
                    ]}
                ]},
                { "name": "vendor", "count": 80, "children": [
                    { "value": "Corsair", "count": 20, "children": [] },
                    { "value": "Kingston", "count": 20, "children": [] },
                    { "value": "SanDisk", "count": 20, "children": [] },
                    { "value": "Toshiba", "count": 20, "children": [] }
                ]}
            ],
            "values": [
                {
                    "score": 0.76792556,
                    "data": "data payload matching schema omitted for brevity"
                },
                {
                    "score": 0.76792556,
                    "data": "data payload matching schema omitted for brevity"
                },
            ]
        }
    ]
}

Using discover results in subsequent searches

The _discover end-point is designed to be used as a springboard for subsequent operations using _search. Some examples are included below:

Requesting more search results:

The initial discover response contains topN results per stream, as well as a atLeastNMoreValues field indicating how many more results can be requested.

Requesting more results is as simple as issuing a search with a count and from.

Discover Response

{
    "streams": [
        {
            "name": "/streams/demo/infrastructure/cpu",
            "atLeastNMoreResults": 100,
            "facets": "omitted for brevity",
            "values": [
                { "score": 0.76, "data": "..." },
                { "score": 0.75, "data": "..." },
                { "score": 0.74, "data": "..." },
                { "score": 0.73, "data": "..." },
                { "score": 0.72, "data": "..." },
                { "score": 0.71, "data": "..." },
                { "score": 0.70, "data": "..." },
                { "score": 0.69, "data": "..." },
                { "score": 0.68, "data": "..." },
            ]
        }
    ]
}

Search request

{
  "uris"  : ["/streams/demo/infrastructure/cpu"],
  "query": {
    "base": "query from discover request",
    "filters": []
  },
  "count" : 10,
  "from"  : 10
}

Filtering results by facet

Similarly, the facet results can be used as a filter in _search. Simply add a group filter to the search matching the parts of the hierarchy required.

In the example below we limit the results under the cpu stream to just those from 15:10 and with vendor SanDisk.

Note that the results will differ once the filter is added. As such, we from is 0 in the request, logically retrieving the first page of the new result set. The atLeastNMoreResults from the discover response is no longer relevant.

Discover Response

{
    "streams": [
        {
            "name": "/streams/demo/infrastructure/cpu",
            "atLeastNMoreResults": 100,
            "facets": [
                { "name": "time", "count": 50, "children": [
                    { "value": "15", "count": 50, "children": [
                        { "value": "10", "count": 4,  "children": [] },
                        { "value": "11", "count": 6,  "children": [] },
                        { "value": "12", "count": 5,  "children": [] },
                        { "value": "13", "count": 10, "children": [] },
                        { "value": "14", "count": 5,  "children": [] },
                        { "value": "15", "count": 15, "children": [] }
                    ]}
                ]},
                { "name": "vendor", "count": 80, "children": [
                    { "value": "Corsair", "count": 20, "children": [] },
                    { "value": "Kingston", "count": 20, "children": [] },
                    { "value": "SanDisk", "count": 20, "children": [] },
                    { "value": "Toshiba", "count": 20, "children": [] }
                ]}
            ]
        }
    ]
}

Search request

{
    "uris"  : ["/streams/demo/infrastructure/cpu"],
    "query": {
        "base": "query from discover request",
        "filters": [
            {
                "type": "group",
                "clause": "and",
                "filters": [
                    { "type": "taxonomy", "dimension": "time", "drilldown": ["15", "10"] },
                    { "type": "taxonomy", "dimension": "vendor", "drilldown": ["SanDisk"] }
                ]
            }
        ]
    },
    "count" : 10,
    "from"  : 0
}