TRY ME

Try Valo for free

We want to show you something amazing.

We'll send you a link to download a fully functional Valo copy to play with.



Great! Check your email and enjoy Valo



Apologies, we seem to be having a problem processing your input, please try again

Valo Export Tool

Note

The export tool is currently unofficial and has not undergone extensive testing at this stage.

Background

A simple export tool exists to help in the following scenarios:

  • You want to export one or more Valo streams in a specific format
  • You want to re-import data (either into a new Valo instance or after a breaking change)
  • You want to create a standardised data set to share with others!
  • You change cluster distribution policy and want to replay data into the cluster

Note on Breaking Changes

Please note we do not want to make breaking changes, however in the early versions it may be necessary to facilitate more optimal storage formats, indexing changes or to fix critical bugs. Eventually Valo will be self-updating, which will largely mitigate the need for this tool. If you have any questions or concerns, please contact us.

Using the tool

The Valo Data Tool is usable from the command line.

Exporting a stream can be as simple as:

valo-data-tool export /streams/demo/infrastructure/cpu

This invokes the ‘export’ command which exports the cpu stream in the default export format ‘json’

Alternatively:

valo-data-tool export /streams/demo/infrastructure/cpu json

Explicitly sets the format to json. It can be any Valo supported data format.

The export format csv only supports flat schemas, i.e it does not support hierarchical data. Use a different data format, such as json for hierarchical data. For all supported formats see data formats.

Global Arguments

The global arguments control default behaviours and include:

  • -host tells the exporter the host of your valo instance (default ‘localhost’)
  • -port tells the exporter the port of your valo instance (default 8888)
  • -format sets the default format (default is json)
  • -path sets the default output path (default is the current path ‘.’)

For example:

valo-data-tool -host myhost -port 8001 -path ./myexports/ export /streams/demo/infrastructure/cpu json

This example changes the default host, port and path for the export command that follows.

Commands

The data tool utilises a command interface, allowing it to host multiple data related tasks. The available commands can be listed using

valo-data-tool show

Which produces output describing the available commands:

10:49:01.560 INFO  [ShowCommand                               ] -  Commands:
10:49:01.560 INFO  [ShowCommand                               ] -   - export: Exports a specific valo stream in a specific format
10:49:01.560 INFO  [ShowCommand                               ] -   - show: lists all commands registered in the data tool

The main ‘useful’ command currently is ‘export’ (there will be others).

The Export Command

The export command extracts an entire stream from valo into a nice standardised representation, including re-import scripts.

As with the above examples

valo-data-tool export /streams/demo/infrastructure/cpu json

Exports the cpu stream in json format. The output file will be named demo.infrastructure.cpu-json.tar.gz

In addition an md5 file will be created for the purposes of checking file integrity later.

to override the default path use:

valo-data-tool export /streams/demo/infrastructure/cpu json -path newPath

If you don’t want to export data but just the schema, mappings, contributors, taxonomies etc use the -nodata option. Eg:

valo-data-tool export /streams/demo/infrastructure/cpu json -nodata

The output file is compressed to a tar.gzip file.

Within this file you will find the following:

  • import.sh - A script used to re-import data
  • manifest.json -A manifest file containing describing all files generated during the export and their individual m5d hashes
  • repo-mapping.json -Repository mapping as json for the selected stream
  • schema.json -Schema definition file for the stream
  • taxonomy.json -Taxonomy definitions - optional
  • tags.json -Tags definitions - optional

In addition you will find a data directory containing data files in the chosen format. Each file contains 100,000 rows/documents. So for 1M rows you will see 10 data files. Data will not be included if the -nodata option is passed to the command.

If contributors are present on the stream, you will see a contributors directory containing all definitions for contributor types and contributor instances. Please note that contributors are not stream specific so exporting 2 or more streams may result in duplicate scripts.

To reimport just run the import.sh script after decompressing (ensure Valo is running!). You may change the host/port names manually in this file if you need to import to a different Valo instance to the one used for the export.