TRY ME

Try Valo for free

We want to show you something amazing.

We'll send you a link to download a fully functional Valo copy to play with.



Great! Check your email and enjoy Valo



Apologies, we seem to be having a problem processing your input, please try again

Logstash walkthough

Background

In this example we will use Logstash to parse an Apache HTTP Server log file and feed it into Valo for analysis.

Walkthrough

  1. Download and unpack Logstash.

    Download the latest version of Logstash, and unzip it. We will refer to the folder you unzip it to as your “Logstash folder”.

    If you are not familiar with Logstash you may want to work through the Logstash tutorial before continuing with this walkthrough.

  2. Start Valo.

    Start a single node instance of Valo as per the instructions TODO LINK DOC HERE

  3. Configure a Valo stream.

    Logstash will POST data to a Valo stream in the same way as any other REST client. Before it can do that, we need to configure the stream.

    In this example Logstash will feed Valo with log data produced by an Apache HTTP Server, so we will use /streams/demo/infrastructure/apache as the stream’s URI.

    The schema fields required depend on the content of the Logstash configuration file (as discussed in the next section).

    Using your HTTP tool of choice (e.g. Postman) create the stream with the following command:

    PUT http://localhost:8888/streams/demo/infrastructure/apache HTTP/1.1
    Content-Type: application/json
    
    {
        "schema": {
            "version": "1.0.0",
            "config": {},
            "topDef": {
                "type": "record",
                "properties": {
                    "utctimestamp": { "type": "datetime" },
                    "version": { "type": "int" },
                    "host": { "type": "string" },
                    "path": { "type": "string" },
                    "message": { "type": "string" },
                    "clientip": { "type": "string" },
                    "ident": { "type": "string" },
                    "auth": { "type": "string" },
                    "timestamp": { "type": "string" },
                    "verb": { "type": "string", "optional": "true" },
                    "request": { "type": "string", "optional": "true" },
                    "httpversion": { "type": "double", "optional": "true" },
                    "rawrequest": { "type": "string", "optional": "true" },
                    "response": { "type": "int" },
                    "bytes": { "type": "int", "optional": "true" }
                }
            }
        }
    }
    

    Next configure the stream to store data in Valo’s Semi-structured repository with the following command:

    PUT http://localhost:8888/streams/demo/infrastructure/apache/repository HTTP/1.1
    Content-Type: application/json
    
    {
        "name"   : "ssr"
    }
    

    If you are interested in why log data is more suited to the Semi-structured repository, refer to Deciding which repository to use.

  4. Prepare the sample data

    Download the following two files into a new folder, we will refer to this folder as your “sample data” folder:

    1. The sample Logstash configuration file logstash-valo-sample.conf.
    2. The sample Apache HTTP Server log file sample-apache-log.zip. This file was provided by NASA (we’re using the first 100,000 lines of “NASA_access_log_Jul95”).

    Unzip “sample-apache-log.zip” so that your “sample data” folder contains “sample-apache-log”.

    You can delete “sample-apache-log.zip” as it is no longer needed.

  5. Run Logstash.

    If you open “logstash-valo-sample.conf”, you will see that it specifies:

    1. What log files to read - in this case a log file produced by an Apache HTTP Server.
    2. How to parse the log files - in this case we are using the COMMONAPACHELOG format as described here.
    3. How to send the log data to Valo.

    To launch Logstash and have it feed data into Valo:

    1. Open “logstash-valo-sample.conf” and replace “<path_to_log_file>” with the fully qualified path to your “sample data” folder.
    2. Open a terminal window / command prompt.
    3. Navigate to your “sample data” folder in your Valo working copy.
    4. Execute the following command:
    <logstash folder>/bin/logstash --config logstash-valo-sample.conf
    
  6. Run some queries.

    It will take a little while for Logstash to push all of the log data into Valo, but you can start running queries immediately.

    As an example you can run the following to see how many log entries have been imported into Valo:

    from historical /streams/demo/infrastructure/apache
    select count() as requests, min(utctimestamp) as first, max(utctimestamp) as last
    

    To execute this query through cURL, open a terminal and,

    1 Create a session:

    curl -X POST http://localhost:8888/execution/demo/sessions
    

    Response,

    { "session": "/execution/demo/sessions/a6fe854e-9190-467b-bb40-827d842830b4" }
    

    2 Submit the query:

    Create the following file query.json

    {
      "id": "test_query",
      "body": "from historical /streams/demo/infrastructure/apache select count() as requests, min(utctimestamp) as first, max(utctimestamp) as last"
    }
    

    And submit it,

    curl -H "Content-Type: application/json" -X PUT --data @query.json http://localhost:8888/execution/demo/sessions/a6fe854e-9190-467b-bb40-827d842830b4
    

    Response,

    [
      {
        "id": "test_query",
        "dependencies": [],
        "query": {
          "state": "initialised",
          "outputs": [
            {
              "type": "OUTPUT_CHANNEL",
              "id": "af98acbb-722a-4c3d-aacb-0135c8c314eb",
              "outputUri": "/output/demo/6daccb6739b02bacda2d4c37428263d3/5264726f-e17b-4bb4-bc44-f45be8fdb44e",
              "outputType": "UNBOUNDED",
              "schema": {
                "version": "",
                "config": {
                  "key": []
                },
                "topDef": {
                  "type": "record",
                  "properties": {
                    "text": {
                      "type": "string"
                    },
                    "createdAt": {
                      "type": "datetime",
                      "annotations": [
                        "urn:itrs:default-timestamp"
                      ]
                    },
                    "user": {
                      "type": "string"
                    }
                  }
                }
              }
            }
          ]
        }
      }
    ]
    

    3 Open another terminal, and execute the following command line in order to open the output channel,

    curl -X GET http://localhost:8888/output/execution/af98acbb-722a-4c3d-aacb-0135c8c314eb
    

    Response, : Letsa’ Go!

    Don’t close this terminal because it’s waiting to receive data when the query will be executed (step 4)

    4 Come back to the first terminal and execute the query which was already submitted in the step 2,

    curl -X PUT http://localhost:8888/execution/demo/sessions/a6fe854e-9190-467b-bb40-827d842830b4/queries/test_query/_start
    

    The output query will be shown in the terminal used in the step 3.

    You can also run the following to see how many responses for each HTTP response code there are:

    from historical /streams/demo/infrastructure/apache
    group by response
    select response, count() as responseCount
    

Resetting Logstash state

Logstash uses “sincedb” to remember what data it has already processed.

If you need to reset this state you can do this on Linux or OSX with the following command:

rm ~/.sincedb_*

For more information see http://stackoverflow.com/questions/19546900/how-to-force-logstash-to-reparse-a-file