The ELK Stack with Beats: Creating Logstash Filters

On my test ELK server, I have a script running every five minutes on a cron job:

echo "$(date +%Y-%m-%d.%H:%M:%S) $(ps aux | grep 'kibana.*node' | grep -v grep | awk '{ print $4 }' | tr -d ' ')" >> "/var/log/node.percentmem.log"

This produces output that looks like this:

2016-03-17.12:55:01 9.3
2016-03-17.13:00:01 9.6
2016-03-17.13:05:01 9.8
2016-03-17.13:10:01 10.1
2016-03-17.13:15:01 10.4
2016-03-17.13:20:01 10.4
2016-03-17.13:25:01 10.7
2016-03-17.13:30:02 10.9
2016-03-17.13:35:01 11.2
2016-03-17.13:40:01 11.5
2016-03-17.13:45:01 11.8
2016-03-17.13:50:01 12.0
2016-03-17.13:55:01 12.3
2016-03-17.14:00:01 12.5
2016-03-17.14:05:01 12.8
2016-03-17.14:10:01 13.1
2016-03-17.14:15:01 13.5
2016-03-17.14:20:01 4.2

This is of interest because Kibana uses Node, and Node's memory usage is insane. To import this with logstash I created the file /etc/logstash/conf.d/03-local-nodemem.conf:

input {
    file {
        path => '/var/log/node.percentmem.log'
        type => "nodemem" # not an existing type, going to build our own filter
    }
}

This tells logstash to pull in this newly created log, but for it to be useful we have to create a filter to interpret the data. I put this in /etc/logstash/conf.d/13-nodemem-filter.conf:

filter {
  if [type] == "nodemem" {
    grok {
      match => { "message" => "%{YEAR:year}-%{MONTH:month-%{MONTHDAY:monthday}.%{TIME:time} %{NUMBER:percentmem}" }
    }
  }
}

To begin creating filters like this, you'll really want to spend a bit of time reading either /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-2.0.2/patterns/grok-patterns OR https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns which lists the default pattern types that logstash understands.

The filter I created above doesn't work. Finding that out is hard enough: finding out why not is even trickier. The first error I managed to spot on my own: I'm missing a closing brace on "MONTH:month". For the second, I needed some external assistance: I went to GrokConstructor , where you can input some of the log you want to grok, and the pattern (use only the stuff inside quotes after "message", so "%{YEAR: ... ...:percentmem}" WITHOUT THE QUOTES, which will break it). This seems to considerably superior to GrokDebug (listed as comparable in the logstash documentation) because it shows partial matches and makes attempts on multiple lines - GrokDebug succeeds or fails, which is much harder to debug, and only works on the first supplied line. So with GrokConstructor, I saw that it was breaking on the "MONTH" which should in fact be "MONTHNUM" because "MONTH" is written out names, and "MONTHNUM" is the two digit number we need. Here's the correct filter:

filter {
  if [type] == "nodemem" {
    grok {
      match => { "message" => "%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:monthday}.%{TIME:time} %{NUMBER:percentmem}" }
    }
  }
}

At that point, the log is being parsed correctly ... but - despite the fact that you've clearly processed it with the name "NUMBER," it is not in fact a number. It's still a string. And this is important if you want to get Kibana to graph the value: we have to convince Kibana it's a numerical value that it can graph. So replace "%{NUMBER:percentmem}" with "%{NUMBER:percentmem:float}". This is probably still an imperfect solution, but it's the best I've managed so far.