The ELK Stack with Beats: Feeding Logstash with Beats (Insecure - so far)

Read the first item in this Table of Contents if you haven't been here before.

Table of Contents


Now we've got a rudimentary working ELK stack, but the promise of ELK is in analyzing and comparing data from multiple machines. And for that, we need a way of moving data (usually logs) from their servers to the ELK machine:

The Beats are open source data shippers that you install as agents on your servers to send different types of operational data to Elasticsearch. Beats can send data directly to Elasticsearch or send it to Elasticsearch via Logstash, which you can use to enrich or archive the data.

(source: https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html)

I'm not interested in Topbeat or Packetbeat right now (you can probably guess what they ship), just moving log files:

Filebeat is a log data shipper initially based on the Logstash-Forwarder source code. Installed as an agent on your servers, Filebeat monitors the log directories or specific log files, tails the files, and forwards them either to Logstash for parsing or directly to Elasticsearch for indexing.

It's worth noting that like most things Linux, the default transport is unencrypted, so after doing basic tests we'll be trying to set up TLS encryption on beats.

We'd better check that logstash is ready to deal with beats input:

# cd /opt/logstash/bin
# ./plugin list
... [ very long list ]
# ./plugin list | grep beats
logstash-input-beats

That's the plugin we're looking for. You can include the --verbose flag to ./plugin to see version numbers. The flag, it should be noted, is for list not ./plugin itself, and so comes after list. If you don't see the logstash-input-beats plugin, the command is:

# cd /opt/logstash/bin
# ./plugin install logstash-input-beats

You may want to run this anyway to get the latest version of the plugin: the Debian package came with one, but there was a more recent one available (I'm now at 2.1.3). I had a LOT of trouble getting this setup working, and things seemed to work better and more easily after this upgrade. It's possible the two things are unrelated as I don't have concrete evidence that this upgrade changed anything.

We need to prepare logstash to listen for beats connections. I added this to /etc/logstash/conf.d/beats.conf, as it seems logstash reads the files in that folder on startup (if I sound surprisingly skeptical, it's because this has been a long and painful process) and having each input type separate appears to ease configuration.

# filename: /etc/logstash/conf.d/beats.conf
# from https://www.elastic.co/guide/en/beats/libbeat/1.1/logstash-installation.html
input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
  stdout { codec => rubydebug }
}

So logstash will listen on port 5044 for incoming beats input. Make sure you punch a hole through your firewall for that. Note that I've added a stdout output so it'll write to /var/log/logstash/logstash.stdout while I'm testing. This could be incredibly verbose and thus a very bad idea in the long term, but for testing it's proven to be an essential tool (really: configuring this setup was several days of back-and-forth between the configs on my machines, elastic.co's documentation, Google, and stackoverflow/serverfault - you'll want that log).

Having done that, (re)start logstash. systemctl restart logstash works if it's already running.

On the machine that's to run filebeat (I've started a new virtual machine called "beatbox" - again with Apache installed as a data source - you may handle this differently), add this line to /etc/apt/sources.list:

deb https://packages.elastic.co/beats/apt stable main

As with previous elastic.co products, we're going to need their Apt Key, and then proceed with a normal installation:

# wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
# apt-get update
E: The method driver /usr/lib/apt/methods/https could not be found.
N: Is the package apt-transport-https installed?

Huh, never seen that before. Install the requested package since elastic.co has (again, with their charmingly quirky inconsistency) chosen to make this one repo https: even though all their others are http: ...

# apt-get install apt-transport-https
...
# apt-get update
...
# apt-get install filebeat

Positively minuscule at 14M! (At least compared to other elastic.co products.) And it doesn't require Java or (J)Ruby?

Continuing their amazing history of inconsistency, this package's binary is where it should be, which is to say /usr/bin/filebeat. This makes it their first product that's actually on the PATH. The config file is also in a good location, /etc/filebeat/filebeat.yml. This file is 400 lines long, but if you remove all the comments and empty lines, it boils down to this:

filebeat:
  prospectors:
    -
      paths:
        - /var/log/*.log
      input_type: log
  registry_file: /var/lib/filebeat/registry
output:
  elasticsearch:
    hosts: ["localhost:9200"]
shipper:
logging:
  files:

We need to change this a bit:

filebeat:
  prospectors:
    -
      paths:
        - /var/log/*.log
        - /var/log/apache2/*.log # NEW!
      input_type: log
  registry_file: /var/lib/filebeat/registry
output:
  logstash: # not "elasticsearch"!
    hosts: ["elktest:5044"] # change to your hostname, note change of port
shipper:
logging:
  files:

Note the addition of the apache2/ folder. This is because filebeat's "prospector" does NOT recurse directories, so you have to tell it to look in all folders you're interested in separately.

Note that the "files:" section under "logging:" is empty. This should theoretically mean it logs to /var/log/syslog (again, their stupid inconsistency: all their other programs log to /var/log/<programname>/). I've tried setting the logging through the config file (examples are included in their massive example config), and I've tried it without settings. I don't think I've ever seen any output: I don't know if this is because filebeat is an exceptionally "quiet" program, or I've never caused it to fail, or because its logging is failing completely.

To test the configuration file:

# /usr/bin/filebeat -c /etc/filebeat/filebeat.yml -configtest

Because consistency is overrated, this exits silently on success (their other products say something like "Config OK"). This is more unixy, but less consistent. To confirm this, I put some garbage at the beginning of the config and got:

Loading config file error: YAML config parsing failed on /etc/filebeat/filebeat.yml: yaml: line 3: did not find expected <document start>. Exiting.

An examination of the contents of the package (dpkg -L filebeat | less) shows that they've included a systemd service file, so:

systemctl enable filebeat

On the ELK Stack machine, I got this in the /var/log/logstash/logstash.log file (multiple entries, presumably for each time a query was made on the filebeat Apache):

{:timestamp=>"2016-03-04T02:24:31.784000-0500", :message=>"Beats Input: Remote connection closed", :peer=>"192.168.168.169:38435", :exception=>#<Lumberjack::Beats::Connection::ConnectionClosed: Lumberjack::Beats::Connection::ConnectionClosed wrapping: Lumberjack::Beats::Parser::UnsupportedProtocol, unsupported protocol 72>, :level=>:warn}

This may have been because I was initially using a filebeat.yml output stanza that looked like this:

output:
  elasticsearch:
    hosts: ["elktest:5044"]

Right host, right port, wrong output: it should say "logstash:" not "elasticsearch:"

Next error:

{:timestamp=>"2016-03-07T16:29:05.665000-0500", :message=>"Beats Input: Remote connection closed", :peer=>"192.168.168.169:38501", :exception=>#<Lumberjack::Beats::Connection::ConnectionClosed: Lumberjack::Beats::Connection::ConnectionClosed wrapping: EOFError, End of file reached>, :level=>:warn}

At this point I actually did the logstash plugin upgrade seen above, and after restarting filebeat on one machine and logstash on the ELK host, I seem to have an almost working system. The reason for the "almost" is this:

{
       "message" => "Mar  8 11:17:01 beatbox CRON[893]: pam_unix(cron:session): session opened for user root by (uid=0)",
      "@version" => "1",
    "@timestamp" => "2016-03-08T16:17:10.098Z",
          "beat" => {
        "hostname" => "beatbox",
            "name" => "beatbox"
    },
         "count" => 1,
        "fields" => nil,
    "input_type" => "log",
        "offset" => 4240,
        "source" => "/var/log/auth.log",
          "type" => "log",
          "host" => "beatbox",
          "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "_grokparsefailure"
    ]
}

This is "authlog" not Apache - and no Apache was arriving. Note the "_grokparsefailure", which hardly seems like a good thing. I know this because I insisted on doing dual output during debugging: this is from /var/log/logstash/logstash.stdout which is populated because I'm sending output to elasticsearch AND stdout. The good news is that logstash is receiving data from filebeat! This is also the point at which I realized that filebeat's "prospector" doesn't recurse and added the - /var/log/apache2/*.log line to filebeat.yml, which fixed that problem (and Apache's logs are "grokked" correctly).

Eventually I concluded I wasn't overly concerned about authlog being read properly - I most wanted Apache logs. I can either worry about grokking them later or simply drop them.

I thought this would be the last blog post in this series, but this is already quite large so I've decided that the process of securing Beats will be another, separate entry.


Continue to The ELK Stack with Beats: Securing the Beats-to-Logstash Connection, the next article in this series.