Saltstack Prometheus exporter which exposes metrics from the Salt eventbus (EventPublisher)

  • By Kevin Petremann
  • Last update: Nov 28, 2022
  • Comments: 1

status Go CI GitHub

Salt Exporter

Quickstart

Salt Exporter is a Prometheus export for Salt events. It exposes relevant metrics regarding jobs and results.

You just need to run the exporter on the same server than the Salt Master.

This project is ready to use, but is still being battle tested and metrics are subject to evolve.

Notes:

This has only be tested on Linux

If you did not setup the external_auth parameter, you need to run the exporter with the same user running the Salt Master. If it is set, any user will do the trick.

Features

Supported tags:

  • salt/job/<jid>/new
  • salt/job/<jid>/ret/<*>

It extracts and exposes:

  • the execution module, to function label
  • the states when using state.sls/state.apply/state.highstate, to state label

Exposed metrics

Example

execution modules:

# HELP salt_expected_responses_total Total number of expected minions responses
# TYPE salt_expected_responses_total counter
salt_expected_responses_total{function="cmd.run", state=""} 6
salt_expected_responses_total{function="test.ping", state=""} 6

# HELP salt_function_responses_total Total number of response per function processed
# TYPE salt_function_responses_total counter
salt_function_responses_total{function="cmd.run",state="",success="true"} 6
salt_function_responses_total{function="test.ping",state="",success="true"} 6

# HELP salt_new_job_total Total number of new job processed
# TYPE salt_new_job_total counter
salt_new_job_total{function="cmd.run",state="",success="false"} 3
salt_new_job_total{function="test.ping",state="",success="false"} 3

# HELP salt_responses_total Total number of response job processed
# TYPE salt_responses_total counter
salt_responses_total{minion="local",success="true"} 6
salt_responses_total{minion="node1",success="true"} 6

states:

salt_expected_responses_total{function="state.apply",state="highstate"} 1
salt_expected_responses_total{function="state.highstate",state="highstate"} 2
salt_expected_responses_total{function="state.sls",state="test"} 1

salt_function_responses_total{function="state.apply",state="highstate",success="true"} 1
salt_function_responses_total{function="state.highstate",state="highstate",success="true"} 2
salt_function_responses_total{function="state.sls",state="test",success="true"} 1

salt_new_job_total{function="state.apply",state="highstate",success="false"} 1
salt_new_job_total{function="state.highstate",state="highstate",success="false"} 2
salt_new_job_total{function="state.sls",state="test",success="false"} 1

salt/job/<jid>/new

It increases:

  • salt_new_job_total
  • salt_expected_responses_total by the number of target in the new job

salt/job/<jid>/ret/<*>

Usually, it will increase the salt_responses_total (per minion) and salt_function_responses_total (per function) counters.

However, if it is a feedback of a scheduled job, it increases salt_scheduled_job_return_total instead.

Why separating salt_responses_total and salt_scheduled_job_return_total

One of the goal is to be able to calculate the number of missed response, without doing the matching manually between the target of a new job and the minion responses.

This is why scheduled job are in a dedicated metric. Once scheduled, a job is only executing autonomously on Minion side, hence there is no new job request for each scheduled job response. Said differently, if there was no differences made, we would end up with more responses than expected responses.

Estimate missing responses

Missing responses can be simply calculated by doing the difference between salt_expected_responses_total and salt_responses_total.

It can be joined on function label to have details per executed module.

Upcoming features

  • details per state module when state.single is used
  • metric regarding IPC connectivity
  • support the runners
  • be able to enable/disable metrics
  • webserver: listen address and port configurable
  • webserver: basic auth/TLS

Estimated performance

According some simple benchmark, for a simple event, it takes:

  • ~60us for parsing
  • ~9us for converting to Prometheus metric

So with a security margin, we can estimate an event should take 100us maximum.

Roughly, the exporter should be able to handle about 10kQps.

For a base of 1000 Salt minions, it should be able to sustain 10 jobs per minion per second, which is a quite high for Salt.

If needed, the exporter can easily scale more up by doing the parsing in dedicated coroutines, the limiting factor being the Prometheus metric update (~9us).

Download

salt-exporter.zip

Comments(1)

  • 1

    Add support of tag with 20 characters or less

    According the Salt docstring here:

    In the new style, when the tag is longer than 20 characters, an end of tag
    string is appended to the tag given by the string constant TAGEND, that is, two
    line feeds '\n\n'.  When the tag is less than 20 characters then the tag is
    padded with pipes "|" out to 20 characters as before.  When the tag is exactly
    20 characters no padded is done.
    

    Currently the exporter only looks for the '\n\n' delimiter.

    Tag with 20 characters or less support must be added.