Welcome to the StackLight Infrastructure Alerting plugin for Fuel documentation!¶
Overview¶
Introduction¶
The StackLight Infrastructure Alerting plugin is used to install and configure Nagios, which provides the alerting and escalation functionality of the so-called Logging, Monitoring, and Alerting Toolchain of Mirantis OpenStack.
Nagios is a key component of the LMA Toolchain project of Mirantis OpenStack, as shown in the figure below.

Requirements¶
The StackLight Infrastructure Alerting plugin 0.10.0 has the following requirements:
Requirement | Version/Comment |
---|---|
Disk space | The plugin’s specification requires provisioning at least 15 GB of disk space for the system, 10 GB for the logs, and 20 GB for Nagios. Therefore, the installation of the plugin will fail if there is less than 45 GB of disk space available on the node. |
Hardware configuration | The hardware configuration (RAM, CPU, disk) required by this plugin depends on the size of your cloud environment and other parameters like the retention period of the data. A typical setup would at least require a quad-core server with 8 GB of RAM and fast disks (ideally, SSDs). |
Mirantis OpenStack | 8.0, 9.0 |
The StackLight Collector Plugin | 0.10 |
The StackLight InfluxDB Grafana Plugin | 0.10 This is optional and only needed if you want to create alarms in Nagios for time-series stored in InfluxDB. |
Limitations¶
The StackLight Infrastructure Alerting plugin 0.10.0 has the following limitation:
- If Nagios is installed on several nodes for high availability, the alerts history will be lost in case of a server failover.
Release notes¶
Version 0.10.1¶
Version 0.10.0¶
The StackLight Infrastructure Alerting plugin 0.10.0 contains the following updates:
- Added support for LDAP(S) authentication to access the Nagios web UI. The nagiosadmin user is still created statically and is the only user who has the admin privileges by default.
- Added Support for TLS encryption to access the Nagios web UI. A PEM file (obtained by concatenating the SSL certificates with the private key) must be provided in the settings of the plugin to configure the TLS termination.
- Bug fixes:
- Fixed the issue with Apache that could not handle the passive checks workload for large deployments. See #1552772.
Version 0.8.0¶
The initial release of the plugin.
Licenses¶
Third-party components¶
Name | Project website | License |
---|---|---|
Nagios | https://www.nagios.org/ | GPLv2 |
Apache HTTP server | http://httpd.apache.org | Apache v2 |
Puppet modules¶
Name | Project website | License |
---|---|---|
puppetlabs-apache | https://github.com/puppetlabs/puppetlabs-apache | Apache v2 |
puppetlabs-concat | https://github.com/puppetlabs/puppetlabs-concat | Apache v2 |
puppetlabs-stdlib | https://github.com/puppetlabs/puppetlabs-stdlib | Apache v2 |
leinaddm-htpasswd | https://github.com/leinaddm/puppet-htpasswd | Apache v2 |
References¶
- The StackLight Infrastructure Alerting Plugin project at GitHub
- The StackLight Collector Plugin project at GitHub
- The StackLight InfluxDB-Grafana Plugin project at GitHub
- The official Nagios documentation
Installing and configuring StackLight Infrastructure Alerting plugin for Fuel¶
Introduction¶
You can install the StackLight Infrastructure Alerting plugin using one of the following options:
- Install using the RPM file
- Install from source
The following is a list of software components installed by the StackLight Infrastructure Alerting plugin:
Component | Version |
---|---|
Nagios | v3.5.1 for Ubuntu (64-bit) |
Apache | Version coming with the Ubuntu distribution |
Install using the RPM file¶
To install the StackLight Infrastructure Alerting Plugin using the RPM file of the Fuel plugins catalog:
Go to the Fuel Plugins Catalog.
From the Filter drop-down menu, select the Mirantis OpenStack version you are using and the Monitoring category.
Download the RPM file.
Copy the RPM file to the Fuel Master node:
[root@home ~]# scp lma_infrastructure_alerting-0.10-0.10.0-0.noarch.rpm \ root@<Fuel Master node IP address>:
Install the plugin using the Fuel Plugins CLI:
[root@fuel ~]# fuel plugins --install \ lma_infrastructure_alerting-0.10-0.10.0-0.noarch.rpm
Verify that the plugin is installed correctly:
[root@fuel ~]# fuel plugins --list id | name | version | package_version ---|-----------------------------|---------|---------------- 1 | lma_infrastructure_alerting | 0.10.0 | 4.0.0
Install from source¶
Alternatively, you may want to build the plugin RPM file from source if, for example, you want to test the latest features of the master branch or customize the plugin.
Note
Running a Fuel plugin that you built yourself is at your own risk and will not be supported.
To install the StackLight Infrastructure Alerting Plugin from source, first prepare an environment to build the RPM file. The recommended approach is to build the RPM file directly onto the Fuel Master node, so that you will not have to copy that file later on.
To prepare an environment and build the plugin:
Install the standard Linux development tools:
[root@home ~] yum install createrepo rpm rpm-build dpkg-devel
Install the Fuel Plugin Builder. To do that, first get pip:
[root@home ~] easy_install pip
Then install the Fuel Plugin Builder (the
fpb
command line) withpip
:[root@home ~] pip install fuel-plugin-builder
Note
You may also need to build the Fuel Plugin Builder if the package version of the plugin is higher than package version supported by the Fuel Plugin Builder you get from pypi. For instructions on how to build the Fuel Plugin Builder, see the Install Fuel Plugin Builder section of the Fuel Plugin SDK Guide.
Clone the plugin repository:
[root@home ~] git clone \ https://github.com/openstack/fuel-plugin-lma-infrastructure-alerting.git
Verify that the plugin is valid:
[root@home ~] fpb --check ./fuel-plugin-lma-infrastructure-alerting
Build the plugin:
[root@home ~] fpb --build ./fuel-plugin-lma-infrastructure-alerting
To install the plugin:
Now that you have created the RPM file, install the plugin using the fuel plugins --install command:
[root@fuel ~] fuel plugins --install ./fuel-plugin-lma-infrastructure-alerting/*.rpm
Plugin configuration¶
To configure the StackLight Infrastructure Alerting plugin:
Create a new environment as described in Create a new OpenStack environment.
In the Fuel web UI, click the Settings tab and select the Other category.
Scroll down through the settings until you find The StackLight Infrastructure Alerting Plugin section.
Select The StackLight Infrastructure Alerting Plugin and fill in the required fields as indicated below.
- If required, override the Nagios web interface self-generated password.
- Select the types of notifications that you would like to be alerted for by email (CRITICAL, WARNING, UNKNOWN, RECOVERY).
- Specify the recipient email address for the alerts.
- Specify the sender email address for the alerts.
- Specify the SMTP server address and port.
- Specify the SMTP authentication method.
- Specify the SMTP username and password. This is not required if the authentication method is None.
Select Enable TLS for Nagios if you want to encrypt your Nagios web UI credentials (username, password). Then, fill in the required fields as indicated below.
- Specify the DNS name of the Nagios web UI. This parameter is used to create a link from within the Fuel dashboard to the Nagios web UI.
- Specify the location of the PEM file, which contains the certificate and the private key of the server that will be used in TLS handchecks with the client.
Select Use LDAP for Nagios Authentication if you want to authenticate through LDAP to the Nagios Web UI. Then, fill in the required fields as indicated below.
- Select the LDAPS if you want to enable LDAP authentication over SSL.
- Specify one or several LDAP server addresses separated by a space. These addresses must be accessible from the node where Nagios is installed. Addresses outside the management network are not routable by default (see the note below).
- Specify the LDAP server port number or leave it empty to use the defaults.
- Specify the Bind DN of a user who has search privileges on the LDAP server.
- Specify the password of the user identified by Bind DN above.
- Specify the User search base DN in the Directory Information Tree (DIT) from where to search for users.
- Specify a valid User search filter to search for users. The search should return a unique user entry.
You can further restrict access to the Nagios web UI to those users who are members of a specific LDAP group. However, with the Nagios web UI there is no notion of privileged (admin) access.
- Select Enable group-based authorization to restrict the access to a group of users.
- Specify the LDAP attribute in the user entry to identify the group of users.
- Specify the DN of the LDAP group that has access to the Nagios web UI.
Configure your environment as described in Configure your Environment.
Note
By default, StackLight is configured to use the management network, of the so-called Default Node Network Group. While this default setup may be appropriate for small deployments or evaluation purposes, it is recommended that you not use this network for StackLight in production. Instead, create a network dedicated to StackLight. Using a dedicated network for StackLight should improve performance and reduce the monitoring footprint. It will also facilitate access to the Nagios web UI after deployment.
Click the Nodes tab and assign the Infrastructure_Alerting role to the node or multiple nodes where you want to install the plugin.
The example below shows that the Infrastructure_Alerting role is assigned to three nodes alongside with the Elasticsearch_Kibana role and the InfluxDB_Grafana role. The three plugins of the LMA toolchain back-end servers are installed on the same node.
Note
Nagios clustering for high availability requires assigning the Infrastructure_Alerting role to three different nodes. You can add or remove nodes with the Infrastructure_Alerting role after deployment.
If required, adjust the disk partitioning as described in Configure disk partitioning.
By default, the StackLight Infrastructure Alerting plugin allocates:
- 20% of the first available disk for the operating system by honoring a range of 15 GB minimum and 50 GB maximum
- 10 GB for
/var/log
- At least 20 GB for the Nagios data in
/var/nagios
The deployment will fail if the above requirements are not met.
Deploy your environment as described in Deploy an OpenStack environment.
Plugin verification¶
Depending on the number of nodes and deployment setup, deploying a Mirantis OpenStack environment may take 20 minutes to several hours. Once the deployment is complete, you should see a deployment success notification message with a link to the Nagios web UI as shown below.

Click Nagios. Once authenticated, you should be redirected to the Nagios home page as shown below.

Note
The username is nagiosadmin
by default, the password is defined
in the settings.
Note
If Nagios is installed on the management network, you may not have direct access to the Nagios web UI. Extra network configuration may be required to create an SSH tunnel to the management network.
Using StackLight Infrastructure Alerting plugin for Fuel¶
Using Nagios¶
The StackLight Infrastructure Alerting plugin configures Nagios to display the health status of all the nodes and services running in the OpenStack environment. The alarms, or service checks in Nagios terms, are created in passive mode, which means that the actual checks are not performed by Nagios itself, but by the Collector and Aggregator agents of the LMA toolchain.
To get an overview of your OpenStack environment:
Log in to the Fuel web UI.
Click Dashboard.
Click Nagios.
Click the Services link in the left panel of the Nagios web UI. You should see the following page:
In this dashboard, there are two virtual hosts representing the health status of the so-called global clusters and node clusters entities:
- 00-global-clusters-env${ENVID} is used to represent the aggregated health status of global clusters, such as ‘Nova’, ‘Keystone’, ‘RabbiMQ’, and others.
- 00-node-clusters-env${ENVID} is used to represent the aggregated health status of node clusters, such as ‘Controller’, ‘Compute’, and ‘Storage’.
The virtual hosts section contains a list of checks received for each of the nodes provisioned in the environment. These checks may vary depending on the role of the node being monitored.
Alerting is enabled by default for the global cluster entities. For the nodes and clusters of nodes alerting is disabled by default to avoid the alert fatigue, since these alerts should not be representative of a critical condition affecting the overall health status of the global cluster entities.
To enable alerting for nodes and clusters:
Click a particular service.
Click the Enable notifications for this service link within the Service Commands panel as shown below.
There is a direct dependency between the configuration of the passive checks in
Nagios and the configuration of the alarms in the Collectors. For details, see
the Configuring alarms section in the
StackLight Collector documentation.
A change in /etc/hiera/override/alarming.yaml
or
/etc/hiera/override/gse_filters.yaml
on any of the nodes monitored by
StackLight would require reconfiguring Nagios. It also implies that these two
files should be maintained rigorously identical on all the nodes of the
environment including those where Nagios is installed. StackLight provides
Puppet artifacts to help you out with that task. To reconfigure the passive
checks in Nagios when /etc/hiera/override/alarming.yaml
or
/etc/hiera/override/gse_filters.yaml
are modified, run the following
command on all the nodes where Nagios is installed:
# puppet apply --modulepath=/etc/fuel/plugins/\
lma_infrastructure_alerting-<version>/puppet/modules:/etc/puppet/modules \
/etc/fuel/plugins/lma_infrastructure_alerting-<version>/puppet/manifests/nagios.pp
Configuring service checks using the InfluxDB metrics¶
You could also configure Nagios to perform active checks, which are not performed by StakLight by default, using the metrics stored in InfluxDB’s time-series. For example, you could define active checks to be notified when the CPU activity of particular process is too high.
Consider the following scenario:
- You want to monitor the Elasticsearch server.
- The CPU activity of the Elasticsearch server is captured in a time-series stored in InfluxDB.
- You want to receive an alert at the ‘warning’ level when the CPU load exceeds 30% of system activity.
- You want to receive an alert at the ‘critical’ level when the CPU load exceeds 50% of system activity.
The steps to create such alarms in Nagios are as follows:
Connect to each of the nodes running Nagios.
Install the Nagios plugin for querying InfluxDB:
[root@node-13 ~]# pip install influx-nagios-plugin
Define the command and the service check in the
/etc/nagios3/conf.d/influxdb_services.conf
file:# Replace <INFLUXDB_HOST>, <INFLUXDB_USER> and <INFLUXDB_PASSWORD> by # the appropriate values for your deployment define command { command_line /usr/local/bin/check_influx \ -h <INFLUXDB_HOST> -u <INFLUXDB_USER> -p <INFLUXDB_PASSWORD> -d lma \ -q "select max(value) from lma_components_cputime_syst \ where time > now() - 5m and service='$ARG1$' \ group by time(5m) limit 1" \ -w $ARG2$ -c $ARG3$ command_name check_cpu_metric } define service { service_description Elasticsearch system CPU host node-13 check_command check_cpu_metric!elasticsearch!30!50: use generic-service }
Verify that the Nagios configuration is valid:
[root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg [snip] Total Warnings: 0 Total Errors: 0
No serious problems were detected during the pre-flight check.
Restart the Nagios server:
[root@node-13 ~]# crm resource restart nagios3
Go to the Nagios Web UI to verify that the service check has been added.
You can define additional service checks for different nodes or node groups using the same check_influx command. To define new service checks, provide the following required arguments:
- A valid InfluxDB query that should return only one row with a single value. See InfluxDB documentation to learn how to use the InfluxDB’s query language.
- A range specification for the warning threshold.
- A range specification for the critical threshold.
Note
Threshold ranges are defined following the Nagios format.
Using an external SMTP server with STARTTLS¶
If your SMTP server requires STARTTLS, perform some manual adjustments to the Nagios configuration after the deployment of your environment.
Note
Prior to enabling STARTTLS, configure the SMTP Authentication method parameter in the plugin’s settings to use either Plain, Login or CRAM-MD5.
Log in to the LMA Infrastructure Alerting node.
Open the
cmd_notify-service-by-smtp-with-long-service-output.cfg
file in the/etc/nagios3/conf.d/
directory for editing.Add the
-S smtp-use-starttls
option to the mail command. For example:define command{ command_name notify-service-by-smtp-with-long-service-output command_line /usr/bin/printf "%b" "***** Nagios *****\n\n"\ "Notification Type: $NOTIFICATIONTYPE$\n\n"\ "Service: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\n"\ "State: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\n"\ "Additional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$\n" | \ /usr/bin/mail -s "** $NOTIFICATIONTYPE$ "\ "Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" \ -r 'nagios@localhost' \ -S smtp="smtp://<SMTP_HOST>" \ -S smtp-auth=<SMTP_AUTH_METHOD> \ -S smtp-auth-user='<SMTP_USER>' \ -S smtp-auth-password='<SMTP_PASSWORD>' \ -S smtp-use-starttls \ $CONTACTEMAIL$ }
Note
If the server certificate is not present in the standard directory, for example,
/etc/ssl/certs
on Ubuntu, specify its location by adding the-S ssl-ca-file=<FILE>
option.To disable the verification of the SSL/TLS server certificate altogether, add the
-S ssl-verify=ignore
option instead.Verify that the Nagios configuration is correct:
[root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg
Restart the Nagios service:
[root@node-13 ~]# crm resource restart nagios3
Troubleshooting¶
If you cannot access the Nagios web UI, use the following troubleshooting tips.
Verify that the StackLight Collector is able to connect to the Nagios VIP address on port
80
.Verify that the Nagios configuration is valid:
[root@node-13 ~]# nagios3 -v /etc/nagios3/nagios.cfg [snip] Total Warnings: 0 Total Errors: 0
No serious problems were detected during the pre-flight check.
Verify that the Nagios server is up and running:
[root@node-13 ~]# crm resource status nagios3 resource nagios3 is NOT running
If Nagios is not running, start it:
[root@node-13 ~]# crm resource start nagios3
Verify that Apache is up and running:
[root@node-13 ~]# crm resource status apache2-nagios
If Apache is not running, start it:
[root@node-13 ~]# crm resource start apache2-nagios
Look for errors in the Nagios
/var/nagios/nagios.log
log file:Look for errors in the Apache log files:
/var/log/apache2/nagios_error.log
/var/log/apache2/nagios_wsgi_access.log
/var/log/apache2/nagios_wsgi_error.log
Nagios may report a host or service state as UNKNOWN, for example:
- ‘UNKNOWN: No datapoint have been received ever’
- ‘UNKNOWN: No datapoint have been received over the last X seconds’
Both cases indicate that Nagios does not receive regular passive checks from the StackLight Collector. This may be due to different issues, for example:
- The ‘hekad’ process fails to communicate with Nagios
- The ‘collectd’ and/or ‘hekad’ process have crashed
- One or several alarm rules are misconfigured
For solutions, see the Troubleshooting section in the StackLight Collector plugin documentation.