Overview

ACCESS Operations uses Nagios to monitor network port accessible services  to verify that they are operational. When a problem is detected an e-mail is sent to the relevant parties informing them of the problem, or a notice can be posted to an ACCESS Slack channel.

Requesting Access

Operators of ACCESS services can request that Nagios monitor their service by opening an Integration and Operations Request at https://operations.access-ci.org/help and selecting the Issue Type "Operations: Tools - ...".

Operators of ACCESS services may also request access to the Nagios web interface to view service monitoring results by opening an Integration and Operations Request at https://operations.access-ci.org/help and selecting the Issue Type "Operations: Tools - ...".

Usage Instructions

The ACCESS Monitoring Nagios web interface is available at https://monitor.access-ci.org/ and the configurations and infrastructure files are managed in GitHub using Terraform and Ansible.  The useful and technical information for developers and SysAdmins is found in Nagios related GitHub repositories, so if anyone has requests about monitoring, please follow the steps listed above using the  Integration and Operations Request.

Login through the web interface using the ACCESS CI (XSEDE) identity.

Once logged in, the Hosts and Services tabs on the left-hand side will be the most important as that's where the host and service checks live.

Monitored Hosts

All current hosts with associated services being monitored by Nagios. Some are routinely shutdown or are temporary, depending on if it's a development or production server and is only running for specific development tasks. Otherwise, the status, alerts, etc, are viewable at the web interface mentioned above.