Chapter 2. Software Monitoring

Table of Contents

System Log File Monitoring
SNMP and JMX Monitoring
Monitoring Metrics for Storage Nodes
Monitoring Metrics for Replication Nodes
Monitoring Metrics for Administration Nodes

Being a distributed system, the Oracle NoSQL Database is composed of several software components and each expose unique metrics that can be monitored, interpreted, and utilized to understand the general health, performance, and operational capability of the NoSQL Database cluster.

This section focuses on best practices for monitoring the Oracle NoSQL software components. While there are several software dependencies for the Oracle NoSQL Database itself (for example, Java virtual machine, operating system, NTP), this section focuses solely on the NoSQL components.

There are three basic mechanisms for monitoring the health of the NoSQL Database:

The following sections discuss details of each of these monitoring techniques and illustrate how each of them can be utilized to detect failures in NoSQL Database components.

System Log File Monitoring

The Oracle NoSQL Database is composed of the following components, and each component produces log files that can be monitored:

  • Replication Nodes – Service read and write requests from API calls. Replication nodes for a particular shard are laid out on different storage nodes (physical servers) by the topology manager, so the log files for the nodes in each shard are spread across multiple machines.

  • Storage Node Agents – Manage the replication nodes that are running on each storage node. The SNA maintains its own log regarding the state of each replication node it is managing. You can think of the SNA log as a high level log of the replication node activity on a particular storage node.

  • Admin Nodes – Administrative nodes handle the execution of commands from the administrative command line interface. Long running plans are also staged from the administrative nodes. Administrative nodes also maintain a consolidated log of all the other logs in the Oracle NoSQL cluster.

All of the above mentioned log files can be found in the following directory structure KVROOT/kvstore/log on the machine where the component is running. The following steps can be used to find the machines that are running the components of the cluster:

  1. java -jar kvstore.jar ping -host <any machine in the cluster> -port <the port number used to initialize the KVStore>

  2. Each storage node (snXX) is listed in the output of the ping command, along with a list of replication nodes (rgXX-rnXX) running on the host listed in the ping output. XX denotes the unique number assigned to that component by NoSQL Database. For replication nodes, rg denotes the shard number and stands for replication group, while rn denotes the replication node number within that shard.

  3. Admin Nodes – Identifying the nodes in the cluster that are running administrative services is a bit more challenging. To identify these nodes, a script would run ps axww on every host in the cluster and grep for kvstore.jar and -class Admin.

The Oracle NoSQL Database maintains a single consolidated log of every node in the cluster, and this can be found on any of the nodes running an administrative service. While this is a convenient and easy single place to monitor for errors, it is not 100% guaranteed. The single consolidated view is aggregated by getting log messages over the network, and transient network failures, packet loss, and high network utilization can cause this consolidated log to either be out of date, or have missing entries. Therefore, we recommend monitoring each host in the cluster as well as monitoring each type of log file on each host in the cluster.

Generally speaking, any log message with a level of SEVERE should be considered a potentially critical event and worthy of generating a systems management notification. The sections in the later part of this document illustrate how to correlate specific SEVERE exceptions with hardware component failure.

SNMP and JMX Monitoring

Oracle NoSQL Database is also monitored through SNMP or JMX based system management tools. For SNMP based tools, the Oracle NoSQL MIB is found in lib directory of the installation along with the JAR files for the product.