Node Inspection Service
Aptos nodes collect metrics and system information while running. These metrics provide a way to track, monitor and inspect the health and performance of the node dynamically, at runtime. Node metrics and system information can be queried or exported via an inspection service that runs on each node.
You can configure various aspects of the node inspection service. This document describes how to expose and see the metrics locally, on the respective node. You may also view these metrics remotely by making the port publicly accessible via firewall rules. Generally, validator nodes don't expose these metrics for security, yet fullnodes do so the health checker can verify them.
If you do make the inspection service port publicly accessible on your validator node, we recommend disabling that access when not in use.
Examining node metrics
If you'd like to examine the metrics of your node (validator or fullnode), start running a node and review the inspection service locally by loading this URL in your browser:
http://localhost:9101/metrics
This will display the values of all the metrics and counters of your node at the time you queried it. To see updates to these values, simply refresh the page.
Likewise, if you wish to view the metrics in json
format, visit the following URL:
http://localhost:9101/json_metrics
See additional configuration details below.
Change inspection service port
The inspection service should run on all nodes by default, at port 9101
. To change
the port the inspection service listens on (e.g., to 1000
), add the following to your node
configuration file:
inspection_service:
port: 1000
Expose system configuration
The inspection service also provides a way to examine the configuration of your node at runtime (i.e., the configuration settings that your node started with).
By default, the configuration endpoint is disabled as it may expose potentially sensitive information about the configuration of your node, e.g., file paths and directories. We recommend enabling this endpoint only if the inspection service is not publicly accessible. :::`
To enable this feature, add the following to your node configuration file:
inspection_service:
expose_configuration: true
And visit the configuration URL:
http://localhost:9101/configuration
Expose system information
Likewise, the inspection service also provides a way to examine the system information of your node at runtime (i.e., build and hardware information). Simply visit the following url:
http://localhost:9101/system_information
If you'd like to disable this endpoint, add the following to your node configuration file:
inspection_service:
expose_system_information: false
The system information displayed here is not guaranteed to be 100% accurate due to limitations in the way this information is collected. As a result, we recommend not worrying about any inaccuracies and treating the information as an estimate. :::`
Understand node metrics
When you visit the metrics endpoint, you will notice that there are a large number of metrics and counters being produced by your node. Most of these metrics and counters are useful only for blockchain development and diagnosing hard-to-find issues. As a result, we recommend that node operators ignore most metrics and pay attention to only the key metrics presented below:
As Aptos continues to grow and develop the blockchain software, many metrics will come and go. As a result, we recommend relying on the presence of only the metrics explicitly mentioned below. All other metrics should be considered unstable and may be changed/removed without warning.
Consensus
If you are running a validator node, the following consensus metrics are important:
aptos_consensus_proposals_count
: Counts the number of times the node sent a block proposal to the network. The count will increase only when the validator is chosen to be a proposer, which depends on the node's stake and leader election reputation. You should expect this metric to increase at least once per hour.aptos_consensus_last_committed_round
: Counts the last committed round of the node. During consensus, we expect this value to increase once per consensus round, which should be multiple times per second. If this does not happen, it is likely the node is not participating in consensus.aptos_consensus_timeout_count
: Counts the number of times the node locally timed out while trying to participate in consensus. If this counter increases, it is likely the node is not participating in consensus and may be having issues, e.g., network difficulties.aptos_state_sync_executing_component_counters{label="consensus"
: This counter increases a few times per second as long as the node is participating in consensus. When this counter stops increasing, it means the node is not participating in consensus, and has likely fallen back to state synchronization (e.g., because it fell behind the rest of the validators and needs to catch up).
State sync
If you are running a fullnode (or a validator that still needs to synchronize to the latest blockchain state), the following state sync metrics are important:
aptos_state_sync_version{type="synced"}
: This metric displays the current synced version of the node, i.e., the number of transactions the node has processed. If this metric stops increasing, it means the node is not syncing. Likewise, if this metric doesn't increase faster than the rate at which new transactions are committed to the blockchain, it means the node is unlikely to get and stay up-to-date with the rest of the network. Note: if you've selected to use fast sync, this metric won't increase until all states have been downloaded, which may take some time. See (3) below.aptos_data_client_highest_advertised_data{data_type="transactions"}
: This metric displays the highest version synced and advertised by the peers that your node is connected to. As a result, when this metric is higher thanaptos_state_sync_version{type="synced"}
(above) it means your node can see new blockchain data and will sync the data from its peers.aptos_state_sync_version{type="synced_states"}
: This metric counts the number of states that have been downloaded while a node is fast syncing. If this metric doesn't increase, andaptos_state_sync_version{type="synced"}
doesn't increase (from above), it means the node is not syncing at all and an issue has likely occurred.aptos_state_sync_bootstrapper_errors
andaptos_state_sync_continuous_syncer_errors
: If your node is facing issues syncing (or is seeing transient failures), these metrics will increase each time an error occurs. Theerror_label
inside these metrics will display the error type.
Compare the synced version shown by aptos_state_sync_version{type="synced"}
with the highest version shown on the Aptos Explorer to see how far behind the latest blockchain version your node is. Remember to select the correct network that your node is syncing to (e.g., mainnet
).
Networking
The following network metrics are important, for both validators and fullnodes:
aptos_connections{direction="inbound"
andaptos_connections{direction="outbound"
: These metrics count the number of peers your node is connected to, as well as the direction of the network connection. Aninbound
connection means that a peer (e.g., another fullnode) has connected to you. Anoutbound
connection means that your node has connected to another node (e.g., connected to a validator fullnode).- If your node is a validator, the sum of both
inbound
andoutbound
connections should be equal to the number of other validators in the network. Note that only the sum of these connections matter. If all connections areinbound
, or all areoutbound
, this doesn't matter. - If your node is a fullnode, the number of
outbound
connections should be> 0
. This will ensure your node is able to synchronize. Note that the number ofinbound
connections matters only if you want to act as a seed in the network and allow other nodes to connect to you as discussed Fullnode Network Connections.
- If your node is a validator, the sum of both
Mempool
The following mempool metrics are important:
core_mempool_index_size{index="system_ttl"
: This metric displays the number of transactions currently sitting in the mempool of the node and waiting to be committed to the blockchain:- If your node is a fullnode, it's highly unlikely that this metric will be
> 0
, unless transactions are actively being sent to your node via the REST API and/or other fullnodes that have connected to you. Most fullnode operators should ignore this metric. - If your node is a validator, you can use this metric to see if transactions from your node's mempool are being included in the blockchain (e.g., if the count decreases). Likewise, if this metric only increases, it means that either: (i) your node is unable to forward transactions to other validators to be included in the blockchain; or (ii) that the entire blockchain is under heavy load and may soon become congested.
- If your node is a fullnode, it's highly unlikely that this metric will be
REST API
The following REST API metrics are important:
-
aptos_api_requests_count{method="GET"
andaptos_api_requests_count{method="POST"
: These metrics count the number of REST APIGET
andPOST
requests that have been received via the node's REST API. This allows you to monitor and track the amount of REST API traffic on your node. You can also further use theoperation_id
in the metric to monitor the types of operations the requests are performing. -
aptos_api_response_status_count
: This metric counts the number of response types that were sent for the REST API. For example,aptos_api_response_status_count{status="200"}
counts the number of requests that were successfully handled with a200
response code. You can use this metric to track the success and failure rate of the REST API traffic.