Skip to Content
HeadGym
Tags:#monitoring#metrics#observability#logs

Monitoring

The Monitoring module provides comprehensive metrics and observability for your clusters and instances.

Overview

The Monitoring page displays real-time metrics, historical trends, and recent logs to help you understand system performance.

Key Metrics

Average CPU Utilization

Percentage of CPU resources being used across all clusters. Includes a trend indicator showing changes from the previous period.

Average Memory Utilization

Percentage of memory resources being used across all clusters. Includes a trend indicator.

Network I/O

Total network throughput showing:

  • Inbound - Incoming data rate
  • Outbound - Outgoing data rate

Error Rate

Percentage of requests or operations that resulted in errors. Lower is better.

Utilization Over Time Chart

The main chart displays CPU and memory utilization over time. You can select different time ranges:

  • Last 1 hour
  • Last 6 hours
  • Last 24 hours (default)
  • Last 7 days
  • Last 30 days

The chart shows:

  • Blue bars - CPU utilization
  • Violet bars - Memory utilization

Recent Logs

The Recent Logs panel shows the latest log entries from the platform with:

  • Timestamp - When the log was generated
  • Level - Log severity (info, warn, error)
  • Message - The log content

Cluster Metrics Table

Detailed metrics for each cluster:

MetricDescription
ClusterCluster name
CPUCPU utilization percentage
MemoryMemory utilization percentage
Network InIncoming network traffic (MB/s)
Network OutOutgoing network traffic (MB/s)
Error RatePercentage of errors

Filtering Clusters

Use the filter input to search for specific clusters by name.

Exporting Metrics

Click the Export button to download metrics data in a standard format for external analysis.

Using Metrics for Troubleshooting

High CPU or memory utilization may indicate:

  • Increased workload demand
  • Memory leaks
  • Inefficient resource allocation
  • Application issues

Use the metrics in conjunction with:

  • Alerts - Set up proactive notifications
  • Events - See what events are occurring
  • Instances - Drill down to specific instances
Last updated on