Alert Management

Alerts Dashboard, Setup, and Firing Alerts

Alerts Dashboard (Grafana)

The Alerts Dashboard provides a consolidated view of the alerts configured for your infrastructure. This dashboard integrates Grafana with Prometheus and Alert Manager, offering visual insights into the current state of alerts and metrics.

Integrated Grafana Dashboards: The dashboard includes various embedded Grafana panels, each tailored to specific metrics such as cluster health, application resource usage, storage utilization, and scaling activities.
Live Data Feed: Prometheus scrapes data from your cluster, and the Alert Manager processes this data to trigger necessary alerts. This information is visualized in real-time on the dashboard.
Customizable Views: Users can switch between different Grafana panels to focus on specific alerts or metrics relevant to their current requirements.

How to Use

On the main Scoutflo screen > Access the 'Alerts Overview' .
You can select the preferred cluster you want to monitor from the drop down.
Use the embedded Grafana panels to inspect the health and performance of your cluster.
Monitor live alerts triggered by Prometheus and Alert Manager directly on the dashboard.

Alerts Setup with Prometheus and Alert Manager (Cluster Settings Page)

The Alerts Setup page is part of the Cluster Settings. This page provides a preconfigured setup of 15-20 base Prometheus alert rule templates tailored for your cluster. Users can customize these rules or create new ones based on their requirements.

Pre-configured Templates: These templates are designed to address common cluster alerting needs. Examples include:
- High CPU usage threshold.
- Memory allocation breaches.
- Storage nearing capacity.
Editable Parameters: Users can modify thresholds, add conditions, or adjust alerting intervals to suit their infrastructure.
Rule Management: A user-friendly interface allows easy management of all alert rules from one place.

How to Use

Go to the 'My Clusters' screen > Click on the cluster you want to edit alerts for.
You will be redirected to the Post Deployment screen of that cluster, navigate to the Cluster Settings page.
Review the list of pre-configured alert rules.
Click on the 'Edit' button at top right and then edit any rule for it's parameters or conditions.
Click on the 'Push' button to make changes to the alert rule on your infra.
Use the 'Create' button to create new alerts.

Firing Alerts Page (Cluster Post-Deployment)

The Firing Alerts Page provides real-time visibility into all active alerts triggered by Prometheus and Alert Manager. This page lists alerts that are currently firing due to resource usage exceeding predefined thresholds.

Live Alert Feed: The page dynamically updates as new alerts are fired or resolved.
Alert Details: Each alert entry includes:
- Resource affected (e.g., CPU, Memory, Storage).
- Current value and threshold breach details.
- Timestamp of when the alert was triggered.
Slack Notifications: Users receive immediate Slack notifications for all fired alerts, ensuring timely action.

How to Use

Go to the 'My Clusters' screen > Click on the cluster you want to edit alerts for.
You will be redirected to the Post Deployment screen of that cluster, navigate to the 'Alerts' section on this screen.
View the list of active alerts, sorted by severity and timestamp.
Monitor the resolution status as alerts are cleared.
You can click on the 'Graph Link' to open the monitored data over time for this metric.

Alerting Workflow:
1. Prometheus scrapes metrics from your cluster.
2. Alert Manager evaluates these metrics against defined rules.
3. Alerts are visualized on the Alerts Dashboard and the Firing Alerts Page.
4. Notifications are sent via Slack and displayed in Grafana.
Integration Points:
- Grafana Dashboards: For visual monitoring.
- Slack: For instant alert notifications.
- Cluster Settings: For rule configuration and customization.

This feature ensures comprehensive monitoring and alerting for your Kubernetes clusters, empowering users to maintain infrastructure health and respond quickly to issues.

PreviousNotification (Coming Soon)NextDefault Alert Rules

Last updated 5 months ago