Event Prioritization and Alerting
- most logging systems categorize each event
Syslog severity levels:
| Code | Level | Interpretation |
|---|---|---|
| 0 | Emergency | The system is unusable (kernel panic). |
| 1 | Alert | A fault requiring immediate remediation has occurred. |
| 2 | Critical | A fault that will require immediate remediation is likely to develop. |
| 3 | Error | A nonurgent fault has developed. |
| 4 | Warning | A nonurgent fault is likely to develop. |
| 5 | Notice | A state that could potentially lead to an error condition has developed. |
| 6 | Informational | A normal but reportable event has occurred. |
| 7 | Debug | Verbose status conditions used during development and testing |
Logging level is the threshold for storing or forwarding an event message based on its severity index or value.
- aka severity level
- determines the maximum level at which events are recorded or forwarded
- configured on each host
Alerting
- automated event management system can generate alerts
- indicate when certain event types of a given severity are encountered
- can be generated by setting thresholds for performance counters
- e.g., packet loss, link bandwidth drops, number of sessions established, delay/jitter in real-time apps, etc.
- can reveal an anomaly
- patterns of behavior or usage that are not consistent with normal activity
- network monitors support heartbeat tests
- receive an alert if a device or server stops responding to probes
- need to have right balance of alerts
- alert means that the system has matched some sort of pattern or filter that should be recorded and highlighted
- notification means that the system sends a message to advertise the occurrence of the alert
- need a process for acknowledging and dismissing alerts
- serious alert may need to be processed as an incident and assigned a job ticket
- false positive can be dismissed