Event Prioritization and Alerting


  • most logging systems categorize each event

Syslog severity levels:

CodeLevelInterpretation
0EmergencyThe system is unusable (kernel panic).
1AlertA fault requiring immediate remediation has occurred.
2CriticalA fault that will require immediate remediation is likely to develop.
3ErrorA nonurgent fault has developed.
4WarningA nonurgent fault is likely to develop.
5NoticeA state that could potentially lead to an error condition has developed.
6InformationalA normal but reportable event has occurred.
7DebugVerbose status conditions used during development and testing

Logging level is the threshold for storing or forwarding an event message based on its severity index or value.

  • aka severity level
  • determines the maximum level at which events are recorded or forwarded
  • configured on each host

Alerting

  • automated event management system can generate alerts
    • indicate when certain event types of a given severity are encountered
    • can be generated by setting thresholds for performance counters
      • e.g., packet loss, link bandwidth drops, number of sessions established, delay/jitter in real-time apps, etc.
    • can reveal an anomaly
      • patterns of behavior or usage that are not consistent with normal activity
    • network monitors support heartbeat tests
      • receive an alert if a device or server stops responding to probes
  • need to have right balance of alerts
  • alert means that the system has matched some sort of pattern or filter that should be recorded and highlighted
  • notification means that the system sends a message to advertise the occurrence of the alert
  • need a process for acknowledging and dismissing alerts
    • serious alert may need to be processed as an incident and assigned a job ticket
    • false positive can be dismissed