Event Prioritization and Alerting

Syslog severity levels:

Code	Level	Interpretation
0	Emergency	The system is unusable (kernel panic).
1	Alert	A fault requiring immediate remediation has occurred.
2	Critical	A fault that will require immediate remediation is likely to develop.
3	Error	A nonurgent fault has developed.
4	Warning	A nonurgent fault is likely to develop.
5	Notice	A state that could potentially lead to an error condition has developed.
6	Informational	A normal but reportable event has occurred.
7	Debug	Verbose status conditions used during development and testing

Logging level is the threshold for storing or forwarding an event message based on its severity index or value.

Alerting

automated event management system can generate alerts
- indicate when certain event types of a given severity are encountered
- can be generated by setting thresholds for performance counters
  - e.g., packet loss, link bandwidth drops, number of sessions established, delay/jitter in real-time apps, etc.
- can reveal an anomaly
  - patterns of behavior or usage that are not consistent with normal activity
- network monitors support heartbeat tests
  - receive an alert if a device or server stops responding to probes
need to have right balance of alerts
alert means that the system has matched some sort of pattern or filter that should be recorded and highlighted
notification means that the system sends a message to advertise the occurrence of the alert
need a process for acknowledging and dismissing alerts
- serious alert may need to be processed as an incident and assigned a job ticket
- false positive can be dismissed