Detection Engineering Process

Key Terminology

Detection / Rule is the search syntax in your SIEM that identifies malicious activity
Alert is the output generated when a detection fires
Ticket is what gets created in your ticketing system (alerts may or may not become tickets based on defined criteria)

These terms are often used interchangeably but represent distinct stages in the pipeline.

The Scientific Method as Detection Engineering

Detection engineering benefits from the same structured, rigorous approach used in science
The scientific method maps neatly onto the detection lifecycle:

Scientific Method	Detection Engineering Equivalent
Observation	Detection Story (initial input)
Research / Hypothesis	Research phase
Experimentation	Query building + back testing
Validation / Analysis	Canary testing
Reporting / Conclusions	Documentation
Theory (repeatedly tested)	Onboarded, continuously improved detection

A high-quality detection = scientific theory

well-researched, verifiable, reproducible, and continuously improved over time

Why Follow a Structured Process?

Better-defined scope
- know what you’re looking for from the start
Fewer missed steps
- A framework prevents you from skipping critical phases
Higher overall detection quality
- Research and documentation are built in, not bolted on

The Detection Engineering Process

Detection Story
Research
Build the Query
Back Test
Build a Canary
Documentation
Onboard
Continuous Improvement

Detection Story (Initial Input)

The detection story is the formalized input that kicks off the process.

Think of it as a structured intake ticket

A good detection story includes:

Reason for the detection
- e.g., observed malicious IOC, customer request, blog/research
Data sources available
- e.g., Windows event logs, EDR telemetry
Example Logic
Expected volume in your environment
Supporting Artifacts
- links to IOCs, sample commands, related reports

Tip

Don’t accept a ticket that just says “make a detection for PSExec.”

Require structure upfront or you’ll waste time guessing intent or build something entirely unnecessary.

Example Scenario: Detection Story

IOCs observed as part of malicious traffic

A command was observed during a real attack where PSExec was used to authenticate with a plaintext password via the -p flag, then copy malware to a list of hosts using a batch script.

Research

Good detection requires full understanding of the idea
Research informs every decision downstream
Research and document findings related to the artifacts in question as you go
Document different avenues you go down, things you look at, etc.

Key principles:

Understand what you’re looking for fully before writing a query
Identify flags, behaviors, and patterns specific to the technique
Research whether the behavior should ever occur legitimately in your environment
Watch your scope — it’s easy to drift from “PSExec plaintext auth” into “all PSExec” or “PSExec + file share interaction,” which are separate detections

Example Scenario: Research

Research reveals the -p flag passes a password in plaintext on the command line

Best practice requires interactive password prompts

Legitimate use of -p in a production environment is essentially nonexistent

this is a strong signal

Build the Query

The query is the core of your detection
should be directly informed by your research
- in what to look for, possible exclusions, etc.
Without a good query, you don’t have a detection
Prototype queries as you go and evaluate the results

Balance is key

Too Broad Just Right Too Narrow
All PSExec activity PSExec process name + -p argument PSExec with -p and a specific known password
High volume, burnout risk Balance between fidelity and volume Too specific, Misses variants (quoted args, file-based passwords)

Too Broad	Just Right	Too Narrow
All PSExec activity	PSExec process name + `-p` argument	PSExec with `-p` and a specific known password
High volume, burnout risk	Balance between fidelity and volume	Too specific, Misses variants (quoted args, file-based passwords)

Practical tips:
- Lowercase the process name field to avoid case-sensitivity mismatches
- Consider PSExec clone/rename variants
- Look at process arguments as an array, not just a string match
- Think about corroborating signals (e.g., PSEXESVC service installation)

Example Scenario: KQL Query
process.name : "psexec.exe" AND process.args : "-p"
This catches any PSExec execution with the -p argument regardless of other arguments

not too specific, not too broad

Back Test

Estimating volume should be done before the SOC brings you issues
Before going live, validate your query against historical data
- Run the query against 90 days of data
  - 30 days is the absolute minimum
- Identify noise, known-good activity, or filters you need to apply
- If results are unexpectedly high, consider:
  - Dropping or deprioritizing the detection
  - Lowering its alert priority
  - Reformulating the query
    - stack the data
    - find common fields in legitimate events
    - filter accordingly
- Document your results
  - use as evidence:
    - to propose a severity/priority
    - If it fires later, you have proof of due diligence
  - screenshot the query, time range, and result count
Zero results in a back test is either:
- great news (high fidelity)
- or a sign something is broken
Verify by confirming the query would have caught known-bad traffic

Example Scenario: Back Test

A 90-day back test returned zero results after excluding a known malicious username

confirms the query correctly identified the original attack traffic

Build a Canary

A canary is code that executes on a schedule to generate the exact traffic your detection is supposed to catch.

A canary validates that your detection continues to work after deployment
- Without one, a broken detection could go unnoticed for months
logic can be extremely complex or very simple
run on scheduled interval
If it doesn’t fire an alert, you get notified of the failure

Canary tiers:

Best: Dedicated canary infrastructure with scheduled runs and failure alerting
- removes environmental variables
Good: Regularly scheduled manual runs
- still catches broken detections over time
Avoid: Testing once at deploy time and never again

Tip

Generate traffic as close to the real thing as possible

echoing a command to the CLI may not produce the same log artifacts as actually running the tool

Reference Atomic Red Team (ART) for a library of pre-built canary-style test scripts

If replaying captured malicious traffic,

ensure nothing in your environment has changed that could make it artificially always pass

Example Scenario: Build a Canary

Simply run psexec.exe -p <password> ... in a controlled test environment on a schedule

Confirm it triggers the detection

psexec \\remote_computer -u username -p password command

Documentation

Documentation is arguably the most important step
Good documentation should cover:
- What the detection is looking for and why
- Why specific fields were included or excluded
- How a SOC analyst should investigate the alert
- Blind spots, how could an attacker evade this detection?
- MITRE ATT&CK technique mappings

Alerting & Detection Strategy (ADS)

Open source
covers everything from high-level goal to specific technical logic, blind spots, and investigation steps
ADS Framework (palantir/alerting-detection-strategy-framework)
Fill this out from the beginning as you move along
can prompt an LLM with your detection logic and have it draft an ADS document for you
- Review and refine
- it won’t be perfect but can save significant time on formatting and structure

Onboard

is enabling the rule in the production side of the SIEM

Checklist:

Paste in the final, validated query
Configure suppression correctly
- e.g., suppress by host.name + user.name to avoid duplicate alerts for the same actor
Configure throttling appropriately
Create a change ticket or pull request if your environment requires it
Run a burn-in period
- detection is enabled in production
- but doesn’t produce alerts yet
- observe output in real conditions

Tip

Suppressing by host + user is about as specific as you want to go

Too broad (e.g., suppress by organization) and you might get one alert for 10 simultaneous threat actors

Continuous Improvement

Detection engineering doesn’t end at deploy
- it is a continuous process
Treat your detection like a living document
Sources of improvement:
- SOC analyst feedback
  - they see the alerts daily and will identify noise or gaps
- New research
  - techniques evolve, tools get renamed, evasions emerge
- Structural changes
  - field mappings change, data sources shift
For large query changes, repeat the back test and burn-in period
Risk management:
- Every filter or exception you add accepts some risk
- Understand and document that tradeoff
- Defense-in-depth means a single filter won’t make or break you
  - but be intentional about what you’re accepting

Example Scenario: Continuous Improvement

An admin needs to execute PSEXEC in the specific alerted way

We accept that risk

filter them out

excluded because it is known activity that is risk-accepted

Summary

A detection that completes this full lifecycle is the equivalent of a scientific theory:

✅ Carefully scoped
✅ Well-researched
✅ Experimentally validated
✅ Documented thoroughly
✅ Continuously tested and improved

“If you have a structured process for your detection engineering and you do it well and it’s well thought out, you will have a much better output at the end.”
— Hayden Covington

Resources

ADS Framework: https://github.com/palantir/alerting-detection-strategy-framework
Atomic Red Team: https://github.com/redcanaryco/atomic-red-team
Detection as Code (Splunk blog): Search “Splunk detection as code”
Wade Wells’ ADS Custom GPT: generates ADS docs from detection logic

Reference

Presenter: Hayden Covington — SOC Analyst & Detection Engineer, Black Hills Information Security
Source: BHIS Webcast — The Detection Engineering Process

The Detection Engineering Process w/ Hayden Covington

adam's notes

Table of Contents

Detection Engineering Process

Key Terminology

The Scientific Method as Detection Engineering

Why Follow a Structured Process?

The Detection Engineering Process

Detection Story (Initial Input)

Research

Build the Query

Back Test

Build a Canary

Documentation

Alerting & Detection Strategy (ADS)

Onboard

Continuous Improvement

Summary

Resources

Reference

Graph View

Backlinks