ELI5: How to investigate a SOC incident? (5 Steps)

cypienta cypienta
Jun 16
4 min read

At Cypienta, I'm fortunate to work with many of the best SOC analysts, hunters, and detection engineers out there, as we help automatically find the relevant data points right at their SIEM, SOAR, XDR, and Case Management tools without searches, queries or rules.

As I am preparing for our Blackhat & Defcon presentations, I went through some personal notes. So thought I'd share my ELI5 (explain like Im 5) from what I got from Fortune 100 SOC internal frameworks, guidelines, and documented workflows on how to deal with an incident when it hits your queue:-

1. Understand the Incident

Answer the following:-

What happened & what threat behavior is this? (abstract to something like MITRE ATTACK TTPs)

Which systems, networks, business units, orgs, identities, or assets are involved? (Use something like a CMDB to understand involved entities' classifications, roles, owners & attributes)
When did it happen?

Example: WHAT="Suspicious outbound traffic" from a WHICH="service acc external user endpoint" at WHEN="2:00 AM Yesterday".

Splunk Enterprise Security SIEM — Splunk Enterprise Security Analytical Stories (more spl content: https://github.com/splunk/security_content)

2. Gather Related Evidence

Inspect & skim or eyeball the relevant situational context around that time period:

Logs from affected systems & apps among others

Events from network traffic capture among others

Alerts from endpoint, network, or cloud security among others
Service tickets & CMDB data for the affected entities among others
Threat intelligence feeds among others
Verify with the user if activity was legitimate

Search Time period determination:

- Start and end times of the suspicious activity

- Events leading up to the incident

- Subsequent actions after detection

- No silver bullet but start with 5min around the incident and go up to a week or more as needed

Example: Query & review relevant firewall logs showing the outbound traffic from this system and EDR events from this endpoint in the past 5mins.

Palo Alto Networks XIAM Cortex Response Process — Palo Alto Networks Malware Playbook

3. Formulate investigational hypotheses & possible threat scenarios

Any outliers or anomalies? Any indicators or patterns of attack/compromise? Initial signs?

From your acumen, experience & training, what do you think is happening here?

Does it look like Malware, Phishing, Insider threat, vulnerability exploit, etc.?

Example Hypothesis 1: The endpoint is infected with malware communicating with a C2 server.

Example Hypothesis 2: User credentials were compromised and used to exfiltrate data.

Example Hypothesis 3: The outbound traffic is from a legitimate but misconfigured application.

Oak Ridge National Lab ORNL Paper — From our good friends and supporters at ORNL. ref: Bobby Bridges et al. 2023

4. Confirm or refute Hypotheses & Identify False Positives

For each hypothesis:

Collect and analyze evidence and revise or form new hypotheses if needed.
Cross-reference alerts with other data sources & check for known benign patterns
Determine what happened and why & Document findings and steps taken

Check network traffic for C2 patterns. Review user activity for anomalies for compromised creds. Examine application logs and settings for mis configs

MITRE ATT&CK Flow — Mitre ATT&CK FLOW (more: https://center-for-threat-informed-defense.github.io/attack-flow/)

5. R^3: Respond, Report, and Review

Contain the incident: Isolate systems, block traffic
Eradicate: Remove malicious artifacts, fix issues
Recover & Restore operations, verify remediation
Document the incident, investigation, findings, and actions
Review and learn from the incident to improve detection content, observability, playbook, and response process

If you noticed, this similar to NIST & SANS incident handling processes, but it is more tactical & operational based on some of the best SOCs' processes outhere.

BONUS: AI can help!

You are a real one for reading this far! So, if you want to make it easier consider using our FOREVER FREE Cyber LLM chatbots like https://hf.co/chat/assistant/6692ea1980d075bf4961ecdf and https://hf.co/chat/assistant/6692eb85ce7a1a25328ab049 to help you in the following usecases:-

Automated incident report generation: LLMs can be used to create detailed and structured incident reports based on raw data and logs
Pattern recognition: The models can identify patterns and anomalies in security data, potentially uncovering hidden threats or attack vectors
Natural language querying: LLMs enable analysts to interact with security data using natural language queries, making it easier to extract relevant information
Context-aware analysis: These models can provide contextual information about threats, helping analysts understand the broader implications of an incident
Recommendation generation: LLMs can suggest potential remediation steps or mitigation strategies based on the analysis of the incident
Knowledge augmentation: The models can supplement an analyst's knowledge by providing relevant information from vast databases of security information
Automated triage: LLMs can help prioritize incidents based on their severity and potential impact, allowing analysts to focus on the most critical issues

And, if you want a private full data correlation & contextualization fine tuned models pipeline to scale for your big data and hold context, knowledge, and is more configurable, auditable, transparent, and reliable then schedule a free trial of our solution at cypienta.com/trial or get started right away for free following the docs at docs.cypienta.com

Cypienta Correlation models — Becase we don't have a SOC talent shortage we just have an elite unicorn talent shortage!

That's all folks! Until the next ELI5!