Definition
SPL (Search Processing Language) is the query language built into Splunk for searching, filtering, transforming, and visualizing machine data. Every interaction with data in Splunk — from a simple keyword search to a complex detection rule running on a scheduled alert — is expressed in SPL. It is the primary language used by SOC analysts, detection engineers, and threat hunters who work within the Splunk ecosystem.
SPL operates on events — individual records of machine data indexed by Splunk from sources like firewalls, endpoint agents, authentication systems, cloud platforms, application logs, and network devices. Each event has a timestamp, raw text, and a set of extracted fields. SPL lets you search those events, filter them by field values, transform them through statistical aggregation, enrich them with external data, and format the results for analysis or alerting.
As the dominant SIEM platform in enterprise security — Splunk holds significant market share across Fortune 500 security operations — SPL is one of the most widely used languages in cybersecurity. Proficiency in SPL is a core skill for SOC analysts, detection engineers, and incident responders. The language's power comes from its expressiveness: a single SPL query can search billions of events, extract fields with regex, correlate across multiple data sources, compute statistics, and format results for a dashboard — all in one pipeline.
SPL vs SQL
Analysts coming from a database background often ask how SPL compares to SQL. While both are query languages that search and aggregate data, their design philosophies differ fundamentally.
| Aspect | SPL | SQL |
|---|---|---|
| Paradigm | Pipe-based (sequential transformation) | Declarative (describe desired result) |
| Data model | Time-series events (semi-structured) | Relational tables (structured) |
| Time handling | Implicit time range on every search | Explicit WHERE on timestamp columns |
| Schema | Schema-on-read (fields extracted at search) | Schema-on-write (fields defined at insert) |
| Text search | Native full-text keyword search | LIKE operator or full-text extensions |
| Regex | Built-in rex command for extraction | REGEXP_EXTRACT or similar functions |
| Chaining | | pipes commands left-to-right | Subqueries, CTEs, JOINs |
| Aggregation | stats count by field | SELECT count(*) GROUP BY field |
The most important difference is the pipe-based paradigm. In SQL, you describe the result you want and the engine decides how to get it. In SPL, you describe a sequence of transformations — each command operates on the output of the previous one, flowing left-to-right through the pipeline. This makes SPL more intuitive for iterative data exploration: you start broad, progressively filter and transform, and watch the results change at each stage.
The schema-on-read model means Splunk does not require you to define field names and types before ingesting data. Raw log text is indexed as-is, and fields are extracted at search time — either automatically (for common formats like JSON, CSV, key-value pairs) or manually (using rex or field extraction rules). This flexibility is what allows Splunk to ingest virtually any machine data without preprocessing, but it also means that field names vary between data sources and environments.
The Pipe Operator
The pipe (|) is the fundamental organizing principle of SPL. Every SPL query is a pipeline: the initial search retrieves events, and each subsequent pipe command transforms the result set before passing it to the next command. This is directly analogous to Unix shell pipelines.
SPLindex=windows sourcetype=WinEventLog:Security EventCode=4625
| where src_ip!="127.0.0.1"
| stats count as failed_attempts by src_ip, user
| where failed_attempts > 10
| sort - failed_attempts
| table src_ip, user, failed_attempts
Reading this pipeline left-to-right, top-to-bottom:
- Search — Retrieve all Windows Security events with EventCode 4625 (failed logon)
- Where — Filter out localhost events
- Stats — Count failed attempts grouped by source IP and user
- Where — Keep only rows with more than 10 failures (threshold)
- Sort — Order by failure count descending
- Table — Format output as a clean table
Each pipe command receives the result set from the previous command. The initial search might return millions of raw events. After stats, the result set is reduced to one row per unique (src_ip, user) pair. After the second where, only the rows exceeding the threshold remain. This progressive refinement is what makes SPL pipelines readable and debuggable — you can truncate the pipeline at any point to see intermediate results.
Key SPL Commands
SPL has hundreds of commands, but security operations rely on a core set. These are the commands that appear in the vast majority of detection rules, hunting queries, and forensic investigations.
search
The implicit first command. Retrieves events matching keywords, field values, and boolean conditions. Every SPL query begins with a search (the search keyword itself is optional at the beginning of a query).
SPLindex=firewall action=blocked src_ip=10.0.0.* dest_port=445
index=proxy http_method=POST uri_path="*/upload*" bytes_out>1000000
where
Boolean filtering using comparison operators, functions, and logical connectives. Unlike search, where treats unquoted strings as field names, enabling field-to-field comparisons.
SPL| where user!=src_user # Field-to-field comparison
| where cidrmatch("10.0.0.0/8", src_ip) # CIDR matching
| where like(uri_path, "%/admin/%") # Pattern matching
| where isnull(user) OR len(user)<3 # Null checks + functions
stats
The workhorse aggregation command. Computes statistical functions grouped by one or more fields. This is the foundation of threshold-based detections — "alert when count exceeds N."
SPL| stats count as attempts by src_ip, user # Count per group
| stats dc(dest_ip) as unique_targets by src_ip # Distinct count
| stats values(dest_port) as ports by src_ip # List unique values
| stats earliest(_time) as first, latest(_time) as last by session_id
| stats avg(response_time) as avg_rt, max(response_time) as max_rt by endpoint
eval
Creates or transforms fields using expressions, functions, and conditional logic. Essential for enrichment, normalization, and computed fields.
SPL| eval duration = round((latest - earliest) / 60, 2) # Compute duration in minutes
| eval severity = case(count>100, "critical",
count>50, "high",
count>10, "medium",
true(), "low") # Conditional assignment
| eval domain = lower(mvindex(split(url, "/"), 2)) # Extract + normalize domain
| eval is_internal = if(cidrmatch("10.0.0.0/8", src_ip), "yes", "no")
rex
Extracts fields from raw text or existing fields using named capture groups in regular expressions. Critical when the data does not have clean field extractions.
SPL# Extract base64 payload from command line
| rex field=CommandLine "(?i)-enc(?:odedcommand)?\s+(?P<encoded_payload>[A-Za-z0-9+/=]+)"
# Extract domain from URL
| rex field=url "https?://(?P<domain>[^/]+)"
# Extract username from email
| rex field=email "(?P<username>[^@]+)@(?P<email_domain>.+)"
lookup
Enriches events with data from CSV lookup tables or KV store collections. Used to add context like asset ownership, threat intelligence indicators, or geographic data.
SPL# Enrich with asset inventory
| lookup asset_inventory ip as src_ip OUTPUT asset_owner, asset_criticality
# Check against threat intel list
| lookup threat_intel_iocs indicator as dest_ip OUTPUT threat_name, confidence
# Add geographic context
| lookup geo_ip ip as src_ip OUTPUT country, city, latitude, longitude
table
Formats the result set as a table with specified columns. Used at the end of a pipeline to select and order the fields you want to see.
SPL| table _time, src_ip, user, action, dest_ip, dest_port, bytes_out
eventstats and streamstats
These commands compute statistics without reducing the result set — the aggregated values are added as new fields to each event. This enables contextual enrichment: "show me each event alongside the average for its group."
SPL# Add group average to each event for anomaly detection
| eventstats avg(bytes_out) as avg_bytes, stdev(bytes_out) as stdev_bytes by user
| where bytes_out > (avg_bytes + 3*stdev_bytes)
# Running count for session analysis
| streamstats count as event_sequence by session_id
Security Detection in SPL
In security operations, SPL is primarily used to write detection rules — queries that run on a schedule (typically every 5-15 minutes) and generate alerts when suspicious or malicious activity is detected. Splunk's correlation search and alert framework executes these SPL queries against incoming data and triggers actions (email, webhook, SOAR playbook) when results are returned.
Detection rules in SPL generally follow one of several patterns:
Pattern matching
The simplest pattern — search for events that match specific field values associated with known malicious behavior.
SPLindex=windows sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" EventCode=1
Image="*\\certutil.exe"
(CommandLine="*-urlcache*" OR CommandLine="*-decode*" OR CommandLine="*-encode*")
| table _time, ComputerName, User, ParentImage, CommandLine
Threshold-based
Aggregate events and alert when a count or metric exceeds a defined threshold. Most brute force, spray, and scan detections follow this pattern.
SPLindex=windows sourcetype=WinEventLog:Security EventCode=4625
| stats count as failed_logins, dc(user) as targeted_users by src_ip
| where failed_logins > 20 AND targeted_users > 5
| eval attack_type = if(targeted_users > 10, "password_spray", "brute_force")
Statistical anomaly
Compare current behavior to a baseline to detect deviations. This pattern catches activity that is not inherently malicious but is abnormal for the environment.
SPLindex=proxy
| stats sum(bytes_out) as total_bytes by user
| eventstats avg(total_bytes) as avg_bytes, stdev(total_bytes) as stdev_bytes
| eval z_score = round((total_bytes - avg_bytes) / stdev_bytes, 2)
| where z_score > 3
| table user, total_bytes, avg_bytes, z_score
Sequence / correlation
Correlate events across multiple data sources or time windows to detect multi-stage attacks. These are the most powerful and most complex detection patterns.
SPL# Detect: failed logins followed by successful login from same source
index=windows sourcetype=WinEventLog:Security (EventCode=4625 OR EventCode=4624)
| stats count(eval(EventCode=4625)) as failures,
count(eval(EventCode=4624)) as successes,
latest(_time) as last_event by src_ip, user
| where failures > 5 AND successes > 0
| eval status = "brute_force_success"
| table src_ip, user, failures, successes, last_event, status
Example: Credential Dumping Detection
The following SPL detection rule targets credential dumping via LSASS process access — a technique used by tools like Mimikatz, mapped to MITRE ATT&CK T1003.001 (OS Credential Dumping: LSASS Memory).
SPLindex=windows sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" EventCode=10
TargetImage="*\\lsass.exe"
GrantedAccess IN ("0x1010", "0x1038", "0x1fffff", "0x1410", "0x143a")
NOT SourceImage IN (
"*\\csrss.exe", "*\\wininit.exe", "*\\wmiprvse.exe",
"*\\svchost.exe", "*\\MsMpEng.exe", "*\\taskhostw.exe"
)
| eval access_hex = GrantedAccess,
risk = case(
GrantedAccess="0x1fffff", "critical",
GrantedAccess="0x1038", "high",
true(), "medium"
)
| stats count as access_count,
values(GrantedAccess) as access_flags,
latest(_time) as last_seen
by ComputerName, SourceImage, SourceUser
| where access_count > 0
| table last_seen, ComputerName, SourceUser, SourceImage, access_flags, access_count
This query works through several layers:
- EventCode=10 — Sysmon Process Access events, logged when one process opens a handle to another
- TargetImage=lsass.exe — Only events where the target is the Local Security Authority Subsystem Service, which stores credentials in memory
- GrantedAccess IN (...) — Specific access masks associated with credential dumping (PROCESS_VM_READ, PROCESS_QUERY_INFORMATION, etc.)
- NOT SourceImage IN (...) — Exclusion of legitimate system processes that routinely access LSASS
- eval risk — Risk scoring based on the specific access mask (0x1fffff is PROCESS_ALL_ACCESS, the most suspicious)
- stats — Aggregation to deduplicate and summarize the activity per source process
SPL Best Practices
Filter early, transform late
The most impactful performance optimization in SPL is filtering as early as possible in the pipeline. Every event that passes through a pipe command consumes CPU and memory. Specify index, sourcetype, and key field values in the initial search to minimize the data set before any transformations.
SPL# Bad: transforms all events, then filters
index=windows | where EventCode=4625 AND src_ip!="127.0.0.1"
# Good: filters in the search, reducing data immediately
index=windows sourcetype=WinEventLog:Security EventCode=4625 src_ip!="127.0.0.1"
Use time ranges explicitly
Every Splunk search has a time range. For scheduled detection rules, set it explicitly (e.g., earliest=-15m latest=now) rather than relying on the time picker. This ensures the rule searches a consistent window regardless of how it is invoked.
Avoid wildcards at the start of values
Leading wildcards (*malicious.exe) prevent Splunk from using its index efficiently. Where possible, use field="*\\malicious.exe" with a backslash anchor rather than field="*malicious.exe" — the path separator gives the search optimizer more context.
Use stats over transaction
The transaction command is powerful but expensive — it holds all related events in memory. For most correlation and grouping tasks, stats with earliest(_time), latest(_time), values(), and dc() produces equivalent results with far better performance.
Comment your detections
SPL supports inline comments with backtick-delimited `comment("...")` or by using the search command with descriptive field names. For complex detections, add comments explaining the logic, expected false positives, and MITRE ATT&CK technique mapping.
Test with | stats count first
Before deploying a detection rule, append | stats count and run it over 7-30 days. This tells you the expected alert volume. A rule that produces 500 alerts per day is not a detection — it is noise. Tune the thresholds and filters until the volume is actionable.
SPL and Sigma
SPL is a vendor-specific language that runs only in Splunk. Sigma is a vendor-neutral YAML format that describes detection logic without any platform syntax. The two are complementary, not competing.
In practice, many detection engineering teams write their canonical rules in Sigma and convert to SPL for deployment in Splunk. This approach provides portability (the same rule can be converted to KQL for Sentinel or Lucene for Elastic), version control (YAML diffs cleanly in Git), and community leverage (thousands of SigmaHQ rules can be imported directly).
However, some SPL capabilities cannot be expressed in Sigma. Advanced subsearches, custom macros, KV store lookups, the tstats command for accelerated data models, and complex eval logic do not have Sigma equivalents. Rules that require these features must be written natively in SPL. The general guidance: use Sigma for portable detection logic, and native SPL for platform-specific enrichment, correlation, and performance optimization.
Sigma tells you what to detect. SPL tells Splunk how to detect it. A mature detection engineering program uses both — Sigma as the source of truth, SPL as the deployment artifact. See: What Are Sigma Rules?
How Threadlinqs Generates SPL
Threadlinqs Intelligence provides production-ready SPL detection rules for every threat in the platform. Each rule is written by analysts who validate the detection logic against real-world telemetry, with proper field mappings for common Splunk configurations (Sysmon, Windows Security Events, CIM-compliant data models).
The platform currently delivers over 1,800 detection rules across SPL, KQL, and Sigma formats. Each SPL rule includes the index and sourcetype specifications, field-level filtering, false positive exclusions, and a final table command that formats the output for analyst review. Rules are mapped to specific MITRE ATT&CK techniques, severity levels, and the threat intelligence report that motivated the detection.
For Splunk shops, SPL rules from Threadlinqs can be copied directly into correlation searches or saved searches. The MCP server and API enable automated retrieval — pull SPL rules by threat ID, MITRE technique, or severity and programmatically deploy them into your Splunk environment, keeping your detection library current as new threats emerge.
Frequently Asked Questions
What is SPL in Splunk?
SPL (Search Processing Language) is Splunk's query language for searching, filtering, transforming, and visualizing machine data. It uses a pipe-based syntax where commands chain together — the output of one command becomes the input of the next, similar to Unix shell pipelines. SPL is used for everything from ad-hoc log searches to production detection rules, dashboards, and scheduled alerts.
Is SPL similar to SQL?
SPL and SQL share conceptual similarities — both query data, filter with conditions, and aggregate results. However, SPL is pipe-based (commands chain left-to-right with the | operator) while SQL is declarative (SELECT/FROM/WHERE blocks). SPL is optimized for time-series log data with implicit time ranges and schema-on-read, while SQL targets relational tables with predefined schemas. SPL handles semi-structured and unstructured data natively, which SQL does not.
What are the most important SPL commands for security?
The essential commands are: search (filtering raw events), where (boolean filtering on fields), stats (aggregation for threshold-based detections), eval (field calculation and conditional logic), rex (regex field extraction), lookup (enrichment from external tables), table (output formatting), and eventstats/streamstats for contextual enrichment without reducing the result set.
How do I convert a Sigma rule to SPL?
Use pySigma with the Splunk backend: install with pip install pySigma pySigma-backend-splunk, then run sigma convert -t splunk -p sysmon rule.yml. The -p flag specifies a processing pipeline that maps generic Sigma field names to your Splunk field names and sourcetypes. Threadlinqs provides pre-converted SPL alongside Sigma and KQL for every detection rule in its library.