VNX-PY-015 – Python ReDoS via User-Controlled Regex
Overview
This rule detects Python code that passes user-controlled input (such as Flask/Django request parameters) directly into regular expression functions like re.compile(), re.match(), re.search(), or re.fullmatch(). When an attacker can control the regex pattern, they can craft a pathological expression that causes catastrophic backtracking, consuming CPU and memory until the application becomes unresponsive.
Severity: High | CWE: CWE-1333 – Inefficient Regular Expression Complexity
Why This Matters
Regular Expression Denial of Service (ReDoS) is a practical, low-effort attack. A single HTTP request with a carefully crafted regex pattern can:
- Pin a CPU core at 100% for seconds, minutes, or longer
- Block the event loop or worker thread handling the request, causing cascading timeouts
- Bring down an entire application if enough malicious requests arrive in parallel
- Bypass rate limiters, since the attack payload is tiny but the compute cost is enormous
Python’s re module uses a backtracking NFA engine, which is inherently vulnerable to patterns with nested quantifiers or ambiguous alternations. Even patterns that look harmless — like (a+)+b — can exhibit exponential backtracking.
What Gets Flagged
This rule flags lines where a request parameter is passed directly to a re function:
# Flagged: user-controlled regex pattern from Flask request
pattern = re.compile(request.args.get("q"))
# Flagged: search with request form data
result = re.search(request.form["pattern"], text)
# Flagged: match with request data
if re.match(request.data, some_string):
...
# Flagged: fullmatch with JSON body
re.fullmatch(request.json["regex"], input_text)
The rule applies only to .py files.
Remediation
Never let users supply raw regex patterns. If you need user-driven search, use literal string matching (
str.find(),str.count(), orfnmatchfor glob patterns) instead of regex:# Safe: literal substring search results = [item for item in items if query in item.lower()]Use the
google-re2library for linear-time matching. The RE2 engine guarantees O(n) execution regardless of pattern complexity, eliminating catastrophic backtracking entirely:import re2 # Safe: RE2 guarantees linear-time matching pattern = re2.compile(user_input) result = pattern.search(text)Install with:
pip install google-re2If you must use Python’s
remodule, validate and constrain the pattern. Escape user input withre.escape()if it should be treated as a literal:import re # Safe: escape treats the input as a literal string, not a pattern safe_pattern = re.escape(request.args.get("q", "")) results = re.findall(safe_pattern, text)Set a timeout for regex operations. Python 3.11+ supports the
timeoutparameter:import re try: result = re.search(fixed_pattern, text, timeout=1.0) except re.error: # Pattern evaluation exceeded timeout abort(400, "Search pattern too complex")Reject patterns that contain dangerous constructs. If users must supply patterns, reject those with nested quantifiers like
(a+)+,(a*)*, or(a|b)*before compilation.
References
- CWE-1333: Inefficient Regular Expression Complexity
- OWASP: Regular Expression Denial of Service (ReDoS)
- CAPEC-197: Exponential Data Expansion
- MITRE ATT&CK T1499.004 – Application or System Exploitation
- google-re2 Python Library
- Python re Module Documentation
- OWASP ASVS V5 – Validation, Sanitization, and Encoding