Catalog Format
All AIBOM detection is driven by a declarative catalog so it can be maintained over time without code changes. The builtin catalog is embedded in the binary (internal/aibom/catalog/*.json). You can extend or replace it at runtime:
vulnetix aibom --catalog ./my-rules.json # merge over the builtin (override by id)
vulnetix aibom --catalog ./only.json --no-builtin-catalog # replace entirely
A catalog file is JSON with any of three top-level arrays: tools, libraries, model_families.
Tool entry
{
"id": "cursor", // unique id (override key)
"name": "Cursor",
"vendor": "Anysphere",
"type": "cli-agent | ide | ide-extension | service | convention",
"homepage": "https://cursor.com",
"env": ["CURSOR_*"], // env var NAMES (exact, or globs with *). Values are never read.
"paths": { // category -> repo-relative path globs (* , ** , ? supported)
"config": [".cursor/**"],
"instructions": [".cursorrules"],
"ignore": [".cursorignore"],
"skills": [".cursor/skills/**"]
// also: agents, commands, hooks, plugins, steering, memory, prompts, marketplace
},
"model_config_extractors": [ // optional: pull model names from this tool's config files
{"file_glob": ".cursor/config.json", "json_key": "model"}
// or: {"file_glob": "...", "pattern": "regex-with-one-capture-group"}
],
"commit_patterns": [ // optional: identify commits authored by this agent
"(?i)co-authored-by:\\s*cursor\\b" // matched against author/committer identity + message
]
}
A tool needs at least one primary (tool-specific) path, env/home, or commit hit to be reported. Cross-tool convention files (AGENTS.md, .mcp.json) are surfaced only through dedicated type: "convention" entries, so a single shared file does not light up every tool.
Library entry
{
"id": "openai-python",
"name": "openai",
"provider": "OpenAI",
"languages": ["python"], // canonical ecosystem keys (javascript covers ts/js)
"purl_names": {"pypi": "openai"}, // ecosystem -> package name (for the component purl)
"import_patterns": ["(?m)^\\s*(?:from|import)\\s+openai\\b"], // confirm the SDK is used
"model_extractors": [
{"param": "model", "pattern": "\\bmodel\\s*=\\s*[\"']([^\"']+)[\"']", "task": "chat"}
]
}
Every model_extractors[].pattern must have exactly one capture group — the model-name literal. Anchoring on the parameter (not the value) is what makes unknown / future model names detectable.
Model family
Family hints map a model-name prefix to a provider/family. They only enrich confidence — an unknown literal is still emitted.
{"prefix_regex": "^claude-", "provider": "Anthropic", "family": "Claude"}
All patterns are Go RE2 (no backreferences/lookaround). The catalog is validated (every pattern compiles, every extractor has a capture group) at load time and by just gen-aibom.