Current Version: 1.0
This document describes what we're building, in what order, and why.
Given a SecID string, return the URL(s) where that resource can be found.
This is the simplest useful thing SecID can do, and it's the foundation everything else builds on.
secid:advisory/mitre.org/cve#CVE-2024-1234
→ https://www.cve.org/CVERecord?id=CVE-2024-1234
secid:weakness/mitre.org/cwe#CWE-79
→ https://cwe.mitre.org/data/definitions/79.html
secid:control/nist.gov/800-53@r5#AC-1
→ https://csrc.nist.gov/projects/cprt/catalog#/cprt/framework/version/SP_800_53_5_1_1/home?element=AC-1
URL resolution delivers immediate value with minimal complexity:
- Useful on day one - People can start using SecIDs to link to security resources
- Tests the registry - Every namespace must define resolution rules, validating the data model
- Foundation for everything else - Relationships, overlays, and applications all need resolution
- Clear success criteria - Either the URL works or it doesn't
Simple case (most namespaces): String substitution. The registry file contains a URL template:
# registry/advisory/org/mitre.md (cve source)
urls:
lookup: "https://www.cve.org/CVERecord?id={id}"Resolution: extract CVE-2024-1234 from the subpath, substitute into template.
Complex case (no direct URL): Some resources don't have predictable URLs. For these, we provide search instructions that humans and AI agents can follow:
# Example: a resource without direct linking
resolution:
type: search
instructions: "Search the vendor's security portal for the advisory ID"
search_url: "https://example.com/security/search?q={id}"| Priority | Deliverable | Why This Order |
|---|---|---|
| 1 | Registry data | Foundation - libraries need data to resolve against |
| 2 | Python library | Security community standard; threat intel, SIEM, AI/ML pipelines |
| 3 | npm/TypeScript library | Web applications, CI/CD integrations, broad developer reach |
| 4 | REST API | Unlocks every other language without waiting for native libraries |
| 5 | Go library | Cloud-native security tools (Trivy, Grype, Falco), Kubernetes ecosystem |
| 6 | Rust library | Memory-safe systems tools, growing security tooling adoption |
| 7 | Java library | Enterprise SAST/DAST tools, legacy integration |
| 8 | C#/.NET library | Windows/enterprise ecosystem |
Registry first because everything depends on it. A library without data is useless.
Python second because the security community runs on Python. Threat intelligence platforms, SIEM integrations, vulnerability scanners, AI/ML pipelines - Python is the lingua franca.
npm/TypeScript third because it covers web applications and has the broadest developer reach. Security dashboards, CI/CD integrations, and developer tools often use JavaScript/TypeScript.
REST API fourth because it's a force multiplier. Once the API exists, any language can consume SecID - Ruby, PHP, shell scripts, anything that can make HTTP requests. This reduces pressure to ship every native library immediately.
Go fifth because cloud-native security infrastructure runs on Go. Tools like Trivy, Grype, and Falco would benefit from native SecID support, and Go is common for CLI tools and microservices.
Rust, Java, C#/.NET later because their communities can use the REST API until native libraries ship. These are important for completeness but not blockers for adoption.
A SecID isn't just an identifier - it's a handle that gives you everything you need to understand and work with security knowledge.
Today, security data is fragmented. CVEs live in one place, CWEs in another, controls in spreadsheets, regulations in PDFs. Finding information requires knowing where to look. Understanding it requires domain expertise. Connecting it requires manual effort.
SecID changes this. When you have a SecID, you can:
- Find it - Get the URL or search instructions
- Understand it - Read a description of what it is
- Read it - Get the actual content (where licensing permits)
- Interpret it - Understand what the fields mean
- Use it - Know what to do with this data
- Connect it - See related concepts, mitigations, and examples
This is AI-first infrastructure - but not AI-only. The primary consumer is AI agents that need to navigate security knowledge autonomously. When an agent receives a SecID response, it should be self-describing - the agent knows what it has, how to interpret it, and what to do with it.
Traditional tools are first-class consumers too. SecID identifiers work in:
- SIEMs and SOC platforms - Correlate alerts across vulnerability, weakness, and technique taxonomies
- GRC tools - Map controls to regulations to compliance evidence
- Vulnerability scanners - Link findings to weaknesses, techniques, and remediations
- SBOMs and VEX documents - Reference advisories with consistent identifiers
- Asset inventories - Tag systems with applicable controls and regulations
- Policy automation - Define rules that reference specific controls or requirements
AI agents accelerate adoption because they can consume SecID immediately without organizational buy-in. But the long-term value is infrastructure that humans, traditional tools, and AI all use together.
We're building this in layers:
- v1.0: URL resolution + descriptions (where to find it, what it is)
- v1.x: Raw content with licensing (the actual text, properly attributed)
- v2.x: Metadata wrapper (interpretation and usage guidance for AI)
- Future: Relationships and overlays (connections and enrichment)
SecID isn't just a spec - it's a complete system for working with security knowledge. We're building in two parallel tracks:
CONTENT TRACK DATA LAYERS
(what you get back) (connections & context)
┌─────────────────────────────────────┐ ┌─────────────────────────────────┐
│ Normalized Content (future) │ │ Overlays (future) │
│ - JSON container with schema │ │ - Quality flags │
│ - Interpretation guidance │ │ - Cross-references │
│ - Usage instructions for AI │ │ - Organizational context │
├─────────────────────────────────────┤ ├─────────────────────────────────┤
│ Raw Content (future) │ │ Relationships (future) │
│ - Actual control/weakness text │ │ - CVE ↔ CWE ↔ ATT&CK │
│ - License information │ │ - Control → Weakness │
│ - Source attribution │ │ - Technique → Mitigation │
├─────────────────────────────────────┤ └─────────────────────────────────┘
│ Description (v1.0) │ ↑
│ - What this thing is │ │ Independent tracks
│ - Human/AI readable summary │ │ (can develop in parallel)
├─────────────────────────────────────┤ │
│ URL Resolution (v1.0) │ ← WE ARE HERE
│ - Where to find it │
│ - Search instructions if no URL │
├─────────────────────────────────────┤
│ Registry (v1.0) │ ← WE ARE HERE
│ - Namespace definitions │
│ - Resolution rules │
│ - ID patterns and examples │
├─────────────────────────────────────┤
│ Specification (complete) │
│ - Identifier format │
│ - Type definitions │
│ - Naming conventions │
└─────────────────────────────────────┘
A SecID isn't just an identifier - it's a handle that gives you everything you need to understand and work with that security concept. When an AI agent receives a SecID response, it should be able to:
- Find it - URL or search instructions
- Understand it - Description of what it is
- Read it - Actual content (where licensing permits)
- Interpret it - Schema, guidance on what fields mean
- Use it - Instructions on what to do with this data
- Connect it - Related concepts, mitigations, examples
Example future response:
{
"secid": "secid:control/cloudsecurityalliance.org/ccm@4.0#IAM-12",
"urls": {
"lookup": "https://cloudsecurityalliance.org/artifacts/cloud-controls-matrix-v4",
"api": "https://api.secid.dev/v1/control/cloudsecurityalliance.org/ccm/IAM-12"
},
"description": "Identity & Access Management control requiring multi-factor authentication for all interactive access to cloud services.",
"content": {
"raw": {
"title": "IAM-12: Multi-Factor Authentication",
"control_text": "Multi-factor authentication shall be implemented for all interactive access...",
"implementation_guidance": "...",
"audit_guidance": "..."
},
"license": "CC BY-NC-SA 4.0",
"attribution": "Cloud Security Alliance",
"retrieved": "2024-01-15"
},
"relationships": {
"mitigates": ["secid:weakness/mitre.org/cwe#CWE-308", "secid:weakness/mitre.org/cwe#CWE-287"],
"related_controls": ["secid:control/nist.gov/800-53@r5#IA-2"],
"attacked_by": ["secid:ttp/mitre.org/attack#T1078"]
},
"meta": {
"schema": "https://secid.dev/schemas/control/v1",
"interpretation": "This is a technical control requiring MFA. The 'control_text' field contains the normative requirement. Check 'implementation_guidance' for how to implement, 'audit_guidance' for how to verify compliance.",
"usage": "Use this to verify MFA requirements in cloud environments. Compare against your current authentication configuration.",
"spec": "https://secid.dev/spec",
"api_docs": "https://secid.dev/api"
}
}This response is self-describing - an AI receiving it knows what it has, how to interpret it, and what to do with it. The raw content stays raw; we add context through metadata, not transformation.
Return where to find it and what it is:
{
"secid": "secid:control/cloudsecurityalliance.org/ccm@4.0#IAM-12",
"urls": { "lookup": "..." },
"description": "Identity & Access Management control requiring multi-factor authentication..."
}Add actual content where licensing permits:
{
"content": {
"raw": { "title": "...", "control_text": "...", "guidance": "..." },
"license": "CC BY-NC-SA 4.0",
"attribution": "Cloud Security Alliance"
}
}Why this matters: Some sources are hard to access programmatically:
- CSA CCM/AICM are in spreadsheets
- ISO standards are behind paywalls
- Vendor advisories require authentication
- Data is buried in HTML tables or nested pages
We respect licensing - include license info, proper attribution, and only redistribute what's permitted.
Wrap raw content in a JSON container with interpretation and usage guidance:
{
"content": {
"raw": { "title": "...", "control_text": "...", "guidance": "..." },
"license": "CC BY-NC-SA 4.0",
"attribution": "Cloud Security Alliance"
},
"meta": {
"schema": "https://secid.dev/schemas/control/v1",
"interpretation": "This is a technical control requiring MFA. The 'control_text' field contains the normative requirement, 'guidance' contains implementation suggestions.",
"usage": "Use this to verify MFA requirements in cloud environments. Compare against your current authentication configuration.",
"spec": "https://secid.dev/spec",
"api_docs": "https://secid.dev/api"
}
}Why this matters: Raw data alone isn't enough for AI agents. They need:
- Schema link to understand structure
- Interpretation guidance for what fields mean
- Usage instructions for what to do with the data
- The content stays raw - we're adding context, not transforming it
Connect SecIDs to each other: CVE → CWE weakness, weakness → control mitigation, technique → weakness exploit.
Why independent? Relationship design benefits from real-world usage. We can ship content before relationships are fully designed.
See RELATIONSHIPS.md for exploratory thinking.
Add metadata without modifying sources: cross-references, quality flags, severity adjustments, organizational context.
Why independent? Same reason - usage will inform design. Overlays can be added to any response once the infrastructure exists.
See OVERLAYS.md for exploratory thinking.
The initial seeding serves multiple purposes:
-
Stress test the spec: Do our naming conventions hold up? Are there edge cases we missed?
-
Learn the landscape: What databases exist? How do they relate? What's the coverage?
-
Build the graph: Relationships need entities on both ends. More entities = richer graph.
-
Demonstrate value: A spec with 10 examples is theoretical. A spec with 1000 entities is useful.
-
Attract contributors: People contribute to living projects, not empty frameworks.
Phase 1: Core Security Infrastructure (50-100 entities)
The foundations everything else references:
| Category | Examples | Why First |
|---|---|---|
| Vuln databases | CVE, NVD, GHSA, OSV, CNVD, EUVD | Core references |
| Weakness taxonomies | CWE, OWASP Top 10 | Vulnerability classification |
| Attack frameworks | ATT&CK, ATLAS, CAPEC | Threat modeling |
| Scoring systems | CVSS, EPSS | Severity/priority |
| Organizations | MITRE, NIST, FIRST, OWASP | Governance/authority |
Status: Largely complete in current files
Phase 2: AI/ML Security Ecosystem (100-200 entities)
Deep coverage of AI security landscape:
| Category | Examples | Why |
|---|---|---|
| AI vendors | OpenAI, Anthropic, Google, Meta | Products to track |
| AI products | GPT-4, Claude, Gemini, Llama | Vulnerability targets |
| AI frameworks | LangChain, LlamaIndex, AutoGPT | Supply chain |
| AI security tools | Garak, PyRIT, Promptfoo | Testing ecosystem |
| AI standards | NIST AI RMF, ISO 42001 | Compliance landscape |
| AI research | Adversarial ML papers, jailbreak repos | Knowledge sources |
Why prioritize AI? This is our eventual differentiator. Deep AI coverage establishes expertise.
Phase 3: Vendor Security Programs (200-500 entities)
Major vendors and their security infrastructure:
| Category | Examples | Why |
|---|---|---|
| Vendor PSIRTs | Microsoft, Google, Red Hat, Cisco | Advisory sources |
| Bug bounty programs | HackerOne, Bugcrowd hosted programs | Disclosure channels |
| Vendor advisories | MSRC, RHSA, DSA | Enrichment sources |
| Cloud security | AWS Security Hub, Azure Defender | Platform-specific |
Why vendors? Vendor advisories are a massive source of vulnerability data that often has richer context than NVD.
Phase 4: Broader Security Ecosystem (500-1000+ entities)
Long tail of security knowledge:
| Category | Examples | Why |
|---|---|---|
| Security tools | Nmap, Metasploit, Burp Suite | Referenced in vulns |
| Security standards | PCI-DSS, HIPAA, SOC 2 | Compliance mapping |
| Threat intel | MISP, OpenCTI, threat feeds | Future: threat intelligence |
| Research groups | Google P0, Microsoft MSTIC | Attribution |
| Conferences | DEF CON, Black Hat, RSA | Community nodes |
The act of adding entities teaches us:
Naming edge cases:
- What about
AT&T? →att(remove special chars) - What about
CERT/CCvsUS-CERT? → Need aliasing strategy - What about acquired companies? → Historical entities need tracking
Relationship patterns:
- Most vulns have CWE mappings... AI vulns are newer and still being classified
- GHSA cross-references CVE... except for ecosystem-specific issues
- Multiple sources may provide different severity assessments... need reconciliation tracking
Coverage status:
- CWE has 4 AI-specific entries (e.g., CWE-1427 for prompt injection), gaps remain
- ATT&CK and ATLAS continue expanding
- AI security taxonomies are still maturing
Data quality observations:
- Processing backlogs can delay enrichment data
- Cross-references between databases occasionally need correction
- Different sources may assess severity differently
This learning feeds back into spec refinement and overlay priorities.
| Deliverable | Status | Notes |
|---|---|---|
| Specification (SPEC.md) | Complete | Open for public comment |
| Registry structure | Complete | 700+ namespace definitions (YAML + JSON) |
| Type documentation | Complete | All 10 types documented |
| Design documentation | Complete | RATIONALE, DESIGN-DECISIONS, STRATEGY |
| Namespace documentation | Complete | _index.md files for advisory namespaces |
| Deliverable | Status | Success Criteria |
|---|---|---|
| Registry data (500+ namespaces) | Done (700+ namespaces) | Every namespace has URL resolution rules + description |
| Format metadata | Done | parsability, schema, parsing_instructions, auth on URL objects. Schemas as reference entries. Parsing instruction docs in docs/parsers/. API supports ?parsability=structured filtering. |
| REST API + MCP server | Live | secid.cloudsecurityalliance.org — MCP server shipped first, REST API followed |
| Compliance test suite | Not started | Canonical test cases built during API development; doubles as conformance spec for third-party implementations |
Python library (secid) |
Not started | pip install secid enables parsing and resolution |
npm/TypeScript library (secid) |
Not started | npm install secid enables parsing and resolution |
| Go library | Not started | Native Go support for cloud-native tools |
| Rust library | Not started | Native Rust support for systems tools |
| Java library | Not started | Native Java support for enterprise tools |
| C#/.NET library | Not started | Native .NET support for Windows ecosystem |
Claude Code skills support the registry workflow. Skills are built incrementally during API development, not as standalone deliverables — each new namespace, conversion, or test case teaches the skills what they need to cover.
| Skill | Purpose | Status |
|---|---|---|
| Registry Research | Research sources, create/update .md registry files, determine resolution strategy | Active |
| Registry Formalization | Convert .md to .json, validate against JSON Schema, ensure cross-format consistency | Active |
| Registry Validation | Validate registry entries against the JSON schema and naming conventions | Active — first non-stub skill |
| Compliance Testing | Run canonical test suite against resolver implementations, diagnose failures | Stub (accumulates test cases as edge cases are discovered) |
| SecID User | Consuming SecID as an end user via the live service | Active (SecID-Service is live) |
Registry quality depends on validation. Our approach uses AI as a first-class participant in the validation process.
The workflow:
-
Goal discovery - Given a SecID like
secid:advisory/redhat.com/errata#RHSA-2024:1234, ask AI: "What would you typically want to do with this?" The most likely answer: "Find the URL for this RHSA." -
Codify the goal - That answer becomes the success criterion: resolution must produce a working URL.
-
Add resolution rules - Create/update the registry entry with URL templates and patterns.
-
Verify it works - AI tests the resolution against real identifiers, confirms URLs resolve.
-
Iterate - If edge cases fail, refine the rules.
Why AI-assisted?
- Scale: 500+ namespaces can't be manually validated continuously
- Consistency: AI applies the same verification logic everywhere
- Discovery: AI can identify what users would expect before we build it
- Maintenance: AI can detect URL rot and resolution failures over time
This isn't "AI does everything" - it's AI as a team member that handles the tedious verification work that humans would skip or do inconsistently.
| Deliverable | Status | Success Criteria |
|---|---|---|
| Content ingestion (CSA CCM/AICM) | Planned | Spreadsheet data extracted, licensed properly |
| Content ingestion (NIST 800-53) | Planned | Control text available via API |
| Content ingestion (CWE/ATT&CK) | Planned | Weakness/technique descriptions included |
| License tracking | Planned | Every content response includes license + attribution |
| API content endpoints | Planned | ?include=content returns raw text |
| Deliverable | Status | Success Criteria |
|---|---|---|
| JSON schemas for each type | Planned | Documented, versioned schemas for controls, weaknesses, etc. |
| Metadata wrapper | Planned | Raw content wrapped with interpretation + usage guidance |
| Relationship layer | Planned | Connect CVE↔CWE↔ATT&CK, enable graph queries |
| Overlay layer | Planned | Quality flags, cross-references, organizational context |
| Deliverable | Depends On | Value |
|---|---|---|
| Web interface | REST API | Browse and search security knowledge visually |
| AI-powered assistant | All of the above | Natural language queries over security knowledge |
| Knowledge graph UI | Relationships | Visualize connections between security concepts |
SecID is designed as a federated ecosystem with multiple independent components:
| Component | What It Is | Can Be Multiple? |
|---|---|---|
| SecID Standard | The identifier specification (secid:type/namespace/name#subpath) |
One canonical spec, versioned |
| SecID Registries | Namespace definitions, resolution rules | Yes - private registries, organizational overlays |
| Relationship Databases | Connections between identifiers | Yes - different sources, perspectives |
| Enrichment Databases | Metadata, annotations, context | Yes - organizational data, private enrichments |
| SecID APIs | Services that resolve and query | Yes - different providers, implementations |
Federation means: Organizations can run their own registries, databases, and APIs that overlay or extend the canonical data. A company might maintain private namespace definitions, internal relationship mappings, or proprietary enrichments - all compatible with the public ecosystem.
SecID identifiers are for structured security knowledge with defined namespaces. Arbitrary URLs are explicitly NOT part of the identifier specification (no secid:url/... type). However, APIs and databases can support URL queries:
| Component | SecID Identifiers | Arbitrary URLs |
|---|---|---|
| SecID Standard | ✅ Defines these | ❌ Explicitly excluded |
| SecID Registry | ✅ Contains these | ❌ Not applicable |
| Our API | ✅ Must support | ✅ Probably will support |
| Our Relationship DB | ✅ Must include | ✅ Probably will include |
| Our Enrichment DB | ✅ Must include | ✅ Probably will include |
Why this separation? URLs are already globally unique identifiers - wrapping them in secid:url/... adds complexity without value. But APIs and databases can accept URLs as query inputs and store relationships/enrichments for arbitrary web content. This keeps the spec clean while enabling practical use cases like "what do we know about this Stack Overflow answer?"
See SPEC.md Section 1.3 for the full rationale.
Our goal is to make SecID as easy to consume as possible. We're building:
| Repository | Purpose | Status |
|---|---|---|
| SecID (this repo) | Spec, registry, operations docs | Active |
| SecID-Service | Hosted API + MCP server | Live |
| SecID-Website | Documentation and registry browser | Planned |
| SecID-Client-SDK | Client libraries + AI instructions | Planned |
Reference client libraries and AI-consumable instructions:
- Python (
pip install secid) and npm/TypeScript (npm install secid) for SEO and discoverability - AI instructions for generating clients in any language
- Test fixtures extracted from the registry
We support the llms.txt standard for AI-friendly content discovery. The website provides /llms.txt with structured links to key resources, enabling AI agents to efficiently understand SecID.
See INFRASTRUCTURE.md for technical details on hosting and architecture.
| Indicator | How We'll Know |
|---|---|
| Resolution works | Given any registered SecID, we return a working URL |
| Libraries are usable | pip install secid and npm install secid work out of the box |
| Coverage is comprehensive | Major advisory sources, weakness taxonomies, and control frameworks covered |
| Community adoption | External projects start using SecID identifiers |
| Indicator | Meaning |
|---|---|
| Naming conventions stable | No major spec changes needed after seeding |
| Edge cases documented | Spec handles exceptions gracefully |
| Resolution rules tested | URL templates produce valid, working links |
Things we'll learn as we build v1.0:
- Resolution edge cases: What happens when a vendor changes their URL structure?
- Deprecation: How do we handle databases that shut down or get acquired?
- Search fallback: When direct URLs aren't possible, what search instructions work best for AI agents?
- Update frequency: How often do registry files need refresh?
- Library scope: Should libraries include validation, or just parsing and resolution?
These will be answered empirically, not theoretically.