Companion PR
Motivation
The Model Catalog currently serves ML model metadata — discovery, filtering, and source aggregation for models across multiple providers. As the AI ecosystem grows, users need to manage a broader range of assets beyond models: MCP servers, datasets, prompt templates, AI agents, evaluation benchmarks, and more.
Rather than building separate services for each asset type, this proposal evolves the catalog into a generic, extensible platform where each asset type is a self-contained plugin. The architecture must satisfy two constraints:
- Zero breaking changes for existing Model Catalog consumers — all current API paths, schemas, and behaviors are preserved.
- Minimal effort to add new catalog types — adding a new AI asset should require defining a schema and implementing data providers, not rebuilding infrastructure.
Architecture Overview
The system is built around a unified catalog server that orchestrates multiple catalog plugins within a single process.
┌──────────────────────────────────────────────────────────────┐
│ catalog-server │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Model │ │ MCP │ │ Dataset │ ... │
│ │ Plugin │ │ Plugin │ │ Plugin │ │
│ │ │ │ │ │ │ │
│ │ /api/model_ │ │ /api/mcp_ │ │ /api/data_ │ │
│ │ catalog/v1 │ │ catalog/v1 │ │ catalog/v1 │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ ┌──────┴────────────────┴────────────────┴──────┐ │
│ │ Shared Database (GORM) │ │
│ │ SQLite / MySQL / PostgreSQL │ │
│ └───────────────────────────────────────────────┘ │
│ │
│ Health: /healthz, /readyz Plugins: /api/plugins │
└──────────────────────────────────────────────────────────────┘
Each plugin is an independent catalog type with its own API routes, database tables, data providers, and OpenAPI specification. Plugins share the database connection, configuration system, and HTTP server infrastructure.
Plugin Lifecycle
Plugins follow a well-defined lifecycle managed by the server:
Compile-time Runtime
──────────── ───────
┌───────────────┐
import _ ──── init() ──── Register() │ Load │
│ sources.yaml │
└──────┬────────┘
│
┌──────▼────────┐
│ Init() │ ← Per-plugin config
└──────┬────────┘
│
┌──────▼────────┐
│ Migrations() │ ← Schema setup
└──────┬────────┘
│
┌──────▼────────┐
│ RegisterRoutes│ ← Mount HTTP API
└──────┬────────┘
│
┌──────▼────────┐
│ Start() │ ← Hot-reload, watchers
└──────┬────────┘
│
┌──────▼────────┐
│ Healthy() │ ← Readiness probes
└───────────────┘
Plugins register at compile-time via Go init() functions — a blank import in the server's main package is all that's needed. At runtime, the server reads sources.yaml to determine which plugins have configuration, initializes them, applies their database migrations, mounts their routes, and starts background operations.
Plugin Interface
Every plugin implements a core interface with lifecycle methods: initialization, route registration, database migrations, health checks, and graceful shutdown. Two optional interfaces allow plugins to customize their API base path and their configuration key in sources.yaml, enabling backward-compatible naming.
Runtime Configuration
Plugins are configured via a unified sources.yaml file that maps plugin names to their data sources:
apiVersion: catalog.kubeflow.org/v1alpha1
kind: CatalogSources
catalogs:
models:
sources:
- id: "huggingface"
type: "yaml"
properties:
yamlCatalogPath: "./data/models.yaml"
mcp:
sources:
- id: "internal-servers"
type: "yaml"
properties:
yamlCatalogPath: "./data/mcp-servers.yaml"
Each plugin reads only its own section. Source definitions support include/exclude glob patterns for filtering, multiple provider types (YAML, HTTP, etc.), and per-source enable/disable toggles.
The Model Catalog as a Plugin
The existing Model Catalog is wrapped into a plugin with no API changes. All current paths are preserved:
GET /api/model_catalog/v1alpha1/models
GET /api/model_catalog/v1alpha1/sources
GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}
GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}/artifacts
The legacy catalog infrastructure — loader, providers, database service — is reused as-is inside the plugin. No client-side changes are required. The plugin simply wraps the existing code behind the CatalogPlugin interface, delegating lifecycle events to the existing initialization and shutdown logic.
An optional interface allows the plugin named "model" to read from the "models" section in sources.yaml, maintaining backward compatibility with existing configuration files.
Generic Catalog Framework
A shared framework in pkg/catalog/ provides type-parameterized building blocks that any plugin can use:
- Loader — Generic data loader that fetches entities and artifacts from multiple sources concurrently, persists them via callbacks, and supports hot-reload with file watching.
- Provider Registry — Typed provider system where each provider type (YAML, HTTP, etc.) registers a function that knows how to fetch data from a source.
- Source Configuration — Shared source definition with include/exclude glob patterns, property bags, and enable/disable state.
- Filter Engine — SQL-like
filterQuery parameter available on all list endpoints, supporting comparison operators (=, !=, >, <, >=, <=), pattern matching (LIKE, ILIKE), set membership (IN), and logical combinators (AND, OR).
- Pagination — Consistent page token-based pagination and ordering via
BaseResourceList response envelope.
These building blocks ensure that every catalog type gets the same query capabilities, pagination behavior, and data loading patterns without reimplementing them.
Plugin API Schema Strategy
Each plugin owns its OpenAPI specification, and a merge process produces a unified spec for documentation and validation.
Schema Ownership
api/openapi/
├── src/lib/common.yaml ← Shared schemas (BaseResource, MetadataValue, etc.)
├── src/catalog.yaml ← Main catalog spec (model paths)
└── catalog-spec.yaml ← Merged unified spec (all plugins)
catalog/plugins/mcp/
└── api/openapi/
├── src/
│ ├── lib/ → <repo-root>/api/openapi/src/lib/ ← Symlink to shared schemas
│ ├── generated/components.yaml ← Generated entity schemas
│ └── openapi.yaml ← Plugin paths and operations
└── openapi.yaml ← Merged plugin spec
Shared schemas like BaseResource, BaseResourceList, and MetadataValue live in a central common.yaml. Plugin schemas reference them via a symlink, ensuring a single source of truth. Entity schemas use allOf composition with BaseResource, inheriting standard fields (id, name, description, timestamps, custom properties) without duplication.
Merge Process
A merge script combines all plugin specs into a single unified OpenAPI document:
- Auto-discover plugin specs under
catalog/plugins/*/api/openapi/openapi.yaml
- Prefix plugin-specific schemas to avoid name collisions (e.g.,
McpServer becomes Mcp_McpServer); common schemas are excluded from prefixing
- Absolutize paths using each plugin's server base URL (e.g.,
/mcpservers becomes /api/mcp_catalog/v1alpha1/mcpservers)
- Prefix operation IDs for uniqueness
- Resolve external references to common schemas into local
#/components/schemas/ references
- Deep-merge all plugin specs with the main catalog spec
The result is a single catalog-spec.yaml that can be used for documentation, client generation, and CI validation. A --check mode verifies the committed spec matches the generated output.
Breaking Change Avoidance
- The main catalog API paths and schemas remain untouched — the model catalog's API surface is identical before and after the refactor
- New plugins add paths under their own base URL (e.g.,
/api/mcp_catalog/v1alpha1/...), never modifying existing paths
- Shared schemas are additive — new fields can be added to
BaseResource without breaking existing consumers
- The merge process is purely additive: it unions paths and schemas from all plugins
catalog-gen — Deterministic Scaffolding
catalog-gen is a CLI tool for scaffolding new catalog plugins. It follows the same philosophy as kubebuilder for Kubernetes: define your schema declaratively, generate the boilerplate, focus on business logic.
Commands
| Command |
Purpose |
catalog-gen init <name> --entity=<Entity> --package=<pkg> |
Scaffold a complete plugin |
catalog-gen generate |
Regenerate non-editable files from catalog.yaml |
catalog-gen add-property <name> <type> |
Add a property to the entity schema |
catalog-gen add-artifact <name> |
Add an artifact type |
catalog-gen add-artifact-property <artifact> <name> <type> |
Add a property to an artifact |
catalog-gen gen-testdata |
Generate sample test data |
What Gets Generated
From a single catalog.yaml definition, the tool generates:
- Plugin lifecycle —
plugin.go and register.go implementing the CatalogPlugin interface
- Entity and artifact models — Go structs with GORM tags for database persistence
- Database schema — Datastore specification and migration setup
- Repositories — Type-safe database access layer
- OpenAPI specification — Entity schemas with
allOf composition against BaseResource, list/get operations, filtering and pagination parameters
- Data providers — YAML file provider (with HTTP provider template available)
- Filter mappings — Field-to-database-column mappings enabling
filterQuery on all list endpoints
- Makefile — Build, test, and OpenAPI code generation targets
- Agentic workflows — Claude Code commands and skills for post-boilerplate development
Deterministic Output
The generation is fully deterministic: the same catalog.yaml always produces the same output. Files are split into two categories:
- Non-editable (regenerated on every
catalog-gen generate): models, repositories, OpenAPI components, filter mappings, loader, plugin registration
- Editable (created once during
init, never overwritten): service implementations, providers, Makefile, OpenAPI main spec
This separation means developers can safely re-run generation after schema changes without losing their custom business logic.
Agentic Workflows for Post-Boilerplate Steps
Code generation handles the deterministic parts — schemas, models, database access, API specs. But some steps require judgment: implementing business logic, wiring up custom providers, writing tests. These are handled by agentic workflows generated alongside the plugin code.
Generated AI-Assistant Integration
Each plugin gets a .claude/ directory with:
- Slash commands (
/add-property, /add-artifact, /regenerate, /fix-build, /gen-testdata) — quick actions that modify catalog.yaml and regenerate code
- Skills — detailed step-by-step guides for common development tasks, with context about the plugin's architecture, type system, and conventions
CLAUDE.md — per-plugin architecture summary that provides an AI agent with full context about the plugin's structure, property types, filtering syntax, and development workflow
The Deterministic + Agentic Split
The approach deliberately separates concerns:
| Layer |
Approach |
Examples |
| Schema → Boilerplate |
Deterministic (catalog-gen) |
Models, repositories, OpenAPI specs, filter mappings |
| Boilerplate → Working Plugin |
Agentic (AI-assisted) |
Service implementation, provider logic, test data, build fixes |
The deterministic layer ensures consistency and reproducibility. The agentic layer handles the creative, context-dependent work where an AI assistant can follow generated skill instructions to complete the integration — wiring artifact repositories, implementing conversion functions, adding custom providers, and fixing build errors after schema changes.
Adding a New Catalog Type
The end-to-end workflow for adding a new catalog type:
1. Initialize catalog-gen init <name> --entity=<Entity> --package=<pkg>
↓
2. Define schema Edit catalog.yaml (add properties, artifacts)
↓
3. Regenerate catalog-gen generate
↓
4. Implement logic Write service impl + data providers
(guided by generated agentic skills)
↓
5. Register plugin Add blank import to catalog-server/main.go
↓
6. Configure sources Add plugin section to sources.yaml
↓
7. Merge API specs make api/openapi/catalog-spec.yaml
↓
8. Generate handlers make gen/openapi-server (in plugin dir)
Steps 1–3 and 7–8 are fully automated. Step 4 is where the developer (or AI agent) focuses their effort, guided by the generated commands and skills. Steps 5–6 are one-line configuration changes.
The result is a new catalog type with its own API, database tables, data providers, filtering, pagination, and OpenAPI documentation — fully integrated into the unified catalog server.
MCP Plugin — First New AI Asset (Example)
The MCP (Model Context Protocol) plugin serves as the first non-model catalog type, validating that the plugin system can accommodate arbitrary AI assets.
It was generated entirely via catalog-gen:
catalog-gen init mcp --entity=McpServer --package=.../catalog/plugins/mcp
- Edit
catalog.yaml to add MCP-specific properties
catalog-gen generate to regenerate with new properties
- Implement the YAML provider for MCP server data
- Add the plugin import to
catalog-server/main.go
- Configure sources in
sources.yaml
The MCP plugin now serves its own API at /api/mcp_catalog/v1alpha1/mcpservers, with full filtering, pagination, and source management — running alongside the Model Catalog in the same server process.
This demonstrates that adding a new AI asset type is primarily a schema definition exercise, with the infrastructure provided by the framework and the integration steps guided by the generated agentic workflows.
Companion PR
Motivation
The Model Catalog currently serves ML model metadata — discovery, filtering, and source aggregation for models across multiple providers. As the AI ecosystem grows, users need to manage a broader range of assets beyond models: MCP servers, datasets, prompt templates, AI agents, evaluation benchmarks, and more.
Rather than building separate services for each asset type, this proposal evolves the catalog into a generic, extensible platform where each asset type is a self-contained plugin. The architecture must satisfy two constraints:
Architecture Overview
The system is built around a unified catalog server that orchestrates multiple catalog plugins within a single process.
Each plugin is an independent catalog type with its own API routes, database tables, data providers, and OpenAPI specification. Plugins share the database connection, configuration system, and HTTP server infrastructure.
Plugin Lifecycle
Plugins follow a well-defined lifecycle managed by the server:
Plugins register at compile-time via Go
init()functions — a blank import in the server's main package is all that's needed. At runtime, the server readssources.yamlto determine which plugins have configuration, initializes them, applies their database migrations, mounts their routes, and starts background operations.Plugin Interface
Every plugin implements a core interface with lifecycle methods: initialization, route registration, database migrations, health checks, and graceful shutdown. Two optional interfaces allow plugins to customize their API base path and their configuration key in
sources.yaml, enabling backward-compatible naming.Runtime Configuration
Plugins are configured via a unified
sources.yamlfile that maps plugin names to their data sources:Each plugin reads only its own section. Source definitions support include/exclude glob patterns for filtering, multiple provider types (YAML, HTTP, etc.), and per-source enable/disable toggles.
The Model Catalog as a Plugin
The existing Model Catalog is wrapped into a plugin with no API changes. All current paths are preserved:
GET /api/model_catalog/v1alpha1/modelsGET /api/model_catalog/v1alpha1/sourcesGET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}/artifactsThe legacy catalog infrastructure — loader, providers, database service — is reused as-is inside the plugin. No client-side changes are required. The plugin simply wraps the existing code behind the
CatalogPlugininterface, delegating lifecycle events to the existing initialization and shutdown logic.An optional interface allows the plugin named "model" to read from the "models" section in
sources.yaml, maintaining backward compatibility with existing configuration files.Generic Catalog Framework
A shared framework in
pkg/catalog/provides type-parameterized building blocks that any plugin can use:filterQueryparameter available on all list endpoints, supporting comparison operators (=,!=,>,<,>=,<=), pattern matching (LIKE,ILIKE), set membership (IN), and logical combinators (AND,OR).BaseResourceListresponse envelope.These building blocks ensure that every catalog type gets the same query capabilities, pagination behavior, and data loading patterns without reimplementing them.
Plugin API Schema Strategy
Each plugin owns its OpenAPI specification, and a merge process produces a unified spec for documentation and validation.
Schema Ownership
Shared schemas like
BaseResource,BaseResourceList, andMetadataValuelive in a centralcommon.yaml. Plugin schemas reference them via a symlink, ensuring a single source of truth. Entity schemas useallOfcomposition withBaseResource, inheriting standard fields (id, name, description, timestamps, custom properties) without duplication.Merge Process
A merge script combines all plugin specs into a single unified OpenAPI document:
catalog/plugins/*/api/openapi/openapi.yamlMcpServerbecomesMcp_McpServer); common schemas are excluded from prefixing/mcpserversbecomes/api/mcp_catalog/v1alpha1/mcpservers)#/components/schemas/referencesThe result is a single
catalog-spec.yamlthat can be used for documentation, client generation, and CI validation. A--checkmode verifies the committed spec matches the generated output.Breaking Change Avoidance
/api/mcp_catalog/v1alpha1/...), never modifying existing pathsBaseResourcewithout breaking existing consumerscatalog-gen— Deterministic Scaffoldingcatalog-genis a CLI tool for scaffolding new catalog plugins. It follows the same philosophy as kubebuilder for Kubernetes: define your schema declaratively, generate the boilerplate, focus on business logic.Commands
catalog-gen init <name> --entity=<Entity> --package=<pkg>catalog-gen generatecatalog.yamlcatalog-gen add-property <name> <type>catalog-gen add-artifact <name>catalog-gen add-artifact-property <artifact> <name> <type>catalog-gen gen-testdataWhat Gets Generated
From a single
catalog.yamldefinition, the tool generates:plugin.goandregister.goimplementing theCatalogPlugininterfaceallOfcomposition againstBaseResource, list/get operations, filtering and pagination parametersfilterQueryon all list endpointsDeterministic Output
The generation is fully deterministic: the same
catalog.yamlalways produces the same output. Files are split into two categories:catalog-gen generate): models, repositories, OpenAPI components, filter mappings, loader, plugin registrationinit, never overwritten): service implementations, providers, Makefile, OpenAPI main specThis separation means developers can safely re-run generation after schema changes without losing their custom business logic.
Agentic Workflows for Post-Boilerplate Steps
Code generation handles the deterministic parts — schemas, models, database access, API specs. But some steps require judgment: implementing business logic, wiring up custom providers, writing tests. These are handled by agentic workflows generated alongside the plugin code.
Generated AI-Assistant Integration
Each plugin gets a
.claude/directory with:/add-property,/add-artifact,/regenerate,/fix-build,/gen-testdata) — quick actions that modifycatalog.yamland regenerate codeCLAUDE.md— per-plugin architecture summary that provides an AI agent with full context about the plugin's structure, property types, filtering syntax, and development workflowThe Deterministic + Agentic Split
The approach deliberately separates concerns:
catalog-gen)The deterministic layer ensures consistency and reproducibility. The agentic layer handles the creative, context-dependent work where an AI assistant can follow generated skill instructions to complete the integration — wiring artifact repositories, implementing conversion functions, adding custom providers, and fixing build errors after schema changes.
Adding a New Catalog Type
The end-to-end workflow for adding a new catalog type:
Steps 1–3 and 7–8 are fully automated. Step 4 is where the developer (or AI agent) focuses their effort, guided by the generated commands and skills. Steps 5–6 are one-line configuration changes.
The result is a new catalog type with its own API, database tables, data providers, filtering, pagination, and OpenAPI documentation — fully integrated into the unified catalog server.
MCP Plugin — First New AI Asset (Example)
The MCP (Model Context Protocol) plugin serves as the first non-model catalog type, validating that the plugin system can accommodate arbitrary AI assets.
It was generated entirely via
catalog-gen:catalog-gen init mcp --entity=McpServer --package=.../catalog/plugins/mcpcatalog.yamlto add MCP-specific propertiescatalog-gen generateto regenerate with new propertiescatalog-server/main.gosources.yamlThe MCP plugin now serves its own API at
/api/mcp_catalog/v1alpha1/mcpservers, with full filtering, pagination, and source management — running alongside the Model Catalog in the same server process.This demonstrates that adding a new AI asset type is primarily a schema definition exercise, with the infrastructure provided by the framework and the integration steps guided by the generated agentic workflows.