[Proposal] Plugin-Based Catalog Architecture and catalog-gen tool to generate new catalog types

---------------------------------------------

**Companion [PR](https://github.com/kubeflow/model-registry/pull/2219)**

---------------------------------------------

## Motivation

The Model Catalog currently serves ML model metadata — discovery, filtering, and source aggregation for models across multiple providers. As the AI ecosystem grows, users need to manage a broader range of assets beyond models: MCP servers, datasets, prompt templates, AI agents, evaluation benchmarks, and more.

Rather than building separate services for each asset type, this proposal evolves the catalog into a **generic, extensible platform** where each asset type is a self-contained plugin. The architecture must satisfy two constraints:

1. **Zero breaking changes** for existing Model Catalog consumers — all current API paths, schemas, and behaviors are preserved.
2. **Minimal effort to add new catalog types** — adding a new AI asset should require defining a schema and implementing data providers, not rebuilding infrastructure.

## Architecture Overview

The system is built around a **unified catalog server** that orchestrates multiple catalog plugins within a single process.

```
┌──────────────────────────────────────────────────────────────┐
│                      catalog-server                          │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │
│  │   Model     │  │    MCP      │  │  Dataset    │  ...      │
│  │   Plugin    │  │   Plugin    │  │  Plugin     │           │
│  │             │  │             │  │             │           │
│  │ /api/model_ │  │ /api/mcp_   │  │ /api/data_  │           │
│  │ catalog/v1  │  │ catalog/v1  │  │ catalog/v1  │           │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘           │
│         │                │                │                  │
│  ┌──────┴────────────────┴────────────────┴──────┐           │
│  │          Shared Database (GORM)               │           │
│  │     SQLite / MySQL / PostgreSQL               │           │
│  └───────────────────────────────────────────────┘           │
│                                                              │
│  Health: /healthz, /readyz    Plugins: /api/plugins          │
└──────────────────────────────────────────────────────────────┘
```

Each plugin is an independent catalog type with its own API routes, database tables, data providers, and OpenAPI specification. Plugins share the database connection, configuration system, and HTTP server infrastructure.

### Plugin Lifecycle

Plugins follow a well-defined lifecycle managed by the server:

```
Compile-time                              Runtime
────────────                              ───────
                                          ┌───────────────┐
   import _  ──── init() ──── Register()  │  Load         │
                                          │  sources.yaml │
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │   Init()      │  ← Per-plugin config
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │  Migrations() │  ← Schema setup
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │ RegisterRoutes│  ← Mount HTTP API
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │   Start()     │  ← Hot-reload, watchers
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │   Healthy()   │  ← Readiness probes
                                          └───────────────┘
```

Plugins register at **compile-time** via Go `init()` functions — a blank import in the server's main package is all that's needed. At **runtime**, the server reads `sources.yaml` to determine which plugins have configuration, initializes them, applies their database migrations, mounts their routes, and starts background operations.

### Plugin Interface

Every plugin implements a core interface with lifecycle methods: initialization, route registration, database migrations, health checks, and graceful shutdown. Two optional interfaces allow plugins to customize their API base path and their configuration key in `sources.yaml`, enabling backward-compatible naming.

### Runtime Configuration

Plugins are configured via a unified `sources.yaml` file that maps plugin names to their data sources:

```yaml
apiVersion: catalog.kubeflow.org/v1alpha1
kind: CatalogSources
catalogs:
  models:
    sources:
      - id: "huggingface"
        type: "yaml"
        properties:
          yamlCatalogPath: "./data/models.yaml"
  mcp:
    sources:
      - id: "internal-servers"
        type: "yaml"
        properties:
          yamlCatalogPath: "./data/mcp-servers.yaml"
```

Each plugin reads only its own section. Source definitions support include/exclude glob patterns for filtering, multiple provider types (YAML, HTTP, etc.), and per-source enable/disable toggles.

## The Model Catalog as a Plugin

The existing Model Catalog is wrapped into a plugin with **no API changes**. All current paths are preserved:

- `GET /api/model_catalog/v1alpha1/models`
- `GET /api/model_catalog/v1alpha1/sources`
- `GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}`
- `GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}/artifacts`

The legacy catalog infrastructure — loader, providers, database service — is reused as-is inside the plugin. No client-side changes are required. The plugin simply wraps the existing code behind the `CatalogPlugin` interface, delegating lifecycle events to the existing initialization and shutdown logic.

An optional interface allows the plugin named "model" to read from the "models" section in `sources.yaml`, maintaining backward compatibility with existing configuration files.

## Generic Catalog Framework

A shared framework in `pkg/catalog/` provides type-parameterized building blocks that any plugin can use:

- **Loader** — Generic data loader that fetches entities and artifacts from multiple sources concurrently, persists them via callbacks, and supports hot-reload with file watching.
- **Provider Registry** — Typed provider system where each provider type (YAML, HTTP, etc.) registers a function that knows how to fetch data from a source.
- **Source Configuration** — Shared source definition with include/exclude glob patterns, property bags, and enable/disable state.
- **Filter Engine** — SQL-like `filterQuery` parameter available on all list endpoints, supporting comparison operators (`=`, `!=`, `>`, `<`, `>=`, `<=`), pattern matching (`LIKE`, `ILIKE`), set membership (`IN`), and logical combinators (`AND`, `OR`).
- **Pagination** — Consistent page token-based pagination and ordering via `BaseResourceList` response envelope.

These building blocks ensure that every catalog type gets the same query capabilities, pagination behavior, and data loading patterns without reimplementing them.

## Plugin API Schema Strategy

Each plugin owns its OpenAPI specification, and a merge process produces a unified spec for documentation and validation.

### Schema Ownership

```
api/openapi/
├── src/lib/common.yaml              ← Shared schemas (BaseResource, MetadataValue, etc.)
├── src/catalog.yaml                 ← Main catalog spec (model paths)
└── catalog-spec.yaml                ← Merged unified spec (all plugins)

catalog/plugins/mcp/
└── api/openapi/
    ├── src/
    │   ├── lib/ → <repo-root>/api/openapi/src/lib/   ← Symlink to shared schemas
    │   ├── generated/components.yaml                  ← Generated entity schemas
    │   └── openapi.yaml                               ← Plugin paths and operations
    └── openapi.yaml                                   ← Merged plugin spec
```

**Shared schemas** like `BaseResource`, `BaseResourceList`, and `MetadataValue` live in a central `common.yaml`. Plugin schemas reference them via a symlink, ensuring a single source of truth. Entity schemas use `allOf` composition with `BaseResource`, inheriting standard fields (id, name, description, timestamps, custom properties) without duplication.

### Merge Process

A merge script combines all plugin specs into a single unified OpenAPI document:

1. **Auto-discover** plugin specs under `catalog/plugins/*/api/openapi/openapi.yaml`
2. **Prefix** plugin-specific schemas to avoid name collisions (e.g., `McpServer` becomes `Mcp_McpServer`); common schemas are excluded from prefixing
3. **Absolutize** paths using each plugin's server base URL (e.g., `/mcpservers` becomes `/api/mcp_catalog/v1alpha1/mcpservers`)
4. **Prefix** operation IDs for uniqueness
5. **Resolve** external references to common schemas into local `#/components/schemas/` references
6. **Deep-merge** all plugin specs with the main catalog spec

The result is a single `catalog-spec.yaml` that can be used for documentation, client generation, and CI validation. A `--check` mode verifies the committed spec matches the generated output.

### Breaking Change Avoidance

- The main catalog API paths and schemas remain **untouched** — the model catalog's API surface is identical before and after the refactor
- New plugins add paths under their own base URL (e.g., `/api/mcp_catalog/v1alpha1/...`), never modifying existing paths
- Shared schemas are additive — new fields can be added to `BaseResource` without breaking existing consumers
- The merge process is purely additive: it unions paths and schemas from all plugins

## `catalog-gen` — Deterministic Scaffolding

`catalog-gen` is a CLI tool for scaffolding new catalog plugins. It follows the same philosophy as kubebuilder for Kubernetes: define your schema declaratively, generate the boilerplate, focus on business logic.

### Commands

| Command | Purpose |
|---------|---------|
| `catalog-gen init <name> --entity=<Entity> --package=<pkg>` | Scaffold a complete plugin |
| `catalog-gen generate` | Regenerate non-editable files from `catalog.yaml` |
| `catalog-gen add-property <name> <type>` | Add a property to the entity schema |
| `catalog-gen add-artifact <name>` | Add an artifact type |
| `catalog-gen add-artifact-property <artifact> <name> <type>` | Add a property to an artifact |
| `catalog-gen gen-testdata` | Generate sample test data |

### What Gets Generated

From a single `catalog.yaml` definition, the tool generates:

- **Plugin lifecycle** — `plugin.go` and `register.go` implementing the `CatalogPlugin` interface
- **Entity and artifact models** — Go structs with GORM tags for database persistence
- **Database schema** — Datastore specification and migration setup
- **Repositories** — Type-safe database access layer
- **OpenAPI specification** — Entity schemas with `allOf` composition against `BaseResource`, list/get operations, filtering and pagination parameters
- **Data providers** — YAML file provider (with HTTP provider template available)
- **Filter mappings** — Field-to-database-column mappings enabling `filterQuery` on all list endpoints
- **Makefile** — Build, test, and OpenAPI code generation targets
- **Agentic workflows** — Claude Code commands and skills for post-boilerplate development

### Deterministic Output

The generation is fully deterministic: the same `catalog.yaml` always produces the same output. Files are split into two categories:

- **Non-editable** (regenerated on every `catalog-gen generate`): models, repositories, OpenAPI components, filter mappings, loader, plugin registration
- **Editable** (created once during `init`, never overwritten): service implementations, providers, Makefile, OpenAPI main spec

This separation means developers can safely re-run generation after schema changes without losing their custom business logic.

## Agentic Workflows for Post-Boilerplate Steps

Code generation handles the deterministic parts — schemas, models, database access, API specs. But some steps require judgment: implementing business logic, wiring up custom providers, writing tests. These are handled by **agentic workflows** generated alongside the plugin code.

### Generated AI-Assistant Integration

Each plugin gets a `.claude/` directory with:

- **Slash commands** (`/add-property`, `/add-artifact`, `/regenerate`, `/fix-build`, `/gen-testdata`) — quick actions that modify `catalog.yaml` and regenerate code
- **Skills** — detailed step-by-step guides for common development tasks, with context about the plugin's architecture, type system, and conventions
- **`CLAUDE.md`** — per-plugin architecture summary that provides an AI agent with full context about the plugin's structure, property types, filtering syntax, and development workflow

### The Deterministic + Agentic Split

The approach deliberately separates concerns:

| Layer | Approach | Examples |
|-------|----------|---------|
| Schema → Boilerplate | **Deterministic** (`catalog-gen`) | Models, repositories, OpenAPI specs, filter mappings |
| Boilerplate → Working Plugin | **Agentic** (AI-assisted) | Service implementation, provider logic, test data, build fixes |

The deterministic layer ensures consistency and reproducibility. The agentic layer handles the creative, context-dependent work where an AI assistant can follow generated skill instructions to complete the integration — wiring artifact repositories, implementing conversion functions, adding custom providers, and fixing build errors after schema changes.

## Adding a New Catalog Type

The end-to-end workflow for adding a new catalog type:

```
1. Initialize          catalog-gen init <name> --entity=<Entity> --package=<pkg>
                       ↓
2. Define schema       Edit catalog.yaml (add properties, artifacts)
                       ↓
3. Regenerate          catalog-gen generate
                       ↓
4. Implement logic     Write service impl + data providers
                       (guided by generated agentic skills)
                       ↓
5. Register plugin     Add blank import to catalog-server/main.go
                       ↓
6. Configure sources   Add plugin section to sources.yaml
                       ↓
7. Merge API specs     make api/openapi/catalog-spec.yaml
                       ↓
8. Generate handlers   make gen/openapi-server (in plugin dir)
```

Steps 1–3 and 7–8 are fully automated. Step 4 is where the developer (or AI agent) focuses their effort, guided by the generated commands and skills. Steps 5–6 are one-line configuration changes.

The result is a new catalog type with its own API, database tables, data providers, filtering, pagination, and OpenAPI documentation — fully integrated into the unified catalog server.

## MCP Plugin — First New AI Asset (Example)

The MCP (Model Context Protocol) plugin serves as the first non-model catalog type, validating that the plugin system can accommodate arbitrary AI assets.

It was generated entirely via `catalog-gen`:

1. `catalog-gen init mcp --entity=McpServer --package=.../catalog/plugins/mcp`
2. Edit `catalog.yaml` to add MCP-specific properties
3. `catalog-gen generate` to regenerate with new properties
4. Implement the YAML provider for MCP server data
5. Add the plugin import to `catalog-server/main.go`
6. Configure sources in `sources.yaml`

The MCP plugin now serves its own API at `/api/mcp_catalog/v1alpha1/mcpservers`, with full filtering, pagination, and source management — running alongside the Model Catalog in the same server process.

This demonstrates that adding a new AI asset type is primarily a schema definition exercise, with the infrastructure provided by the framework and the integration steps guided by the generated agentic workflows.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Plugin-Based Catalog Architecture and catalog-gen tool to generate new catalog types #2220

Motivation

Architecture Overview

Plugin Lifecycle

Plugin Interface

Runtime Configuration

The Model Catalog as a Plugin

Generic Catalog Framework

Plugin API Schema Strategy

Schema Ownership

Merge Process

Breaking Change Avoidance

`catalog-gen` — Deterministic Scaffolding

Commands

What Gets Generated

Deterministic Output

Agentic Workflows for Post-Boilerplate Steps

Generated AI-Assistant Integration

The Deterministic + Agentic Split

Adding a New Catalog Type

MCP Plugin — First New AI Asset (Example)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Command	Purpose
`catalog-gen init <name> --entity=<Entity> --package=<pkg>`	Scaffold a complete plugin
`catalog-gen generate`	Regenerate non-editable files from `catalog.yaml`
`catalog-gen add-property <name> <type>`	Add a property to the entity schema
`catalog-gen add-artifact <name>`	Add an artifact type
`catalog-gen add-artifact-property <artifact> <name> <type>`	Add a property to an artifact
`catalog-gen gen-testdata`	Generate sample test data

Layer	Approach	Examples
Schema → Boilerplate	Deterministic (`catalog-gen`)	Models, repositories, OpenAPI specs, filter mappings
Boilerplate → Working Plugin	Agentic (AI-assisted)	Service implementation, provider logic, test data, build fixes

[Proposal] Plugin-Based Catalog Architecture and catalog-gen tool to generate new catalog types #2220

Description

Motivation

Architecture Overview

Plugin Lifecycle

Plugin Interface

Runtime Configuration

The Model Catalog as a Plugin

Generic Catalog Framework

Plugin API Schema Strategy

Schema Ownership

Merge Process

Breaking Change Avoidance

catalog-gen — Deterministic Scaffolding

Commands

What Gets Generated

Deterministic Output

Agentic Workflows for Post-Boilerplate Steps

Generated AI-Assistant Integration

The Deterministic + Agentic Split

Adding a New Catalog Type

MCP Plugin — First New AI Asset (Example)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`catalog-gen` — Deterministic Scaffolding