Skip to content

[Proposal] Plugin-Based Catalog Architecture and catalog-gen tool to generate new catalog types #2220

@Al-Pragliola

Description

@Al-Pragliola

Companion PR


Motivation

The Model Catalog currently serves ML model metadata — discovery, filtering, and source aggregation for models across multiple providers. As the AI ecosystem grows, users need to manage a broader range of assets beyond models: MCP servers, datasets, prompt templates, AI agents, evaluation benchmarks, and more.

Rather than building separate services for each asset type, this proposal evolves the catalog into a generic, extensible platform where each asset type is a self-contained plugin. The architecture must satisfy two constraints:

  1. Zero breaking changes for existing Model Catalog consumers — all current API paths, schemas, and behaviors are preserved.
  2. Minimal effort to add new catalog types — adding a new AI asset should require defining a schema and implementing data providers, not rebuilding infrastructure.

Architecture Overview

The system is built around a unified catalog server that orchestrates multiple catalog plugins within a single process.

┌──────────────────────────────────────────────────────────────┐
│                      catalog-server                          │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │
│  │   Model     │  │    MCP      │  │  Dataset    │  ...      │
│  │   Plugin    │  │   Plugin    │  │  Plugin     │           │
│  │             │  │             │  │             │           │
│  │ /api/model_ │  │ /api/mcp_   │  │ /api/data_  │           │
│  │ catalog/v1  │  │ catalog/v1  │  │ catalog/v1  │           │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘           │
│         │                │                │                  │
│  ┌──────┴────────────────┴────────────────┴──────┐           │
│  │          Shared Database (GORM)               │           │
│  │     SQLite / MySQL / PostgreSQL               │           │
│  └───────────────────────────────────────────────┘           │
│                                                              │
│  Health: /healthz, /readyz    Plugins: /api/plugins          │
└──────────────────────────────────────────────────────────────┘

Each plugin is an independent catalog type with its own API routes, database tables, data providers, and OpenAPI specification. Plugins share the database connection, configuration system, and HTTP server infrastructure.

Plugin Lifecycle

Plugins follow a well-defined lifecycle managed by the server:

Compile-time                              Runtime
────────────                              ───────
                                          ┌───────────────┐
   import _  ──── init() ──── Register()  │  Load         │
                                          │  sources.yaml │
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │   Init()      │  ← Per-plugin config
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │  Migrations() │  ← Schema setup
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │ RegisterRoutes│  ← Mount HTTP API
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │   Start()     │  ← Hot-reload, watchers
                                          └──────┬────────┘
                                                 │
                                          ┌──────▼────────┐
                                          │   Healthy()   │  ← Readiness probes
                                          └───────────────┘

Plugins register at compile-time via Go init() functions — a blank import in the server's main package is all that's needed. At runtime, the server reads sources.yaml to determine which plugins have configuration, initializes them, applies their database migrations, mounts their routes, and starts background operations.

Plugin Interface

Every plugin implements a core interface with lifecycle methods: initialization, route registration, database migrations, health checks, and graceful shutdown. Two optional interfaces allow plugins to customize their API base path and their configuration key in sources.yaml, enabling backward-compatible naming.

Runtime Configuration

Plugins are configured via a unified sources.yaml file that maps plugin names to their data sources:

apiVersion: catalog.kubeflow.org/v1alpha1
kind: CatalogSources
catalogs:
  models:
    sources:
      - id: "huggingface"
        type: "yaml"
        properties:
          yamlCatalogPath: "./data/models.yaml"
  mcp:
    sources:
      - id: "internal-servers"
        type: "yaml"
        properties:
          yamlCatalogPath: "./data/mcp-servers.yaml"

Each plugin reads only its own section. Source definitions support include/exclude glob patterns for filtering, multiple provider types (YAML, HTTP, etc.), and per-source enable/disable toggles.

The Model Catalog as a Plugin

The existing Model Catalog is wrapped into a plugin with no API changes. All current paths are preserved:

  • GET /api/model_catalog/v1alpha1/models
  • GET /api/model_catalog/v1alpha1/sources
  • GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}
  • GET /api/model_catalog/v1alpha1/sources/{source_id}/models/{model_name}/artifacts

The legacy catalog infrastructure — loader, providers, database service — is reused as-is inside the plugin. No client-side changes are required. The plugin simply wraps the existing code behind the CatalogPlugin interface, delegating lifecycle events to the existing initialization and shutdown logic.

An optional interface allows the plugin named "model" to read from the "models" section in sources.yaml, maintaining backward compatibility with existing configuration files.

Generic Catalog Framework

A shared framework in pkg/catalog/ provides type-parameterized building blocks that any plugin can use:

  • Loader — Generic data loader that fetches entities and artifacts from multiple sources concurrently, persists them via callbacks, and supports hot-reload with file watching.
  • Provider Registry — Typed provider system where each provider type (YAML, HTTP, etc.) registers a function that knows how to fetch data from a source.
  • Source Configuration — Shared source definition with include/exclude glob patterns, property bags, and enable/disable state.
  • Filter Engine — SQL-like filterQuery parameter available on all list endpoints, supporting comparison operators (=, !=, >, <, >=, <=), pattern matching (LIKE, ILIKE), set membership (IN), and logical combinators (AND, OR).
  • Pagination — Consistent page token-based pagination and ordering via BaseResourceList response envelope.

These building blocks ensure that every catalog type gets the same query capabilities, pagination behavior, and data loading patterns without reimplementing them.

Plugin API Schema Strategy

Each plugin owns its OpenAPI specification, and a merge process produces a unified spec for documentation and validation.

Schema Ownership

api/openapi/
├── src/lib/common.yaml              ← Shared schemas (BaseResource, MetadataValue, etc.)
├── src/catalog.yaml                 ← Main catalog spec (model paths)
└── catalog-spec.yaml                ← Merged unified spec (all plugins)

catalog/plugins/mcp/
└── api/openapi/
    ├── src/
    │   ├── lib/ → <repo-root>/api/openapi/src/lib/   ← Symlink to shared schemas
    │   ├── generated/components.yaml                  ← Generated entity schemas
    │   └── openapi.yaml                               ← Plugin paths and operations
    └── openapi.yaml                                   ← Merged plugin spec

Shared schemas like BaseResource, BaseResourceList, and MetadataValue live in a central common.yaml. Plugin schemas reference them via a symlink, ensuring a single source of truth. Entity schemas use allOf composition with BaseResource, inheriting standard fields (id, name, description, timestamps, custom properties) without duplication.

Merge Process

A merge script combines all plugin specs into a single unified OpenAPI document:

  1. Auto-discover plugin specs under catalog/plugins/*/api/openapi/openapi.yaml
  2. Prefix plugin-specific schemas to avoid name collisions (e.g., McpServer becomes Mcp_McpServer); common schemas are excluded from prefixing
  3. Absolutize paths using each plugin's server base URL (e.g., /mcpservers becomes /api/mcp_catalog/v1alpha1/mcpservers)
  4. Prefix operation IDs for uniqueness
  5. Resolve external references to common schemas into local #/components/schemas/ references
  6. Deep-merge all plugin specs with the main catalog spec

The result is a single catalog-spec.yaml that can be used for documentation, client generation, and CI validation. A --check mode verifies the committed spec matches the generated output.

Breaking Change Avoidance

  • The main catalog API paths and schemas remain untouched — the model catalog's API surface is identical before and after the refactor
  • New plugins add paths under their own base URL (e.g., /api/mcp_catalog/v1alpha1/...), never modifying existing paths
  • Shared schemas are additive — new fields can be added to BaseResource without breaking existing consumers
  • The merge process is purely additive: it unions paths and schemas from all plugins

catalog-gen — Deterministic Scaffolding

catalog-gen is a CLI tool for scaffolding new catalog plugins. It follows the same philosophy as kubebuilder for Kubernetes: define your schema declaratively, generate the boilerplate, focus on business logic.

Commands

Command Purpose
catalog-gen init <name> --entity=<Entity> --package=<pkg> Scaffold a complete plugin
catalog-gen generate Regenerate non-editable files from catalog.yaml
catalog-gen add-property <name> <type> Add a property to the entity schema
catalog-gen add-artifact <name> Add an artifact type
catalog-gen add-artifact-property <artifact> <name> <type> Add a property to an artifact
catalog-gen gen-testdata Generate sample test data

What Gets Generated

From a single catalog.yaml definition, the tool generates:

  • Plugin lifecycleplugin.go and register.go implementing the CatalogPlugin interface
  • Entity and artifact models — Go structs with GORM tags for database persistence
  • Database schema — Datastore specification and migration setup
  • Repositories — Type-safe database access layer
  • OpenAPI specification — Entity schemas with allOf composition against BaseResource, list/get operations, filtering and pagination parameters
  • Data providers — YAML file provider (with HTTP provider template available)
  • Filter mappings — Field-to-database-column mappings enabling filterQuery on all list endpoints
  • Makefile — Build, test, and OpenAPI code generation targets
  • Agentic workflows — Claude Code commands and skills for post-boilerplate development

Deterministic Output

The generation is fully deterministic: the same catalog.yaml always produces the same output. Files are split into two categories:

  • Non-editable (regenerated on every catalog-gen generate): models, repositories, OpenAPI components, filter mappings, loader, plugin registration
  • Editable (created once during init, never overwritten): service implementations, providers, Makefile, OpenAPI main spec

This separation means developers can safely re-run generation after schema changes without losing their custom business logic.

Agentic Workflows for Post-Boilerplate Steps

Code generation handles the deterministic parts — schemas, models, database access, API specs. But some steps require judgment: implementing business logic, wiring up custom providers, writing tests. These are handled by agentic workflows generated alongside the plugin code.

Generated AI-Assistant Integration

Each plugin gets a .claude/ directory with:

  • Slash commands (/add-property, /add-artifact, /regenerate, /fix-build, /gen-testdata) — quick actions that modify catalog.yaml and regenerate code
  • Skills — detailed step-by-step guides for common development tasks, with context about the plugin's architecture, type system, and conventions
  • CLAUDE.md — per-plugin architecture summary that provides an AI agent with full context about the plugin's structure, property types, filtering syntax, and development workflow

The Deterministic + Agentic Split

The approach deliberately separates concerns:

Layer Approach Examples
Schema → Boilerplate Deterministic (catalog-gen) Models, repositories, OpenAPI specs, filter mappings
Boilerplate → Working Plugin Agentic (AI-assisted) Service implementation, provider logic, test data, build fixes

The deterministic layer ensures consistency and reproducibility. The agentic layer handles the creative, context-dependent work where an AI assistant can follow generated skill instructions to complete the integration — wiring artifact repositories, implementing conversion functions, adding custom providers, and fixing build errors after schema changes.

Adding a New Catalog Type

The end-to-end workflow for adding a new catalog type:

1. Initialize          catalog-gen init <name> --entity=<Entity> --package=<pkg>
                       ↓
2. Define schema       Edit catalog.yaml (add properties, artifacts)
                       ↓
3. Regenerate          catalog-gen generate
                       ↓
4. Implement logic     Write service impl + data providers
                       (guided by generated agentic skills)
                       ↓
5. Register plugin     Add blank import to catalog-server/main.go
                       ↓
6. Configure sources   Add plugin section to sources.yaml
                       ↓
7. Merge API specs     make api/openapi/catalog-spec.yaml
                       ↓
8. Generate handlers   make gen/openapi-server (in plugin dir)

Steps 1–3 and 7–8 are fully automated. Step 4 is where the developer (or AI agent) focuses their effort, guided by the generated commands and skills. Steps 5–6 are one-line configuration changes.

The result is a new catalog type with its own API, database tables, data providers, filtering, pagination, and OpenAPI documentation — fully integrated into the unified catalog server.

MCP Plugin — First New AI Asset (Example)

The MCP (Model Context Protocol) plugin serves as the first non-model catalog type, validating that the plugin system can accommodate arbitrary AI assets.

It was generated entirely via catalog-gen:

  1. catalog-gen init mcp --entity=McpServer --package=.../catalog/plugins/mcp
  2. Edit catalog.yaml to add MCP-specific properties
  3. catalog-gen generate to regenerate with new properties
  4. Implement the YAML provider for MCP server data
  5. Add the plugin import to catalog-server/main.go
  6. Configure sources in sources.yaml

The MCP plugin now serves its own API at /api/mcp_catalog/v1alpha1/mcpservers, with full filtering, pagination, and source management — running alongside the Model Catalog in the same server process.

This demonstrates that adding a new AI asset type is primarily a schema definition exercise, with the infrastructure provided by the framework and the integration steps guided by the generated agentic workflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions