Skip to content

Add Plugins Support#1000

Open
alyssacgoins wants to merge 4 commits intoopendatahub-io:mainfrom
alyssacgoins:mlflow-plugin-configs
Open

Add Plugins Support#1000
alyssacgoins wants to merge 4 commits intoopendatahub-io:mainfrom
alyssacgoins:mlflow-plugin-configs

Conversation

@alyssacgoins
Copy link
Copy Markdown
Contributor

@alyssacgoins alyssacgoins commented Mar 26, 2026

The issue resolved by this Pull Request:

Resolves #<issue_number>

Description of your changes:

Add support for plugins, following kubeflow/pipelines#13005 and kubeflow/pipelines#13052

Testing instructions

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • MLflow integration added for Data Science Pipelines Applications.
    • CRD exposes spec.apiServer.mlflow with IntegrationMode (AUTODETECT, DISABLED) and InjectUserEnvVars flag.
    • API server can auto-detect and inject MLflow endpoint into its runtime config when enabled.
    • RoleBindings added to grant MLflow integration permissions to the API server and pipeline runner.
    • Runtime now watches MLflow resources to enable integration.
  • Chores

    • Release parameters updated to include an MLFLOW option.

@openshift-ci openshift-ci bot requested review from HumairAK and hbelmiro March 26, 2026 21:18
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds optional MLflow integration across API, CRD schema, deepcopy, controller logic, and templates: introduces MLflowConfig and IntegrationMode types and constants; registers MLflow CR type in controller cache; extends DSPA params to resolve MLflow endpoint and build a plugin JSON which is written into apiserver config.json via new util helpers that update ConfigMap JSON fields; adds RBAC RoleBindings templates for MLflow; updates go.mod and release params.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Critical findings

  • Inconsistent error handling for MLflow integration: endpoint retrieval failures are logged and treated as non-fatal while build/update failures are returned. Choose a single fail/ degrade policy and implement it. (CWE-703)
  • scripts/release/params.py sets OTHER_OPTIONS['MLFLOW'] = "{"}"" which is malformed JSON/string; correct to a valid placeholder or intended payload.
  • Retrieved MLflow Status.Address.URL is used without validation or normalization (scheme/host/path). Validate and parse via net/url before embedding. (CWE-20)

Code quality findings

  • UpdateConfigMapByName mutates JSON in ConfigMap without schema validation; validate structure (or against a JSON Schema) before writing to avoid corrupting config.json.
  • UpdateConfigMapByName performs non-transactional read-modify-write using client.Update without retry-on-conflict or optimistic concurrency control; switch to controllerutil.CreateOrUpdate or implement retry with resourceVersion to prevent lost updates. (CWE-362)
  • Error messages lack consistent wrapping/context; wrap underlying errors (e.g., fmt.Errorf("update configmap: %w", err)) to preserve chains.
  • BuildMLflowPluginConfigJson embeds user-derived values (endpoint, CA path, inject flag) into JSON; validate/escape inputs to prevent malformed JSON or injection.
  • DeepCopy methods were added for MLflowConfig but ensure any future fields (slices/maps) get correct deep-copy semantics.

Architectural notes

  • Adding MLflow CR type to controller cache without selectors increases controller memory and watch cardinality; consider selective caching/indexing or label-based filters for large clusters.
  • Using a shared config.json key in a ConfigMap as the integration transport is brittle and racy. Prefer an atomic/versioned config API (or per-tenant keys) and use server-side apply or CreateOrUpdate to avoid conflicts.
  • New dependency on MLflow operator API in go.mod pins a specific revision; verify compatibility across target operator versions and CI images.
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title 'Add Plugins Support' is vague and overly broad; while plugins are added, the changeset is specifically focused on MLflow plugin configuration, not general plugins support. Revise the title to be more specific, e.g., 'Add MLflow plugin configuration support' or 'Enable MLflow integration via plugin framework'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@alyssacgoins alyssacgoins marked this pull request as draft March 26, 2026 21:18
@dsp-developers
Copy link
Copy Markdown
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/1000/head
git checkout -b pullrequest 0eb20289080ad354fd1e01550faa70d52b3bb6e5
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-1000"

More instructions here on how to deploy and test a Data Science Pipelines Application.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (3)
controllers/config/defaults.go (1)

168-170: DefaultPluginsEnabled = true enables feature by default.

With plugins enabled by default but configuration broken due to path mismatches, users may see unexpected behavior. Consider defaulting to false until the feature is stable, or ensure the configuration plumbing is fixed first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/config/defaults.go` around lines 168 - 170, DefaultPluginsEnabled
is set to true which turns plugins on by default despite DefaultPlugins being
empty and potential path/config mismatches; change the default to false by
updating the DefaultPluginsEnabled constant to false (or alternatively fix the
configuration/path plumbing before enabling by ensuring DefaultPlugins is
populated and plugin path resolution in the plugin initialization code is
validated), and add a short comment near DefaultPlugins and
DefaultPluginsEnabled explaining the safe default choice until plugin config is
stable.
config/samples/dspa-simple/dspa_simple.yaml (1)

7-7: Sample enables plugins without demonstrating configuration.

Enabling pluginsEnabled: true without showing the corresponding plugins field may confuse users. Consider adding a commented example of the plugins field or setting pluginsEnabled: false until the feature is documented.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/samples/dspa-simple/dspa_simple.yaml` at line 7, The sample sets
pluginsEnabled: true but provides no plugins configuration, which can confuse
users; update the sample by either setting pluginsEnabled: false or add a
commented example showing the plugins field (e.g., a commented plugins: section
with one or two plugin entries and their minimal config) so users see how to
enable and configure plugins; refer to the pluginsEnabled and plugins keys when
making the change.
controllers/util/util.go (1)

105-111: Simplify return statement; consider conflict-retry guidance.

The function can be reduced to a single return statement. Additionally, per the call site in controllers/dspipeline_params.go:694, there's no retry on 409 Conflict. With MAX_CONCURRENT_RECONCILES=10, concurrent updates to the same ConfigMap will race. Consider documenting that callers must handle conflicts or implement retry-with-backoff within this helper.

Simplify
 func UpdateConfigMap(ctx context.Context, cfgMap *v1.ConfigMap, client client.Client) error {
-	err := client.Update(ctx, cfgMap)
-	if err != nil {
-		return err
-	}
-	return nil
+	return client.Update(ctx, cfgMap)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/util/util.go` around lines 105 - 111, The UpdateConfigMap helper
currently returns an explicit nil after updating; simplify it to a single return
by returning the result of client.Update(ctx, cfgMap) directly (function:
UpdateConfigMap). Also address concurrent 409 Conflict races (given
MAX_CONCURRENT_RECONCILES) by either documenting that callers must handle
conflicts or implementing retry-with-backoff inside UpdateConfigMap (use
client-go's retry.RetryOnConflict or a small loop that checks errors.IsConflict
and retries with exponential backoff) so concurrent updates to the same
ConfigMap are retried safely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1/dspipeline_types.go`:
- Around line 58-64: The Plugins field lacks kubebuilder validation and should
be constrained (add a +kubebuilder:validation:MaxLength=<reasonable limit> and
optionally a pattern/format marker) to prevent unbounded strings; add this
validation marker above the Plugins declaration. The PluginsEnabled bool must
distinguish "unset" from explicit false — change the type to *bool
(PluginsEnabled *bool `json:"pluginsEnabled,omitempty"`) or remove `omitempty`,
and update any controller logic that sets p.PluginsEnabled (refer to the
PluginsEnabled field and the controller code that assigns
config.DefaultPluginsEnabled) to first check for a user-provided value (nil vs
non-nil) before applying the default; ensure kubebuilder validation markers
(e.g., +kubebuilder:validation:Optional) remain consistent.

In `@config/base/kustomization.yaml`:
- Around line 184-190: The PLUGINS vars entry has fieldref wrongly nested inside
objref; update the PLUGINS vars block (the entry named PLUGINS) so that fieldref
is a sibling of objref (matching the structure used by WEBHOOK_ANNOTATIONS) —
move the fieldref.node (fieldpath: data.PLUGINS) out from under
objref.apiVersion into the same level as objref, ensuring the var uses objref:
{kind: ConfigMap, name: dspo-parameters, apiVersion: v1} and a separate
fieldref: {fieldpath: data.PLUGINS} so $(PLUGINS) substitution works.

In `@config/configmaps/files/config.yaml`:
- Line 29: The YAML places the Plugins key under DSPO (DSPO.Plugins) while the
code expects DSPO.ApiServer.Plugins (PluginsConfigName in
controllers/config/defaults.go), so update one side to match: either move the
Plugins entry in config.yaml under an ApiServer mapping (so the key becomes DSPO
-> ApiServer -> Plugins) or change PluginsConfigName to "DSPO.Plugins" in
defaults.go; adjust whichever file you modify to preserve the exact key string
used by Viper and ensure the names match (reference: PluginsConfigName constant
and the Plugins key in config.yaml).

In
`@config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml`:
- Around line 919-927: Remove the internal "todo: in this approach..." sentence
from the plugins field description, add a maxLength constraint to the plugins
schema (e.g., add maxLength: 4096 on the string items or on the string property
for plugins) to prevent arbitrarily large values that can stress etcd, and set
an explicit default for pluginsEnabled by adding default: true to the
pluginsEnabled property so the schema and description are consistent; update the
descriptions accordingly for plugins and pluginsEnabled while keeping the
existing field names (plugins, pluginsEnabled).

In `@config/manager/manager.yaml`:
- Around line 88-89: The deployed manager injects an env var named PLUGINS but
the code in main.go reads os.Getenv("DSPO_APISERVER_PLUGINS"), so the plugin
configuration is never seen; fix this by renaming the environment variable key
in the manager manifest from PLUGINS to DSPO_APISERVER_PLUGINS (update the
manifest entry that currently reads - name: PLUGINS value: $(PLUGINS)) so it
matches the code path that checks DSPO_APISERVER_PLUGINS in main.go and enables
proper MLflow plugin detection.

In `@controllers/config/defaults.go`:
- Line 112: The constant PluginsConfigName currently has the wrong config path
string; update the value of PluginsConfigName to "DSPO.Plugins" to match the
existing YAML structure (or alternatively align the YAML to include an ApiServer
nesting), ensuring any code that references PluginsConfigName uses the corrected
path.

In `@controllers/dspipeline_params.go`:
- Around line 494-498: The conditional is reading dsp.Spec.PluginsEnabled
directly while later logic uses p.PluginsEnabled, causing divergence; update
SetupObjectParams so p.PluginsEnabled is assigned from dsp.Spec.PluginsEnabled
early (fix the assignment around the existing initialization code) and change
the block that clears plugins to check p.PluginsEnabled (not
dsp.Spec.PluginsEnabled); also ensure when p.PluginsEnabled is false you set
p.Plugins = "" to prevent injection downstream (refer to SetupObjectParams,
p.PluginsEnabled, p.Plugins, and dsp.Spec.PluginsEnabled).
- Around line 640-641: p.PluginsEnabled is always set to
config.DefaultPluginsEnabled, ignoring dsp.Spec.PluginsEnabled; change the
assignment so p.PluginsEnabled reflects the spec when present and falls back to
the config default otherwise. Specifically, replace the unconditional
p.PluginsEnabled = config.DefaultPluginsEnabled with logic that checks
dsp.Spec.PluginsEnabled (handle it as *bool if the API uses a pointer): if
dsp.Spec.PluginsEnabled != nil set p.PluginsEnabled = *dsp.Spec.PluginsEnabled
else set it = config.DefaultPluginsEnabled; ensure subsequent code uses
p.PluginsEnabled rather than reading dsp.Spec.PluginsEnabled directly.
- Around line 677-701: The ConfigMap update sequence around util.GetConfigMap ->
updateJSON -> util.UpdateConfigMap risks TOCTOU conflicts and silently proceeds
on updateJSON errors; change the block that runs when p.PluginsEnabled &&
p.Plugins != "" to first validate that p.Plugins is well-formed JSON (e.g.,
attempt to unmarshal into interface{}) before calling updateJSON, then perform
the read-modify-write inside a retry loop that re-reads the ConfigMap (using
util.GetConfigMap), checks/uses cm.ResourceVersion, calls updateJSON, and
retries the UpdateConfigMap on conflict (409) a few times; also immediately
return the updateJSON error instead of only logging it (so update failures don’t
continue with an empty/invalid updatedServerCfg). Ensure you reference
updateJSON, util.GetConfigMap, util.UpdateConfigMap, cm.Data["config.json"], and
p.Plugins in the fix.

In `@main.go`:
- Around line 289-311: ParsePlugins may return nil and plugins["mlflow"] will
panic; after calling ParsePlugins check for nil (e.g., if plugins == nil { log
the parse error and exit or treat as empty map }) before indexing. Also make the
MLflow CR absence decisive: in the block that attempts to Get the MLflow CR
(mlflowCR via mgr.GetAPIReader().Get) convert the current non-fatal
setupLog.Error into a fatal startup error (or return the error from main) if the
mlflow plugin is required, so the operator does not continue without the
required CR; alternatively, if non-fatal, document and handle it downstream in
the reconciler. Finally, confirm the CR name/namespace/scope assumptions used
when setting mlflowCR (Name "mlflow" and empty Namespace) match the MLflow CRD
scope and adjust the ObjectKey/namespace accordingly.
- Around line 331-346: ParsePlugins currently unmarshals pluginStr without size
checks and returns nil on error; add defensive validation and richer logging:
define a constant maxPluginsSize (e.g., 64KB), check len(pluginStr) <=
maxPluginsSize at the top of ParsePlugins and log/return an empty map if
exceeded, then call json.Unmarshal into envelope wrapped in a short-decorated
error log that includes the input length and a truncated preview (not the full
content) on failure, and on any unmarshal or type-assertion failure return an
empty map[string]interface{} (not nil) so callers won’t panic; update uses of
variables envelope, pluginStr and setupLog in ParsePlugins accordingly.

In `@scripts/release/params.py`:
- Around line 49-51: The DSPO_APISERVER_PLUGINS default is malformed JSON
("{\"}\"") which causes ParsePlugins (main.go:335) json.Unmarshal to fail and
disables plugins; update the DSPO_APISERVER_PLUGINS entry to a valid empty JSON
object ("{}") so ParsePlugins receives valid JSON and plugin initialization
(including MLflow detection) works correctly.

---

Nitpick comments:
In `@config/samples/dspa-simple/dspa_simple.yaml`:
- Line 7: The sample sets pluginsEnabled: true but provides no plugins
configuration, which can confuse users; update the sample by either setting
pluginsEnabled: false or add a commented example showing the plugins field
(e.g., a commented plugins: section with one or two plugin entries and their
minimal config) so users see how to enable and configure plugins; refer to the
pluginsEnabled and plugins keys when making the change.

In `@controllers/config/defaults.go`:
- Around line 168-170: DefaultPluginsEnabled is set to true which turns plugins
on by default despite DefaultPlugins being empty and potential path/config
mismatches; change the default to false by updating the DefaultPluginsEnabled
constant to false (or alternatively fix the configuration/path plumbing before
enabling by ensuring DefaultPlugins is populated and plugin path resolution in
the plugin initialization code is validated), and add a short comment near
DefaultPlugins and DefaultPluginsEnabled explaining the safe default choice
until plugin config is stable.

In `@controllers/util/util.go`:
- Around line 105-111: The UpdateConfigMap helper currently returns an explicit
nil after updating; simplify it to a single return by returning the result of
client.Update(ctx, cfgMap) directly (function: UpdateConfigMap). Also address
concurrent 409 Conflict races (given MAX_CONCURRENT_RECONCILES) by either
documenting that callers must handle conflicts or implementing
retry-with-backoff inside UpdateConfigMap (use client-go's retry.RetryOnConflict
or a small loop that checks errors.IsConflict and retries with exponential
backoff) so concurrent updates to the same ConfigMap are retried safely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 8b23755a-45ff-481b-9028-8479d5f73b1b

📥 Commits

Reviewing files that changed from the base of the PR and between e5d4c5a and 0eb2028.

📒 Files selected for processing (12)
  • api/v1/dspipeline_types.go
  • config/base/kustomization.yaml
  • config/base/params.env
  • config/configmaps/files/config.yaml
  • config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml
  • config/manager/manager.yaml
  • config/samples/dspa-simple/dspa_simple.yaml
  • controllers/config/defaults.go
  • controllers/dspipeline_params.go
  • controllers/util/util.go
  • main.go
  • scripts/release/params.py

Copy link
Copy Markdown
Contributor

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we move the MLflow check to the reconciler? That way CRs can be added/modified on the fly and users can have status feedback.
  • I'd remove PluginsEnabled. If Plugins is empty or absent, plugins are disabled.
  • Probably out of scope: It would be nice to have admin (operator level) granularity on what plugins are allowed by DSPAs

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

@alyssacgoins
Copy link
Copy Markdown
Contributor Author

  • Can we move the MLflow check to the reconciler? That way CRs can be added/modified on the fly and users can have status feedback.
  • I'd remove PluginsEnabled. If Plugins is empty or absent, plugins are disabled.
  • Probably out of scope: It would be nice to have admin (operator level) granularity on what plugins are allowed by DSPAs

Hey @hbelmiro thanks for the review.

  • Added MLflow check to reconciler, I agree that makes more sense.
  • The Jira requested DSPA CRD needs a new field to opt out of the MLflow [or general plugin] integration. I think it makes sense for the DSPO operator to propagate the PLUGINS value through to DSPA, and for users to be able to specify pluginsEnabled on a per-DSPA basis. What are your thoughts?
  • That would be super helpful. I don't think we have the time on this PR to implement this, but we can make a follow-up.

Comment on lines +60 to +64
Plugins string `json:"plugins,omitempty"`

// Enable plugins. Defaults to True.
// +kubebuilder:validation:Optional
PluginsEnabled bool `json:"pluginsEnabled,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with Matt, we've agreed to just surface explicit configuration options catered to MLflow specifically. The fields we are going to need are:

spec:
  # If mlflow field is omitted, the default behavior is mlflow.integrationMode = AutoDetect
  mlflow:
    integrationMode: # *enum type, accepts one of "Disabled" or "AutoDetect", when omitted assume "AutoDetect"
    injectUserEnvVars: # *bool type, by default this value is false, when omitted the value is false.

The behavior works as follows:

integrationMode: Disabled

DSP API Server does not enable MLflow plugin configuration.

integrationMode: AutoDetect

For each DSPA reconcile loop, DSPO needs to enable MLflow plugin only if MLflow is deployed on the RHOAI cluster.

DSPO on start up should look up the MLflow Custom Resource's Status spec and cache the endpoint. That link points to the cluster internal address that can be leveraged by KFP in RHOAI.

Add a cache ttl, and use this cache to determine in each reconcile if MLflow is deployed (only need to make this check if integrationMode is AutoDetect).

injectUserEnvVars: false

DSPA sets the equivalent MLflow plugin setting injectUserEnvVars to false. This doesn't exist yet — you will need to add this to your KFP PR for launcher/driver as a top level plugin setting for MLflow.

When this is false, KFP will not update the user container's environment variable (the executor container). We want this behavior by default when unspecified.

injectUserEnvVars: true

Conversely, when the user opts in to this by setting true, then propagate this to the KFP config so that KFP can then have driver set these on the executor container.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if integrationMode: AutoDetect but we are unable to find the endpoint, then we should surface a DSPA status condition on this with a helpful user facing message. In this case plugin integration is not configured for KFP.

dspaStatus.SetObjStoreReady()
}

if params.PluginsEnabled && params.Plugins != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I don't think mlflowCR is doing anything.

As mentioned earlier, use explicit mlflow fields suggested earlier. You do not need to always fetch mlflow CR, use the semantics described in the earlier comment, do it once at dspo startup, and after everytime cache expires.

dspaStatus.SetObjStoreReady()
}

if params.PluginsEnabled && params.Plugins != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your approach needs to change here. Take a look at how plugin settings are configured at the API server level. This is the config that is present here. This is what you need to update when mlflow integrationMode is set to AutoDetect and you the MLflow instance exists on the cluster (i.e. you have successfully retrieved the mlflow endpoint).

Now the only thing you need to dynamically fetch is the endpoint (as mentioned earlier) from the MLflow CR. The rest you can infer/hardcode (as constants):

Timeout: 30s (or have llm suggest a reasonable timeout)
TLS: the dspa custom bundle path is sufficient
WorkspacesEnabled: true
AuthType: kubernetes
ExperimentDescription: "Created by AI Pipelines."

This can go straight in to the pipeline server config, and the integration will be enabled.

Proxy *ProxyConfig `json:"proxy,omitempty"`

// Plugins is a string list of JSON configs for each API server plugin
Plugins string `json:"plugins,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment, I added this to negate the accidental approve earlier.

@alyssacgoins alyssacgoins force-pushed the mlflow-plugin-configs branch 2 times, most recently from e830308 to eb0cd45 Compare March 30, 2026 20:55
@alyssacgoins alyssacgoins marked this pull request as ready for review March 31, 2026 16:02
@openshift-ci openshift-ci bot requested review from mprahl and sduvvuri1603 March 31, 2026 16:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
api/v1/dspipeline_types.go (2)

59-59: ⚠️ Potential issue | 🟠 Major

Fix the MLflow JSON tag before this becomes part of the public API.

json:"mlflowConfig,omitEmpty" uses a non-existent encoding/json option, so omitempty is not applied, and config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml:919-930 already describes a different MLflow wire shape. Align the tag with the intended API/CRD name and add the usual kubebuilder marker on the field.

As per coding guidelines, "Kubebuilder validation markers on all user-facing fields."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/dspipeline_types.go` at line 59, The MLflow field's JSON tag is wrong
and missing a kubebuilder marker: update the MLflow field (MLflow *MLflowConfig)
to use the correct wire name and omitempty (e.g. json:"mlflow,omitempty" to
match the CRD) and add the kubebuilder validation marker (e.g.
+kubebuilder:validation:Optional) immediately above the field so it is treated
as optional in generated CRDs; then regenerate CRDs to verify the wire shape
matches the existing manifests.

459-469: ⚠️ Potential issue | 🔴 Critical

MLflowConfig cannot be generated into a CRD as written.

The anonymous IntegrationMode field has no JSON tag, which is already tripping controller-gen in CI, and type IntegrationMode int will marshal as 0/1 instead of the string enum the CRD/API exposes. Make IntegrationMode a string type and give both fields explicit JSON tags, defaults, and enum markers.

As per coding guidelines, "Kubebuilder validation markers on all user-facing fields" and "Enum validation where applicable."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/dspipeline_types.go` around lines 459 - 469, The MLflowConfig struct
and IntegrationMode type must be updated so the CRD can be generated: change
type IntegrationMode from int to string, keep the enum constants (AutoDetect,
Disabled) as string values, add explicit json tags on both IntegrationMode and
InjectUserEnvVars in MLflowConfig (e.g., `integrationMode` and
`injectUserEnvVars`), and add kubebuilder validation markers on the fields —
include an enum marker for IntegrationMode listing "AutoDetect" and "Disabled"
and a default marker (e.g., default=AutoDetect), plus a default/optional marker
for InjectUserEnvVars as appropriate so controller-gen can produce correct CRD
validation.
controllers/dspipeline_params.go (3)

675-675: ⚠️ Potential issue | 🔴 Critical

Handle omitted MLflow config before dereferencing it.

p.MLflow = dsp.Spec.MLflow followed by p.MLflow.IntegrationMode panics whenever the field is omitted, because p.MLflow stays nil. Initialize a default config or guard the pointer before the mode check.

Minimal fix
-	p.MLflow = dsp.Spec.MLflow
+	if dsp.Spec.MLflow != nil {
+		p.MLflow = dsp.Spec.MLflow.DeepCopy()
+	} else {
+		p.MLflow = &dspa.MLflowConfig{
+			IntegrationMode: dspa.AutoDetect,
+		}
+	}
@@
-		if p.MLflow.IntegrationMode == dspa.AutoDetect {
+		if p.MLflow != nil && p.MLflow.IntegrationMode == dspa.AutoDetect {

Also applies to: 704-704

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/dspipeline_params.go` at line 675, p.MLflow is assigned from
dsp.Spec.MLflow but may be nil, causing a panic when later accessing
p.MLflow.IntegrationMode; modify the assignment sites (where p.MLflow =
dsp.Spec.MLflow is set) to either (a) check for nil and set p.MLflow to a
default MLflow config struct when dsp.Spec.MLflow == nil, or (b) guard
subsequent accesses by checking if p.MLflow != nil before reading
IntegrationMode (e.g., use conditional logic in the code path that reads
p.MLflow.IntegrationMode). Apply the same fix at both locations referenced (the
assignment at p.MLflow = dsp.Spec.MLflow and the second occurrence) so no
dereference happens on a nil pointer.

712-721: ⚠️ Potential issue | 🔴 Critical

This MLflow config is built and patched through the wrong path.

controllers/util/util.go:93-133 shows UpdateConfigMapByName uses field both as the ConfigMap data key and the JSON key passed to updateJSON, so passing "config.json" here cannot write the intended MLflow entry. This block also runs before CustomServerConfig is defaulted later in the function, and the empty if err != nil {} discards update failures. Build the final config.json after defaults are applied and write it through a helper that separates dataKey from jsonKey.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/dspipeline_params.go` around lines 712 - 721, The MLflow plugin
config is being built and patched too early and via the wrong JSON key: move the
BuildMLflowPluginConfigJson(...) + config patching logic to after
CustomServerConfig is defaulted, and replace the util.UpdateConfigMapByName(...)
call with a call to a helper that accepts separate dataKey and jsonKey (rather
than passing "config.json" for both) so the MLflow entry is written into the
correct JSON field; also handle and surface any error from the update (remove
the empty if err != nil {} and log/return the error). Ensure you reference
BuildMLflowPluginConfigJson, p.APIServer.CustomServerConfig.Name, p.Namespace
and util.UpdateConfigMapByName (or the new helper) when making the change.

704-715: ⚠️ Potential issue | 🟠 Major

Do not continue after MLflow endpoint/config generation fails.

This block only logs RetrieveMLflowEndpoint / BuildMLflowPluginConfigJson failures and keeps reconciling. That leaves the DSPA looking healthy while MLflow integration is skipped, and pluginCfg can still be built from an empty endpoint. Return an error or set a failing status condition here instead of continuing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/dspipeline_params.go` around lines 704 - 715, The code continues
reconciliation when RetrieveMLflowEndpoint or BuildMLflowPluginConfigJson fail
(leaving pluginCfg empty); change the error handling inside the
p.MLflow.IntegrationMode == dspa.AutoDetect branch so that on error you do not
proceed: after calling RetrieveMLflowEndpoint and after
BuildMLflowPluginConfigJson check the returned err and either return that error
from the enclosing reconcile function or set a failing DSPA status condition and
return, ensuring you do not continue with an empty mlflowEndpoint or pluginCfg;
target the calls to RetrieveMLflowEndpoint, BuildMLflowPluginConfigJson, and the
pluginCfg/mlflowEndpoint variables for the fix.
scripts/release/params.py (1)

50-50: ⚠️ Potential issue | 🟠 Major

MLFLOW defaults to malformed JSON.

"{\"}\"" expands to {"}", so this is neither an empty object ({}) nor a valid quoted JSON payload ("{}"). If the release params consumer parses MLFLOW as JSON like the neighboring config values, plugin initialization will fail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/release/params.py` at line 50, The MLFLOW default value is malformed
("{\"}\"") and should be a valid JSON payload; update the MLFLOW entry in
params.py (the "MLFLOW" mapping) to a proper JSON string representing an empty
object — e.g., replace "{\"}\"" with "{}" so consumers that JSON-parse the value
will receive an empty object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/dspipeline_params.go`:
- Line 45: The import github.com/opendatahub-io/mlflow-operator/api/v1 used in
controllers/dspipeline_params.go is not present in go.mod; update your module
dependencies by adding github.com/opendatahub-io/mlflow-operator at an
appropriate version (pin a specific release or commit) to go.mod and run go mod
tidy to populate go.sum; ensure the module version you choose is compatible with
the mlflowv1 import used in that file and commit the updated go.mod and go.sum
alongside this change.

---

Duplicate comments:
In `@api/v1/dspipeline_types.go`:
- Line 59: The MLflow field's JSON tag is wrong and missing a kubebuilder
marker: update the MLflow field (MLflow *MLflowConfig) to use the correct wire
name and omitempty (e.g. json:"mlflow,omitempty" to match the CRD) and add the
kubebuilder validation marker (e.g. +kubebuilder:validation:Optional)
immediately above the field so it is treated as optional in generated CRDs; then
regenerate CRDs to verify the wire shape matches the existing manifests.
- Around line 459-469: The MLflowConfig struct and IntegrationMode type must be
updated so the CRD can be generated: change type IntegrationMode from int to
string, keep the enum constants (AutoDetect, Disabled) as string values, add
explicit json tags on both IntegrationMode and InjectUserEnvVars in MLflowConfig
(e.g., `integrationMode` and `injectUserEnvVars`), and add kubebuilder
validation markers on the fields — include an enum marker for IntegrationMode
listing "AutoDetect" and "Disabled" and a default marker (e.g.,
default=AutoDetect), plus a default/optional marker for InjectUserEnvVars as
appropriate so controller-gen can produce correct CRD validation.

In `@controllers/dspipeline_params.go`:
- Line 675: p.MLflow is assigned from dsp.Spec.MLflow but may be nil, causing a
panic when later accessing p.MLflow.IntegrationMode; modify the assignment sites
(where p.MLflow = dsp.Spec.MLflow is set) to either (a) check for nil and set
p.MLflow to a default MLflow config struct when dsp.Spec.MLflow == nil, or (b)
guard subsequent accesses by checking if p.MLflow != nil before reading
IntegrationMode (e.g., use conditional logic in the code path that reads
p.MLflow.IntegrationMode). Apply the same fix at both locations referenced (the
assignment at p.MLflow = dsp.Spec.MLflow and the second occurrence) so no
dereference happens on a nil pointer.
- Around line 712-721: The MLflow plugin config is being built and patched too
early and via the wrong JSON key: move the BuildMLflowPluginConfigJson(...) +
config patching logic to after CustomServerConfig is defaulted, and replace the
util.UpdateConfigMapByName(...) call with a call to a helper that accepts
separate dataKey and jsonKey (rather than passing "config.json" for both) so the
MLflow entry is written into the correct JSON field; also handle and surface any
error from the update (remove the empty if err != nil {} and log/return the
error). Ensure you reference BuildMLflowPluginConfigJson,
p.APIServer.CustomServerConfig.Name, p.Namespace and util.UpdateConfigMapByName
(or the new helper) when making the change.
- Around line 704-715: The code continues reconciliation when
RetrieveMLflowEndpoint or BuildMLflowPluginConfigJson fail (leaving pluginCfg
empty); change the error handling inside the p.MLflow.IntegrationMode ==
dspa.AutoDetect branch so that on error you do not proceed: after calling
RetrieveMLflowEndpoint and after BuildMLflowPluginConfigJson check the returned
err and either return that error from the enclosing reconcile function or set a
failing DSPA status condition and return, ensuring you do not continue with an
empty mlflowEndpoint or pluginCfg; target the calls to RetrieveMLflowEndpoint,
BuildMLflowPluginConfigJson, and the pluginCfg/mlflowEndpoint variables for the
fix.

In `@scripts/release/params.py`:
- Line 50: The MLFLOW default value is malformed ("{\"}\"") and should be a
valid JSON payload; update the MLFLOW entry in params.py (the "MLFLOW" mapping)
to a proper JSON string representing an empty object — e.g., replace "{\"}\""
with "{}" so consumers that JSON-parse the value will receive an empty object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c0b1ce06-3eb8-40f1-b296-3fcdb30abe51

📥 Commits

Reviewing files that changed from the base of the PR and between 0eb2028 and 35558d9.

📒 Files selected for processing (6)
  • api/v1/dspipeline_types.go
  • config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml
  • controllers/dspipeline_params.go
  • controllers/util/util.go
  • main.go
  • scripts/release/params.py
✅ Files skipped from review due to trivial changes (1)
  • main.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml
  • controllers/util/util.go

@alyssacgoins alyssacgoins force-pushed the mlflow-plugin-configs branch from 35558d9 to 51b88cb Compare March 31, 2026 21:13
@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (1)
scripts/release/params.py (1)

50-50: ⚠️ Potential issue | 🔴 Critical

Fix malformed JSON default for MLFLOW.

Line 50 uses "{\"}\"", which is invalid JSON and will break any JSON parsing path for this option.

Proposed fix
-    "MLFLOW": "{\"}\"",
+    "MLFLOW": "{}",

As per coding guidelines, prioritize bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/release/params.py` at line 50, The MLFLOW default value is a
malformed JSON string ("{\"}\"") and will break JSON parsing; update the
"MLFLOW" entry to contain valid JSON (for example an empty object "{}" or
another appropriate JSON payload) and ensure the string is properly escaped to
match the surrounding file format so JSON parsing succeeds; locate the "MLFLOW"
key in scripts/release/params.py and replace the invalid value accordingly.
🧹 Nitpick comments (1)
controllers/dspipeline_params.go (1)

147-168: Unused mapstructure tags.

These struct fields have both json and mapstructure tags, but only json.Marshal is used in BuildMLflowPluginConfigJson. Remove unused tags to reduce noise:

 type PluginConfig struct {
-	Endpoint string               `json:"endpoint,omitempty" mapstructure:"endpoint"`
-	Timeout  string               `json:"timeout,omitempty" mapstructure:"timeout"`
-	TLS      TLSConfig            `json:"tls,omitempty" mapstructure:"tls"`
-	Settings MLflowPluginSettings `json:"settings,omitempty" mapstructure:"settings"`
+	Endpoint string               `json:"endpoint,omitempty"`
+	Timeout  string               `json:"timeout,omitempty"`
+	TLS      TLSConfig            `json:"tls,omitempty"`
+	Settings MLflowPluginSettings `json:"settings,omitempty"`
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/dspipeline_params.go` around lines 147 - 168, The struct tags
include unused `mapstructure` tags (in PluginConfig, TLSConfig,
MLflowPluginSettings) while only json.Marshal is used in
BuildMLflowPluginConfigJson; remove the `mapstructure:"..."` tags from the
fields (Endpoint, Timeout, TLS, Settings, InsecureSkipVerify, CABundlePath,
WorkspacesEnabled, ExperimentDescription, DefaultExperimentName, KFPBaseURL,
InjectUserEnvVars) so only `json` tags remain to reduce noise and keep
serialization consistent with BuildMLflowPluginConfigJson.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1/dspipeline_types.go`:
- Around line 58-59: The struct field MLflow *MLflowConfig currently has an
incorrect JSON tag `json:"mlflowConfig,omitEmpty"`; change it to
`json:"mlflow,omitempty"` (use lowercase "omitempty" and the CRD-aligned field
name "mlflow") so the tag matches the CRD and uses the correct omitempty option;
update the tag on the MLflow field (MLflow *MLflowConfig) accordingly.
- Around line 471-481: The MLflowConfig struct and its IntegrationMode type lack
JSON tags and use an int iota enum which breaks controller-gen; change
IntegrationMode from int to a string-based type (e.g., type IntegrationMode
string) with explicit constant values like "AUTODETECT" and "DISABLED" and add
JSON tags to MLflowConfig fields (e.g., `json:"integrationMode,omitempty"` and
`json:"injectUserEnvVars,omitempty"` referencing the IntegrationMode and
InjectUserEnvVars fields) and, if desired for CRD validation, add kubebuilder
validation markers on IntegrationMode (e.g., `//
+kubebuilder:validation:Enum=AUTODETECT;DISABLED`) so the CRD serializes as
strings and controller-gen no longer errors.

In
`@config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml`:
- Around line 917-931: The CRD's mlflow schema is missing type declarations and
uses PascalCase field names; update the mlflow block to include "type: object",
rename the properties "IntegrationMode" -> "integrationMode" and
"InjectUserEnvVars" -> "injectUserEnvVars" to follow Kubernetes/Go JSON tag
camelCase conventions, and add "type: string" to the integrationMode property
(keeping its enum values) while ensuring injectUserEnvVars retains "type:
boolean"; adjust any downstream schema references to these property names as
needed.

In `@config/internal/apiserver/default/rolebinding_pipeline-runner.yaml.tmpl`:
- Around line 17-33: The RoleBinding for mlflow-integration is created
unconditionally; wrap the RoleBinding block (the resource named
mlflow-integration that references ClusterRole mlflow-integration and
ServiceAccount pipeline-runner-{{.Name}}) in a template conditional that checks
MLflow is configured and not disabled (use the template variables .MLflow and
.MLflow.IntegrationMode and test that IntegrationMode != "DISABLED", e.g. an if
that requires .MLflow to be present and its IntegrationMode not equal to
"DISABLED"), so the RoleBinding is only rendered when MLflow integration is
enabled.

In `@controllers/dspipeline_params.go`:
- Around line 794-813: The MLflow plugin setup block can panic because
p.APIServer.CustomServerConfig may be nil when accessed; update the logic around
RetrieveMLflowEndpoint / BuildMLflowPluginConfigJson /
util.UpdateConfigMapByName so it only runs after p.APIServer.CustomServerConfig
is initialized (or add an explicit nil check on p.APIServer.CustomServerConfig
before using .Name). Concretely, either move the entire MLflow config block to
execute after the CustomServerConfig defaulting/initialization code, or guard
the existing block with if p.APIServer != nil && p.APIServer.CustomServerConfig
!= nil before calling util.UpdateConfigMapByName and building the plugin config.
- Around line 751-766: The MLflow handling will never match the Disabled
constant because the spec field's Go type/tags in api/v1/dspipeline_types.go are
incorrect (embedded field missing JSON tag and type mismatch between
IntegrationMode definitions); fix the API type by giving the embedded MLflow
spec a proper json tag and ensure IntegrationMode in the CRD type uses the same
underlying type as dspa.IntegrationMode (e.g., string if dspa.IntegrationMode is
a string) so deserialization works, then in controllers/dspipeline_params.go
read dsp.Spec.MLflow.IntegrationMode directly and compare to dspa.Disabled; also
simplify the boolean check by assigning mlflowCfg.InjectUserEnvVars =
dsp.Spec.MLflow.InjectUserEnvVars (no == true check).
- Around line 246-263: RetrieveMLflowEndpoint uses client.Get to read MLflow CRs
(mlflow.kubeflow.org) via the Kubernetes API, but the cluster Role in
config/rbac/role.yaml lacks permissions; update the Role to grant the
mlflow.kubeflow.org apiGroup access to the mlflows resource with verbs get,
list, and watch so client.Get in RetrieveMLflowEndpoint (and any calls to mlflow
resources) won't be denied by RBAC.

In `@controllers/util/util.go`:
- Around line 119-122: The check using cmField := cm.Data[field] conflates a
missing key with an empty value; change to use the presence form (cmField, ok :=
cm.Data[field]) and if !ok return fmt.Errorf("configmap field %s does not
exist", field), else if cmField == "" return fmt.Errorf("configmap field %s is
empty", field) so the error messages accurately reflect whether the key is
missing or present-but-empty (update the code around cmField, cm.Data and the
existing error returns).
- Around line 114-133: UpdateConfigMapByName currently passes a JSON string
returned by BuildMLflowPluginConfigJson into updateJSON causing double-encoding
(updateJSON does obj[field] = value then json.Marshal). Fix by unmarshaling the
JSON string into a Go value before calling updateJSON (or change updateJSON to
accept an interface{} and unmarshal there): parse value (the JSON text) into an
interface{} (e.g., var v interface{}; json.Unmarshal([]byte(value), &v)) and
pass that unmarshaled v to updateJSON so updateJSON assigns a structured object
to obj[field] instead of a string, avoiding the escaped JSON literal issue with
obj[field] = value.

In `@scripts/release/params.py`:
- Line 50: The release params file adds a "MLFLOW" entry that isn't being used
by kustomize; either wire it into kustomize vars or remove it: update
scripts/release/params.py to remove the "MLFLOW" key if you don't need it, or
add a matching var in config/base/kustomization.yaml (declare a varRef/data
entry for data.MLFLOW and include it in vars/name) so that data.MLFLOW is
available for manifest substitution; target the "MLFLOW" symbol in params.py and
the vars block in config/base/kustomization.yaml (referencing data.MLFLOW) when
making the change.

---

Duplicate comments:
In `@scripts/release/params.py`:
- Line 50: The MLFLOW default value is a malformed JSON string ("{\"}\"") and
will break JSON parsing; update the "MLFLOW" entry to contain valid JSON (for
example an empty object "{}" or another appropriate JSON payload) and ensure the
string is properly escaped to match the surrounding file format so JSON parsing
succeeds; locate the "MLFLOW" key in scripts/release/params.py and replace the
invalid value accordingly.

---

Nitpick comments:
In `@controllers/dspipeline_params.go`:
- Around line 147-168: The struct tags include unused `mapstructure` tags (in
PluginConfig, TLSConfig, MLflowPluginSettings) while only json.Marshal is used
in BuildMLflowPluginConfigJson; remove the `mapstructure:"..."` tags from the
fields (Endpoint, Timeout, TLS, Settings, InsecureSkipVerify, CABundlePath,
WorkspacesEnabled, ExperimentDescription, DefaultExperimentName, KFPBaseURL,
InjectUserEnvVars) so only `json` tags remain to reduce noise and keep
serialization consistent with BuildMLflowPluginConfigJson.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 7962928e-9fac-471d-ad34-bd737826f12f

📥 Commits

Reviewing files that changed from the base of the PR and between 35558d9 and 51b88cb.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum, !**/*.sum
📒 Files selected for processing (9)
  • api/v1/dspipeline_types.go
  • config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml
  • config/internal/apiserver/default/rolebinding_ds-pipeline.yaml.tmpl
  • config/internal/apiserver/default/rolebinding_pipeline-runner.yaml.tmpl
  • controllers/dspipeline_params.go
  • controllers/util/util.go
  • go.mod
  • main.go
  • scripts/release/params.py
✅ Files skipped from review due to trivial changes (3)
  • main.go
  • go.mod
  • config/internal/apiserver/default/rolebinding_ds-pipeline.yaml.tmpl

Comment on lines +246 to +263
func (p *DSPAParams) RetrieveMLflowEndpoint(ctx context.Context, client client.Client, log logr.Logger) (string, error) {
mlflowName := "mlflow"
mlflow := &mlflowv1.MLflow{}
namespacedName := types.NamespacedName{
Name: mlflowName,
Namespace: p.Namespace,
}
err := client.Get(ctx, namespacedName, mlflow)
if err != nil {
log.V(1).Info(fmt.Sprintf("Unable to retrieve mlflow resource [%s].", mlflowName))
return "", err
}
status := mlflow.Status
if status.Address != nil && status.Address.URL != "" {
return mlflow.Status.Address.URL, nil
}
return "", errors.New("MLflow resource missing Status.Address.URL field. Unable to resolve endpoint")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify RBAC rules for MLflow resources
echo "Checking for MLflow RBAC rules in role.yaml..."
rg -i "mlflow" config/rbac/role.yaml || echo "No MLflow rules found in role.yaml"

echo ""
echo "Checking MLflow operator API group..."
rg -n "GroupVersion" --type go -A2 | grep -i mlflow || echo "Could not determine MLflow API group"

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 247


🏁 Script executed:

#!/bin/bash
# Find MLflow type imports in the controller file
echo "=== Searching for MLflow imports in dspipeline_params.go ==="
rg "mlflow" controllers/dspipeline_params.go -i -B2 -A2

echo ""
echo "=== Looking for MLflow type definitions ==="
fd -e go -type f | xargs rg "type MLflow struct" -l

echo ""
echo "=== Inspecting role.yaml structure ==="
head -50 config/rbac/role.yaml

echo ""
echo "=== Checking full MLflow API references ==="
rg "mlflowv1\.|mlflow\..*io" --type go -l | head -5

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 4654


🏁 Script executed:

#!/bin/bash
# Search for MLflow CRD or operator information
echo "=== Looking for MLflow GroupVersion or API group ==="
rg "GroupVersion|group.*mlflow" --type go -i | head -20

echo ""
echo "=== Check if MLflow is optional or required ==="
grep -n "RetrieveMLflowEndpoint" controllers/dspipeline_params.go -A5 -B5 | head -30

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 1827


🏁 Script executed:

#!/bin/bash
# Search for any existing RBAC configurations for other CRs
echo "=== Current RBAC rules structure in role.yaml ==="
rg -A3 "^- apiGroups:" config/rbac/role.yaml | head -50

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 778


🏁 Script executed:

#!/bin/bash
# Check how argoproj RBAC is configured (as reference for external CRs)
echo "=== Argoproj RBAC rules in role.yaml ==="
rg -A5 "argoproj" config/rbac/role.yaml

echo ""
echo "=== Full list of apiGroups in role.yaml ==="
rg "^  - apiGroups:" -A1 config/rbac/role.yaml | grep -E "^  - apiGroups:|^    - " | head -30

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 380


🏁 Script executed:

#!/bin/bash
# Search for any references to mlflow operator API group in the codebase
echo "=== Looking for mlflow-operator imports and schema ==="
rg "github.com/opendatahub-io/mlflow-operator" --type go

echo ""
echo "=== Checking if there's documentation about MLflow API group ==="
fd -e md -type f | xargs rg "mlflow.*group|api.*group" -i 2>/dev/null | head -10

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 961


🏁 Script executed:

#!/bin/bash
# Check if there's a go.mod or go.sum that references mlflow-operator version
echo "=== MLflow operator dependency in go.mod ==="
rg "mlflow-operator" go.mod -A1 -B1

echo ""
echo "=== Check main.go for MLflow operator setup ==="
grep -n "mlflowv1" main.go -A3 -B3

echo ""
echo "=== Verify the exact import path and look for API group ==="
head -100 main.go | tail -50

Repository: opendatahub-io/data-science-pipelines-operator

Length of output: 2874


🌐 Web query:

mlflow-operator opendatahub api group schema

💡 Result:

The mlflow-operator from OpenDataHub uses the API group mlflow.kubeflow.org with version v1. The primary Custom Resource is MLflowConfig in the package github.com/opendatahub-io/mlflow-operator/api/mlflowconfig/v1. MLflowConfig is a namespaced singleton resource (metadata.name must be "mlflow") used to override artifact storage settings per namespace. It references spec.artifactRootSecret named "mlflow-artifact-connection". Example usage from documentation: apiVersion: mlflow.kubeflow.org/v1 kind: MLflowConfig metadata: name: mlflow spec: artifactRootSecret: mlflow-artifact-connection The full Go API schema documentation is available at pkg.go.dev/github.com/opendatahub-io/mlflow-operator/api/mlflowconfig/v1 (published 2026-02-18). Sample CRs are in the repository at config/samples/, including mlflow_v1_mlflow.yaml for OpenShift deployment with local storage. The operator deploys MLflow using internal Helm charts, supporting OpenDataHub (default) and RHOAI modes, with built-in Kubernetes auth and storage configuration for artifacts (local, PVC, S3-compatible).

Citations:


Add RBAC rules for MLflow CR access (CWE-862).

client.Get at line 253 retrieves the MLflow CR from the mlflow.kubeflow.org API group, but config/rbac/role.yaml contains no rules granting access. This will fail with 403 Forbidden when MLflow integration is enabled.

Add to config/rbac/role.yaml:

- apiGroups:
  - mlflow.kubeflow.org
  resources:
  - mlflows
  verbs:
  - get
  - list
  - watch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/dspipeline_params.go` around lines 246 - 263,
RetrieveMLflowEndpoint uses client.Get to read MLflow CRs (mlflow.kubeflow.org)
via the Kubernetes API, but the cluster Role in config/rbac/role.yaml lacks
permissions; update the Role to grant the mlflow.kubeflow.org apiGroup access to
the mlflows resource with verbs get, list, and watch so client.Get in
RetrieveMLflowEndpoint (and any calls to mlflow resources) won't be denied by
RBAC.

Comment on lines +114 to +133
func UpdateConfigMapByName(ctx context.Context, cfgName, ns string, client client.Client, field, value string) error {
cm, err := GetConfigMap(ctx, cfgName, ns, client)
if err != nil {
return fmt.Errorf("failed to get configmap %s/%s: %w", ns, cfgName, err)
}
cmField := cm.Data[field]
if cmField == "" {
return fmt.Errorf("configmap field %s does not exist", field)
}
updatedField, err := updateJSON(cmField, field, value)
if err != nil {
return fmt.Errorf("failed to update configmap field %s/%s: %w", ns, field, err)
}
cm.Data[field] = updatedField
err = UpdateConfigMap(ctx, cm, client)
if err != nil {
return fmt.Errorf("failed to update configmap %s/%s: %w", ns, cfgName, err)
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Double-encoding bug: JSON string value will be escaped.

At line 123, updateJSON(cmField, field, value) is called where value is already a JSON string (from BuildMLflowPluginConfigJson). Inside updateJSON, line 321 does obj[field] = value, then line 322 marshals the entire object. This double-encodes value - the JSON string becomes an escaped string literal.

Expected result: {"config.json": {"endpoint": "..."}}
Actual result: {"config.json": "{\"endpoint\": \"...\"}"}

Fix by unmarshaling value before assignment:

-func updateJSON(jsonStr string, field string, value any) (string, error) {
+func updateJSON(jsonStr string, field string, value string) (string, error) {
 	var obj map[string]any
 	if err := json.Unmarshal([]byte(jsonStr), &obj); err != nil {
 		return "", fmt.Errorf("invalid JSON input: %w", err)
 	}
-	obj[field] = value
+	var parsed any
+	if err := json.Unmarshal([]byte(value), &parsed); err != nil {
+		return "", fmt.Errorf("invalid JSON value: %w", err)
+	}
+	obj[field] = parsed
 	out, err := json.Marshal(obj)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/util/util.go` around lines 114 - 133, UpdateConfigMapByName
currently passes a JSON string returned by BuildMLflowPluginConfigJson into
updateJSON causing double-encoding (updateJSON does obj[field] = value then
json.Marshal). Fix by unmarshaling the JSON string into a Go value before
calling updateJSON (or change updateJSON to accept an interface{} and unmarshal
there): parse value (the JSON text) into an interface{} (e.g., var v
interface{}; json.Unmarshal([]byte(value), &v)) and pass that unmarshaled v to
updateJSON so updateJSON assigns a structured object to obj[field] instead of a
string, avoiding the escaped JSON literal issue with obj[field] = value.

Comment on lines +119 to +122
cmField := cm.Data[field]
if cmField == "" {
return fmt.Errorf("configmap field %s does not exist", field)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty string check doesn't distinguish missing key from empty value.

cm.Data[field] returns empty string for both missing keys and keys with empty values. The error message claims the field "does not exist" but that's not necessarily true.

-	cmField := cm.Data[field]
-	if cmField == "" {
-		return fmt.Errorf("configmap field %s does not exist", field)
+	cmField, exists := cm.Data[field]
+	if !exists {
+		return fmt.Errorf("configmap field %s does not exist", field)
+	}
+	if cmField == "" {
+		return fmt.Errorf("configmap field %s is empty", field)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmField := cm.Data[field]
if cmField == "" {
return fmt.Errorf("configmap field %s does not exist", field)
}
cmField, exists := cm.Data[field]
if !exists {
return fmt.Errorf("configmap field %s does not exist", field)
}
if cmField == "" {
return fmt.Errorf("configmap field %s is empty", field)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/util/util.go` around lines 119 - 122, The check using cmField :=
cm.Data[field] conflates a missing key with an empty value; change to use the
presence form (cmField, ok := cm.Data[field]) and if !ok return
fmt.Errorf("configmap field %s does not exist", field), else if cmField == ""
return fmt.Errorf("configmap field %s is empty", field) so the error messages
accurately reflect whether the key is missing or present-but-empty (update the
code around cmField, cm.Data and the existing error returns).

"FIPSENABLED": "false",
"WEBHOOK_ANNOTATIONS": ""
"WEBHOOK_ANNOTATIONS": "",
"MLFLOW": "{\"}\"",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire MLFLOW through kustomize vars or remove it from release params.

Line 50 adds MLFLOW, but config/base/kustomization.yaml var definitions do not include data.MLFLOW, so this setting is currently inert for manifest substitution.

Proposed wiring in config/base/kustomization.yaml
 vars:
+  - name: MLFLOW
+    objref:
+      kind: ConfigMap
+      name: dspo-parameters
+      apiVersion: v1
+    fieldref:
+      fieldpath: data.MLFLOW

As per coding guidelines, prioritize architectural issues and anti-patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/release/params.py` at line 50, The release params file adds a
"MLFLOW" entry that isn't being used by kustomize; either wire it into kustomize
vars or remove it: update scripts/release/params.py to remove the "MLFLOW" key
if you don't need it, or add a matching var in config/base/kustomization.yaml
(declare a varRef/data entry for data.MLFLOW and include it in vars/name) so
that data.MLFLOW is available for manifest substitution; target the "MLFLOW"
symbol in params.py and the vars block in config/base/kustomization.yaml
(referencing data.MLFLOW) when making the change.

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml (1)

613-627: Previous schema issues resolved; add a description for the mlflow object.

The past review findings (missing type: object, missing type: string on integrationMode, PascalCase field names) have all been addressed. The schema is now structurally correct.

One remaining gap: the mlflow property itself lacks a description field. Other peer properties like mlmd (line 628) also omit this, but for a new user-facing feature, a brief description improves discoverability.

Suggested description addition
             mlflow:
+              description: |-
+                MLflow can be optionally enabled as an API server plugin for tracking
+                pipeline runs in MLflow experiments.
               properties:
                 injectUserEnvVars:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml`
around lines 613 - 627, The mlflow property in the CRD schema is missing a
user-facing description; update the mlflow object (the block that defines
properties injectUserEnvVars and integrationMode) to include a concise
description explaining what the mlflow section configures (e.g., MLflow
integration settings for KFP, toggling plugin integration and whether to inject
user env vars), so consumers see purpose and usage when viewing the CRD.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml`:
- Around line 613-627: The mlflow property in the CRD schema is missing a
user-facing description; update the mlflow object (the block that defines
properties injectUserEnvVars and integrationMode) to include a concise
description explaining what the mlflow section configures (e.g., MLflow
integration settings for KFP, toggling plugin integration and whether to inject
user env vars), so consumers see purpose and usage when viewing the CRD.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 86a704a2-06d1-4d23-a666-b484e471d9ba

📥 Commits

Reviewing files that changed from the base of the PR and between 51b88cb and 3ac3fed.

📒 Files selected for processing (4)
  • api/v1/dspipeline_types.go
  • api/v1/zz_generated.deepcopy.go
  • config/crd/bases/datasciencepipelinesapplications.opendatahub.io_datasciencepipelinesapplications.yaml
  • config/internal/apiserver/default/rolebinding_pipeline-runner.yaml.tmpl
🚧 Files skipped from review as they are similar to previous changes (2)
  • config/internal/apiserver/default/rolebinding_pipeline-runner.yaml.tmpl
  • api/v1/dspipeline_types.go

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

4 similar comments
@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

// Indicates whether to enable MLflow plugin on API server if plugin installed on cluster,
// or disable entirely.
// +kubebuilder:validation:Optional
IntegrationMode IntegrationMode `json:"integrationMode,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you set default to autodetect

// +kubebuilder:default=AUTODETECT

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, added // +kubebuilder:default=false for InjectUserEnvVars as well.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from humairak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

@alyssacgoins alyssacgoins force-pushed the mlflow-plugin-configs branch from 5f36953 to 448ce67 Compare April 2, 2026 19:29
@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

Signed-off-by: Alyssa Goins <agoins@redhat.com>
@alyssacgoins alyssacgoins force-pushed the mlflow-plugin-configs branch from 448ce67 to f5c9ad7 Compare April 2, 2026 19:34
@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

Signed-off-by: Alyssa Goins <agoins@redhat.com>
@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

Signed-off-by: Alyssa Goins <agoins@redhat.com>
Signed-off-by: Alyssa Goins <agoins@redhat.com>
@dsp-developers
Copy link
Copy Markdown
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-1000

Comment on lines +28 to +29
"Plugins": {
"mlflow": {{ if .APIServerPluginsJson }}
Copy link
Copy Markdown
Contributor

@HumairAK HumairAK Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this resulting in a valid json? it looks like it's missing a close } I think you fixed this already. The following still applies though:

Can we add a unit test for the rendering of server-config.yaml.tmpl with the new MLflow config? The declarative test framework already supports this, it should just be a matter of adding an expected ConfigMap to one of the existing test cases under controllers/testdata/declarative/, or adding a focused assertion in apiserver_test.go. Minimal effort given the existing infrastructure.

Comment on lines +279 to +282
if isCRDAvailable(restCfg, mlflowv1.GroupVersion.String()) {
mlflowSyncPeriod := 5 * time.Minute
mgrOpts.Cache.ByObject[&mlflowv1.MLflow{}] = cache.ByObject{SyncPeriod: &mlflowSyncPeriod}
setupLog.Info("MLflow CRD detected, caching MLflow resources", "syncPeriod", mlflowSyncPeriod)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(AI Curated response)

The isCRDAvailable check in main.go only runs at operator startup. If the MLflow CRD is installed after DSPO is already running, it won't be detected and the cache won't be configured — requiring an operator restart.

Instead of relying on the manager cache for the MLflow CR, consider fetching the MLflow endpoint directly via an uncached API call in RetrieveMLflowEndpoint and caching the result in-memory with a TTL (e.g. a simple struct with sync.Mutex, the cached
value, and an expiry timestamp). This way:

  1. No startup-time CRD detection needed — the lookup happens lazily during reconciliation.
  2. The MLflow CRD/CR can be installed at any time without restarting DSPO.
  3. The TTL controls how often you hit the API server, giving you the same performance benefit as the cached approach.

The mlflowv1 scheme registration can stay since it's needed to deserialize the MLflow CR, but the manager cache configuration and isCRDAvailable check in main.go could be removed.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 7, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants