Commit 31f5258
[OPIK-5759] [BE/FE] feat: multi-provider LLM-as-judge support for eval suite assertions (#6167)
* [OPIK-5759] [BE/FE] feat: multi-provider LLM-as-judge support for eval suite assertions
Support OpenAI, Anthropic, and Gemini as LLM-as-judge providers for eval suite
assertions. Previously only OpenAI was implicitly supported. The model is resolved
from connected providers using a priority order (OpenAI > Anthropic > Gemini).
Backend:
- Add SupportedJudgeProvider enum with provider-to-model mapping
- Move provider resolution from config into EvalSuiteAssertionSampler
- Keep EvalSuiteEvaluatorMapper as a pure data transformer
- Remove unused defaultModelName from EvalSuiteConfig
- Fix: don't send snakeCased config keys to backend (camelCase expected)
Frontend:
- Add provider validation: disable run button when no supported provider is connected
- Pass pre-computed boolean to RunOnDatasetDialog instead of raw provider keys
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(eval-suite): improve evaluator mapper clarity and add unit tests
- Merge deserialization and model assignment into deserializeScoringCode()
- Remove separate setModel/withModel method
- Add parameterized unit tests for resolveModel() provider priority
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: remove FE provider blocking, add backend model fallback from trace metadata
- Remove frontend logic that blocked eval suite runs without a supported
LLM provider (revert llm.ts, PlaygroundHeader, RunOnDatasetDialog)
- Resolve model once per batch on backend: try connected providers
(OpenAI > Anthropic > Gemini > Vertex AI), fall back to completion
task model from trace metadata
- Add SupportedJudgeProvider enum referencing model name enums for
compile-time safety
- Store eval_suite_model in trace metadata for fallback resolution
- Add Vertex AI as supported eval suite judge provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review comments for eval suite model resolution
- Extract SupportedJudgeProvider to its own file
- Use streams instead of for-loop in resolveModel
- Preserve model parameters (temperature, seed, customParameters) when overriding name
- Guard against null modelName with early return and warning log
- Replace hardcoded model strings in tests with enum constants
- Use imported Context instead of qualified reactor.util.context.Context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add proper import for reactor.util.context.Context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update EvalSuiteAssertionSamplerTest for null model guard
- Mock connected OpenAI provider in setUp so existing tests pass
- Add test for eval_suite_model metadata fallback when no provider connected
- Add test for early return when neither provider nor metadata model available
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 95ed74c commit 31f5258
File tree
10 files changed
+335
-39
lines changed- apps
- opik-backend
- src
- main/java/com/comet/opik
- api/resources/v1/events
- domain
- infrastructure
- test/java/com/comet/opik
- api/resources/v1/events
- domain
- opik-frontend/src/api/playground
10 files changed
+335
-39
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
595 | 595 | | |
596 | 596 | | |
597 | 597 | | |
598 | | - | |
599 | | - | |
600 | | - | |
601 | 598 | | |
602 | 599 | | |
603 | 600 | | |
| |||
Lines changed: 42 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
| 7 | + | |
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
| |||
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
| 16 | + | |
14 | 17 | | |
15 | 18 | | |
16 | 19 | | |
| |||
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
| 27 | + | |
24 | 28 | | |
25 | 29 | | |
26 | 30 | | |
| |||
30 | 34 | | |
31 | 35 | | |
32 | 36 | | |
| 37 | + | |
33 | 38 | | |
| 39 | + | |
34 | 40 | | |
35 | 41 | | |
36 | 42 | | |
| |||
56 | 62 | | |
57 | 63 | | |
58 | 64 | | |
| 65 | + | |
59 | 66 | | |
60 | 67 | | |
61 | 68 | | |
| |||
64 | 71 | | |
65 | 72 | | |
66 | 73 | | |
67 | | - | |
| 74 | + | |
| 75 | + | |
68 | 76 | | |
69 | 77 | | |
70 | 78 | | |
71 | 79 | | |
72 | 80 | | |
73 | 81 | | |
| 82 | + | |
74 | 83 | | |
75 | 84 | | |
76 | 85 | | |
| |||
83 | 92 | | |
84 | 93 | | |
85 | 94 | | |
86 | | - | |
| 95 | + | |
87 | 96 | | |
88 | 97 | | |
89 | 98 | | |
90 | 99 | | |
91 | 100 | | |
92 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
93 | 115 | | |
94 | 116 | | |
95 | 117 | | |
| |||
120 | 142 | | |
121 | 143 | | |
122 | 144 | | |
123 | | - | |
| 145 | + | |
124 | 146 | | |
125 | 147 | | |
126 | 148 | | |
| |||
134 | 156 | | |
135 | 157 | | |
136 | 158 | | |
137 | | - | |
| 159 | + | |
| 160 | + | |
138 | 161 | | |
139 | 162 | | |
140 | 163 | | |
| |||
191 | 214 | | |
192 | 215 | | |
193 | 216 | | |
194 | | - | |
| 217 | + | |
| 218 | + | |
195 | 219 | | |
196 | 220 | | |
197 | 221 | | |
| |||
202 | 226 | | |
203 | 227 | | |
204 | 228 | | |
205 | | - | |
| 229 | + | |
206 | 230 | | |
207 | 231 | | |
208 | 232 | | |
| |||
230 | 254 | | |
231 | 255 | | |
232 | 256 | | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
233 | 269 | | |
Lines changed: 14 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
24 | | - | |
25 | 23 | | |
26 | 24 | | |
27 | 25 | | |
| |||
54 | 52 | | |
55 | 53 | | |
56 | 54 | | |
57 | | - | |
| 55 | + | |
| 56 | + | |
58 | 57 | | |
59 | 58 | | |
60 | 59 | | |
| |||
66 | 65 | | |
67 | 66 | | |
68 | 67 | | |
69 | | - | |
| 68 | + | |
70 | 69 | | |
71 | 70 | | |
72 | 71 | | |
| |||
84 | 83 | | |
85 | 84 | | |
86 | 85 | | |
87 | | - | |
88 | | - | |
89 | | - | |
| 86 | + | |
| 87 | + | |
90 | 88 | | |
91 | 89 | | |
92 | 90 | | |
93 | 91 | | |
94 | 92 | | |
95 | | - | |
96 | | - | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
97 | 102 | | |
98 | 103 | | |
99 | 104 | | |
| |||
160 | 165 | | |
161 | 166 | | |
162 | 167 | | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | 168 | | |
apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/events/SupportedJudgeProvider.java
Lines changed: 41 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| |||
Lines changed: 0 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
7 | 6 | | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
11 | 10 | | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | 11 | | |
16 | 12 | | |
17 | 13 | | |
| |||
0 commit comments