Releases: lablup/backend.ai
26.4.4rc2
Features
- Migrate image project membership check from
association_groups_userstoassociation_scopes_entities(ASE). (#11357)
Fixes
- Auto-load
.envfrom the working directory in the v2 CLI, restoring the v1 behavior whereBACKEND_*variables are picked up implicitly. (#11327) - Apply domain and keypair-resource-policy
allowed_vfolder_hostswhen creating a project-owned vfolder. (#11347) - Expose missing image, cluster, resource, and execution fields on the GraphQL
CreateDeploymentRevisionPresetInputandUpdateDeploymentRevisionPresetInput, and fix the response type mismatch onPresetExecutionSpec.environso that querying environment variables on a deployment revision preset no longer fails. (#11354) - Reject negative resource slot quantities at
ResourceSlot._normalize_valueso that requests likecpu: -1fail fast with HTTP 400 (InvalidResourceSlotQuantity) at the API boundary instead of being deferred to the scheduler. (#11365) - Replace the misleading
desired_replica_count/desired_replicasinput field withreplica_counton deployment create/update across REST v1, GraphQL v2, and CLI. The previous name accepted an autoscaling-internal target that was overwritten on the next reconcile tick; clients now set the user-controllablereplica_countdirectly, matching the data layer and theendpoints.replicascolumn. Response DTOs continue to exposedesired_replica_countas a read-only view of the current scheduling state. (#11367) - Fix superadmin session creation in
model-storeafter the RBAC scope-binding migration by backfillingassociation_scopes_entitiesrows in seed fixtures. (#11368) - Wipe a deployment's access tokens (
endpoint_tokens) in the same transaction that flips the endpoint toDESTROYING, so destroyed deployments no longer leave behind never-expiring token rows that were still resolvable through direct lookups. (#11369) - Evict orphan-revision routes (active routes whose
revision_idis neither the endpoint'scurrent_revisionnor itsdeploying_revision) throughRouteEvictionHandleralongside the existing scaling-group health-policy eviction, so a preempted rollout no longer leaves behind PROVISIONING / RUNNING routes pointing at a stale revision. (#11370) - Allow re-activating a deployment revision while another rollout is still in progress:
set_deploying_revisionnow overrides any in-flightdeploying_revisionandactivate_revisionno longer raisesDeploymentAlreadyInProgress. Routes belonging to the preempted rollout are cleaned up on the next route tick byRouteEvictionHandler's orphan-revision branch. (#11371)
Miscellaneous
- Drop Intel macOS (x86_64) builds from CI; only Apple Silicon (arm64) installers are produced going forward. (#11361)
- Always restart the Apollo Router halfstack container in
scripts/refresh-graphql-gateway.shinstead of asking for an interactivey/Nconfirmation, so the script can be run unattended and never silently skips the restart. (#11363)
Full Changelog
Check out the full changelog until this release (26.4.4rc2).
Full Commit Logs
Check out the full commit logs between release (26.4.4rc1) and (26.4.4rc2).
26.4.3
Features
- Add
scoperesolver onEntityRefGQLso RBAC role scope connections can resolve the scope target (e.g., project, domain) directly in GraphQL. (#11107) - Add
session_v2(id)GraphQL single-node query with RBAC-enforced single-session reads across GraphQL, REST v2, SDK, and CLI. (#11124) - Add image_id FK to kernels and image_ids to sessions for UUID-based image references (#11125)
- Add sglang runtime variant presets (#11129)
- Add admin API to refresh revisions for all active deployments, rebuilding each revision through
DeploymentControllerso preset, deployment-config, and model_definition are re-resolved. Partial success is reported per deployment. (#11134) - Add missing filter and order fields to deployment and revision search APIs, and fix
DeploymentOrders.updated_atcrash. (#11154) - Expose
deploying_revisionon theModelDeploymentGraphQL node so clients can observe the revision currently being rolled out alongsidecurrent_revision. (#11156) - Support partial
model_definitioninput from preset, vfoldermodel-definition.yaml, and request override via the newModelDefinitionDrafttype; the merged draft is resolved into the strictModelDefinitiononly at the persistence boundary. (#11167)
Improvements
- Add lazy resolver fields for FK references across GQL Node types and deprecate legacy Graphene stub resolvers with v2 dataloader-based alternatives. (#11120)
- Make DeploymentController the single authority for revision creation and activation, ensuring consistent preset application, RBAC, deployment strategy, and concurrency guards across all API paths (v2 and legacy). (#11126)
- Unify legacy and v2 deployment creation through
DeploymentController, removing thecurrent_revision-direct-assignment bug that bypassed the DEPLOYING strategy lifecycle on initial deploy and dropping the now-redundantCHECK_PENDINGlifecycle stage. (#11167)
Fixes
- Add retry logic to
etcd_put_jsonin TUI installer to handle etcd not being ready after halfstack startup (#10905) - Add RBAC validation to v2 vfolder GET endpoint (#11062)
- Grant
project_admin_page:read/domain_admin_page:readpermissions to the auto-generated admin role when a new project or domain is created. (#11074) - Add
--waitflag to halfstackdocker compose upto ensure etcd passes its healthcheck before the installer proceeds to configuration, preventing a gRPC race condition duringconfigure_manager()(#11081) - Fix prometheus query preset fixture failing with "Unconsumed column names: category" by using category_name alias with FixtureReferenceSpec (#11086)
- Fix missing enum filter handling across deployment, session, kernel, vfolder, audit-log, login-session, and login-history domains, and standardize all enum filters to support equals/in/not_equals/not_in operators consistently. (#11092)
- Fix auto-scaling rule
last_triggered_atreturning fake timestamps instead of null, and addNullableDateTimeFilterfor filtering nullable datetime columns (#11102) - Add independent
cookie_secureconfig under[security]to set the Secure flag on session cookies, decoupled fromssl_enabledfor reverse proxy SSL termination environments. (#11105) - Increase default health check initial delay from 5 minutes to 30 minutes for all runtime variant generators to prevent premature failures during large model loading. (#11108)
- Automatically sync RBAC project-member role bindings when users are added to or removed from a project via
modifyGroupormodifyUser. (#11116) - Fix project creation to also create the member system role and backfill member roles for existing projects that were missing one. (#11118)
- Fix GQL serialization error for AgentStatus enum in agentsV2 query. (#11127)
- Correct category_id type to UUID in QueryDefinitionGQL and add GQL query/mutation support for prometheus query preset categories. (#11130)
- Fix alembic merge migration declaring non-head ancestors as parents, which caused
alembic upgrade headto fail. (#11131) - Fix inflated
total_countin theadmin_rolesGraphQL query caused by an unused LEFT JOIN onObjectPermissionRow. (#11132) - Allow auto scaling rule updates to clear nullable fields (
min_threshold,max_threshold,min_replicas,max_replicas,prometheus_query_preset_id) by sending an explicitnull, while keeping omitted fields unchanged. (#11137) - Validate that scope_id is a valid UUID for USER/PROJECT scope types in RBAC adapter, preventing email addresses from being stored as scope_id (#11138)
- Add RBAC validation to v2 session GET endpoint using SingleEntityActionProcessor (#11143)
- Fix incorrect vLLM default values in runtime variant preset fixture to match upstream defaults (#11144)
- Populate
deployment_revisions.model_definitionon both legacy endpoint creation (POST /func/services) and modify flows by running them through the unified revision merge pipeline so all sources —deployment-config.yaml, revision preset,model-definition.yaml, and request — flow throughRevisionDraft. On modify, the current revision is used as the lowest-priority base so untouched fields are preserved while yaml/preset refreshes remain authoritative. (#11145) - Fix
./bai admin deployment revision refreshfailing with aTypeErroron every deployment after the revision-merge pipeline refactor. (#11148) - Replace the non-existent
namefilter on deployment revisions with arevision_numberfilter and ordering across the DTO, GraphQL, REST, and CLI layers (./bai deployment revision search --name-containsis replaced by--revision-number). (#11150) - Rename the
ModelDeployment.createdUser/createdUserV2GraphQL fields to a singlecreatorfield. (#11152) - Backfill missing role-to-scope mappings in
association_scopes_entitiesfor migration-created SYSTEM roles so that GraphQL scope resolution no longer returns null (#11159) - Fix Prometheus range query 502 errors by accepting timezone-aware datetimes or Unix timestamps in CLI execute inputs (#11163)
- Populate revision-level fields on the legacy GQL endpoint response during the initial DEPLOYING phase by falling back to
deploying_revisionwhencurrent_revisionis unset, exposeresource_slotson the v2 revision response, and stop hard-coding cluster mode / size / runtime variant in the model-card and vfolderdeployadapters so the revision preset's values are no longer silently overridden. (#11167)
Documentation Updates
- Add data migration testing guideline to alembic CLAUDE.md. (#10936)
Miscellaneous
- Add a convenience script (
scripts/refresh-graphql-gateway.sh) to regenerate the GraphQL schema, copy it to the project root, and optionally restart the Apollo Router gateway in one step. (#11091)
Full Changelog
Check out the full changelog until this release (26.4.3).
Full Commit Logs
Check out the full commit logs between release (26.4.2) and (26.4.3).
26.4.2
Features
- Add optional
activateflag toadd_model_revisionAPI (#10468) - Add TCP appproxy worker installation support in dev installer (#10650)
- Add the
login_client_typestable, model, data dataclass, and repository so administrators can register and manage login client types at runtime. (#10822) - Add
owner_id(delegated user UUID) toEnqueueSessionInputfor delegated session ownership when enqueuing v2 sessions. (#10845) - Expose the
login_client_typesentity via the Strawberry GraphQL schema:loginClientType(id)single query,loginClientTypesConnection query with filter/order/pagination, andcreateLoginClientType/updateLoginClientType/deleteLoginClientTypemutations. (#10876) - Add
./baiCLI v2 commands forlogin_client_types:./bai login-client-type list/get(any authenticated user) and./bai admin login-client-type create/update/delete(super admin only). (#10878) - Add
--otel-endpointand--metric-access-cidroptions to TUI installer, configure announce-addr for manager/agent/storage-proxy, and add[otel]blocks to app-proxy halfstack configs (#10880) - Add vLLM runtime variant preset fixtures with automatic runtime_variant_name FK resolution in fixture populate (#10889)
- Add the
login_client_typeservice layer, v2 DTOs, and an admin-only search path (LoginClientTypeAdminRepository/LoginClientTypeAdminService/LoginClientTypeAdminProcessors) with filtering, ordering, and pagination support viaBatchQuerier. (#10923) - Add REST v2 CRUD endpoints for the
login_client_typesentity at/v2/login-client-types/, including a/v2/login-client-types/searchendpoint with filtering, ordering, and pagination support. (#10924) - Replace the hard-coded
LoginClientTypeenum with a foreign-key reference to thelogin_client_typestable in login sessions, allowing administrators to manage client types dynamically. (#10925) - Add Client SDK v2 domain client and CLI v2 commands for the
login_client_typesentity:./bai login-client-type get,./bai admin login-client-type search/create/update/delete. (#10942) - Add PROMETHEUS auto-scaling metric source that queries Prometheus directly via query presets, with bidirectional scaling support (scale-out/in thresholds in a single rule). (#10993)
- Add
user_idfilter to login session admin search andadmin_unblock_userAPI to clear failed-login rate limit blocks (#11011) - Add
creator_idcolumn to vfolders and wire VFolder ownership GQL resolvers (user, project, creator) to DataLoaders for proper entity resolution. (#11018) - Add deployment-scoped Prometheus query presets with category system, description, rank, and vLLM example fixtures (#11072)
Improvements
- Delete login session rows on termination and record full session lifecycle events in login history (#11013)
- Add explicit LabelMatcher to Prometheus query presets to support regex matching operators (#11025)
Fixes
- Rename
TooManyConcurrentLoginSessionserror type fromtoo-many-concurrent-loginstoactive-login-session-existsto match actual error semantics (#5691) - Fix imagify API handler that incorrectly parsed POST body as query parameters by switching from QueryParam to BodyParam (#5694)
- Return HTTP 409 (Conflict) instead of 429 (Too Many Requests) for
TooManyConcurrentLoginSessionserror (#10992) - Re-read model definition from vfolder when legacy
modify_endpointcreates a new revision, so on-disk file changes are reflected. Also triggerCHECK_REPLICAlifecycle on revision-level field changes to notify the deployment controller. (#10994) - Fix OIDC AUTHORIZE hook to read sToken from hook params before falling back to cookies, enabling token-login flow via JSON body. (#11002)
- Fix
GET /stream/session/{name}/execute500 error by sharing a singlePrivateContextbetween the stream handler and its lifecycle hook, sostream_execute_handlersis initialized on the instance the handler reads at request time. (#11003) - Fix per-container CUDA metric collection failing due to missing
container.show()call ingather_container_measures(#11006) - Fix double /func/ prefix in session-mode GQL path causing HTTP 404 (#11007)
- Fix 500 Internal Server Error when creating a session with an invalid or non-member project group by replacing plain
ValueErrorwith properBackendAIErrorsubclasses inquery_userinfo(). (#11012) - Fix RBAC action validators silently bypassing permission denials; legacy processor paths now observe denials via log and metric instead of raising. (#11014)
- Fix TERMINATED transition hook blocking session termination when model-definition.yaml is missing from storage for custom-runtime inference sessions. (#11019)
- Fix endpoint destroy failing with
UniqueViolationErroronix_endpoints_unique_name_when_not_destroyedby narrowing the partial unique index predicate to exclude DESTROYING/DESTROYED states. (#11020) - Make
client_type_idoptional inAuthorizeRequestso clients that do not specify a login client type (e.g., WebUI) can still authenticate, and add the missing migration for thelogin_client_type_idcolumn on thelogin_sessionstable. (#11022) - Fix GQL user adapter to handle
not_equalsandnot_inoperations in status and role filter conversion, which were previously silently ignored. (#11024) - Fix route health initial_delay calculation to use running_at instead of route creation time, preventing premature session termination for custom runtime variants with long model loading times. (#11029)
- Add missing
server_defaulttoimages.last_used_atcolumn so that new image rows without an explicitlast_used_atvalue no longer violate the NOT NULL constraint. (#11031) - Fix endpoint status to reflect route health check results instead of only lifecycle status (#11033)
- Set Secure flag on session cookie when SSL is enabled. (#11035)
- Fix Pydantic validation error when using
orderByin deployment-related GraphQL queries (autoScalingRules,deployments,replicas,accessTokens) (#11037) - Fix Prometheus metrics silently missing on Linux by separating the multiprocess setup module to prevent import-time
ValueClassmisfire. (#11038) - Fix orphan
login_sessionsrows after WebUI logout when authenticated via the keypair (sToken) login flow. (#11042) - Fix
TypeErrorin TOTP hook during stoken login by using attribute access on the user Row object. (#11064) - Handle null
HostConfig.DeviceRequestsfrom Docker API in CUDA container measures to preventTypeError. (#11070) - Bypass RBAC permission checks for superadmin users in all action validators so superadmin operations (e.g. project creation) no longer fail with
NotEnoughPermission. (#11071) - Add RBAC validation to deployment get/update/destroy, fix keypair resource policy lookup by wrong column, and move resource-group CLI commands to admin scope (#11076)
Test Updates
- Add component test verifying that exceeding
max_concurrent_loginsreturns HTTP 409 Conflict (#10997)
Full Changelog
Check out the full changelog until this release (26.4.2).
Full Commit Logs
Check out the full commit logs between release (26.4.1) and (26.4.2).
26.4.1
Fixes
- Fix Pydantic validation error when creating ModelCard with null framework, label, or accessLevel fields via GraphQL (#10921)
- Fix model service creation failing with Pydantic validation error when using fractional
cuda.sharesresource values (e.g., 2.5) (#10929) - Fix backfill migration referencing dropped
permission_groupstable; use denormalized permissions schema instead. (#10933) - Fix migration failure on BinarySize-suffixed resource_slots values (e.g. "32g", "4m"). (#10934)
- Fix superadmin unable to see other users' vfolders via vfolder_nodes GQL query due to empty ADMIN_PERMISSIONS (#10939)
- Skip event deserialization in event dispatcher when no consumer or subscriber is registered, preventing
ModuleNotFoundErrorin appproxy coordinator (#10941) - Fix legacy GQL endpoint resolvers crashing when routings is empty by using
is not Nonecheck instead of truthiness check, and add missingload_routesinload_all. (#10948) - Fix CLI v2
RuntimeError: no running event loopcrash on aiohttp >= 3.13 by deferringCookieJarcreation to an async context (#10954) - Fix IndexError in health check handlers caused by incompatible
web.Requestannotation in_wrap_api_handler; now useRequestCtxparameter type. (#10958) - Fix
ModelDefinition.merge()corruptingstart_commandvia index-based list merging by replacingdeep_merge()with Pydantic-aware field-by-field merge functions (#10959) - Normalize
Noneroutings to empty list in endpointto_data()andfrom_dto()to fix NoneType iteration crashes (#10965) - Fix GQL
my_client_ipreturning the hive-gateway proxy IP by forwarding theX-Forwarded-Forheader from hive-gateway to manager subgraph requests (note:allowed_client_ipconfigurations that whitelisted the hive-gateway IP as a workaround should be reviewed, as the manager will now see the real client IP via GQL) (#10966) - Expose
AND/OR/NOTcomposition on theModelCardV2FilterGraphQL input so composed filter queries no longer fail withField "AND" is not defined(#10970)
Full Changelog
Check out the full changelog until this release (26.4.1).
Full Commit Logs
Check out the full commit logs between release (26.4.0) and (26.4.1).
26.4.0
Features
v2 API, SDK & CLI
Delivered REST v2 endpoints for all 26 API domains, migrated GraphQL to Strawberry-backed Pydantic types with PydanticNodeMixin and domain Adapters, and added the v2 client SDK and CLI with entity-command structure covering admin CRUD, user self-service, and raw GraphQL operations.
- Add DataLoader for batched role assignment queries by user ID and
my_rolesfield on UserV2 to prevent N+1 queries. (#9552) - Migrate AuditLog GraphQL API to Strawberry with cursor-based pagination and filtering support (#10065)
- Add Strawberry GraphQL node type for ContainerRegistry to support RBAC entity resolution (#10093)
- Add
activeResourceOverviewGraphQL field toDomainandProjecttypes, exposing currently occupied resource slots and active session count. (#10095) - Add AND, OR, NOT logical operators to GraphQL filter types for complex boolean filter expressions. (#10250)
- Migrate GraphQL layer to Pydantic-backed types by introducing PydanticNodeMixin, domain Adapters, and @strawberry.experimental.pydantic.input across all GQL domains. (#10299)
- Add
update_deployment_policyGQL mutation (#10300) - Add
execute_bulk_purger_partial()function to support partial failure handling for bulk delete operations with savepoint-based transaction isolation (#10332) - Add UUID-based single-entity User CRUD (create/update/delete/purge) to the GraphQL v2 API, resolving six previously stubbed mutations. (#10403)
- Add
my_keypairsGraphQL query to list the current user's keypairs with filter, orderBy, and cursor/offset pagination support. (#10404) - Add
optionsfield toPurgeUserV2Inputto control purge behavior (migrate shared vfolders, delegate endpoint ownership). (#10498) - Add REST v2 API endpoints for all 26 domains under the
/v2/prefix, reusing existing v2 DTO adapters shared with GraphQL. (#10499) - Add v2 client SDK and CLI with
[admin] {entity} [{sub-entity}] {operation}command structure,~/.backend.ai/config system, and./baishortcut for all 26 domains. (#10504) - Add admin CRUD mutations to v2 API for Domain, Project, ContainerRegistry, and Image entities with full stack coverage (Adapter, REST v2, SDK v2, CLI v2, GQL) (#10516)
- Add
./bai gqlCLI command and SDK client for sending raw GraphQL queries, supporting both legacy and Strawberry schemas. (#10539) - Add VFolder adapter with admin search implementation including filter, order, and pagination support (#10569)
- Add v2 session REST API with enqueue, search (admin/my/project-scoped), get, terminate (batch), start/shutdown-service, logs, and update endpoints (#10599)
- Define VFolder Strawberry GQL node and nested field group types for the Graphene-to-Strawberry migration. (#10603)
- Add VFolder filter and order-by Strawberry GQL types for v2 queries with AND/OR/NOT logical operators (#10604)
- Add missing update, execute commands and admin CLI module for v2 prometheus-query-preset (#10606)
- Add v2 export REST API, client SDK, and CLI commands for CSV export operations (#10609)
- Add REST v2 and GraphQL endpoints for unassigning users from a project, with failure information for non-existent or unassigned user IDs. (#10632)
- Add REST v2 endpoint for assigning users to projects with RBAC enforcement (#10633)
- Add resource policy v2 API with Strawberry GQL, REST v2, SDK, and CLI for keypair/user/project resource policies, replacing JSON fields with typed structures. (#10634)
- Add Strawberry GraphQL resolvers for container registry v2 with search, create, update, delete operations and full filter/orderBy/pagination support. (#10635)
- Add resource group allow/disallow API for bidirectional domain and project association management with atomic add/remove in a single request (#10636)
- Add resource preset v2 CRUD API with shared BinarySizeInput/BinarySizeInfo types for byte-size fields (#10637)
- Add scope-based resource allocation v2 APIs with effective assignable computation and preset availability check (#10638)
- Add keypair admin CRUD v2 API (search, get, create, update, delete) across GQL, REST, SDK, and CLI (#10640)
- Add search_vfolders operation with repository, service, and processor layers (#10641)
- Add search_user_vfolders operation with repository, service, and processor layers (#10642)
- Add cloneable filter to VFolder my_search query pipeline (#10674)
- Wire
myVfoldersGraphQL query resolver and registerVFolderAdapterfor end-to-end user vfolder search (#10677) - Add required
role_idparameter to the assign-users-to-project API so that users receive a project role upon assignment. (#10688) - Add WebSocket transport (
graphql-transport-wsprotocol) for Strawberry GraphQL subscriptions, enabling the Hive Gateway to forward subscriptions to the manager. (#10739) - Add user.id (UUID) filter to ProjectUserFilter for /v2/projects/search (#10793)
- Add project-scoped role search API (GQL, REST v2, SDK, CLI) to discover roles available within a project. (#10794)
- Add
my_storage_host_permissionsquery,deployVFoldermutation, admin SSH keypair management,storage_hostfilter for model card search, andin/not_in/i_in/i_not_inoperators for StringFilter. (#10887)
Pydantic DTO v2 Models
Defined comprehensive Pydantic v2 DTO types across all 26 API domains, establishing the typed Input/Node/Payload naming convention with SENTINEL pattern for nullable-clearable update fields and full unit test coverage.
- Add Pydantic DTO v2 model structure for RBAC Role domain with Input, Node, and Payload types, establishing conventions for future domain DTOs. (#10253)
- Add Pydantic DTO v2 models for
authandacldomains undercommon/dto/manager/v2/.
Theauthmodule includes 9 Input models and 10 Payload models with nested sub-models (AuthCredentialInfo, TwoFactorInfo, RoleInfo, SSHKeypairInfo, PasswordChangeInfo).
Theaclmodule includes GetPermissionsPayload and VFolderHostPermission re-export. (#10254) - Add Pydantic DTO v2 models for manager API domains: config (Dotfile, BootstrapScript), etcd (ConfigKey, ResourceMetadata), system (SystemVersion), infra (ScalingGroup, ResourcePreset, Usage, Watcher, ContainerRegistry), and operations (ErrorLog, ManagerStatus, Announcement, SchedulerOps, SessionEvents). (#10255)
- Add Pydantic DTO v2 models for
scaling_group,resource_group,resource_slot, andresource_policydomains undercommon/dto/manager/v2/, including request (Input), response (Node/Payload) models with SENTINEL pattern, and comprehensive unit tests for all four domains. (#10256) - Add Pydantic DTO v2 models for
event_stream,streaming, andexportdomains undersrc/ai/backend/common/dto/manager/v2/, with comprehensive unit tests for all three packages. (#10257) - Add Pydantic DTO v2 models for
session,compute_session, andagentdomains undersrc/ai/backend/common/dto/manager/v2/, following the Input/Node/Payload naming convention with nested sub-models for semantic field grouping. (#10258) - Add Pydantic DTO v2 models for user, domain, and group entities under
common/dto/manager/v2/, with nested sub-models, SENTINEL pattern for nullable-clearable update fields, and comprehensive unit tests. (#10259) - Add Pydantic DTO v2 models for
image,scheduling_history, andauto_scaling_ruledomains with full unit test coverage. (#10260) - Add Pydantic DTO v2 models for vfolder, object_storage, quota_scope, and storage domains under
ai.backend.common.dto.manager.v2, with comprehensive unit tests for each domain. ([#10261](https://github.com/lablu...
26.4.0rc1
Features
-
Add shell auto-completion support for Backend.AI CLI (#7021)
-
Add DataLoader for batched role assignment queries by user ID and
my_rolesfield on UserV2 to prevent N+1 queries. (#9552) -
Add RBAC validator infrastructure to Session actions following BEP-1048 patterns (#9624)
-
Migrate Session entities to RBAC database with entity-type permissions and AUTO scope associations (#9636)
-
Add CLI commands for prometheus query definition admin CRUD and execution (#9641)
-
Support cloning vfolders to a different quota scope by adding
target_quota_scope_idparameter to the clone API. (#9741) -
Add per-container metric collection support for CUDA devices (#9787)
-
Update ATOM plugin definition to be conformant of rebellions CDI architecture (#9788)
-
Implement Rolling Update deployment strategy (#9997)
-
Apply RBAC Creator pattern to ArtifactRevision for consistent entity creation and access control (#10021)
-
Apply RBAC validator for App config actions (#10028)
-
Apply RBAC validators to project (group) action processors for proper permission enforcement (#10029)
-
Apply RBAC validator for Model Artifact Registry actions (#10032)
-
Apply RBAC permission validators to model deployment service actions (#10033)
-
Apply RBAC validator for Keypair actions to enforce permission checks on create, get, update, delete, and purge operations (#10051)
-
Apply RBAC validator for User actions following the established pattern from Group, VFolder, and Session services (#10055)
-
Apply RBAC validators to Image service actions for proper authorization checks (#10059)
-
Migrate AuditLog GraphQL API to Strawberry with cursor-based pagination and filtering support (#10065)
-
Add self-service keypair issue/revoke/switch GraphQL mutations (#10066)
-
Add self-service IP allowlist mutation with lockout prevention (#10067)
-
Seed built-in container utilization metric query presets (gauge, rate, diff) previously hardcoded in
ContainerUtilizationMetricServiceas configurable DB fixtures and Alembic data migration (#10090) -
Add Strawberry GraphQL node type for ContainerRegistry to support RBAC entity resolution (#10093)
-
Add
activeResourceOverviewGraphQL field toDomainandProjecttypes, exposing currently occupied resource slots and active session count. (#10095) -
Add AND, OR, NOT logical operators to GraphQL filter types for complex boolean filter expressions. (#10250)
-
Add Pydantic DTO v2 model structure for RBAC Role domain with Input, Node, and Payload types, establishing conventions for future domain DTOs. (#10253)
-
Add Pydantic DTO v2 models for
authandacldomains undercommon/dto/manager/v2/.
Theauthmodule includes 9 Input models and 10 Payload models with nested sub-models (AuthCredentialInfo, TwoFactorInfo, RoleInfo, SSHKeypairInfo, PasswordChangeInfo).
Theaclmodule includes GetPermissionsPayload and VFolderHostPermission re-export. (#10254) -
Add Pydantic DTO v2 models for manager API domains: config (Dotfile, BootstrapScript), etcd (ConfigKey, ResourceMetadata), system (SystemVersion), infra (ScalingGroup, ResourcePreset, Usage, Watcher, ContainerRegistry), and operations (ErrorLog, ManagerStatus, Announcement, SchedulerOps, SessionEvents). (#10255)
-
Add Pydantic DTO v2 models for
scaling_group,resource_group,resource_slot, andresource_policydomains undercommon/dto/manager/v2/, including request (Input), response (Node/Payload) models with SENTINEL pattern, and comprehensive unit tests for all four domains. (#10256) -
Add Pydantic DTO v2 models for
event_stream,streaming, andexportdomains undersrc/ai/backend/common/dto/manager/v2/, with comprehensive unit tests for all three packages. (#10257) -
Add Pydantic DTO v2 models for
session,compute_session, andagentdomains undersrc/ai/backend/common/dto/manager/v2/, following the Input/Node/Payload naming convention with nested sub-models for semantic field grouping. (#10258) -
Add Pydantic DTO v2 models for user, domain, and group entities under
common/dto/manager/v2/, with nested sub-models, SENTINEL pattern for nullable-clearable update fields, and comprehensive unit tests. (#10259) -
Add Pydantic DTO v2 models for
image,scheduling_history, andauto_scaling_ruledomains with full unit test coverage. (#10260) -
Add Pydantic DTO v2 models for vfolder, object_storage, quota_scope, and storage domains under
ai.backend.common.dto.manager.v2, with comprehensive unit tests for each domain. (#10261) -
Add Pydantic DTO v2 models for
artifact,artifact_registry, andcontainer_registrydomains undercommon/dto/manager/v2/, including typed Input, Node, and Payload models with full unit test coverage. (#10262) -
Add Pydantic DTO v2 models (
types.py,request.py,response.py,__init__.py) fordeployment,model_serving, andservice_catalogmanager API domains, with comprehensive unit tests. (#10263) -
Add Pydantic DTO v2 models for
notification,error_log,fair_share, andprometheus_query_presetdomains undersrc/ai/backend/common/dto/manager/v2/, with comprehensive unit tests for each domain. (#10264) -
Use deploying-revision image for new route session creation (#10271)
-
Integrate pyinfra deployment framework from backend.ai-installer into the unified install package, enabling production deployment via PyInfra alongside existing Docker-based development setup.
Key additions:
- PyInfra framework (runner, configs, os_packages) with enterprise config schemas (enabled=False in OSS)
- OSS deploy scripts (os, halfstack, cores, monitor) - 318 files, 82K+ lines
- TUI PACKAGE mode now offers choice: Release Package (existing) or Production Deployment (PyInfra)
- Horizontal card layout with keyboard navigation for deployment type selection (#10275)
-
Consolidate deploying handlers and remove unused sub-steps (#10276)
-
Migrate GraphQL layer to Pydantic-backed types by introducing PydanticNodeMixin, domain Adapters, and @strawberry.experimental.pydantic.input across all GQL domains. (#10299)
-
Add
update_deployment_policyGQL mutation (#10300) -
Add internal health endpoint (
/health) to the manager's internal app, and simplify the public health handler to a plain liveness probe. (#10308) -
Add
update_my_keypairGQL mutation to allow users to toggle their keypair's active state (is_active) (#10309) -
Support resolving session entities in RBAC entity and permission scope queries (#10320)
-
Add
execute_bulk_purger_partial()function to support partial failure handling for bulk delete operations with savepoint-based transaction isolation (#10332) -
Add
PROJECT_ADMIN_PAGEandDOMAIN_ADMIN_PAGEguarded RBAC entities for admin page access control. (#10334) -
Add repository-layer support for filtering role assignments by permissions via PermissionConditions and exists_permission_combined (#10397)
-
Add UUID-based single-entity User CRUD (create/update/delete/purge) to the GraphQL v2 API, resolving six previously stubbed mutations. (#10403)
-
Add
my_keypairsGraphQL query to list the current user's keypairs with filter, orderBy, and cursor/offset pagination support. (#10404) -
Remove
rollback_on_failurefrom DB schema, API, and related code (#10410) -
Add
last_used_atreal column to images table, replacing the computed subquery with a direct DB column updated o...
26.3.3
Fixes
- Fix ON CONFLICT column mismatch in vfolder invitation RBAC remigration causing InvalidColumnReferenceError during alembic upgrade. (#10471)
Full Changelog
Check out the full changelog until this release (26.3.3).
Full Commit Logs
Check out the full commit logs between release (26.3.2) and (26.3.3).
26.3.2
Fixes
- Add safe Prometheus metric wrappers to prevent mmap error propagation into business logic (#10395)
- Remove duplicate
debugfield in the webserver'sconfig.toml.j2template (#10423) - Add missing OpenTelemetrySpec initialization in the manager, enabling trace and log export to the OTEL Collector. (#10439)
Full Changelog
Check out the full changelog until this release (26.3.2).
Full Commit Logs
Check out the full commit logs between release (26.3.1) and (26.3.2).
26.3.1
Features
- Add AND, OR, NOT logical operators to GraphQL filter types for complex boolean filter expressions. (#10250)
- Add internal health endpoint (
/health) to the manager's internal app, and simplify the public health handler to a plain liveness probe. (#10308)
Improvements
- Add
TimeoutSecondsannotated type to centralize and simplify session timeout validation in request DTOs. (#10267)
Fixes
- Fix global container registry RBAC migration to map to project scopes instead of domain scopes (#10082)
- Fix resource preset check returning incorrect occupancy when scaling groups have no active sessions (#10268)
- Fix session dependency GraphQL dataloaders returning empty results due to incorrect key mapping and missing eager loading (#10280)
- Restore
dbandconfig_provideraccess for webapp plugins (OpenID, TOTP) after DI refactoring by injecting them into the root app context (#10292) - Add otp field to AuthorizeRequest and AuthorizeAction for TOTP two-factor authentication compatibility. (#10305)
- Exclude unmeasurable metrics from utilization idle check instead of treating stat collection failures as 0% usage (#10316)
- Restore
etcdandvalkey_stataccess for webapp plugins (Cloud) after DI refactoring by injecting them into the root app context (#10318) - Route authenticated TOTP endpoints through web_handler instead of anonymous handler (#10345)
Full Changelog
Check out the full changelog until this release (26.3.1).
Full Commit Logs
Check out the full commit logs between release (26.3.0) and (26.3.1).
25.11.4
Features
- Execute Resource Usage Recalculation periodically (#5646)
Improvements
- Add per-plugin timeout (120s) to
gather_container_measurescalls so a single hung plugin does not block stat collection from all other plugins (#9781)
Fixes
- Fix wrong value type of Valkey client address (#5649)
- Remove the unnecessary
asyncio.LockfromStatContextas self-concurrency is already prevented byTimerDelayPolicy.CANCELand each collect method operates on independent data structures (#9256) - Fix container net_rx/net_tx stats reading host namespace counters due to unchecked setns() return value (#9681)
- Pre-validate namespace path before
netstat_ns()to prevent thread pool exhaustion from hung threads on stale network namespaces (#9782) - Exclude unmeasurable metrics from utilization idle check instead of treating stat collection failures as 0% usage (#10316)
Full Changelog
Check out the full changelog until this release (25.11.4).
Full Commit Logs
Check out the full commit logs between release (25.11.3) and (25.11.4).