Releases · nvidia-cosmos/cosmos-curate

01 May 19:01

github-actions

v1.4.0

29f1358

Release v1.4.0 Latest

Latest

Added

SAM3-based video object tracking, per-event VLM captioning, serialized SAM3 outputs, an example
event pipeline, and a demo tool.
Sensor library support for GPS, IMU, camera intrinsics, and camera extrinsics data.
MP4 header validation utilities for video-index checks.
Qwen3.5-27B support for image captioning.
External OpenAI/Gemini endpoint support for image semantic filtering and classification stages.
Async OpenAI/Gemini request handling with batch_size-controlled concurrency for image/video
captioning and external filter/classifier stages.
exclusive_end_ns support in make_ts_grid for half-open clip spans.

Fixed

Prevent Qwen from falling back to native-resolution inputs during resize.
Isolate vLLM async per-window payload handling.
Preserve model-variant-specific image filter errors.

Changed

Upgrade the cosmos-xenna Python package and submodule to v0.4.0.
Add a dedicated sam3 pixi environment for Segment Anything 3 dependencies.
Include runtime prompt and config data files in built wheels.

Documentation

Reorganize curator documentation into design, guide, and reference sections.
Add the interactive Slurm guide.
Add GPS and IMU sensor-library design documentation.
Update image pipeline documentation, including Qwen3.5 coverage.

Assets 2

28 Apr 00:09

github-actions

v1.3.0

5305613

Release v1.3.0

Added

Image curation pipeline with semantic filtering
Image embedding stages (Cosmos-Embed1, InternVideo2-MM, OpenAI-compatible) and image annotate pipeline
OpenAI- and Gemini-compatible endpoints for image captioning, filtering, and classification
Artificial-text detection stage for the video filtering pipeline (PaddleOCR-based)
Sensor library (camera-only) with SensorGroup, mcap-based ingestion, and timestamp validation
SeedVR-based upscaling stage
Pipeline config files with NVCF-compatible JSON and YAML loading (--config for split/shard/dedup)
Centralized pipeline argument validation via common_pipeline_settings and shard_pipeline_settings
vLLM async captioning stage for higher captioning throughput (experimental — correctness
issues are still being worked through; not recommended for production use)
OpenTelemetry instrumentation for vLLM captioning
Token-counting instrumentation to measure captioning throughput
Caption status fields normalized across caption backends, with status-gated metadata writing
Stage-replay validation that compares re-run output against the original recording
S3 support for stage-save and stage-replay
Ray Data hello-world pipeline and splitting pipeline MVP as an alternative engine alongside Xenna
--*-cpus-per-worker knobs documented for CPU-constrained hosts
Run local-launched container as the host user (including AD/SSSD/NIS UIDs) to avoid root-owned outputs
Slim Docker image built alongside the full image, with auto-warmup honoring --envs
Local Xenna build path in CI and per-pipeline Xenna overrides
Fixed-stride coverage in the NVCF split benchmark matrix
Real-inference smoke test for vLLM captioning health
Upgrade to CUDA 13.0
Upgrade vLLM to 0.19.0
Upgrade Ray to 2.55.0 (with the serve extra)
Upgrade cosmos-xenna to 0.2.3
Bump av to >=17,<18 and add the mcap dependency for the sensor library

Fixed

SamplingGrid produced incorrect windows for irregular grids
--execution-mode CLI flag is now honored end-to-end
Cosmos-Embed1 writes per-variant embedding directories
Symlink the host pixi path so shebangs resolve inside the local-launched container
Sensor library uses read-only views to avoid accidental buffer mutation
Add Qwen3 preprocessing logic for filtering stages
Use pre-built images for benchmark runs to avoid redundant builds
Remove external storage dependency from ImageSensor
Semantic filter updates and dedup pipeline input path cleanup
Loosen Cosmos-Reason1 caption similarity threshold to reduce flakiness

Changed

Replace CurationPhase / PipelineBuilder with factory functions (*_builders.py); the
phase_interface module and per-pipeline phases.py files are removed
Add config: VllmConfig parameter to VllmPlugin.make_llm_input for image vs video
modality selection; subclasses must update their signature
Switch CI Slurm and k8s GPU jobs to the slim image with in-container pixi install and
pixi run --as-is
Change CI NVCF backend
Normalize the SamplingGrid API and make sampling windows explicit (no sentinel boundaries)
Update semantic filter stages to use VllmCaptioning
Add a CPU-only Paddle option for the unified env
Pixi lockfile refreshed for CVE coverage
Add notice and disclaimer to README and Docker image

Documentation

Speed-of-light design doc for captioning throughput, with refined SOL baseline methodology
using vllm bench as the reference
Refined Ray Data runner design with the first implementation slice
Document --*-cpus-per-worker tuning knobs
Add --squash-before-merge to MR guidelines

Assets 2

25 Mar 17:37

github-actions

v1.2.2

40ec6b6

Release v1.2.2

Added

--slim flag and --pixi-path for lightweight image builds
--transcode-max-output-frames to limit clip frame count
OpenAI-compatible endpoint for video embedding
Pre-populate timestamps on Video during download
Multistage Docker builds to reduce container image size
Docker buildx cache for faster image builds
Set vLLM performance_mode for improved inference throughput
Upgrade vLLM to 0.17.1
Upgrade Ray to 2.54.0
Upgrade cuML to 26.0.2
Upgrade cosmos-xenna to 0.2.1
Upgrade Python to 3.12.13

Fixed

Remove pycuda dependency, use PyNvVideoCodec built-in context
Purge filtered_clips in MetadataWriterStage cleanup
Video.get_major_size() should include filtered clips
Type mismatch for qwen-gpus-per-worker
Remove syntax warnings in stages
Pin importlib-metadata to avoid version conflicts
Output built Docker images to the local Docker image store
Remove vllm from local extra to eliminate litellm from lock file
Update type annotations for AV pipeline

Removed

ffmpeg_gpu decode mode

Changed

Replace FFmpeg source build with conda-forge package
Remove support for Phi-4 captioning to keep a
security floor of pillow>=12.1.1 (GHSA-cfh3-3jmp-rvhc)

Documentation

Add batch processing guide for large video sets
Add slim image design doc

Assets 2

11 Mar 00:03

github-actions

v1.2.1

255aaea

Release v1.2.1

Added

Separate OpenAI endpoints for caption and enhance stages
Build CPU-only ffmpeg by default for LGPL-compatible images
Allow QwenVideoClassifier stage to be configurable

Fixed

Fix tracing flush lifecycle and embed profiling inside pipeline functions
Always use docker buildx build to avoid legacy builder errors
Defer flush_tracing() until after traced span exits to prevent closed-file ValueError

Changed

Fold RemuxStage into VideoDownloader

Assets 2

05 Mar 03:40

github-actions

v1.2.0

619401a

Release v1.2.0

Added

Composable pipeline API via CurationPhase and PipelineBuilder for declarative pipeline construction
OpenAI-compatible API captioning stage for using external LLM endpoints
LazyData for zero-copy split-field pipeline transport, reducing memory overhead
Automatic CPU and memory profiling for all pipeline stages
Stage replay for re-running individual stages without full pipeline re-execution
Unified write abstraction for local and remote storage
Multi-camera splitting pipeline (data model, task creation, download/remux, frame extraction, clip transcoding, clip
writer, and summary writer)
ARM64 CLI and container build support
GB200 support for loading Qwen3-VL-235B
Optional Ray token authentication
Upgrade vLLM to 0.15.1
Upgrade cosmos-xenna to 0.2.0
Upgrade ffmpeg to 8.0.1
QwenVideoClassifier stage for video classification using Qwen VL
Remove flash-attn dependency in favor of PyTorch SDPA

Fixed

Critical: fix caption ordering bug in inflight batching. When inflight batching was enabled (the default),
captions could be assigned to the wrong videos. The bug was introduced in v1.1.5, was dormant in v1.1.6 (inflight
batching temporarily removed), and has been active in v1.1.7–v1.1.11. If you used VLM captioning with any of those
releases, captions may be mismatched. Upgrade to v1.2.0 and re-run affected captioning jobs.
Enforce exact --limit semantics for storage listings and add num_input_videos_selected metric
Reset LazyData.nbytes on drop and eliminate tobytes copy in upload path
Update conda environment name from vllm to unified in Qwen filter stages
Harden NVCF split benchmark retries and count validation
Resolve Docker build failures from NVIDIA wheel timeouts and file permissions
Check for remote mounts in curator_submit
Handle clips with no stream
Pin setuptools<81 to preserve pkg_resources for ngcsdk
Add minimum version constraints for typer dependency
Ensure split_video_into_windows returns equal-length lists

Documentation

Add Ray Data runner design document
Update end user guide

Assets 2

09 Jan 22:54

github-actions

v1.1.11

a6a7551

Release v1.1.11

Known Issues

Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.

Added

Add support for Cosmos-Reason2-8B as an alternative VLM captioning model
Conform shard pipeline output folder name to include duration
Add configurable sharding parameters to the video shard pipeline
Add a Ray Data-based hello world pipeline example

Assets 2

19 Dec 01:08

github-actions

v1.1.10

9496498

Release v1.1.10

Known Issues

Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.

Added

Improve sharding pipeline input gathering time
Release new helm chart 2.2.1 that improves robustness of metrics collection
Add support for Lance outputs for clips and embeddings
Upgrade python from 3.10 to 3.12
Add FP8 variant of Qwen3-VL-235B which can run on 4x H100s
Add FP8 variant of Qwen3-VL-30B which can run on a single 48GB GPU
Upgrade cosmos-xenna to 0.1.8 with support for online-serving mode

Fixed

Local launch CLI when specifying GPU list
Parquet output format for Cosmos Dataset Search (CDS)
Race condition with --copy-weights-to by passing it only to the model captioning stage but not the prepare stage
Upgrade vllm in develop environment to match what is used inside container
Remove async engine code in qwen_vl
Fix Qwen3-VL models regarding pre-processing

Assets 2

08 Dec 22:29

github-actions

v1.1.9

abd87ba

Release v1.1.9

Known Issues

Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.

Added

Add support for Qwen/Qwen3-VL-235B-A22B-Instruct
Save model_input tensor input as pngs
Wire vllm sampling params into splitting cli
Switch enhance captions to OpenAI V1 Responses API
Expose setup_on_node in stage_interface

Fixed

Fixed Nemotron-Nano VL as the captioning algorithm.
Upgrade vllm to 0.11.2 and add metadata field to fix nemotron-nano-v2-vl
Replace softprops/action-gh-release with gh release command
Nemotron: change VideoMetadata to dict, model_does_preprocess=True
Fix a bug in windowing which made us always lose 1 frame
Bump ray version, unset vars not used in CI
Dimensions aligned to i4
Race condition in --copy-weights-to

Assets 2

17 Nov 22:49

github-actions

v1.1.8

8e695c3

Release v1.1.8

Known Issues

Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.

Added

Nemotron-Nano-12B-v2-VL as an alternative VLM captioning model
Gemini API as an option for video captioning
Improved helm chart to simplify vanilla k8s deployment
Upgraded cosmos-xenna to 0.1.7 for better scalability
Significantly improved test coverage

Fixed

Fixed a bug in clip windowing utils which caused wrong caption for later windows within a clip
Allow underscore in S3 bucket name
Set cudagraph mode to piecewise for Qwen-based VL models to mitigate failure with illegal memory access
Improved exception handling in vllm-captioning stage setup and process

Documentation

Added documentation for vllm_interface which simplifies the integration of new vLLM-powered VLMs for captioning.

Assets 2

06 Nov 20:57

github-actions

v1.1.7

2b054a7

Release v1.1.7

Known Issues

Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.

Added

Azure OpenAI API as an option to enhance captions
Increased test coverage for vllm_interface to 100%
Azure Blob Storage support for Slurm deployments
Support multipart result zips
Update python version to 3.10.19
Retry vllm captioning on engine failure

Fixed

Switch torch package to pypi in unified
Resolve hello_world pipeline execution with transformers
vLLM stage 2 captioning bug

Assets 2

Releases: nvidia-cosmos/cosmos-curate

Release v1.4.0

Added

Fixed

Changed

Documentation

Uh oh!

Release v1.3.0

Added

Fixed

Changed

Documentation

Uh oh!

Release v1.2.2

Added

Fixed

Removed

Changed

Documentation

Uh oh!

Release v1.2.1

Added

Fixed

Changed

Uh oh!

Release v1.2.0

Added

Fixed

Documentation

Uh oh!

Release v1.1.11

Known Issues

Added

Uh oh!

Release v1.1.10

Known Issues

Added

Fixed

Uh oh!

Release v1.1.9

Known Issues

Added

Fixed

Uh oh!

Release v1.1.8

Known Issues

Added

Fixed

Documentation

Uh oh!

Release v1.1.7

Known Issues

Added

Fixed

Uh oh!