Releases: nvidia-cosmos/cosmos-curate
Releases · nvidia-cosmos/cosmos-curate
Release v1.4.0
Added
- SAM3-based video object tracking, per-event VLM captioning, serialized SAM3 outputs, an example
event pipeline, and a demo tool. - Sensor library support for GPS, IMU, camera intrinsics, and camera extrinsics data.
- MP4 header validation utilities for video-index checks.
- Qwen3.5-27B support for image captioning.
- External OpenAI/Gemini endpoint support for image semantic filtering and classification stages.
- Async OpenAI/Gemini request handling with
batch_size-controlled concurrency for image/video
captioning and external filter/classifier stages. exclusive_end_nssupport inmake_ts_gridfor half-open clip spans.
Fixed
- Prevent Qwen from falling back to native-resolution inputs during resize.
- Isolate vLLM async per-window payload handling.
- Preserve model-variant-specific image filter errors.
Changed
- Upgrade the
cosmos-xennaPython package and submodule to v0.4.0. - Add a dedicated
sam3pixi environment for Segment Anything 3 dependencies. - Include runtime prompt and config data files in built wheels.
Documentation
- Reorganize curator documentation into design, guide, and reference sections.
- Add the interactive Slurm guide.
- Add GPS and IMU sensor-library design documentation.
- Update image pipeline documentation, including Qwen3.5 coverage.
Release v1.3.0
Added
- Image curation pipeline with semantic filtering
- Image embedding stages (Cosmos-Embed1, InternVideo2-MM, OpenAI-compatible) and image annotate pipeline
- OpenAI- and Gemini-compatible endpoints for image captioning, filtering, and classification
- Artificial-text detection stage for the video filtering pipeline (PaddleOCR-based)
- Sensor library (camera-only) with
SensorGroup, mcap-based ingestion, and timestamp validation - SeedVR-based upscaling stage
- Pipeline config files with NVCF-compatible JSON and YAML loading (
--configfor split/shard/dedup) - Centralized pipeline argument validation via
common_pipeline_settingsandshard_pipeline_settings - vLLM async captioning stage for higher captioning throughput (experimental — correctness
issues are still being worked through; not recommended for production use) - OpenTelemetry instrumentation for vLLM captioning
- Token-counting instrumentation to measure captioning throughput
- Caption status fields normalized across caption backends, with status-gated metadata writing
- Stage-replay validation that compares re-run output against the original recording
- S3 support for
stage-saveandstage-replay - Ray Data hello-world pipeline and splitting pipeline MVP as an alternative engine alongside Xenna
--*-cpus-per-workerknobs documented for CPU-constrained hosts- Run local-launched container as the host user (including AD/SSSD/NIS UIDs) to avoid root-owned outputs
- Slim Docker image built alongside the full image, with auto-warmup honoring
--envs - Local Xenna build path in CI and per-pipeline Xenna overrides
- Fixed-stride coverage in the NVCF split benchmark matrix
- Real-inference smoke test for vLLM captioning health
- Upgrade to CUDA 13.0
- Upgrade vLLM to 0.19.0
- Upgrade Ray to 2.55.0 (with the
serveextra) - Upgrade cosmos-xenna to 0.2.3
- Bump
avto>=17,<18and add themcapdependency for the sensor library
Fixed
SamplingGridproduced incorrect windows for irregular grids--execution-modeCLI flag is now honored end-to-end- Cosmos-Embed1 writes per-variant embedding directories
- Symlink the host pixi path so shebangs resolve inside the local-launched container
- Sensor library uses read-only views to avoid accidental buffer mutation
- Add Qwen3 preprocessing logic for filtering stages
- Use pre-built images for benchmark runs to avoid redundant builds
- Remove external storage dependency from
ImageSensor - Semantic filter updates and dedup pipeline input path cleanup
- Loosen Cosmos-Reason1 caption similarity threshold to reduce flakiness
Changed
- Replace
CurationPhase/PipelineBuilderwith factory functions (*_builders.py); the
phase_interfacemodule and per-pipelinephases.pyfiles are removed - Add
config: VllmConfigparameter toVllmPlugin.make_llm_inputfor image vs video
modality selection; subclasses must update their signature - Switch CI Slurm and k8s GPU jobs to the slim image with in-container
pixi installand
pixi run --as-is - Change CI NVCF backend
- Normalize the
SamplingGridAPI and make sampling windows explicit (no sentinel boundaries) - Update semantic filter stages to use
VllmCaptioning - Add a CPU-only Paddle option for the
unifiedenv - Pixi lockfile refreshed for CVE coverage
- Add notice and disclaimer to README and Docker image
Documentation
- Speed-of-light design doc for captioning throughput, with refined SOL baseline methodology
usingvllm benchas the reference - Refined Ray Data runner design with the first implementation slice
- Document
--*-cpus-per-workertuning knobs - Add
--squash-before-mergeto MR guidelines
Release v1.2.2
Added
--slimflag and--pixi-pathfor lightweight image builds--transcode-max-output-framesto limit clip frame count- OpenAI-compatible endpoint for video embedding
- Pre-populate timestamps on
Videoduring download - Multistage Docker builds to reduce container image size
- Docker buildx cache for faster image builds
- Set vLLM
performance_modefor improved inference throughput - Upgrade vLLM to 0.17.1
- Upgrade Ray to 2.54.0
- Upgrade cuML to 26.0.2
- Upgrade cosmos-xenna to 0.2.1
- Upgrade Python to 3.12.13
Fixed
- Remove pycuda dependency, use PyNvVideoCodec built-in context
- Purge
filtered_clipsinMetadataWriterStagecleanup Video.get_major_size()should include filtered clips- Type mismatch for
qwen-gpus-per-worker - Remove syntax warnings in stages
- Pin
importlib-metadatato avoid version conflicts - Output built Docker images to the local Docker image store
- Remove vllm from local extra to eliminate litellm from lock file
- Update type annotations for AV pipeline
Removed
ffmpeg_gpudecode mode
Changed
- Replace FFmpeg source build with conda-forge package
- Remove support for Phi-4 captioning to keep a
security floor ofpillow>=12.1.1(GHSA-cfh3-3jmp-rvhc)
Documentation
- Add batch processing guide for large video sets
- Add slim image design doc
Release v1.2.1
Added
- Separate OpenAI endpoints for caption and enhance stages
- Build CPU-only ffmpeg by default for LGPL-compatible images
- Allow
QwenVideoClassifierstage to be configurable
Fixed
- Fix tracing flush lifecycle and embed profiling inside pipeline functions
- Always use
docker buildx buildto avoid legacy builder errors - Defer
flush_tracing()until after traced span exits to prevent closed-file ValueError
Changed
- Fold
RemuxStageintoVideoDownloader
Release v1.2.0
Added
- Composable pipeline API via
CurationPhaseandPipelineBuilderfor declarative pipeline construction - OpenAI-compatible API captioning stage for using external LLM endpoints
- LazyData for zero-copy split-field pipeline transport, reducing memory overhead
- Automatic CPU and memory profiling for all pipeline stages
- Stage replay for re-running individual stages without full pipeline re-execution
- Unified write abstraction for local and remote storage
- Multi-camera splitting pipeline (data model, task creation, download/remux, frame extraction, clip transcoding, clip
writer, and summary writer) - ARM64 CLI and container build support
- GB200 support for loading Qwen3-VL-235B
- Optional Ray token authentication
- Upgrade vLLM to 0.15.1
- Upgrade cosmos-xenna to 0.2.0
- Upgrade ffmpeg to 8.0.1
QwenVideoClassifierstage for video classification using Qwen VL- Remove flash-attn dependency in favor of PyTorch SDPA
Fixed
- Critical: fix caption ordering bug in inflight batching. When inflight batching was enabled (the default),
captions could be assigned to the wrong videos. The bug was introduced in v1.1.5, was dormant in v1.1.6 (inflight
batching temporarily removed), and has been active in v1.1.7–v1.1.11. If you used VLM captioning with any of those
releases, captions may be mismatched. Upgrade to v1.2.0 and re-run affected captioning jobs. - Enforce exact
--limitsemantics for storage listings and addnum_input_videos_selectedmetric - Reset
LazyData.nbyteson drop and eliminatetobytescopy in upload path - Update conda environment name from
vllmtounifiedin Qwen filter stages - Harden NVCF split benchmark retries and count validation
- Resolve Docker build failures from NVIDIA wheel timeouts and file permissions
- Check for remote mounts in
curator_submit - Handle clips with no stream
- Pin setuptools<81 to preserve
pkg_resourcesfor ngcsdk - Add minimum version constraints for typer dependency
- Ensure
split_video_into_windowsreturns equal-length lists
Documentation
- Add Ray Data runner design document
- Update end user guide
Release v1.1.11
Known Issues
- Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.
Added
- Add support for Cosmos-Reason2-8B as an alternative VLM captioning model
- Conform shard pipeline output folder name to include duration
- Add configurable sharding parameters to the video shard pipeline
- Add a Ray Data-based hello world pipeline example
Release v1.1.10
Known Issues
- Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.
Added
- Improve sharding pipeline input gathering time
- Release new helm chart 2.2.1 that improves robustness of metrics collection
- Add support for Lance outputs for clips and embeddings
- Upgrade python from 3.10 to 3.12
- Add FP8 variant of Qwen3-VL-235B which can run on 4x H100s
- Add FP8 variant of Qwen3-VL-30B which can run on a single 48GB GPU
- Upgrade cosmos-xenna to 0.1.8 with support for online-serving mode
Fixed
- Local launch CLI when specifying GPU list
- Parquet output format for Cosmos Dataset Search (CDS)
- Race condition with --copy-weights-to by passing it only to the model captioning stage but not the prepare stage
- Upgrade vllm in develop environment to match what is used inside container
- Remove async engine code in qwen_vl
- Fix Qwen3-VL models regarding pre-processing
Release v1.1.9
Known Issues
- Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.
Added
- Add support for Qwen/Qwen3-VL-235B-A22B-Instruct
- Save model_input tensor input as pngs
- Wire vllm sampling params into splitting cli
- Switch enhance captions to OpenAI V1 Responses API
- Expose setup_on_node in stage_interface
Fixed
- Fixed Nemotron-Nano VL as the captioning algorithm.
- Upgrade vllm to 0.11.2 and add metadata field to fix nemotron-nano-v2-vl
- Replace softprops/action-gh-release with gh release command
- Nemotron: change VideoMetadata to dict, model_does_preprocess=True
- Fix a bug in windowing which made us always lose 1 frame
- Bump ray version, unset vars not used in CI
- Dimensions aligned to i4
- Race condition in --copy-weights-to
Release v1.1.8
Known Issues
- Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.
Added
- Nemotron-Nano-12B-v2-VL as an alternative VLM captioning model
- Gemini API as an option for video captioning
- Improved helm chart to simplify vanilla k8s deployment
- Upgraded cosmos-xenna to 0.1.7 for better scalability
- Significantly improved test coverage
Fixed
- Fixed a bug in clip windowing utils which caused wrong caption for later windows within a clip
- Allow underscore in S3 bucket name
- Set cudagraph mode to piecewise for Qwen-based VL models to mitigate failure with illegal memory access
- Improved exception handling in vllm-captioning stage setup and process
Documentation
- Added documentation for vllm_interface which simplifies the integration of new vLLM-powered VLMs for captioning.
Release v1.1.7
Known Issues
- Caption ordering bug: Inflight batching (enabled by default) can assign captions to the wrong videos. Fixed in
v1.2.0.
Added
- Azure OpenAI API as an option to enhance captions
- Increased test coverage for vllm_interface to 100%
- Azure Blob Storage support for Slurm deployments
- Support multipart result zips
- Update python version to 3.10.19
- Retry vllm captioning on engine failure
Fixed
- Switch torch package to pypi in unified
- Resolve hello_world pipeline execution with transformers
- vLLM stage 2 captioning bug