Skip to content

[proposal] remote storage refactor proposal#3750

Open
andaaron wants to merge 2 commits intoproject-zot:mainfrom
andaaron:storage3
Open

[proposal] remote storage refactor proposal#3750
andaaron wants to merge 2 commits intoproject-zot:mainfrom
andaaron:storage3

Conversation

@andaaron
Copy link
Copy Markdown
Contributor

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.46%. Comparing base (1ff6f60) to head (fc5fefd).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3750   +/-   ##
=======================================
  Coverage   91.46%   91.46%           
=======================================
  Files         193      193           
  Lines       27456    27456           
=======================================
  Hits        25112    25112           
  Misses       1520     1520           
  Partials      824      824           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@andaaron andaaron marked this pull request as ready for review January 31, 2026 09:46

### S3 Blob Path Structure
```
{rootDir}/storage/blobs/{algorithm}/{digest} # Shared across all repos
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we are using hard links for local storage, but are fine without any kind of reference counting links in S3. i.e. if the implementation logic for blobs without links is possible, then would local storage also work fine without the hardlinks?

I'm thinking of whether there any drawbacks of not using hard links in local storage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hardlinks saves space if you have multiple instances of the same blobs in different paths on the same drive. If we remove hardlinks disk usage would go up significantly.

For example: you push a base image and a derived image, and, because they have different repositories, the base image layers would take twice as much disk space for the 2 different instances, once for the base image, and once for the derived image repos.

Sure, we could implement a single instance of {rootDir}/storage/blobs/{algorithm}/{digest} also in case of local storage, but that would break tools reading directly from the disk and expecting each repository to be valid OCI layout of its own. One such cases is the product we developed zot for in the first place.

Comment on lines +50 to +53
if is.storageDriver != nil {
return path.Join(is.rootDir, "storage", ispec.ImageBlobsDir,
digest.Algorithm().String(), digest.Encoded())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In newer code, we should probably not assume that a non-empty storage driver == S3. Zot may in future support perhaps a new storage driver. Since we have the interfaces in place, we should check if it is really S3 and then return the appropriate path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There's also the GCS PR we should be merging soon.

}
```

**Key Point:** No configuration flag needed. S3 detection is automatic via `storeDriver != nil`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not onboard with this - if zot supports something new in future, all this code would need to be updated, rather we should check if the storage is really S3 before proceeding.

Copy link
Copy Markdown
Contributor Author

@andaaron andaaron Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads S3 because that's the only option besides local disk right now.
I would keep the existing dedupe logic only for local.
So it makes little difference what value storeDriver has as long as it is not local driver, for which we need the dedupe.
S3 and any other possible value in the future would use {rootDir}/storage/blobs/{algorithm}/{digest}.

3. Review results
4. Run migration: `zot migrate-s3-storage --config config.json`
5. **Restart zot** (automatically uses shared blob storage)
6. Verify and monitor
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a destructive operation (migrating files to the new path), will the migration logic also keep track of what was moved so that it can be rolled back if required?

Out of curiosity, what would a rollback cost on S3?

Copy link
Copy Markdown
Contributor Author

@andaaron andaaron Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are both good questions.

I think we can keep track of 1, but unless we keep it in the "cache DB", keeping it up to date after using start using the service would be practically impossible. Even if we track it in the cache DB, It complicates maintaining it.

Maybe we should have an entry there for the original blob location same as we do now, and simply use that instead of {rootDir}/storage/blobs/{algorithm}/{digest} in case of old blobs from pre-transition. I think this could potentially help cut the cost of copying stuff back in case of a rollback.

andaaron added 2 commits March 1, 2026 09:56
Signed-off-by: Andrei Aaron <andreifdaaron@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants