Skip to content

MAST: Enable cloud dataset by default#3534

Open
snbianco wants to merge 6 commits intoastropy:mainfrom
snbianco:enable-cloud-dataset
Open

MAST: Enable cloud dataset by default#3534
snbianco wants to merge 6 commits intoastropy:mainfrom
snbianco:enable-cloud-dataset

Conversation

@snbianco
Copy link
Copy Markdown
Contributor

@snbianco snbianco commented Feb 16, 2026

Current Behavior

  • Access to cloud dataset is not enabled by default.
  • Users have to explicitly run the enable_cloud_dataset function before they use any cloud-related functions.

Proposed Behavior

  • The cloud dataset is enabled by default, so users do not have to explicitly call enable_cloud_dataset.
  • To disable the cloud dataset, users can alter a config value or call the disable_cloud_dataset function.

Advantages

  • Faster downloads (in some cases)
  • Removes a hidden step that users may not even know is available to them.
  • More in line with the direction that MAST is heading re. Roman and other large datasets
  • More convenient for users in cloud platforms
  • Fewer download failures
  • Reduce operational load on MAST servers

What Changes for Users?

  • No longer have to call enable_cloud_dataset explicitly
  • Cloud downloads will be preferred automatically.
  • Files in the cloud will be pulled from there instead of MAST servers.
  • This is just default behavior and can be changed.
  • Cloud dataset will NOT be enabled if the user does not have the prerequisite packages (boto3, botocore)
  • Users can change whether the cloud dataset is automatically enabled with a configuration variable

Other Things to Note

  • Cloud access is instantiated lazily only when a relevant method is called (download_file, download_products, get_cloud_uri(s))
  • Fixes a bug introduced in Get cloud missions dynamically and fix cloud download workflow #3488 where botocore, an optional package, is imported without a guard. Now, both boto3 and botocore are imported with try blocks and ImportError is properly handled.
  • Observations internally keeps track of whether the cloud dataset is explicitly enabled or disabled. If not explicitly enabled by the user (they call enable_cloud_dataset), warning messages are not logged when a product cannot be found in the cloud and the download falls back to on-prem. This is to avoid clogging up the console and introducing warning messages where there were none before.

Close #3546

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 85.07463% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.13%. Comparing base (6a3c231) to head (1193295).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
astroquery/mast/cloud.py 69.23% 4 Missing ⚠️
astroquery/exceptions.py 0.00% 3 Missing ⚠️
astroquery/mast/observations.py 94.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3534      +/-   ##
==========================================
+ Coverage   73.09%   73.13%   +0.04%     
==========================================
  Files         219      219              
  Lines       20592    20640      +48     
==========================================
+ Hits        15052    15096      +44     
- Misses       5540     5544       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@snbianco snbianco marked this pull request as ready for review February 16, 2026 23:25
@snbianco snbianco requested a review from bsipocz February 16, 2026 23:25
@bsipocz bsipocz added this to the 0.4.12 milestone Feb 23, 2026
Copy link
Copy Markdown
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, but it still needs a rebase to resolve the conflicts.

Also, I have one minor comment, not a blocker but maybe we can cleanup a little bit based on the answer.

Comment on lines +27 to +30
try:
from botocore.exceptions import ClientError
except ImportError:
ClientError = BotoCoreError = ()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this version dependent? If yes, do you know when the changes were added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. I do see that I left BotoCoreError in here by mistake, so I'll push a fix for that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why do we need this? We use the mock that already has the importskip, so I think the try/except should be enough for making the dependency optional (and maybe leave a comment that we do that try/except as this is optional dependency).

Also, in the code itself I think we should not override these errors but patch around the missing boto packages better; e.g. raise a warning sooner or maybe even an exception if the user wants to use cloud but doesn't have the optional dependencies installed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're saying. I went back into the code and refactored things a bit. The functions that require cloud access error out if the initialization fails and now give a more descriptive error message. Functions that don't require cloud access to work (downloads) will warn if cloud access can't be established, but fall back to an on-prem download by default.

@snbianco snbianco force-pushed the enable-cloud-dataset branch 2 times, most recently from ff73a75 to 0570107 Compare March 25, 2026 14:34
@snbianco snbianco force-pushed the enable-cloud-dataset branch from 0570107 to aad5436 Compare March 26, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mast import crashes when botocore optional dependency is not installed

2 participants