Releases: boettiger-lab/mcp-data-server
v0.5.1 — retry-rescued parents enqueue their sub-children
Patch fix for v0.5.0. If a parent-of-children (e.g. us-census) fails the first-pass fetch and gets rescued by the retry pass, its sub-children are now also fetched in the retry pool — previously they'd silently go missing from the catalog.
Observed on the 2026-04-18 dev rollout: v0.5.0 loaded 75 of the expected 81 collections because us-census retried successfully but its 6 census-year sub-collections were never enqueued.
Test added: test_retry_rescued_parent_fetches_its_subchildren. 96/96 tests passing.
v0.5.0 — STAC retry + exit on root failure
Retry pass for transient child failures
STAC catalog children that time out on the first attempt now get one automatic retry in a fresh pool with a longer per-child timeout.
- New env var: `STAC_CHILD_RETRY_TIMEOUT` (default 8s).
- A retry that succeeds clears the original error from `STAC_LOAD_ERRORS`; a retry that also fails leaves the error in place.
- Rescues the tail-latency case observed on v0.4.0's first real-world deploy (2 of 81 collections timed out at the 5s ceiling — both were recoverable with a slightly longer timeout).
Exit on root-catalog failure
When the root STAC catalog JSON is unreachable at startup, the server now `sys.exit(1)` rather than starting uvicorn with an empty catalog. Kubernetes restarts the pod; next attempt tries fresh S3 conditions.
Partial catalogs (some children failed) still serve — that's unchanged from v0.4.0. Only total failure triggers exit.
Default concurrency bumped
`STAC_FETCH_CONCURRENCY` default raised from 8 to 16 so the main pool plus the retry pass both fit within the readiness-probe budget. The full-load cold-start worst case (all 63 fetches succeed on retry) drops from ~40s to ~28s.
Upgrade notes
All changes are backwards-compatible for existing callers. New env var `STAC_CHILD_RETRY_TIMEOUT` is optional. Behavior change worth knowing about: pods that previously stayed up with an empty catalog when root failed will now exit on restart; that's intentional (k8s retries faster than a broken-but-serving pod).
v0.4.0 — STAC catalog resilience
Resilient STAC catalog loader (Fixes #65)
fetch_stac_catalog() now survives slow / partially-failing S3:
- Split timeouts:
STAC_ROOT_TIMEOUT(default 15s, hard prerequisite) andSTAC_CHILD_TIMEOUT(default 5s, individually skippable). Back-compat:STAC_TIMEOUTalone still works as a single knob. - Bounded parallelism via
ThreadPoolExecutorwith dynamic enqueue — parent and sub-child fetches share an 8-worker pool by default; tune withSTAC_FETCH_CONCURRENCY. Sub-children are submitted as soon as their parent's JSON arrives, keeping the pool saturated without static wave boundaries. - Partial-result fallback: per-child failures are recorded in a new module-level
STAC_LOAD_ERRORSdict instead of aborting the whole load.list_datasets()appends a⚠️ footer when errors exist so agents see "N collections could not be loaded" rather than treating a transient-failure collection as nonexistent. - Cache preservation: cache-miss refetches no longer wipe previously-loaded state if the new load returns zero datasets — the old snapshot is kept until a subsequent load succeeds.
Worst-case startup wall-clock is bounded to ~40s (within the readiness probe budget) even under pathological S3 tail latency.
Startup performance (#66)
- Dropped a duplicate catalog walk at module import — halves the startup S3 load.
- Removed the dead
fetch_stac_collections()/DATA_CATALOGmachinery that only tests consumed.
Internal / ops
- Dev deployment right-sized proposal (PR #67; merged to main, cluster spec still pending a successful
kubectl apply). - Prod scaled from 4 to 2 replicas (#68).
Upgrade notes
No code changes required on the client side. New env vars (STAC_ROOT_TIMEOUT, STAC_CHILD_TIMEOUT, STAC_FETCH_CONCURRENCY) are all optional. The STAC_TIMEOUT env var still works as a back-compat single knob.
v0.3.0 — Dynamic hex tile endpoint, get_collection tool, programmatic access docs
What's New
Dynamic MVT tile endpoint (register_hex_tiles)
New MCP tool that materializes H3 hex data into a resolution pyramid on S3 and returns a MapLibre-compatible vector tile URL. For datasets too large to return as a table (~100k+ cells), agents can now generate interactive hex map layers on the fly.
register_hex_tiles(sql, finest_res, ...)→ writes partitioned parquet pyramid, returnstile_url_template/tiles/hex/{hash}/{z}/{x}/{y}.pbfendpoint serves MVT tiles via DuckDBST_AsMVT- Automatic resolution switching: coarser hexes at low zoom, finest at high zoom
- Content-addressed hashing: identical queries dedupe naturally
get_collection tool
Returns structured STAC collection metadata as JSON for programmatic use. Unlike get_stac_details (markdown for LLM consumption), this returns the raw collection dict with all assets, per-asset STAC extension fields, and child collection IDs. Intended for app code that builds map layers and system prompts programmatically.
Programmatic access docs & examples
- R examples using
ellmer(agent) andhttr2(direct query) - Python examples using
langchain-mcp-adapters(agent) andhttpx(direct query) - VitePress docs page at
/guide/programmatic-access
Bug fix
- Tile connection S3 secret:
build_tile_connection()was missing the Ceph S3 secret configuration, causingregister_hex_tilesto fail withNoSuchBucket. Now matches the query tool's S3 setup.
Test summary
- 105 unit/integration tests passing
- Live-tested on dev server: pyramid materialization, tile serving at z2–z12, no regressions on existing tools
v0.2.0 — Auth, query reliability, H3 spatial guidance
What's new since v0.1.0
Security & Auth
- Optional Bearer token auth via
MCP_AUTH_TOKENenv var (#43) — set it to require authentication on all endpoints
Query reliability
- Fixed GEOMETRY crash on GeoParquet queries — geometry columns are now silently dropped from tabular output (#48)
- Removed
s3_allow_recursive_globbing=falseworkaround — fixed upstream in DuckDB 1.5.1 (#37)
Performance & infrastructure
- Least-conn load balancing and 10-minute query timeout on the HAProxy ingress (#46)
H3 spatial guidance (agent prompt improvements)
- Prohibit DuckDB spatial ops; require H3 hash joins for all geographic filtering (#49)
- Strengthen cross-resolution join guidance to prevent
h3_cell_to_childrenmisuse (#40) - Add resolution direction table and pre-computed parent column join pattern (#45)
- Area column warning, multi-resolution table, cross-resolution joins (#35)
STAC catalog improvements
get_stac_details: directory listing for parent collections, suppress parent columns (#36)- Replace placeholder S3 paths to prevent path hallucination (#38)
Query optimization hints
- Case-insensitive text search and apostrophe escaping guidance
- S3 credentials: scoped per-request isolation with
s3_scopesupport
Docs
- Fix Claude Code install instructions in README (#51)
Upgrade notes
No breaking changes. The server is fully backwards-compatible with v0.1.0 clients.
If you set MCP_AUTH_TOKEN, all requests must include Authorization: Bearer <token>.
v0.1.0
What's included
- MCP server with three tools:
browse_stac_catalog,get_stac_details, andquery - DuckDB-powered SQL against S3 Parquet files with H3 spatial indexing
- STAC catalog integration for dynamic dataset discovery
- Client-supplied credentials for private STAC catalogs and S3 buckets
s3_scopeparameter for queries mixing private and public S3 endpoints- Hosted endpoint at
https://duckdb-mcp.nrp-nautilus.io/mcpon NRP Nautilus k8s - MCP resources (
catalog://list,catalog://{dataset_id}) andgeospatial-analystprompt - VitePress documentation site
Datasets served
GLWD, Vulnerable Carbon, NCP, WDPA, Ramsar Sites, HydroBASINS, Countries & Regions, iNaturalist, Corruption Index 2024