Skip to content

Releases: boettiger-lab/mcp-data-server

v0.5.1 — retry-rescued parents enqueue their sub-children

18 Apr 19:26
8b8be9c

Choose a tag to compare

Patch fix for v0.5.0. If a parent-of-children (e.g. us-census) fails the first-pass fetch and gets rescued by the retry pass, its sub-children are now also fetched in the retry pool — previously they'd silently go missing from the catalog.

Observed on the 2026-04-18 dev rollout: v0.5.0 loaded 75 of the expected 81 collections because us-census retried successfully but its 6 census-year sub-collections were never enqueued.

Test added: test_retry_rescued_parent_fetches_its_subchildren. 96/96 tests passing.

v0.5.0 — STAC retry + exit on root failure

18 Apr 19:15
3756774

Choose a tag to compare

Retry pass for transient child failures

STAC catalog children that time out on the first attempt now get one automatic retry in a fresh pool with a longer per-child timeout.

  • New env var: `STAC_CHILD_RETRY_TIMEOUT` (default 8s).
  • A retry that succeeds clears the original error from `STAC_LOAD_ERRORS`; a retry that also fails leaves the error in place.
  • Rescues the tail-latency case observed on v0.4.0's first real-world deploy (2 of 81 collections timed out at the 5s ceiling — both were recoverable with a slightly longer timeout).

Exit on root-catalog failure

When the root STAC catalog JSON is unreachable at startup, the server now `sys.exit(1)` rather than starting uvicorn with an empty catalog. Kubernetes restarts the pod; next attempt tries fresh S3 conditions.

Partial catalogs (some children failed) still serve — that's unchanged from v0.4.0. Only total failure triggers exit.

Default concurrency bumped

`STAC_FETCH_CONCURRENCY` default raised from 8 to 16 so the main pool plus the retry pass both fit within the readiness-probe budget. The full-load cold-start worst case (all 63 fetches succeed on retry) drops from ~40s to ~28s.

Upgrade notes

All changes are backwards-compatible for existing callers. New env var `STAC_CHILD_RETRY_TIMEOUT` is optional. Behavior change worth knowing about: pods that previously stayed up with an empty catalog when root failed will now exit on restart; that's intentional (k8s retries faster than a broken-but-serving pod).

v0.4.0 — STAC catalog resilience

18 Apr 15:57
31d8233

Choose a tag to compare

Resilient STAC catalog loader (Fixes #65)

fetch_stac_catalog() now survives slow / partially-failing S3:

  • Split timeouts: STAC_ROOT_TIMEOUT (default 15s, hard prerequisite) and STAC_CHILD_TIMEOUT (default 5s, individually skippable). Back-compat: STAC_TIMEOUT alone still works as a single knob.
  • Bounded parallelism via ThreadPoolExecutor with dynamic enqueue — parent and sub-child fetches share an 8-worker pool by default; tune with STAC_FETCH_CONCURRENCY. Sub-children are submitted as soon as their parent's JSON arrives, keeping the pool saturated without static wave boundaries.
  • Partial-result fallback: per-child failures are recorded in a new module-level STAC_LOAD_ERRORS dict instead of aborting the whole load. list_datasets() appends a ⚠️ footer when errors exist so agents see "N collections could not be loaded" rather than treating a transient-failure collection as nonexistent.
  • Cache preservation: cache-miss refetches no longer wipe previously-loaded state if the new load returns zero datasets — the old snapshot is kept until a subsequent load succeeds.

Worst-case startup wall-clock is bounded to ~40s (within the readiness probe budget) even under pathological S3 tail latency.

Startup performance (#66)

  • Dropped a duplicate catalog walk at module import — halves the startup S3 load.
  • Removed the dead fetch_stac_collections() / DATA_CATALOG machinery that only tests consumed.

Internal / ops

  • Dev deployment right-sized proposal (PR #67; merged to main, cluster spec still pending a successful kubectl apply).
  • Prod scaled from 4 to 2 replicas (#68).

Upgrade notes

No code changes required on the client side. New env vars (STAC_ROOT_TIMEOUT, STAC_CHILD_TIMEOUT, STAC_FETCH_CONCURRENCY) are all optional. The STAC_TIMEOUT env var still works as a back-compat single knob.

v0.3.0 — Dynamic hex tile endpoint, get_collection tool, programmatic access docs

16 Apr 16:14
ff9b615

Choose a tag to compare

What's New

Dynamic MVT tile endpoint (register_hex_tiles)

New MCP tool that materializes H3 hex data into a resolution pyramid on S3 and returns a MapLibre-compatible vector tile URL. For datasets too large to return as a table (~100k+ cells), agents can now generate interactive hex map layers on the fly.

  • register_hex_tiles(sql, finest_res, ...) → writes partitioned parquet pyramid, returns tile_url_template
  • /tiles/hex/{hash}/{z}/{x}/{y}.pbf endpoint serves MVT tiles via DuckDB ST_AsMVT
  • Automatic resolution switching: coarser hexes at low zoom, finest at high zoom
  • Content-addressed hashing: identical queries dedupe naturally

get_collection tool

Returns structured STAC collection metadata as JSON for programmatic use. Unlike get_stac_details (markdown for LLM consumption), this returns the raw collection dict with all assets, per-asset STAC extension fields, and child collection IDs. Intended for app code that builds map layers and system prompts programmatically.

Programmatic access docs & examples

  • R examples using ellmer (agent) and httr2 (direct query)
  • Python examples using langchain-mcp-adapters (agent) and httpx (direct query)
  • VitePress docs page at /guide/programmatic-access

Bug fix

  • Tile connection S3 secret: build_tile_connection() was missing the Ceph S3 secret configuration, causing register_hex_tiles to fail with NoSuchBucket. Now matches the query tool's S3 setup.

Test summary

  • 105 unit/integration tests passing
  • Live-tested on dev server: pyramid materialization, tile serving at z2–z12, no regressions on existing tools

v0.2.0 — Auth, query reliability, H3 spatial guidance

16 Apr 05:27
f31d233

Choose a tag to compare

What's new since v0.1.0

Security & Auth

  • Optional Bearer token auth via MCP_AUTH_TOKEN env var (#43) — set it to require authentication on all endpoints

Query reliability

  • Fixed GEOMETRY crash on GeoParquet queries — geometry columns are now silently dropped from tabular output (#48)
  • Removed s3_allow_recursive_globbing=false workaround — fixed upstream in DuckDB 1.5.1 (#37)

Performance & infrastructure

  • Least-conn load balancing and 10-minute query timeout on the HAProxy ingress (#46)

H3 spatial guidance (agent prompt improvements)

  • Prohibit DuckDB spatial ops; require H3 hash joins for all geographic filtering (#49)
  • Strengthen cross-resolution join guidance to prevent h3_cell_to_children misuse (#40)
  • Add resolution direction table and pre-computed parent column join pattern (#45)
  • Area column warning, multi-resolution table, cross-resolution joins (#35)

STAC catalog improvements

  • get_stac_details: directory listing for parent collections, suppress parent columns (#36)
  • Replace placeholder S3 paths to prevent path hallucination (#38)

Query optimization hints

  • Case-insensitive text search and apostrophe escaping guidance
  • S3 credentials: scoped per-request isolation with s3_scope support

Docs

  • Fix Claude Code install instructions in README (#51)

Upgrade notes

No breaking changes. The server is fully backwards-compatible with v0.1.0 clients.
If you set MCP_AUTH_TOKEN, all requests must include Authorization: Bearer <token>.

v0.1.0

31 Mar 23:59

Choose a tag to compare

What's included

  • MCP server with three tools: browse_stac_catalog, get_stac_details, and query
  • DuckDB-powered SQL against S3 Parquet files with H3 spatial indexing
  • STAC catalog integration for dynamic dataset discovery
  • Client-supplied credentials for private STAC catalogs and S3 buckets
  • s3_scope parameter for queries mixing private and public S3 endpoints
  • Hosted endpoint at https://duckdb-mcp.nrp-nautilus.io/mcp on NRP Nautilus k8s
  • MCP resources (catalog://list, catalog://{dataset_id}) and geospatial-analyst prompt
  • VitePress documentation site

Datasets served

GLWD, Vulnerable Carbon, NCP, WDPA, Ramsar Sites, HydroBASINS, Countries & Regions, iNaturalist, Corruption Index 2024