feat: add custom Lance metrics to trace read-path scan performance by summaryzb · Pull Request #460 · lance-format/lance-spark

summaryzb · 2026-04-20T14:16:00Z

Summary

Adds custom metrics to the Lance Spark read path using Spark's DataSource V2 CustomMetric API, enabling per-task timing and counter instrumentation that surfaces on the Spark UI Scan node. Six metrics are tracked: fragments scanned, batches read, dataset open time, scanner creation time, batch load time, and a derived total scan time.

Motivation

Implement #459

before this pr

after this pr

Approach

The implementation uses Spark's CustomMetric / CustomTaskMetric API, which is the standard DataSource V2 mechanism for surfacing connector-specific metrics in the Spark UI.

Metric definitions (LanceCustomMetrics): Six CustomSumMetric inner classes define the metrics. Each has a public no-arg constructor (required by Spark's reflection-based instantiation). A static allMetrics() method returns all definitions for LanceScan.supportedCustomMetrics(). The CustomSumMetric base class handles aggregation across tasks automatically.

Executor-side tracking (LanceReadMetricsTracker): A thread-confined accumulator that lives inside each PartitionReader. It collects per-phase nanosecond timings and counters via simple add*() methods. The currentMetricsValues() method returns a snapshot array of CustomTaskMetric instances -- Spark calls this after each next() invocation. The derived scanTimeNs metric is computed as datasetOpenTimeNs + scannerCreateTimeNs + batchLoadTimeNs.

Instrumentation points: Timing is captured at three boundaries in the scan lifecycle:

LanceFragmentScanner.create() wraps Dataset.open() and fragment.newScan() with System.nanoTime() measurements, storing the durations as instance fields.
LanceFragmentColumnarBatchScanner.loadNextBatch() measures each ArrowReader.loadNextBatch() call.
LanceColumnarPartitionReader reads these timings from the scanner and feeds them into its LanceReadMetricsTracker.

The same pattern is applied to LanceCountStarPartitionReader (pushed-down COUNT(*)) and LanceRowPartitionReader (delegates to the columnar reader). All three reader types override currentMetricsValues() to report metrics to Spark.

Test Coverage

Metric count: allMetrics() returns exactly 6 metrics.
Name uniqueness: all metric names are distinct.
Name correctness: metric names match the string constants defined in LanceCustomMetrics.
Description presence and uniqueness: every metric has a non-empty, unique description.
Sum aggregation: CustomSumMetric.aggregateTaskMetrics() correctly sums values (including empty array).
Reflection instantiation: all six metric inner classes can be constructed via no-arg reflection (as Spark does at runtime).
Tracker initial state: all counters start at zero.
Tracker accumulation: repeated add*() calls accumulate correctly; derived scanTimeNs equals sum of three sub-timings.
Tracker bulk add: addFragmentsScanned() supports multi-fragment increments.
Tracker-definition consistency: task metric names match definition names (required for Spark aggregation).
Integration -- columnar read: a SELECT x, y query produces non-zero values for all six metrics, and scanTimeNs == datasetOpenTimeNs + scannerCreateTimeNs + batchLoadTimeNs after Spark aggregation.
Integration -- COUNT(*) with filter: the LanceCountStarPartitionReader path also produces non-zero values for all metrics.
Integration -- basic read/count completions: verifies that metrics instrumentation does not break normal query execution.

Change-Id: I63dd17d7e8469c27a73251d7eca3ac373d279d7f

hamersaw · 2026-04-27T17:56:52Z

Some review notes on the metrics approach — splitting into perf concerns, naming/correctness fixes, and gaps in coverage.

Performance concerns

The overall approach is sound — thread-confined long counters, no locks, and nanoTime() overhead per batch (~50ns total) is negligible against the JNI + Arrow IPC cost it's measuring. Two things to address before merge:

Per-call allocation in currentMetricsValues(). LanceReadMetricsTracker.currentMetricsValues() allocates a fresh CustomTaskMetric[6] plus 6 anonymous inner-class instances every call. Spark invokes this once per PartitionReader.next(). That's per-batch for columnar (fine), but LanceRowPartitionReader.currentMetricsValues() delegates straight through, so for the row-oriented read path it's 7 allocations per row. Cache the array and metric instances once at construction and mutate the long fields in place — Spark snapshots the value on read, reuse is safe.
Redundant field shuttle for batch load time. LanceFragmentColumnarBatchScanner.lastBatchLoadTimeNs exists only so the partition reader can pull it back out via a getter. Pass the tracker into the scanner and have it call tracker.addBatchLoadTimeNs(...) directly. Removes a field, a getter, and a per-batch coupling point.

One unrelated correctness win bundled in worth noting: LanceCountStarPartitionReader.get() previously ran computeCount() on every call — now cached behind if (currentBatch == null). Spark only calls get() once per next()==true so it likely never bit anyone, but worth keeping.

Naming / correctness fixes

Format time metrics for the UI. CustomSumMetric returns the raw long, so the UI will show "47382910283" instead of "47.4 s". Subclass once with a ns-formatting aggregateTaskMetrics and use it for every time metric:

public abstract class CustomNsTimeMetric extends CustomSumMetric {
  @Override public String aggregateTaskMetrics(long[] taskMetrics) {
    long total = 0; for (long v : taskMetrics) total += v;
    return formatDuration(total); // \"1.2 s\", \"350 ms\", \"47 us\"
  }
}

SQLMetrics.stringValue is private[sql], so reimplement the formatter — it's ~10 lines.

Drop or rename scanTimeNs. It's open + create + load summed across fragments and batches in a partition — not a wall-clock scan duration. The name invites misinterpretation. Either delete it (the components are already exposed) or rename to lanceJniTotalTimeNs. Removing it also retires the "must be updated if new phases are added" comment in currentMetricsValues() — that comment is admitting a maintenance trap that doesn't need to exist.
Match Spark naming conventions. Built-in metrics use numX for counts:
- fragmentsScanned → numFragmentsScanned
- batchesRead → numBatchesLoaded (verb consistency with loadNextBatch / batchLoadTimeNs)
Verify the JNI surface for native stats. Before adding the row/byte counters in the next section, check whether lance-jni's LanceScanner exposes a getStats()-style call. If yes, pull from there — it's authoritative. If no, file an upstream ask.

Gaps in coverage

The current set tells you "how long did Lance take" but not "where did the time go" or "did pushdown work." Proposed final metric set:

Counts (CustomSumMetric):

Metric	Description
`numFragmentsScanned`	Lance fragments actually opened by this task.
`numFragmentsPruned`	Fragments eliminated by stats / zonemap before scan. Critical for verifying pushdown — silent regressions otherwise.
`numBatchesLoaded`	Arrow batches returned from the JNI scanner.
`numRowsScanned`	Rows read from storage before filter evaluation. Pair with Spark's built-in `numOutputRows` to compute filter selectivity.
`numIndexLookups`	Times a Lance scalar/vector index was consulted. Tells you whether a query stayed on the index path.
`bytesRead`	Bytes pulled from object store / disk by this task. The single most important diagnostic metric — answers IO-bound vs CPU-bound. Format with a size formatter (KB/MB/GB).

Times (CustomNsTimeMetric from fix #3):

Metric	Description
`datasetOpenTimeNs`	Total time in `Utils.openDatasetBuilder(...).build()` summed across fragments in the partition.
`scannerCreateTimeNs`	Total time in `fragment.newScan(...)` summed across fragments.
`indexLookupTimeNs`	Time spent in index probes. Sourced from native stats if available.
`ioWaitTimeNs`	Time blocked on storage IO inside `loadNextBatch`. Requires native split — file an ask if `lance-jni` doesn't surface it.
`decodeTimeNs`	Time spent decompressing / decoding inside `loadNextBatch` (the non-IO portion). Same native-source dependency as above.
`batchLoadTimeNs`	Total `loadNextBatch` wall time. Keep even if IO/decode are split — useful as a sanity check (`ioWait + decode ≈ batchLoad`).

Drop: scanTimeNs (derived, misleading — see fix #4).

If lance-jni doesn't yet expose bytesRead, numFragmentsPruned, numRowsScanned, numIndexLookups, ioWaitTimeNs, or decodeTimeNs, ship the Java-measurable ones now (fragment/batch counts, the three timing phases, plus numRowsScanned from VectorSchemaRoot.getRowCount()) and gate the rest on an upstream change. Don't fake the missing ones with Java-side approximations — that's worse than not having them.

The two I'd treat as blocking are bytesRead and numFragmentsPruned. The rest is nice-to-have.

hamersaw · 2026-04-27T18:19:39Z

This is awesome! Appreciate the effort here, it will be super useful. I left a comment ^^^ kind of just dumping my claude session during review (sorry for the AI comment). My takeaways:
(1) I like the idea of formatting the nanosecond times into seconds for readability
(2) I think we need to add a little more information to the metrics and solidify a naming scheme (ex. num* for counters, *TimeNS for durations, etc. The actual metric values were from me iterating on this a bit.
(3) Lets update the performance docs with a breakdown of these values + description. I think this could help debug a lot of performance issues.

feat: add custom Lance metrics to trace read-path scan performance

8c42871

Change-Id: I63dd17d7e8469c27a73251d7eca3ac373d279d7f

github-actions Bot added the enhancement New feature or request label Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add custom Lance metrics to trace read-path scan performance#460

feat: add custom Lance metrics to trace read-path scan performance#460
summaryzb wants to merge 1 commit intolance-format:mainfrom
summaryzb:feat/read-path-custom-metrics

summaryzb commented Apr 20, 2026

Uh oh!

hamersaw commented Apr 27, 2026

Uh oh!

hamersaw commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

summaryzb commented Apr 20, 2026

Summary

Motivation

before this pr

after this pr

Approach

Test Coverage

Uh oh!

hamersaw commented Apr 27, 2026

Performance concerns

Naming / correctness fixes

Gaps in coverage

Uh oh!

hamersaw commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants