Skip to content

feat: support Spark 4.1 TimeType (time-of-day)#457

Open
LuciferYang wants to merge 1 commit intolance-format:mainfrom
LuciferYang:feat/timetype-support
Open

feat: support Spark 4.1 TimeType (time-of-day)#457
LuciferYang wants to merge 1 commit intolance-format:mainfrom
LuciferYang:feat/timetype-support

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

Summary

Spark 4.1 introduces TimeType, representing time-of-day (00:00:00.000000 to 23:59:59.999999) stored internally as nanoseconds since midnight. Lance already supports Arrow Time32/Time64 types at the core level; this PR adds read, write, and filter pushdown support in the lance-spark connector layer.

Since TimeType only exists in Spark 4.1+, and the base module must compile against older Spark versions, all TimeType references use reflection. This follows the same cross-version compatibility pattern already established in the project — e.g., Float2Vector (Arrow 18+) uses class name checks rather than direct imports to avoid compile-time dependencies.

Changes

New files

  • TimeUnitAccessor.java — read-path accessor for Arrow Time vectors
  • TimeTypeRoundtripTest.java — E2E roundtrip test in the Spark 4.1 module

Write path

  • LanceArrowUtils.toArrowType maps TimeType to ArrowType.Time(NANOSECOND, 64) via class name check to avoid compile-time dependency
  • LanceArrowWriter adds TimeNanoWriter — Spark stores nanos internally, Arrow Time(NANOSECOND) expects nanos, so the value is passed through directly with no conversion

Read path

  • LanceArrowUtils.fromArrowField adds an ArrowType.Time branch that resolves the Spark type via TimeUtils.resolveSparkTimeType() — returns TimeType on Spark 4.1+, falls back to LongType on older versions
  • New TimeUnitAccessor handles all four Arrow Time vectors (TimeSec / TimeMilli / TimeMicro / TimeNano) and normalizes them to nanoseconds for Spark. Time32 vectors return int and Time64 return long with no common base class, so a TypedGetter functional interface is used for typed access to avoid per-row boxing overhead
  • LanceArrowColumnVector dispatches the four Time vector types to TimeUnitAccessor

Filter pushdown

  • FilterPushDown.isFilterSupported now checks filter value types and rejects LocalTime values. The Lance planner layer does not yet support the TIME type, so time predicates are evaluated as post-scan filters on the Spark side
  • compileValue includes a defensive LocalTime branch — currently unreachable, kept for forward compatibility when Lance upstream adds TIME support

Tests

  • FilterPushDownTest: LocalTime filter rejection covering comparison operators, In, Not/Or/And composites, and processFilters bucketing
  • LanceArrowUtilsSuite: ArrowType.Time ↔ Spark type bidirectional mapping
  • LanceArrowColumnVectorSuite: read-path assertions for all four Time vector types, verifying normalization to nanos
  • TimeTypeRoundtripTest (4.1 module): end-to-end write-read roundtrip

Add end-to-end support for Spark 4.1's TimeType (time-of-day stored as
nanoseconds since midnight). Uses reflection for cross-version compat
so the base module still compiles against older Spark versions.

Write path:
- LanceArrowUtils maps TimeType -> ArrowType.Time(NANOSECOND, 64)
- TimeNanoWriter passes nanos through directly (no conversion needed)

Read path:
- LanceArrowUtils maps ArrowType.Time -> TimeType (or LongType on <4.1)
- TimeUnitAccessor normalizes all 4 Arrow Time units to nanoseconds
- LanceArrowColumnVector dispatches Time vectors to TimeUnitAccessor

Filter pushdown:
- Reject LocalTime filters since Lance planner lacks TIME support
- Defensive compileValue(LocalTime) for future upstream compatibility

Tests:
- FilterPushDownTest: LocalTime rejection (comparison, In, composite)
- LanceArrowUtilsSuite: ArrowType.Time <-> Spark type mapping
- LanceArrowColumnVectorSuite: all 4 Time vector read-path assertions
- TimeTypeRoundtripTest: E2E roundtrip test for Spark 4.1 module
@github-actions github-actions Bot added the enhancement New feature or request label Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant