feat: support Spark 4.1 TimeType (time-of-day)#457
Open
LuciferYang wants to merge 1 commit intolance-format:mainfrom
Open
feat: support Spark 4.1 TimeType (time-of-day)#457LuciferYang wants to merge 1 commit intolance-format:mainfrom
LuciferYang wants to merge 1 commit intolance-format:mainfrom
Conversation
Add end-to-end support for Spark 4.1's TimeType (time-of-day stored as nanoseconds since midnight). Uses reflection for cross-version compat so the base module still compiles against older Spark versions. Write path: - LanceArrowUtils maps TimeType -> ArrowType.Time(NANOSECOND, 64) - TimeNanoWriter passes nanos through directly (no conversion needed) Read path: - LanceArrowUtils maps ArrowType.Time -> TimeType (or LongType on <4.1) - TimeUnitAccessor normalizes all 4 Arrow Time units to nanoseconds - LanceArrowColumnVector dispatches Time vectors to TimeUnitAccessor Filter pushdown: - Reject LocalTime filters since Lance planner lacks TIME support - Defensive compileValue(LocalTime) for future upstream compatibility Tests: - FilterPushDownTest: LocalTime rejection (comparison, In, composite) - LanceArrowUtilsSuite: ArrowType.Time <-> Spark type mapping - LanceArrowColumnVectorSuite: all 4 Time vector read-path assertions - TimeTypeRoundtripTest: E2E roundtrip test for Spark 4.1 module
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Spark 4.1 introduces
TimeType, representing time-of-day (00:00:00.000000to23:59:59.999999) stored internally as nanoseconds since midnight. Lance already supports ArrowTime32/Time64types at the core level; this PR adds read, write, and filter pushdown support in the lance-spark connector layer.Since
TimeTypeonly exists in Spark 4.1+, and the base module must compile against older Spark versions, allTimeTypereferences use reflection. This follows the same cross-version compatibility pattern already established in the project — e.g.,Float2Vector(Arrow 18+) uses class name checks rather than direct imports to avoid compile-time dependencies.Changes
New files
TimeUnitAccessor.java— read-path accessor for Arrow Time vectorsTimeTypeRoundtripTest.java— E2E roundtrip test in the Spark 4.1 moduleWrite path
LanceArrowUtils.toArrowTypemapsTimeTypetoArrowType.Time(NANOSECOND, 64)via class name check to avoid compile-time dependencyLanceArrowWriteraddsTimeNanoWriter— Spark stores nanos internally, Arrow Time(NANOSECOND) expects nanos, so the value is passed through directly with no conversionRead path
LanceArrowUtils.fromArrowFieldadds anArrowType.Timebranch that resolves the Spark type viaTimeUtils.resolveSparkTimeType()— returnsTimeTypeon Spark 4.1+, falls back toLongTypeon older versionsTimeUnitAccessorhandles all four Arrow Time vectors (TimeSec / TimeMilli / TimeMicro / TimeNano) and normalizes them to nanoseconds for Spark. Time32 vectors returnintand Time64 returnlongwith no common base class, so aTypedGetterfunctional interface is used for typed access to avoid per-row boxing overheadLanceArrowColumnVectordispatches the four Time vector types toTimeUnitAccessorFilter pushdown
FilterPushDown.isFilterSupportednow checks filter value types and rejectsLocalTimevalues. The Lance planner layer does not yet support the TIME type, so time predicates are evaluated as post-scan filters on the Spark sidecompileValueincludes a defensiveLocalTimebranch — currently unreachable, kept for forward compatibility when Lance upstream adds TIME supportTests
FilterPushDownTest: LocalTime filter rejection covering comparison operators,In,Not/Or/Andcomposites, andprocessFiltersbucketingLanceArrowUtilsSuite:ArrowType.Time↔ Spark type bidirectional mappingLanceArrowColumnVectorSuite: read-path assertions for all four Time vector types, verifying normalization to nanosTimeTypeRoundtripTest(4.1 module): end-to-end write-read roundtrip