Skip to content

Support nested columns in tblproperties field metadata #434

@hamersaw

Description

@hamersaw

Currently, our encoding scheme puts the column name first, which can lead to a few issues. First, it may introduce ambiguity if a column is named the same as a different keyword, and second, how do we set these fields on nested datatypes (ex. struct fields, etc). I'm wondering if rather than prefixing the column, we should postfix it? So in this case, rather then:

'payload.lance.compression'       = 'zstd'
'payload.lance.compression-level' = '3'
'ts.lance.structural-encoding'    = 'miniblock'
'ts.lance.rle-threshold'          = '0.5'
'ts.lance.bss'                    = 'auto'

we would do something like:

'lance.compression.column.payload'       = 'zstd'
'lance.compression-level.column.payload' = '3'
'lance.structural-encoding.column.ts'    = 'miniblock'
'lance.rle-threshold.column.ts'          = '0.5'
'lance.bss.column.ts'                    = 'auto'

This would remove the potential ambiguity and allow us to set specifically on nested fields by using multiple identifiers after the column naming.

If this is the direction we decide to go, we can make the change backward compatible. For example, the existing options will respect both formats, but moving forward we should only support the new format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions