SparseSerialization should perserve whether is numpy

**Describe the bug**
It is expected that the query result for numeric arrays when `use_numper/columnar=True` should be numpy.ndarray instead of python list/tuple. However when your column is sparse (i.e., majority of the data is of same value like 0/-1/nan), the behavior after the introduction of sparse serialization becomes the later. The implementation does not take into account whether it is a numpy column or not.

See commit 4de5b2b21e4c345b4a30cb2a2e2766ef20c291f3

https://github.com/mymarilyn/clickhouse-driver/blame/49afa09cede2e904090d46b44c1a059bec14c598/clickhouse_driver/columns/base.py#L49

```python

    def apply_sparse(self, items):
        default = self.column.null_value
        if self.column.after_read_items:
            default = self.column.after_read_items([default])[0]

        rv = [default] * (self.items_total - 1)
        for item_number, i in enumerate(self.sparse_indexes):
            rv[i - 1] = items[item_number]

        return rv
```

**To Reproduce**
Read any sparse column with  `use_numper/columnar=True`

**Expected behavior**
Returns a numpy array as usual columns.

**Versions**
After commit 4de5b2b21e4c345b4a30cb2a2e2766ef20c291f3

**Suggest implementation**
Add another NumpyColumnSparseSerialization that
1. save sparse indexes in numpy int array
2. apply_sparse simply create a buffer with np.full, and buf[index]=items.
3. it is recommended to implement `read_sparse` in a compiled way.

Or ad hoc introduce such simple fix:
```python

    def apply_sparse(self, items):
        default = self.column.null_value
        if hasattr(self.column, "dtype") and not self.column.nullable:
            import numpy as np
            rv = np.full((self.items_total - 1,), dtype=items.dtype, fill_value=default)
            rv[np.array(self.sparse_indexes)-1] = items
            return rv

        if self.column.after_read_items:
            default = self.column.after_read_items([default])[0]

        rv = [default] * (self.items_total - 1)
        for item_number, i in enumerate(self.sparse_indexes):
            rv[i - 1] = items[item_number]

        return rv
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SparseSerialization should perserve whether is numpy #499

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

SparseSerialization should perserve whether is numpy #499

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions