Skip to content

SY-3272: Add Boolean Data Type Channel Support#2229

Open
emilbon99 wants to merge 49 commits intorcfrom
sy-3272-add-boolean-data-type-channel
Open

SY-3272: Add Boolean Data Type Channel Support#2229
emilbon99 wants to merge 49 commits intorcfrom
sy-3272-add-boolean-data-type-channel

Conversation

@emilbon99
Copy link
Copy Markdown
Contributor

@emilbon99 emilbon99 commented Apr 16, 2026

Issue Pull Request

Linear Issue

SY-3272

Description

Adds a first-class BoolT data type across the full stack per RFC 0036.

Three distinct representations per RFC 0036 §3.0:

  • In memory: byte-packed, canonical {0x00, 0x01}. The telem.Series density invariant is preserved; iterators, writers, Cesium readers, and client TypedArray views treat a bool sample identically to a uint8 sample.
  • On the wire: bit-packed, LSB-first. The Freighter frame codec packs 8 samples per byte on send and unpacks back to byte-packed Series on receive, dropping digital traffic 8x before any further compression.
  • On disk: byte-packed today, identical to the in-memory form. Future Cesium storage codec can compress independently.

Write paths normalize any nonzero source byte to 0x01 at the client boundary. Cesium storage required zero changes: BoolT falls through the fixed-density path.

What landed per language

Language Scope
Go BoolT in x/go/telem/data_type.go, extended types.Sized + FixedSample, NewSeries[bool]/UnmarshalSeries[bool]/NewSeriesFromAny with bool normalization, bit-packed frame codec in core/pkg/distribution/framer/codec/codec.go
TypeScript DataType.BOOLEAN with Uint8Array backing, atBoolean accessor, convertDataType handles BOOLEAN, bit-packed codec in client/ts/src/framer/codec.ts
Python DataType.BOOL mapped to np.bool_, list[bool] and np.bool_ array inference, _FROM_NUMPY[np.bool_] flipped from UINT8, bit-packed codec in client/py/synnax/framer/codec.py
C++ BOOL_T in details + public namespace, bool → BOOL_T via TYPE_INDEXES, cast normalization via std::visit, bit-packed codec in client/cpp/framer/codec.cpp

Wire format

ceil(N / 8) bytes LSB-first within each byte, with the existing per-series sample_count header telling the decoder how many bits to recover. Reference vector [1,0,1,1,0,0,0,1,1] encodes to [0x8D, 0x01], pinned in Go (codec_internal_test.go) and Python (test_frame_codec.py).

Tests

  • Unit: per-language data type tests, series construction tests, cast tests, codec round-trip tests with sample counts on partial-byte boundaries (1, 7, 8, 9, 17, ...).
  • Cross-language: reference vector test pinned in Go and Python.
  • End-to-end: Python (test_channel.py, test_frame_writer.py) and TypeScript (channel.spec.ts, writer.spec.ts, streamer.spec.ts) verify create + write + read + stream through a live server.

Out of scope (per RFC §5)

  • Driver migration (Modbus coils, LabJack DIO, NI digital lines currently Uint8T)
  • Arc type system integration
  • Console/Schematic rendering
  • Generalized per-type wire codecs (gorilla, delta, RLE)

Each warrants its own RFC.

Basic Readiness

  • I have performed a self-review of my code.
  • I have added relevant, automated tests to cover the changes.
  • I have updated documentation to reflect the changes.

Greptile Summary

This PR adds first-class bool/BOOL_T/DataType.BOOLEAN support across Go, TypeScript, Python, and C++. The codec design is well thought out: byte-packed in memory (0x00/0x01), bit-packed LSB-first on the wire (ceil(N/8) bytes), and byte-packed on disk, with the equalLens compression flag correctly using sample count throughout all four codec implementations.

The Go, TypeScript, and Python layers look correct. The C++ telem.h / series.h layer is incomplete: Series::at(const int&) returning SampleValue, operator<<, and write_casted(void*, size_t, DataType) all throw at runtime for BOOL_T, and avg<T>() also throws instead of summing uint8 values.

Confidence Score: 3/5

Safe to merge if C++ is not yet used in production paths that call Series::at(SampleValue), operator<<, or write_casted on boolean series; the three P1 gaps in series.h will throw at runtime when hit.

The Go server codec, TypeScript client, and Python client implementations are all correct and consistent. The C++ series.h polymorphic API has three runtime-throwing gaps for BOOL_T (at(), operator<<, write_casted) that must be fixed before boolean channels are used through any C++ code path that touches those methods.

x/cpp/telem/series.h — the polymorphic SampleValue dispatch methods (at, operator<<, write_casted, avg) all need BOOL_T branches added.

Important Files Changed

Filename Overview
x/cpp/telem/series.h BOOL_T added to constructors and DataType::cast(), but at(SampleValue), operator<<, write_casted, and avg() all throw at runtime for BOOL_T — P1/P2 gaps that will crash callers using the polymorphic API.
core/pkg/distribution/framer/codec/codec.go Adds wireSize, packBoolBits, and unpackBoolBits helpers; encoder/decoder correctly use sample count for BoolT in all three compression paths.
client/ts/src/framer/codec.ts TypeScript codec correctly handles BOOLEAN: encodes sample count as length field, bit-packed wire bytes, and unpacks on decode. Pack/unpack helpers match Go reference vector.
client/py/synnax/framer/codec.py Python codec correctly implements BOOL wire protocol; all three compression modes handle BOOL consistently.
client/cpp/framer/codec.cpp C++ codec correctly computes wire byte length, packs/unpacks bits, and calls s.write with size_=0 — safe because the series is freshly constructed.
x/go/telem/data_type.go Adds BoolT with density Bit8; InferDataType[bool] correct; IsVariable correctly excludes BoolT.
x/go/telem/series_factory.go Extends FixedSample to include bool; marshalFixed[bool] uses unsafe.CastSlice which is safe; castToBool covers all numeric variants.
x/ts/src/telem/telem.ts Adds DataType.BOOLEAN with Uint8Array and Density.BIT8; isNumeric, canSafelyCastTo, and convertDataType handle BOOLEAN correctly.
core/pkg/distribution/framer/codec/codec_internal_test.go Tests pack/unpack round-trips for key sample counts and pins the reference vector [1,0,1,1,0,0,0,1,1] → [0x8D, 0x01].

Sequence Diagram

sequenceDiagram
    participant W as Writer (any language)
    participant ENC as Encoder
    participant WIRE as Network Wire
    participant DEC as Decoder
    participant R as Reader (any language)

    W->>ENC: Series{BoolT, data: [0x01,0x00,0x01,0x01,...]}
    Note over ENC: byte-packed memory (1 byte/sample)
    ENC->>ENC: packBoolBits(src) → ceil(N/8) bytes (LSB-first)
    ENC->>ENC: length field = sampleCount (not byte count)
    ENC->>WIRE: flags | seqNum | sampleCount | packed_bits

    WIRE->>DEC: flags | seqNum | sampleCount | packed_bits
    DEC->>DEC: wireBytes = ceil(sampleCount/8)
    DEC->>DEC: unpackBoolBits(src, sampleCount) → [0x01,0x00,...]
    Note over DEC: byte-packed memory restored
    DEC->>R: Series{BoolT, data: [0x01,0x00,0x01,0x01,...]}
Loading

Comments Outside Diff (5)

  1. x/cpp/telem/series.h, line 1040-1055 (link)

    P1 BOOL_T missing from polymorphic at() dispatch

    SampleValue at(const int&) has no branch for BOOL_T, so any code that calls the type-erased accessor on a boolean Series — e.g., the Arc evaluator, control-flow tasks, or anything that iterates samples as SampleValue — will throw "unsupported data type for at: bool" at runtime.

    The fix is to add a uint8_t branch immediately before the throw:

  2. x/cpp/telem/series.h, line 1068-1096 (link)

    P1 operator<< silently emits "unknown data type" for BOOL_T

    BOOL_T is not covered in the chain of else if (dt == ...) branches inside operator<<. Printing any boolean Series — common in logging and debugging — produces Series(type: bool, size: N, cap: N, data: [unknown data type]) instead of the actual values. Add a branch before the final else:

  3. x/cpp/telem/series.h, line 1154-1179 (link)

    P1 write_casted throws on BOOL_T source type

    write_casted(const void*, size_t, const DataType&) has no branch for BOOL_T, so casting from a boolean source array throws "Unsupported data type for casting: bool". This is reachable whenever the Arc evaluator or any driver pipeline casts heterogeneous series containing boolean channels. Add a branch before the throw:

  4. x/cpp/telem/series.h, line 1637-1646 (link)

    P2 avg<T>() throws on BOOL_T

    avg() has no branch for BOOL_T and falls through to throw std::runtime_error("Unsupported data type for average: bool"). Computing the mean of a boolean array (fraction of true values) is a valid and common operation. Consider adding it alongside the UINT8_T branch:

  5. x/cpp/telem/series.h, line 928-939 (link)

    P2 write(const NumericType*, size_t) writes to buffer start instead of appending

    memcpy(this->data_.get(), d, …) always copies to offset 0 rather than data_.get() + size_ * density. This is a pre-existing bug in the non-bool path, but it becomes observable in this PR's codec decode path because s.write(unpacked.data(), local_data_len_or_byte_cap) happens to be called when size_=0, masking the bug. Any subsequent write call on the same freshly-decoded series would silently overwrite the first batch. This isn't triggered by current codec code, but it's worth noting as it could bite future callers.

Reviews (1): Last reviewed commit: "fix ts codec type errors and simplify bo..." | Re-trigger Greptile

…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
emilbon99 and others added 17 commits April 13, 2026 15:11
Widens CrudeSeries type alias to accept list[int] and list[TimeStamp]
(runtime already handles these) and adjusts tests to satisfy strict
mypy without type: ignores or Any annotations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eries' of https://github.com/synnaxlabs/synnax into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
variable.Writer.Write returned a post-write alignment while
fixed.Writer.Write returned pre-write, so writer_stream.go's
sampleCount = SampleIndex + series.Len() produced new+delta instead of
new for the variable branch. This corrupted resolveCommitEnd's Stamp
call whenever a writer committed a variable channel without the index
in its frame.

Align variable.Writer.Write with fixed.Writer.Write by deferring
scanOffsets until after the pre-write alignment is captured, and add a
regression test covering commits on an index-less variable writer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…0-support-for-persisted-variable-length-data-types-in-cesium

# Conflicts:
#	cesium/writer_stream.go
#	x/py/tests/test_series.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 80.83832% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.37%. Comparing base (1d488f3) to head (44a7836).

Files with missing lines Patch % Lines
x/go/telem/series_factory.go 42.85% 24 Missing ⚠️
core/pkg/distribution/framer/codec/codec.go 87.09% 2 Missing and 2 partials ⚠️
x/go/telem/series.go 0.00% 2 Missing ⚠️
x/ts/src/telem/series.ts 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                                          Coverage Diff                                           @@
##           sy-4060-support-for-persisted-variable-length-data-types-in-cesium    #2229      +/-   ##
======================================================================================================
+ Coverage                                                               62.95%   63.37%   +0.41%     
======================================================================================================
  Files                                                                    2100     2106       +6     
  Lines                                                                  105157   106361    +1204     
  Branches                                                                 8242     8320      +78     
======================================================================================================
+ Hits                                                                    66201    67402    +1201     
+ Misses                                                                  33075    33055      -20     
- Partials                                                                 5881     5904      +23     
Flag Coverage Δ
alamos-go 55.25% <ø> (ø)
alamos-ts 48.87% <ø> (ø)
arc-go 76.00% <ø> (+0.33%) ⬆️
aspen 67.54% <ø> (-1.18%) ⬇️
cesium 82.51% <ø> (+0.02%) ⬆️
client-py 85.96% <100.00%> (+0.04%) ⬆️
client-ts 90.28% <100.00%> (+0.10%) ⬆️
console 20.02% <ø> (ø)
core 69.56% <87.09%> (+0.03%) ⬆️
drift 39.05% <ø> (ø)
freighter-go 63.00% <ø> (+0.16%) ⬆️
freighter-integration 1.51% <ø> (ø)
freighter-py 79.96% <ø> (ø)
freighter-ts 73.87% <ø> (ø)
oracle 59.53% <ø> (-0.01%) ⬇️
pluto 55.16% <ø> (+0.05%) ⬆️
x-go 81.53% <44.68%> (+2.19%) ⬆️
x-ts 88.88% <88.88%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from sy-4060-support-for-persisted-variable-length-data-types-in-cesium to rc April 17, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant