SY-3272: Add Boolean Data Type Channel Support#2229
Open
Conversation
…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
Widens CrudeSeries type alias to accept list[int] and list[TimeStamp] (runtime already handles these) and adjusts tests to satisfy strict mypy without type: ignores or Any annotations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eries' of https://github.com/synnaxlabs/synnax into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
variable.Writer.Write returned a post-write alignment while fixed.Writer.Write returned pre-write, so writer_stream.go's sampleCount = SampleIndex + series.Len() produced new+delta instead of new for the variable branch. This corrupted resolveCommitEnd's Stamp call whenever a writer committed a variable channel without the index in its frame. Align variable.Writer.Write with fixed.Writer.Write by deferring scanOffsets until after the pre-write alignment is captured, and add a regression test covering commits on an index-less variable writer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…0-support-for-persisted-variable-length-data-types-in-cesium # Conflicts: # cesium/writer_stream.go # x/py/tests/test_series.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## sy-4060-support-for-persisted-variable-length-data-types-in-cesium #2229 +/- ##
======================================================================================================
+ Coverage 62.95% 63.37% +0.41%
======================================================================================================
Files 2100 2106 +6
Lines 105157 106361 +1204
Branches 8242 8320 +78
======================================================================================================
+ Hits 66201 67402 +1201
+ Misses 33075 33055 -20
- Partials 5881 5904 +23
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Base automatically changed from
sy-4060-support-for-persisted-variable-length-data-types-in-cesium
to
rc
April 17, 2026 15:32
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue Pull Request
Linear Issue
SY-3272
Description
Adds a first-class
BoolTdata type across the full stack per RFC 0036.Three distinct representations per RFC 0036 §3.0:
{0x00, 0x01}. Thetelem.Seriesdensity invariant is preserved; iterators, writers, Cesium readers, and clientTypedArrayviews treat a bool sample identically to a uint8 sample.Serieson receive, dropping digital traffic 8x before any further compression.Write paths normalize any nonzero source byte to
0x01at the client boundary. Cesium storage required zero changes:BoolTfalls through the fixed-density path.What landed per language
BoolTinx/go/telem/data_type.go, extendedtypes.Sized+FixedSample,NewSeries[bool]/UnmarshalSeries[bool]/NewSeriesFromAnywith bool normalization, bit-packed frame codec incore/pkg/distribution/framer/codec/codec.goDataType.BOOLEANwithUint8Arraybacking,atBooleanaccessor,convertDataTypehandles BOOLEAN, bit-packed codec inclient/ts/src/framer/codec.tsDataType.BOOLmapped tonp.bool_,list[bool]andnp.bool_array inference,_FROM_NUMPY[np.bool_]flipped fromUINT8, bit-packed codec inclient/py/synnax/framer/codec.pyBOOL_Tindetails+ public namespace,bool → BOOL_TviaTYPE_INDEXES, cast normalization viastd::visit, bit-packed codec inclient/cpp/framer/codec.cppWire format
ceil(N / 8)bytes LSB-first within each byte, with the existing per-seriessample_countheader telling the decoder how many bits to recover. Reference vector[1,0,1,1,0,0,0,1,1]encodes to[0x8D, 0x01], pinned in Go (codec_internal_test.go) and Python (test_frame_codec.py).Tests
test_channel.py,test_frame_writer.py) and TypeScript (channel.spec.ts,writer.spec.ts,streamer.spec.ts) verify create + write + read + stream through a live server.Out of scope (per RFC §5)
Uint8T)Each warrants its own RFC.
Basic Readiness
Greptile Summary
This PR adds first-class
bool/BOOL_T/DataType.BOOLEANsupport across Go, TypeScript, Python, and C++. The codec design is well thought out: byte-packed in memory (0x00/0x01), bit-packed LSB-first on the wire (ceil(N/8)bytes), and byte-packed on disk, with theequalLenscompression flag correctly using sample count throughout all four codec implementations.The Go, TypeScript, and Python layers look correct. The C++
telem.h/series.hlayer is incomplete:Series::at(const int&)returningSampleValue,operator<<, andwrite_casted(void*, size_t, DataType)all throw at runtime forBOOL_T, andavg<T>()also throws instead of summing uint8 values.Confidence Score: 3/5
Safe to merge if C++ is not yet used in production paths that call Series::at(SampleValue), operator<<, or write_casted on boolean series; the three P1 gaps in series.h will throw at runtime when hit.
The Go server codec, TypeScript client, and Python client implementations are all correct and consistent. The C++ series.h polymorphic API has three runtime-throwing gaps for BOOL_T (at(), operator<<, write_casted) that must be fixed before boolean channels are used through any C++ code path that touches those methods.
x/cpp/telem/series.h — the polymorphic SampleValue dispatch methods (at, operator<<, write_casted, avg) all need BOOL_T branches added.
Important Files Changed
Sequence Diagram
sequenceDiagram participant W as Writer (any language) participant ENC as Encoder participant WIRE as Network Wire participant DEC as Decoder participant R as Reader (any language) W->>ENC: Series{BoolT, data: [0x01,0x00,0x01,0x01,...]} Note over ENC: byte-packed memory (1 byte/sample) ENC->>ENC: packBoolBits(src) → ceil(N/8) bytes (LSB-first) ENC->>ENC: length field = sampleCount (not byte count) ENC->>WIRE: flags | seqNum | sampleCount | packed_bits WIRE->>DEC: flags | seqNum | sampleCount | packed_bits DEC->>DEC: wireBytes = ceil(sampleCount/8) DEC->>DEC: unpackBoolBits(src, sampleCount) → [0x01,0x00,...] Note over DEC: byte-packed memory restored DEC->>R: Series{BoolT, data: [0x01,0x00,0x01,0x01,...]}Comments Outside Diff (5)
x/cpp/telem/series.h, line 1040-1055 (link)BOOL_Tmissing from polymorphicat()dispatchSampleValue at(const int&)has no branch forBOOL_T, so any code that calls the type-erased accessor on a booleanSeries— e.g., the Arc evaluator, control-flow tasks, or anything that iterates samples asSampleValue— will throw"unsupported data type for at: bool"at runtime.The fix is to add a
uint8_tbranch immediately before the throw:x/cpp/telem/series.h, line 1068-1096 (link)operator<<silently emits "unknown data type" forBOOL_TBOOL_Tis not covered in the chain ofelse if (dt == ...)branches insideoperator<<. Printing any booleanSeries— common in logging and debugging — producesSeries(type: bool, size: N, cap: N, data: [unknown data type])instead of the actual values. Add a branch before the finalelse:x/cpp/telem/series.h, line 1154-1179 (link)write_castedthrows onBOOL_Tsource typewrite_casted(const void*, size_t, const DataType&)has no branch forBOOL_T, so casting from a boolean source array throws"Unsupported data type for casting: bool". This is reachable whenever the Arc evaluator or any driver pipeline casts heterogeneous series containing boolean channels. Add a branch before the throw:x/cpp/telem/series.h, line 1637-1646 (link)avg<T>()throws onBOOL_Tavg()has no branch forBOOL_Tand falls through tothrow std::runtime_error("Unsupported data type for average: bool"). Computing the mean of a boolean array (fraction oftruevalues) is a valid and common operation. Consider adding it alongside theUINT8_Tbranch:x/cpp/telem/series.h, line 928-939 (link)write(const NumericType*, size_t)writes to buffer start instead of appendingmemcpy(this->data_.get(), d, …)always copies to offset 0 rather thandata_.get() + size_ * density. This is a pre-existing bug in the non-bool path, but it becomes observable in this PR's codec decode path becauses.write(unpacked.data(), local_data_len_or_byte_cap)happens to be called whensize_=0, masking the bug. Any subsequentwritecall on the same freshly-decoded series would silently overwrite the first batch. This isn't triggered by current codec code, but it's worth noting as it could bite future callers.Reviews (1): Last reviewed commit: "fix ts codec type errors and simplify bo..." | Re-trigger Greptile