In DataFusion 53, the CoalesceBatchesExec node was deprecated, the alternative is to just let each node batch things at will.
We still rely on this node for
|
pub(crate) fn batch_coalescing_below_network_boundaries( |
, which adds some coalescing right below network boundaries so that we send bigger batches over the wire.
Removing this should show some performance improvements, as batch coalescing relies on data copies for creating the concatenated batches.
I'm not sure what should be the alternative to our batch_coalescing_below_network_boundaries though, I'm not sure if we can just rely on people to increase the vanilla datafusion.execution.batch_size setting, as queries that don't end up getting distributed might suffer some degradation if that's widely set.
In DataFusion 53, the
CoalesceBatchesExecnode was deprecated, the alternative is to just let each node batch things at will.We still rely on this node for
datafusion-distributed/src/distributed_planner/batch_coalescing_below_network_boundaries.rs
Line 13 in 5020a4b
Removing this should show some performance improvements, as batch coalescing relies on data copies for creating the concatenated batches.
I'm not sure what should be the alternative to our
batch_coalescing_below_network_boundariesthough, I'm not sure if we can just rely on people to increase the vanilladatafusion.execution.batch_sizesetting, as queries that don't end up getting distributed might suffer some degradation if that's widely set.