0.8.2 Nov 14, 2016: New HLL-based UniqueCountMap sketch; Bug Fixes
New HLL-based UniqueCountMap sketch
This is a totally new sketch in the HLL package that addresses real-time unique counting of identifiers associated with millions of Keys. Please refer to the javadocs for the UniqueCountMap sketch class in the sketches/hll package.
Fixed SerDe Compatibility With Shaded, Reallocated sketches-core.jar for Pig and Hive
The previous scheme of creating a hash ID from the SerDe class names to prevent accidental deserialization with the wrong SerDe class was fragile and failed when the core classes were shaded and reallocated for the Pig and Hive jars. In our attempt to protect the user from themselves, we had inadvertently created a worse problem.
This has now been fixed, but to do that we had to abandon the SerDe ID concept entirely. This fix is now backward compatible will all earlier releases of SerDe classes.
Upgraded Reservoir Sampling to allow for full integer precision values of K.
This allows the sketch size specification, K, for the Reservoir sketches to have full integer precision. This is also backward compatible with the earlier specification.