Distributed Distinct Elements

Emin Karayel 📧

April 3, 2023

This is a development version of this entry. It might change over time and is not stable. Please refer to release versions for citations.

Abstract

This entry formalizes a randomized cardinality estimation data structure with asymptotically optimal space usage. It is inspired by the streaming algorithm presented by Błasiok in 2018. His work closed the gap between the best-known lower bound and upper bound after a long line of research started by Flajolet and Martin in 1984 and was to first to apply expander graphs (in addition to hash families) to the problem. The formalized algorithm has two improvements compared to the algorithm by Błasiok. It supports operation in parallel mode, and it relies on a simpler pseudo-random construction avoiding the use of code based extractors.

License

BSD License

Topics

Related publications

  • Karayel, E. (2023). An Embarrassingly Parallel Optimal-Space Cardinality Estimation Algorithm. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.APPROX/RANDOM.2023.35
  • Karayel, E. (2023). An embarrassingly parallel optimal-space cardinality estimation algorithm (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2307.00985

Session Distributed_Distinct_Elements