TY - GEN

T1 - On the optimal time/space tradeoff for hash tables

AU - Bender, Michael A.

AU - Farach-Colton, Martín

AU - Kuszmaul, John

AU - Kuszmaul, William

AU - Liu, Mingmou

N1 - Funding Information: This research was supported in part by NSF grants CSR-1938180, CCF-2106999, CCF-2118620, CCF-2118832, CCF-2106827, CCF-1725543, CSR-1763680, CCF-1716252 and CNS-1938709. Kuszmaul is funded by an NSF GRFP fellowship and a Fannie and John Hertz Fellowship. Mingmou is currently supported by Singapore Ministry of Education (AcRF) Tier 1 grant RG75/21, and was previously supported by Singapore Ministry of Education (AcRF) Tier 2 grant MOE2018-T2-1-013. Funding Information: This research was also partially sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. Publisher Copyright: © 2022 ACM.

PY - 2022/9/6

Y1 - 2022/9/6

N2 - For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are (logn) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the information-theoretic optimum-this bound has been proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(loglogn) wasted bits per key is not the end of the line for hashing. In fact, for any k [log∗ n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and the k-th iterated logarithm O(log(k) n) wasted bits per key (all with high probability in n), while also supporting dynamic resizing as the size of the table changes. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct. Our result holds for arbitrarily large keys/values, and in the case where keys/values are very small, we can tighten our bounds to o(1) wasted bits per key. Building on this, we obtain a constant-time dynamic filter that uses n glog"-1 g‰+ n loge + o(n) bits of space for a wide choice of false-positive rates ", resolving a long-standing open problem for the design of dynamic filters.

AB - For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are (logn) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the information-theoretic optimum-this bound has been proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(loglogn) wasted bits per key is not the end of the line for hashing. In fact, for any k [log∗ n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and the k-th iterated logarithm O(log(k) n) wasted bits per key (all with high probability in n), while also supporting dynamic resizing as the size of the table changes. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct. Our result holds for arbitrarily large keys/values, and in the case where keys/values are very small, we can tighten our bounds to o(1) wasted bits per key. Building on this, we obtain a constant-time dynamic filter that uses n glog"-1 g‰+ n loge + o(n) bits of space for a wide choice of false-positive rates ", resolving a long-standing open problem for the design of dynamic filters.

KW - balls and bins

KW - data structures

KW - hash tables

KW - succinct

UR - http://www.scopus.com/inward/record.url?scp=85130832160&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85130832160&partnerID=8YFLogxK

U2 - https://doi.org/10.1145/3519935.3519969

DO - https://doi.org/10.1145/3519935.3519969

M3 - Conference contribution

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 1284

EP - 1297

BT - STOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing

A2 - Leonardi, Stefano

A2 - Gupta, Anupam

PB - Association for Computing Machinery

T2 - 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022

Y2 - 20 June 2022 through 24 June 2022

ER -