Fingerprinting-based minimal perfect hashing revisited

ACM Journal of Experimental Algorithmics, May 2023, doi: 10.1145/3596453

In the paper we study a fingerprint-based minimal perfect hash function (FMPH for short). We propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements.

Abstrakt

In the paper we study a fingerprint-based minimal perfect hash function (FMPH for short). While FMPH is not as space-efficient as some other minimal perfect hash functions (for example RecSplit, CHD, or PTHash), it has a number of practical advantages that make it worthy of consideration. FMPH is simple and quite fast to evaluate. Its construction requires very little auxiliary memory, takes a short time and, in addition, can be parallelized or carried out without holding keys in memory.

In this paper, we propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements. In addition, we experimentally study FMPHGO performance and find the best values for its parameters. Our benchmarks show that with our method and an efficient structure to support the rank queries on a bit vector, the FMPH size can be reduced to about 2.1 bits/key, which is close to the size achieved by state-of-the-art methods and noticeably larger only compared to RecSplit. FMPHGO preserves most of the FMPH advantages mentioned above, but significantly reduces its construction speed. However, FMPHGO’s construction speed is still competitive with methods of similar space efficiency (like CHD or PTHash), and seems to be good enough for practical applications.

Bibtex

@article{10.1145/3596453,
author = {Beling, Piotr},
title = {Fingerprinting-Based Minimal Perfect Hashing Revisited},
year = {2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1084-6654},
url = {https://doi.org/10.1145/3596453},
doi = {10.1145/3596453},
abstract = {In the paper we study a fingerprint-based minimal perfect hash function (FMPH for short). While FMPH is not as space-efficient as some other minimal perfect hash functions (for example RecSplit, CHD, or PTHash), it has a number of practical advantages that make it worthy of consideration. FMPH is simple and quite fast to evaluate. Its construction requires very little auxiliary memory, takes a short time and, in addition, can be parallelized or carried out without holding keys in memory. In this paper, we propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements. In addition, we experimentally study FMPHGO performance and find the best values for its parameters. Our benchmarks show that with our method and an efficient structure to support the rank queries on a bit vector, the FMPH size can be reduced to about 2.1 bits/key, which is close to the size achieved by state-of-the-art methods and noticeably larger only compared to RecSplit. FMPHGO preserves most of the FMPH advantages mentioned above, but significantly reduces its construction speed. However, FMPHGO’s construction speed is still competitive with methods of similar space efficiency (like CHD or PTHash), and seems to be good enough for practical applications.},
note = {Just Accepted},
journal = {ACM Journal of Experimental Algorithmics},
month = {may},
keywords = {minimal perfect hashing, data structures, algorithms}
}