本文为摘录,原文为: attachments/pdf/a/p1632-vaidya.pdf
1 ABSTRACT
SNARF: Sparse Numerical Array-Based Range Filters 基于稀疏数组的范围过滤器
用于数值类型的范围过滤
2 INTRODUCTION
- Filters
are space efficient but appropriate
Answer membership queries on a set S
Point Filters
- eg BloomFilter
- Support point queries: “Is x in the set S ?”
Range Filters (可用作范围过滤 where + 单表)
- for range query:
“Is there a key in the set S in between values
p
&q
?”
- for range query:
“Is there a key in the set S in between values
允许误报
2.1 Range Filters
- range filters can significantly improve the performance of systems for synthetic and real-world workloads
- e.g RocksDB, SQL Server
3 SNARF: A LEARNED FILTER
3.1 SNARF Description
3.1.1 SNARF Construction:
给定一组 keys \[S=\{x_1, x_2, …,x_n\}\] , 构建一个可以进行范围查询的过滤器
SNARF 通过单调函数 \[f\] 将 keys 映射进一个 bit array B:
- B 中有 \[|B| = K \times n\] 个比特
初始化时候,数组中所有元素均为 0
将 B 中的某些位置置位: \[f(x_i) = 1\] , 其中: \[x_i \in S\]
隐射函数为: \[f(x) = \lfloor MCDF(x) \times nK\rfloor \]
MCDF
为 S 中 key 的 CDF 经验值的单调估计
SNARF 使用压缩:提升空间效率: