coldr3ality/ICEPack

GitHub: coldr3ality/ICEPack

ICEPack 是一种基于反转循环 RLE 编码的压缩真值向量数据结构,融合散列与数组的访问优势,为大规模分布式环境下的无冲突会话 ID 生成与高效稀疏键空间操作提供轻量级解决方案。

Stars: 0 | Forks: 0

ICEPack 是一个可实例化的对象类,用于一种我将其定义为“压缩真值向量”的数据结构。在我着手开发这种数据结构时,我试图实现的启发式概念是:将散列和数组的访问模式结合起来,同时避免数据库的复杂性或开销,从而达到甚至超越现代计算效率的标准。元素可以通过散列中的稀疏键进行寻址,也可以像数组那样通过有序索引进行寻址。 该架构具有三个抽象层:首先是自定义的二进制编码,我称之为“反转循环 RLE”,它是用于交替布尔值的 RLE 的一个最小化版本。第二个抽象层是在这些 IC-RLE 段的有序数组之上实现的一个相当基础的二分搜索;第三个抽象层是对第二层数据数组进行分层/层状回归量化,将命名空间内容逐步降级为越来越量化的缩减形式,并将各自的模值存储在每个组合单元键释放出的分配空间中。当结构发生变化时,这些模值可以由 setter 进行原子更新,并且可以通过 getter 高效地求和,以便按需计算稀疏键的排序顺序。我想将其称为动态枚举。 在某种程度上,它允许将已定义和未定义的命名空间视为同一个常规序列的两个维度。它启用了一种类似于 Perl 范围运算符的访问模式(即通过指定起始值和结束值来确定一个序列),但现在这些值可以是稀疏键,并且它们可以高效地从已定义或未定义的稀疏键命名空间中进行选择。这使得大规模随机洗牌变得切实可行。实际上,其预期应用是大规模分布式会话 ID 随机化,在这种场景中,通过真正的命名空间守恒来防止冲突(而不仅仅是依赖天文数字般的极低概率),并且不需要引入对核心网络在边缘服务器之间保持同步的特殊需求或要求。 我最初在 2020 年使用 Perl 开发了一个完整的概念验证。从那以后,我开始学习 C 语言,并对规范和架构进行了妥善的处理,以实现一个企业级的实现版本。 ``` ICE encoding is for inside-out UUID tables. It has O(1) access time to lowest / highest / nearest existing / nonexisting keys. ICEPack is a Perl/XS implementation of ICE, providing hash-like access and wire-ready compression on hyperbolic time scales. ICEPack::REG implements a regressive exponent gradient, providing dynamic range enumeration over logarighmic time scales. So, to reiterate: > Trivial access to lowest / highest / nearest sparse index in O(1) time— an obvious strength for dynamic ID tables > Hash-like sparsity with array-like sorting effectively works like a range operator for key spaces > Basically redefines the Perl idiom "Everything Is A Number" ICE is a QWORD-sized compressed truth vector which uses an original variant of RLE encoding— Inversion Cycle RLE. IC-RLE compresses repeating values into run lengths (like RLE), but stores no explicit values— only implicit boolean truth. Since RLE stores only the first occurrence of a repeating value, and boolean values can only be one of two, value is implicit— so essentially, IC-RLE representation is an explicit series of run length pairs which implicitly store alternating true-false values. To promote the integrity of the encoding across mutations, ICE compresses pairs of true-false run lengths in single entries, and mediates computational complexity to access and mutate these entries with opportunistic [de]fragmentation. Encoded data is stored as a series of semi-regular chunks (16 to 144 bytes in length) which are sorted into a searchable AV* array. > Computational complexity plots as a roughly hyperbolic asymptote given the worst case highly entropic data. > Batch processing affects significant improvement in mutation time complexity when leveraged by the application. > Variety of accessor methods enable manipulation by range, mask, sorted list and scalar arguments, as well as recombination. > In-memory data blocks are an easy packet payload to stream over TCP with no fragmentation and minimal layer-4 overhead. Any sparse array compression technique which omits nulls makes the obvious but unfortunate tradeoff of gaining space while sacrificing the implicit identity of each element by its index— often the single most characteristically useful property of arrays. This is where ICEPack::RELIC comes in— to implement efficient non-sparse sort order computation. For example: let's say you wish to implement a random number generator that is non-deterministic, yet also non-repeating, and you wish to use this to exzate Session IDs in a massively distributed cloud server application. You would have your choice of entropy sources as usual, but instead of piping this directly into a Session ID generator, you use it to choose the "nth" free ID in an ICEPack::RELiC instance, which trivially guards against colissions; in order to make replication across a server farm more efficient, you can allow servers to preexzate large random sets of IDs, periodically throwing them back into the pool and drawing a new set. In this way, edge servers can still set service-wide Session ID assignments on an event-driven basis, with no core negotiation needed, but IDs are still guaranteed collission-free. Not only does this free us to rate the appropriate namespace depth precisely, but it also frees us to implement Perfect Forward Secrecy— to renew the Session-ID upon each and every response. As a security enthusiast, I must bore you with words of caution. While this data structure is designed to be directly amenable to wire synchronization across edge servers, this concept is not widely used. ```
标签:ICEPack, Perl, 二分查找, 会话ID随机化, 内存优化, 分布式系统, 动态数组, 动态枚举, 压缩算法, 原子更新, 去中心化, 命名空间管理, 哈希表, 响应大小分析, 客户端加密, 布尔值编码, 开源库, 搜索引擎爬虫, 数据序列化, 数据结构, 无状态架构, 游程编码(RLE), 稀疏键值, 算法设计, 边缘计算, 量化索引, 键值存储, 防碰撞, 高性能计算