ARTICLEantithesis.com8 min read

From Skiplists to Skiptree: Solving Data Challenges with Innovative Structures

By Will Wilson CEO

From Skiplists to Skiptree: Solving Data Challenges with Innovative Structures

AI Summary

I initially dismissed skiplists as a niche data structure with limited relevance to my work, until a problem at Antithesis revealed their potential. Skiplists are randomized structures that serve as alternatives to binary search trees, offering simpler concurrent implementations and faster search times through hierarchical linked lists. Each node in a skiplist has a 50% chance of being promoted to a higher level, allowing for O(log n) search time compared to O(n) in regular linked lists.

At Antithesis, we faced challenges with our software testing process, which involved running multiple simulations to identify bugs. This created a branching tree of timelines, each representing a sequence of fuzzer decisions. We needed efficient ways to query these timelines, but our analytic database, Google BigQuery, was slow at point lookups, which were necessary for tracing event histories.

The solution was a novel structure we called a 'skiptree', combining the hierarchical nature of skiplists with tree structures. A skiptree consists of multiple levels of trees, each with about half the nodes of the level below, forming a series of interconnected skiplists. This allowed us to store data in SQL tables, using columns to track ancestor nodes and intermediate nodes, enabling efficient ancestor retrieval with a fixed number of JOINs.

Despite the complexity of the SQL queries, which we automated with a JavaScript compiler, this approach was cost-effective under BigQuery's pricing model. It allowed us to perform tree-shaped queries efficiently until we developed our own analytic database.

Interestingly, skiptree is similar to skip graphs, a distributed data structure based on skiplists, highlighting that innovative ideas often have precedents. While traditional trees outperform skiplists, the latter's simplicity and adequate performance make them valuable, especially in SQL implementations.

Key Concepts

Skiplists

Skiplists are randomized data structures that serve as alternatives to binary search trees. They consist of multiple levels of linked lists, where each node has a probabilistic chance of being promoted to a higher level, allowing for efficient search operations.

Skiptree

A skiptree is an innovative data structure that combines the hierarchical nature of skiplists with tree structures. It involves multiple levels of trees, each with fewer nodes than the level below, forming interconnected skiplists.

Category

Programming
M

Summarized by Mente

Save any article, video, or tweet. AI summarizes it, finds connections, and creates your to-do list.

Start free, no credit card