White Paper

White Paper

In-depth technical papers exploring the design, implementation, and evaluation of TabbyDB’s engine-level enhancements to Apache Spark.

Each paper focuses on a specific class of performance or scalability challenges observed in complex analytical workloads.

Constraint Propagation Optimization

Eliminating Constraints Explosion:

This paper introduces a redesigned constraint propagation algorithm that addresses the permutational constraints explosion observed in stock Apache Spark. By tracking aliases, canonicalizing expressions, and avoiding redundant inference, the proposed approach significantly reduces query compilation time and memory usage— with guaranteed, identical or better optimized plan.

Capping Query Plan Size

Preventing Query Plan Explosion During Analysis

This paper describes an approach to collapsing projection nodes during the analysis phase, preventing uncapped query plan growth in workloads built using iterative DataFrame APIs or deeply layered views. The technique preserves correctness, cache compatibility and improved cache lookup efficiency while dramatically reducing compilation overhead.

Runtime File Pruning Using Broadcasted Keys

Improving Join Performance on Non-Partitioned Columns

This paper explores a runtime performance enhancement that leverages broadcast hash join keys as dynamic runtime filters. The approach enables more effective file-level and row-group pruning for joins on non-partitioned columns, extending the benefits of dynamic pruning beyond traditional partition-based strategies.

TPC-DS Benchmark Evaluation

Methodology, Configuration, and Observations

This paper provides a transparent breakdown of TPC-DS benchmark results, including execution timelines, configuration details, and experimental setup. It also discusses the limitations of TPC-DS in representing compile-time behavior and complex real-world Spark workloads.

Please choose the Demo below!

Please select your desired option here.