The TabbyDB Story

Re-engineering APACHE Spark for complex, production-scale workloads.

The Impact!

TabbyDB is a performance-focused fork of Apache Spark designed to address real-world challenges in query compilation, optimizer behavior, and large-scale analytical execution. It is built for teams running complex SQL and DataFrame workloads where planning time, memory behavior, and correctness are critical.

Why TabbyDB Was Created

Story of Efficient Performance

TabbyDB was born from repeated exposure to a specific class of Spark challenges: workloads where query compilation, not execution, became the dominant bottleneck. In production environments involving programmatically generated SQL, deeply layered views, and iterative DataFrame transformations, query plans can grow rapidly in size and complexity. In these scenarios, Spark may spend extended periods in query planning, sometimes long before execution begins.


Rather than treating these issues as configuration problems, TabbyDB approaches them as engine-level challenges. The goal was not to tune around limitations, but to address root causes in the optimizer and analysis phases while preserving Spark’s API compatibility and execution model.
TabbyDB represents a focused effort to improve Spark behavior in complex analytical environments without requiring disruptive architectural changes.

Building Efficiency One Query At A Time

Engineering Philosophy

Fix Root Causes, Not Symptoms

Performance issues in complex Spark workloads are often addressed through configuration tuning, rule disabling, or infrastructure scaling. TabbyDB focuses instead on identifying underlying algorithmic inefficiencies in planning and optimization, and improving them at the source.

Measure, Don’t Market

All enhancements are evaluated using reproducible workloads, standardized benchmarks, and real-world analytic patterns. Improvements are characterized transparently, distinguishing between benchmark results and observed behavior in production-scale queries.

Preserve Compatibility

TabbyDB maintains compatibility with Apache Spark APIs and execution semantics. Existing SQL, DataFrame code, and tooling continue to operate without modification. Enhancements are implemented within the engine while preserving expected behavior for users and applications.

Vision & Contact

Complex analytical workloads continue to grow in depth and scale. As data pipelines become increasingly programmatic and query plans more dynamic, engine-level efficiency becomes essential.
TabbyDB’s long-term vision is to advance Spark’s ability to handle deeply nested, large-scale workloads predictably and efficiently—while remaining aligned with the broader Spark ecosystem.

Connect with us

If you are exploring performance improvements, encountering optimizer-related bottlenecks, or evaluating Spark behavior under complex workloads, we welcome the conversation. Connect TabbyDB Team at asif.shahid@kwikquery.com

Please choose the Demo below!

Please select your desired option here.