Dave Meier | DIPr Lab at PSU

Dave Meier | DIPr Lab at PSUhttps://diprlab.github.io/author/dave-meier/Dave MeierHugo Blox Builder (https://hugoblox.com)en-usFri, 06 Feb 2026 00:00:00 +0000https://diprlab.github.io/author/dave-meier/avatar_hu_cc555bc0cfc03ffa.pngDave Meierhttps://diprlab.github.io/author/dave-meier/Winter 2026 Week 5https://diprlab.github.io/dbrg/events/2026/winter/05/Fri, 06 Feb 2026 00:00:00 +0000https://diprlab.github.io/dbrg/events/2026/winter/05/<table> <tr> <td>Title</td> <td> I Can’t Believe It’s Not Yannakakis: Pragmatic Bitmap Filters in Microsoft SQL Server </td> </tr> <tr> <td>Authors</td> <td> Hangdong Zhao et al. </td> </tr> <tr> <td>Abstract</td> <td> The quest for optimal join processing has reignited interest in the Yannakakis algorithm, as researchers seek to realize its theoretical ideal in practice via bitmap filters instead of expensive semijoins. While this academic pursuit may seem distant from industrial practice, our investigation into production databases led to a startling discovery: over the last decade, Microsoft SQL Server has built an infrastructure for bitmap pre-filtering that subsumes the very spirit of Yannakakis! This is not a story of academia leading industry; but rather of industry practice, guided by pragmatic optimization, outpacing academic endeavors. This paper dissects this discovery. As a crucial contribution, we prove how SQL Server’s bitmap filters, pull-based execution, and Cascades optimizer conspire to not only consider, but often generate, instance-optimal plans, when it truly minimizes the estimated cost! Moreover, its rich plan search space reveals novel, largely overlooked pre-filtering opportunities on intermediate results, which approach strong semi-robust runtime for arbitrary join graphs. Instead of a verdict, this paper is an invitation: by exposing a system design that is long-hidden, we point our community towards a challenging yet promising research terrain. </td> </tr> </table>Fall 2025 Week 7https://diprlab.github.io/dbrg/events/2025/fall/07/Wed, 12 Nov 2025 00:00:00 +0000https://diprlab.github.io/dbrg/events/2025/fall/07/<table> <tr> <td> Title </td> <td> Scribe: How Meta transports terabytes per second in real time </td> </tr> <tr> <td> Authors </td> <td> Manos Karpathiotakis, et al. </td> </tr> <tr> <td> Abstract </td> <td> Millions of web servers and a multitude of applications are producing ever-increasing amounts of data in real time at Meta. Regardless of how data is generated and how it is processed, there is a need for infrastructure that can accommodate the transport of arbitrarily large data streams from their generation location to their processing location with low latency. <br /> <br /> This paper presents Scribe, a multi-tenant message queue service that natively supports the requirements of Meta’s data-intensive applications, ingesting > 15 TB/s and serving > 110 TB/s to its consumers. Scribe relies on a multi-hop write path and opportunistic data placement to maximise write availability, whereas its read path adapts replica placement and representation based on the incoming workload as a means to minimise resource consumption for both Scribe and its downstreams. The wide range of Scribe use cases can pick from a range of offered guarantees, based on the trade-offs favourable for each one. </td> </tr> </table>Summer 2025 Week 1https://diprlab.github.io/dbrg/events/2025/summer/01/Wed, 09 Jul 2025 00:00:00 +0000https://diprlab.github.io/dbrg/events/2025/summer/01/<table> <tr> <td> Title </td> <td> Streaming Democratized: Ease Across the Latency Spectrum with Delayed View Semantics and Snowflake Dynamic Tables </td> </tr> <tr> <td> Authors </td> <td> Daniel Sotolongo, Daniel Mills, Tyler Akidau, Anirudh Santhiar, Attila-Péter Tóth, Botong Huang, Boyuan Zhang, Igor Belianski, Ling Geng, Matt Uhlar, Nikhil Shah, Olivia Zhou, Saras Nowak, Sasha Lionheart, Vlad Lifliand, Wendy Grus, Yiwen Zhu, Ankur Sharma, Dzmitry Pauliukevich, Enrico Sartorello, Ilaria Battiston, Ivan Kalev, Lawrence Benson, Leon Papke, Niklas Semmler, Till Merker, Yi Huang </td> </tr> <tr> <td> Abstract </td> <td> Streaming data pipelines remain challenging and expensive to build and maintain, despite significant advancements in stronger consistency, event time semantics, and SQL support over the last decade. Persistent obstacles continue to hinder usability, such as the need for manual incrementalization, semantic discrepancies across SQL implementations, and the lack of enterprise-grade operational features (e.g. granular access control, disaster recovery). While the rise of incremental view maintenance (IVM) as a way to integrate streaming with databases has been a huge step forward, transaction isolation in the presence of IVM remains underspecified, which leaves the maintenance of application-level invariants as a painful exercise for the user. Meanwhile, most streaming systems optimize for latencies of 100 milliseconds to 3 seconds, whereas many practical use cases are well-served by latencies ranging from seconds to tens of minutes. <p>In this paper, we present delayed view semantics (DVS), a conceptual foundation that bridges the semantic gap between streaming and databases, and introduce Dynamic Tables, Snowflake’s declarative streaming transformation primitive designed to democratize analytical stream processing. DVS formalizes the intuition that stream processing is primarily a technique to eagerly compute derived results asynchronously, while also addressing the need to reason about the resulting system end to end. Dynamic Tables then offer two key advantages: ease of use through DVS, enterprise-grade features, and simplicity; as well as scalable cost efficiency via IVM with an architecture designed for diverse latency requirements. We first develop extensions to transaction isolation that permit the preservation of invariants in streaming applications. We then detail the implementation challenges of Dynamic Tables and our experience operating it at scale. Finally, we share insights into user adoption and discuss our vision for the future of stream processing.</p> </td> </tr> </table>Spring 2025 Week 4https://diprlab.github.io/dbrg/events/2025/spring/04/Fri, 25 Apr 2025 00:00:00 +0000https://diprlab.github.io/dbrg/events/2025/spring/04/<table> <tr> <td> Title </td> <td> How good are query optimizers, really? </td> </tr> <tr> <td> Authors </td> <td> Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann </td> </tr> <tr> <td> Abstract </td> <td> Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the sub-optimal cardinality estimates. </td> </tr> </table>