QuerySurge Automated Data Warehouse Testing
Depends on MySQL for Pinpoint Accuracy

"MySQL's reliability and robustness is critical to QuerySurge. It provides an absolutely solid foundation."

Dr. Jeff Bocarsly,
Architect, QuerySurge

RTTS and QuerySurge Overview

RTTS is a leading provider of software quality and testing solutions for critical business applications. Since 1996, it has served over four hundred Fortune 500 and mid-sized companies worldwide, including Adobe, Pfizer, Ticketmaster, and ConEdison. A few years ago, many RTTS customers requested help with a common problem: they were relying heavily on data warehouse-driven analyses to make their most critical business decisions, but they didn't know if the underlying data was valid. This was in large part due to the data warehouse ETL process: data must first be Extracted from an array of data sources (the Internet, CRM systems, acquired companies to name a few), then Transformed into a common format, and Loaded into a data warehouse, and then into a data mart. Each ETL step creates opportunity for error, and once introduced, any defect is propagated downstream, making detection and resolution extremely difficult.

This was compounded by a lack of data warehouse-specific testing tools, which left most organizations to test their data manually. Most commonly, this requires anywhere from 2 to 10 members of a development team writing SQL Queries and manually checking data at different points in the ETL process over a period of five weeks. This is expensive and worse, it's ineffective: a typical organization tests less than 1% of their data1. This is cause for concern when, according to Gartner, poor data quality costs the average organization $8.2 million annually.

To address this need, RTTS created QuerySurge, the only testing tool built specifically to automate the testing of data warehouses and the ETL process. With the help of MySQL, QuerySurge ensures that the data extracted from sources remains intact by analyzing and pinpointing any differences in the data (down to a single character) between each leg of the ETL process. QuerySurge automates data warehouse test design, management, scheduling, execution, analysis and reporting. QuerySurge testing is up to 100 times faster than manual testing, and vastly more comprehensive, with the ability to test up to 100% of the data. In addition, QuerySurge is able to do all of this without any effect on data warehouse performance beyond running the test queries.

QuerySurge Architecture

Figure 1: QuerySurge Architecture (click image to enlarge)

The Database Challenge and The MySQL Solution

QuerySurge is able to perform its data comparisons so quickly and without impacting data warehouse performance by pulling all of the data - from the source files, databases, and the target data warehouse - back to QuerySurge's own database for processing. RTTS chose to rely on MySQL for this critical task from the very start.

"All of the things that we wanted in a database, MySQL does. MySQL is necessary for us to have a solution in this market." Dr. Jeff Bocarsly, QuerySurge Architect

The following characteristics were critical to the QuerySurge product team when determining which database to use and remain key MySQL benefits.


Database performance is key for QuerySurge. Jeff and his team were impressed with the MySQL published benchmarks, and their continued increases with each release. Their own performance testing confirmed that MySQL could meet their needs. Now MySQL regularly processes QuerySurge test result sets that can range from thousands of rows to literally tens of millions of rows, and from one to hundreds of columns.

Reliable and Robust

QuerySurge needs to find any mismatches in the data, even down to a single character. To do that, its database needs to be highly reliable regardless of task and or circumstance, and it must be ACID-compliant. As a data quality testing tool QuerySurge can never introduce defects through lost or corrupted data. Using MySQL, QuerySurge is able to regularly and reliably handle very large queries. And, using MySQL, their customers' data is kept intact. "MySQL's reliability and robustness is critical to QuerySurge. It provides an absolutely solid foundation," said Jeff.

Open Source

RTTS wanted to use an open source database to help control their and their customers' costs. They found that the other most commonly used open source databases were not built to handle the size data loads QuerySurge generates, and / or they were intended for different purposes. This alone made MySQL the right choice from the start.

Strong Company Backing

RTTS wanted to use an open source database, and they wanted the assurance strong company backing. MySQL's backing by Sun and then by Oracle gave them confidence in their choice. They especially valued Oracle's depth of database expertise.

Storage Engines Saves Disk and Money

RTTS has come to value MySQL's pluggable storage engine architecture and the efficiencies they've gained by using different storage engines for different tasks. QuerySurge deals with two broad classes of data: the first are small table sets that hold "volatile" data that is queried and updated often. The second are larger table sets that are not queried often but do need to be stored. QuerySurge uses InnoDB for the first type and the Archive storage engine for the second. Using Archive, they are able to store data in a compact array and achieve about 90% compression over InnoDB, which is important because QuerySurge generates very large queries that quickly fill disk space. Using the Archive engine reduces hardware requirements for QuerySurge customers and allows them to hold historical data much longer.

Summing Up

RTTS President Bill Hayduk summarized their experience with MySQL as follows: "We're sold on MySQL and we have no plans to go anywhere else. We've been happy with the MySQL Technology and how easy it's been to do business with Oracle MySQL. Our experience has been top notch thus far and I expect it to continue."

1 http://www.querysurge.com/resource-center/white-papers/enterprise-business-intelligence-data-warehousing-the-data-quality-conundrum