boo-box Serves 1 Billion Advertisements per Month
with MySQL and Hadoop

"MySQL is a core part of our big data strategy. Simple integration with Hadoop enables us to improve our digital advertising service and grow our business with maximum speed and agility."

Josafá Santos,
IT Manager, boo-box


boo-box relies on MySQL and Hadoop to display 1 billion advertisements to 60 million people across 430,000 web sites and social network profiles every month.

boo-box Overview

boo-box is one of the largest advertising networks in South America, with a focus on the Brazilian social media market. boo-box connects content publishers offering space on their web sites, blogs and social media properties with advertisers who want to target marketing campaigns to specific audiences, in order to reach and engage new customers.

To successfully monetize publishers' content and maximize advertisers' campaign ROI (Return on Investment), boo-box's MySQL and Hadoop technology stack must enable the precise segmentation of audiences based on interest, demographics and behavior, serving relevant advertisements in the right format, to the right user, on the right device, at the right time.

The Early Days

From their founding in 2007, boo-box has deployed MySQL to log user activity, including pages views and click-through rates, and then run analytics against the data to enable campaign reporting and targeting.

MySQL was selected due to its simplicity and flexibility. Expecting to grow quickly, boo-box demanded scalability, high availability and low cost from their database. The developers at boo-box were familiar with MySQL, enabling them to rapidly build and expand their business without trying to tame new and unproven technologies.

As traffic volumes grew, so boo-box deployed MySQL replication to scale their performance by dedicating MySQL source servers to event logging and offloading analytics to the MySQL replicas.

The Rise of Big Data

As the business grew, coupled with the explosion in Internet connectivity and users in South America, boo-box augmented their data management infrastructure with Hadoop in June 2011.

As the diagram below shows, user activity is logged to MySQL and then extracted, sent and loaded as .csv files with bash scripts to Amazon Web Services (AWS) S3. Apache Pig then co-ordinates Map Reduce jobs running on AWS EMR (Elastic Map Reduce) with result sets loaded back to MySQL BI and Statistics databases, and to Google's Big Query service.


The result sets than can be queried to place precisely targeted advertisements, in real time, to publisher's web sites and social media properties.

Performance Metrics

boo-box's behavioral targeting system uses complex algorithms to count, select and display the most relevant advertisements, based on a visitor's profile. The low latency query performance of MySQL is critical with boo-box designing their platform to deliver targeted advertisements in less than 250 ms (milliseconds) - a response time that includes network round-trips and processing.

boo-box is using MySQL with the nginx web server and Lua to manage the advertisement placement and "link shortener" service, supporting 20,000 transactions per second using only one server. Ruby and Python are also used with MySQL for different services within boo-box's advertising platform.

2 TB of raw web logs are captured per month, with 22 billion rows processed by MySQL.

The BI database powered by MySQL currently stores 8TB of data, growing 5GB per day, and the Statistics database manages 1TB of data hosted by MySQL.

The Future

boo-box have built their data management infrastructure on MySQL and are now exploring the latest developments to further scale their business.

The new MySQL Applier for Hadoop will enable boo-box to load data natively, in real-time as events happen, from MySQL to HDFS.

MySQL 5.6 and the new NoSQL Memcached API for InnoDB will improve performance, especially for high volume data ingest.

MySQL Cluster is being evaluated for those workloads that demand the highest scalability, lowest latency and most demanding uptime requirements.