Facebook, Data Durability, and hacking HBase

I found myself catching up on what’s been happening on the other side of the fence in the HPC Distributed computing world and in particular the Hardoop stack. Boots on the ground implementations of distributed computing are where theory meets the harsh reality of customer demands, network latency and commodity hardware.

That’s why I found this article recently written by Ars Technica about Facebook’s choice of using HBase over MySQL to be fascinating.

The debate around using Hardoop, and in particular the distributed file system HDFS and HBase,  vs. existing file system and SQL databases has been heating up in the past year. This is an interesting fight. Microsoft says SQL Server Datacenter is just fine, and Dryad uses it along with it’s distributed file system Cosmos. Oracle says they’ve been doing parallel dataset analysis and storage since 2001 using SQL with no problem. Meanwhile NoSQL is gaining traction and people are wondering if you need object relational mapping solutions at all.

SQL based HPC solutions give you reliability but fall down when it comes to high throughput and distributed processing, especially in long running queries. Non SQL database solutions are fast and very distributed but the cost of that distributed nature is data loss. Very smart people have struggled with how to best reduce that data loss in massively distributed systems, but most proposals still has potential data loss at levels most cloud companies would find too dangerous. This was a big topic of conversation at the IEEE International Symposium on High Performance Distributed Computing, and Abdullah Gharaibeh & Matei Ripeanu’s paper on this, Exploring data reliability tradeoffs in replicated storage systems, covers a lot of ground.

Facebook’s own distributed platform, Cassandra, has this problem as well according to the article:

For many, it’s surprising that Facebook didn’t put the messaging system on Cassandra, which was originally developed at the company and has since become hugely popular across the interwebs. But Ranganathan and others felt that it was too difficult to reconcile Cassandra’s “eventual consistency model” with the messaging system setup.

“If you’re going to front the data with some sort of a cache…then it becomes very difficult for use to program a cache where the database heals from underneath,” Ranganathan says.

This is an odd statement, since all massively distributed database solutions have this problem, including HBase, which Facebook picked over Cassandra.

Then I read the following statement:

But HBase did require modification. Facebook engineer Nicolas Spiegelberg and another company developer spent a year adding commits to the platform, mainly in an effort to minimize data loss. “The goal was zero data loss,” says Spiegelberg. Committers updated HDFS Sync support, added some ACID properties, and even redesigned the HBase master.

What exactly is Facebook doing here? Usually the best way to scale a distributed database in a massive way is to minimize data loss, not prevent it. It’ll be interesting to see how this new platform works out for them.