22
Jul 2011
AUTHORMike Benner
COMMENTSNo Comments

OSCON Data 2011

I am going to be up in Portland for the OSCON Data tracks this upcoming week. Are you going?


OSCON Data 2011

22
Mar 2011
AUTHORMike Benner
COMMENTS2 Comments

Zlib and Mongo Binary

Last Wednesday night I whipped together a prototype of an application to test some architectural changes and make delivering large amounts of data to a client easier for both of us.  The end result was to be a centralized datastore for all of our apps that could be accessed via a very simple API.

Testing went well and then the flood gates opened.  Several hundred thousand requests in the opening hour had generated almost 60GB of data in Mongo.  While every aspect of the system functioned better than was expected (especially for a late night prototype) the amount of data being generated so quickly was alarming.

When trying to implement the zlib compression the night before Mongo threw fits and I did not have time to deal with it.  But now was the time.  Trying to find the answer to the errors I was getting was tough.  The app itself uses the Mongoid ORM for Rails and the background workers use the Mongo Driver for Ruby.  The deflate happens on the background workers and the inflate in the Rails app.

Let’s start with the easy change, Mongoid: field :large_data, :type => binary

That simply tells Mongoid to expect a binary object and insert as such.

The more difficult issue I ran into was actually doing the insert with the Mongo Driver in the Ruby scripts.  I simply wasn’t looking for the right thing.  What I needed to do was convert my zlib binary into a new BSON Binary to be stored in Mongo.

BSON::Binary.new(Zlib::Deflate.deflate(raw_data))

Then simply call the standard Mongo Insert command.

The last piece to this was back on the Rails App, I needed to inflate and return this data.  The caveat that I found was needing to turn the BSON Binary to a string before trying to inflate it.

Zlib::Inflate.inflate(large_data.to_s)

The end result was a system that was still keeping up with demand, but was now putting away far less data.  My results were a 50K JSON string down to 12.5K and a 300K HTML file down to 72K.

Hope this helps anyone looking to squeeze a little more out of their storage solution.