Monday, June 15, 2009

hadoop summit 2009

Last week was the second hadoop summit. Being local and really interested in the technologies I decided to visit. I have mixed thoughts about the whole event that I will try to lay down in this post.
First of all, I was pleasantly surprised by the participants and the problems they try to tackle using map reduce related technologies. It's amazing to listen about how every different company from the smallest ones to the big ones ( Yahoo / Amazon ) are using these technologies to process data and extract analytics.
The talks were really informative for someone who is dealing with this stuff on a day to day basis. I got to learn how twitter / linkedin / facebook are trying to extract more data from their users and what are their plans for future expansion. I was amazed by the fact that many of these companies aren't having the huge datasets or big analytics groups that one would think they have. I think it was linkedin claiming to have 9PB ( that's just 9000 GB ) of data in their hadoop cluster. That's less than any one of my typical job submits...Other companies had a handful of engineers working in problems with great breadth. This was a great surprise for me, as I thought that every company dealing with map reduce problems has multi thousand nodes clusters. Apparently, there are companies dealing with mapreduce problems that can be solved in a hundreds and even tens of nodes.
An unexpected surprise was amazon's talk. These people are really amazing in terms of engineering and they are addressing the long tail of the mapreduce problem really effectively. These companies needing a couple of tens of nodes in a regular or adhoc basis are way better off renting infrastructure from amazon than having their own infrastructure, ops people and all the costs associated with that. I was really interested in how they build their infrastructure and especially with all the security implications that sharing a node with other users has...Maybe in the next hadoop summit I will ask them more about their security plan ;):)
All in all, my overall impression about the summit was really positive. I learned a whole ton of stuff and I feel that I can understand better how do different organizations use map reduce frameworks for fun and profit.
I will definitely follow up with a post about the scale camp the night before the summit which was really interesting as well...

No comments:

Post a Comment