14 Oct 2008 nconway   » (Master)

Public Talk on Facebook Hive

The Berkeley DB group is hosting a talk about Facebook Hive this Thursday, at Soda Hall in UC Berkeley. Details and the abstract are below -- it should be an interesting talk! I'd encourage anyone in the area to attend -- if you need directions / parking suggestions / etc., just drop me a line.

Thursday, October 16th, 2008
606 Soda Hall, UC Berkeley
10-11am

Title: Hive: Data Warehousing using Hadoop

Abstract: Hive is an open-source data warehousing infrastructure built on top of Hadoop that allows SQL like queries along with abilities to add custom transformation scripts in different stages of data processing. It includes language constructs to import data from various sources, support for object oriented data types and a metadata repository that structures hadoop directories into relational tables and partitions with typed columns. Facebook uses this system for variety of tasks - classic log aggregation, graph mining, text analysis and indexing.

In this talk we will give an overview of the Hive system, the data model, query language compilation and execution and the metadata store. We will also discuss our near term roadmap and avenues for significant contributions in terms of query optimization, execution speed and data compression amongst others. We will also present some statistics on usage within Facebook and outline some of the challenges in operating Hive/Hadoop in a utility computing model in fast growing environment.

Bio: Joydeep Sensarma has been working in the Facebook Data Team for the last 1+ year where he's taken turns coding up Hive, keeping Hadoop running, eating and sleeping in that order. He's really glad he no longer works on closed source file and database systems like he did for the last ten years.

Zheng Shao has worked in Facebook Data Team on Hadoop and Hive for about 6 months. Before that he worked in the Yahoo web search team which heavily uses Hadoop.

Namit Jain has been working in the Facebook Data team with Hive for about 6 months. Before that he was in the database and application server groups at Oracle for about 10 years.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!