Apache Hadoop is a Java framework for large-scale distributed batch processing infrastructure which runs on commodity hardware. The biggest advantage is the ability to scale to hundreds or thousands of computers. Hadoop is designed to efficiently distribute and handle large amounts of work across a set of machines.
This talk will introduce Hadoop along with MapReduce and HDFS. It will discuss the possible scenarios where Hadoop fits as a robust solution and will include a case study from a project, where Hadoop is used for bulk inserts and large-scale data analytics.
- What is Hadoop?
- Why Hadoop?
- What is MapReduce?
- HDFS Architecture Overview
- Demo with a use case from a real project scenario.
- Who is on Hadoop ?
This demo driven presentation will help audience to see the power of Hadoop when it comes to processing terabytes of data on commodity hardware.