Big Data Analytics

Course Objective

 To provide a strong foundation in Big Data analytics in order to handle huge and complex data bases 

 Learning Outcomes

 Upon completion of this course, students will be able to

Work on big data platforms like Hadoop

Use parallel processing techniques to analyse huge and complex data bases

Use the most powerful and sophisticated routines in R and other platforms for big data

 Detailed Syllabus

1. Introduction:  Features of big data, Challenges with big data, Examples of big data, Opportunities and advantages with big data analytics

 2. Analytical Platforms for Big data: Introduction to Hadoop and Sub Projects: HDFS, MapReduce, Hive, Pig, HBase, and Avro.  MapReduce-ish implementations: Disco, Misco, Phoenix, Cloud MapReduce, bashreduce, Qizmt, HTTPMR, Galago’s TupleFlow, Skynet,  Sphere, Riak, Starfish, Octopy, MPI-MR, Filemap, Plasma , apReduce, Mapredus,  Mincemeat, MapReduceTitan, GPMR, Elastic Phoenix, Preregrine, R3

 3. Big Data tools in R: Programming with Big Data in R (pbdR), Discussion on MPI,  ScaLAPACK,  NetCDF4 and others

 4. More on Hadoop Environment: Understanding Hadoop features, Learning the HDFS and MapReduce architecture, Writing Hadoop MapReduce Programs

 5. Integrating R and Hadoop: Discussions on RHIPE, RHadoop, pbdR

 6. Importing and Exporting Data from Various DBs Discussion on MySQL, Excel, MongoDB, SQLite, PostgreSQL, Hive, HBase

 7. Data Analytics with R and Hadoop: Understanding the data analytics project life cycle, Understanding data analytics problems, Big Data Analysis with machine Learning

 8.Hadoop Streaming with R, Basics of Hadoop streaming,  Running Hadoop streaming with R, Exploring the Hadoop Streaming R packages