Advanced Big Data Processing
This course discusses advanced approaches and tools for big data processing. The course starts with describing popular big data frameworks with focus on Hadoop and Spark, HDFS, YARN, and MapReduce. The use of Pig, Hive, and Impala to work on data stored in HDFS is subsequently presented. Data ingestion with Sqoop and Flume, and real-time parallel processing with functional programming in Spark are investigated along with advanced optimization strategies. Security issues in big data and managing big data streams are also discussed.