Real-Time Processing with Spark streaming


Module M6

The “Real-Time Processing with Spark Streaming” training is designed to train developers and architects in real-time data processing with Spark Streaming technologies.

This course is aimed at computer training populations (developers) with a solid knowledge of Java and at ease with Java development tools such as Eclipse or IntelliJ, Maven etc.


Real-Time Processing with Spark Streaming

  • Hadoop ecosystem quick reminders
  • Quick reminders about Scala and Spark
  • The concepts of Spark Streaming : StreamingContext, DStreams
  • DStreams input with TP
  • Transformations on DStreams with TP
  • Output operations on DStreams with TP
  • The variables accumulators and broadcasts
  • Data Frames and SQL on these Data Frames with TP
  • MLLib operations on DStreams with TP
  • Checkpointing
  • Deploy, monitor and optimize its Spark Streaming application

Prerequisites : Module M1 & Strong Knowledge of Java and Associated Development Environments