Big Data fundamentals


Module M1

The “Big Data Fundamentals” course aims to train architects and IS developers (DSI, BI), project managers and technical managers with Big Data approaches and technologies that can be used. It brings the trainee into a world of open source technologies and gives a comprehensible and credible definition of Big Data.

It is intended for technical training populations (computer scientists, mathematicians, physicists, economists or any other field) who have had at least one development experience in any programming language.

With very few pre-requisites, it is the ideal training to tackle Big Data with ease and show the enormous power.


Program

  • What is Big Data?
  • The two fundamental components of a Big Data foundation
  • Topology of Hadoop clusters, choice of distributions and hardware
  • Big Data Tools in the Big Data Hadoop and Beyond Ecosystem
  • The notion of distributed file system
  • The different models of task parallelization (MPP, MPI, Map Reduce)
  • Technology supporting models within Hadoop (Yarn, MapReduce2, Storm, Spark)
  • The notion of Data Lake and Lambda architectures
  • Ingestion and data discovery (Flume, Nifi, Sqoop, Hive / SQL Spark, Elastic Search / SolR + Kibana etc.)
  • NoSQL databases (Redis, HBase, Cassandra, MongoDB, Couchbase, Neo4J etc.).
  • Data science – the most used algorithms – Spark MLLib library
  • Visualization of large volumes of data