Course code Title Language Price # Unit Startdate Hour Enddate Location Signup
BD003 Big Data track: Big Data and data science on Big Data English €850.00 2 Day(s) 06-03-2018 09u00 07-03-2018 Kontich Subscribe
BD003 Big Data track: Big Data and data science on Big Data Dutch €850.00 2 Day(s) 26-11-2018 09u00 27-11-2018 Kontich Subscribe
BD003 Big Data track: Big Data and data science on Big Data on your request on your request Contact Us

Big Data track: Big Data and data science on Big Data

Big Data track: Big Data and data science on Big Data

Overview

Course code: 
BD003
Duration: 
2
Time Unit: 
Day(s)
Overview: 

This track consists of two courses which are also offered seperatly.
As a course it gives the necessary insights in Apache Hadoop, hands on knowledge to get started working with Apache Hive and Pig and use cases of data science in Big Data
The theoretical background on Big Data and Hadoop is explained from the ground up as well as the data science process – requiring no existing knowledge on these topics. The math behind creating models is left out as much as possible (if interested, we refer to the course “The math behind data science”)

Since Hadoop consists of a complete ecosystem of interrelated tools, this course gives a clear insight in the most important ones, and the situations in which they tend to fit.

Practically, this course gives a head start on development with the most important Hadoop tools:

- MapReduce: the low-level layer (Java API) for parallel processing in Apache Hadoop
- Hive: the SQL-like querying infrastructure for semi-structured data
- Pig: the tool and the Pig Latin scripting language for data processing
- Sqoop: the SQL-to-Hadoop tool for migrating data to and from an Hadoop environment
- Mahout: a machine learning library on top of Hadoop
- Spark MLlib: a machine learning library on top of Spark

Furthermore this course explains the entire data science process and what it can do for you.

Learning objectives
- Explain the nature of Big Data systems and the difference with typical IT systems
- Understand the concepts behind Big Data, semi-structured data and Hadoop
- Understand the MapReduce pattern and how it can be applied to solve problem in a parallelized processing setup
- Grasp the role of each tool in the Apache Hadoop ecosystem, and the situations in which they best fit
- Process data with Apache Hive and Pig
- Use Apache Sqoop to migrate back and forth between relational (SQL) databases and Apache HDFS Build data processing pipelines with Hive, Pig and MapReduce
- Explain the data science process and the difference with typical IT development cycles
- Get an idea of what machine learning can do for you
- Understand the difference of data science and machine learning in general and in Big Data
- Use Apache Mahout to create recommendations or prediction models
- Use Spark MLlib to create recommendations or prediction models

Topics

Topics: 

For a more complete topic overview, we refer to the topic pages of the individual courses: “Big Data: introduction to Hadoop & hands on with Hive and Pig” and “Data Science and machine learning in Big Data”

Day 1:

CHAPTER 1: Introduction to Big Data
CHAPTER 2: Introduction to Hadoop
CHAPTER 3: HDFS – The Hadoop Distributed File System
CHAPTER 4: Hive
CHAPTER 5: Pig
CHAPTER 6: MapReduce
CHAPTER 7: Sqoop: SQL-to-Hadoop

Day 2:

CHAPTER 1: Introduction to data science
CHAPTER 2: Introduction to machine learning
CHAPTER 3: Introduction to Big Data
CHAPTER 4: The data science process: pre-modeling
CHAPTER 5: The data science process: prediction models
CHAPTER 6: The data science process: Advise
CHAPTER 7: The data science process: the entire cycle

Prerequisites

Prerequisites: 

- Java programming skills (JDK, Eclipse, Maven) is useful to get hands on with Hadoop, the rest of the course however does not require any knowledge of Java programming skills
- Some experience with working with the Linux command line
- Some knowledge of SQL syntax is very useful, but not strictly required
- Some basic understanding of prediction models as well is useful, but not strictly required

Audience

Audience: 

This course is aimed towards developers seeking insight in Big Data concepts and more general the entire data science process.
It is also aimed towards data scientists/BI personnel willing understand Big Data and build their prediction models with Big Data tools.