Why Apache Spark is must have skill for Data Engineering?

Overview

Understanding Spark Architecture

Resilient Distributed Datasets (RDD)

Directed Acyclic Graph (DAG)

Spark Driver

Cluster Manager

Spark Executor

Spark Task

Spark Modules

SQL Module

Streaming

MLlib

GraphX

So Why Apache Spark?

Speed

Ease of Use

Generality

Runs Everywhere

Final Thoughts

Software Engineer with 15+ years experience (Interested in Cloud Computing, Kubernetes, Docker, Serverless Computing, BlockChain Technologies)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store