Data Analysis at Scale in the Cloud
Data Analysis at Scale in the Cloud
Course taught at Duke MIDS, Spring 2020-2022 by Noah Gift.
- This is the course syllabus.
- These are the projects in the course
- This the week by week calendar
- This is the rubric for grading assignments
- This is the grading for the course
- This is the FAQ
- A complete online book with screencast videos is available here.
- Coursera Course, Building Cloud Computing Solutions at Scale Specialization, can be found here: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale
π Pragmatic AI Labs | Join 1M+ ML Engineers
π₯ Hot Course Offers:
- π€ Master GenAI Engineering - Build Production AI Systems
- π¦ Learn Professional Rust - Industry-Grade Development
- π AWS AI & Analytics - Scale Your ML in Cloud
- β‘ Production GenAI on AWS - Deploy at Enterprise Scale
- π οΈ Rust DevOps Mastery - Automate Everything
π Level Up Your Career:
- πΌ Production ML Program - Complete MLOps & Cloud Mastery
- π― Start Learning Now - Fast-Track Your ML Career
- π’ Trusted by Fortune 500 Teams
Learn end-to-end ML engineering from industry veterans at PAIML.COM
Prequel Material
These resources could be helpful before starting this course.
Duke/Coursera: Foundations of Data Engineering Course (Launching early 2022)
Course1: Python and Pandas for Data Engineering
Course2: Linux and Bash for Data Engineering
Github Repos for Projects in Course
Week1: Using Linux
Week2: Using Bash
- Lesson 1: Create and Use .bashrc
- Lesson 2: Sourcing shell variables from a script
- Lesson3: Using stdout and stdin
Week3: Building Bash Scripts
- Lesson 1: Build a for loop in Bash
- Lesson 2: Truncate large files with Bash
- Lesson 3: Building a command-line tool for data processing
- Lesson 4: Build Bash CLI with options
Week4: Composing File and Data Management Solutions with Linux
- Lesson 1: Understand the search commands
- Lesson 2: Setting permissions
- Lesson 3: Using regex to process text from file
- Lesson 4: Search the filesystem with find
Course3: Python and SQL for Data Engineering
Course4: Building Data Engineering Solutions with Python for Web Applications, Command-Line Tools and Notebooks
Sequel Material
These resources could be helpful after starting this course.
Duke/Coursera: Applied Data Engineering Course (Launching late 2022)
Github Repos Referenced Duke Coursera Course
Course 1: Cloud Computing Foundations
- Practice Markdown
- Github Actions-Pytest
- Google App Engine Continuous Delivery
- Hello World Flask
- Hugo Continuous Delivery on AWS
Course 2: Cloud Computing Building Blocks
- Lint Dockerfile
- [Flask Change Microservice]
Lecture Topics:
Getting Started: [Week1]
Cloud Computing Foundations: [Week2]
Virtualization and Containers: [Week3 & Week 4]
Challenges and Opportunities in Distributed Computing: [Week 5 & Week 6]
Cloud Storage [Week 7 & Week 8]
Serverless [Week 9 & Week 10]
MLOps, Big Data and Edge Computer Vision [Week 11 & Week 12 & Week 13]
General
Student Example Projects
A practical guide to Data Science, Machine Learning Engineering and Data Engineering
Read Cloud Computing for Data Book
Free book Developing-on-AWS-with-CSharp
Next Steps: Take Coursera MLOps Course
- Take the Specialization
- Cloud Computing Foundations
- Cloud Virtualization, Containers and APIs
- Cloud Data Engineering
- Cloud Machine Learning Engineering and MLOps
Text and Code License
The text and code content of notebooks and documents is released under the CC-BY-NC-ND license