{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Lesson4-AWSML-Modeling.ipynb",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": [
"c_Id55m6Jsbu",
"bq4VmHjPpMOR",
"SDAq-AnzLNHW",
"seXc9BlVLZpz",
"mXxBpaedOfvv",
"HWCc2hTbOf2p",
"WMmgxYymPCP3",
"UzOX1NXvVajB",
"j_gRca1vByuL",
"jhSFhnKtDawl",
"g2DKkOCHDaz4",
"1hrzi0nhDa3V",
"fCPtd7bADa6X",
"OMNcsGAwyyM7",
"R0to0Hr3yyQE",
"3XzlNBqRyyWj",
"kD9MppHKy5Ma",
"JFK7Q6MBy9Yj",
"s2HmQ3Q0zAW4",
"VdKJTV38igkg",
"jsvkrxtAjdoT",
"HB0gINZRkPTP",
"pxmnqaDlz5O6",
"4LyfP2iVMSHz",
"STZg1LvYzIy-"
],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"metadata": {
"id": "iwV-bBLUtTQu",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Lesson 4 Machine Learning Modeling on AWS\n",
"\n",
"[Watch Lesson 4: Machine Learning Modeling on AWS Video](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597/9780135556597-ACML_01_04_00)"
]
},
{
"metadata": {
"id": "c_Id55m6Jsbu",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Pragmatic AI Labs\n",
"\n"
]
},
{
"metadata": {
"id": "e5p96AqpSDZa",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"\n",
"This notebook was produced by [Pragmatic AI Labs](https://paiml.com/). You can continue learning about these topics by:\n",
"\n",
"* Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863863) from Informit.\n",
"* Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.amazon.com/Pragmatic-AI-Introduction-Cloud-Based-Learning/dp/0134863860) from Amazon\n",
"* Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)\n",
"* Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.\n",
"* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)\n",
"* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)\n",
"* Viewing more content at [noahgift.com](https://noahgift.com/)\n"
]
},
{
"metadata": {
"id": "bq4VmHjPpMOR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Load AWS API Keys"
]
},
{
"metadata": {
"id": "aWrzIk7WpRoh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Put keys in local or remote GDrive: \n",
"\n",
"`cp ~/.aws/credentials /Users/myname/Google\\ Drive/awsml/`"
]
},
{
"metadata": {
"id": "hPWO_zyRopXN",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Mount GDrive\n"
]
},
{
"metadata": {
"id": "XI73HZNLobp4",
"colab_type": "code",
"outputId": "66f597a9-6c94-4ffd-b323-21869a4eb988",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"from google.colab import drive\n",
"drive.mount('/content/gdrive', force_remount=True)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Mounted at /content/gdrive\n"
],
"name": "stdout"
}
]
},
{
"metadata": {
"id": "UNyzZwgmoxwm",
"colab_type": "code",
"outputId": "80a98691-3926-4666-82c7-f27f606fbfaf",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"import os;os.listdir(\"/content/gdrive/My Drive/awsml\")"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['kaggle.json', 'credentials', 'config']"
]
},
"metadata": {
"tags": []
},
"execution_count": 27
}
]
},
{
"metadata": {
"id": "fYu0ekUlqPk6",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Install Boto"
]
},
{
"metadata": {
"id": "dJDDrUkWrYRY",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"!pip -q install boto3\n"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "FpJhrpSQsK5E",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create API Config"
]
},
{
"metadata": {
"id": "QxRwGOZtsN0-",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"!mkdir -p ~/.aws &&\\\n",
" cp /content/gdrive/My\\ Drive/awsml/credentials ~/.aws/credentials "
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Kj977UW3rph_",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Test Comprehend API Call"
]
},
{
"metadata": {
"id": "P-A8Cia-raT0",
"colab_type": "code",
"outputId": "4290a9a8-667f-4b06-edf3-aa4c2ae121f2",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 238
}
},
"cell_type": "code",
"source": [
"import boto3\n",
"comprehend = boto3.client(service_name='comprehend', region_name=\"us-east-1\")\n",
"text = \"There is smoke in San Francisco\"\n",
"comprehend.detect_sentiment(Text=text, LanguageCode='en')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',\n",
" 'content-length': '160',\n",
" 'content-type': 'application/x-amz-json-1.1',\n",
" 'date': 'Thu, 22 Nov 2018 00:21:54 GMT',\n",
" 'x-amzn-requestid': '9d69a0a9-edec-11e8-8560-532dc7aa62ea'},\n",
" 'HTTPStatusCode': 200,\n",
" 'RequestId': '9d69a0a9-edec-11e8-8560-532dc7aa62ea',\n",
" 'RetryAttempts': 0},\n",
" 'Sentiment': 'NEUTRAL',\n",
" 'SentimentScore': {'Mixed': 0.008628507144749165,\n",
" 'Negative': 0.1037612184882164,\n",
" 'Neutral': 0.8582549691200256,\n",
" 'Positive': 0.0293553676456213}}"
]
},
"metadata": {
"tags": []
},
"execution_count": 50
}
]
},
{
"metadata": {
"id": "EBv9LzI9xnrF",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 4.1 AWS ML Systems Overview"
]
},
{
"metadata": {
"id": "Qu_UeD92AEPB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Machine Learning at AWS"
]
},
{
"metadata": {
"id": "xCsczQCqAJBR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"* [AWS ML Services Video](https://www.aws.training/learningobject/video?id=16207&trk=gs_card)\n",
"* [AWS Machine Learning Overview Video](https://www.aws.training/learningobject/video?id=16363&trk=gs_card)\n",
"\n",
"* Thousands of Engineers working on Machine Learning at AWS\n"
]
},
{
"metadata": {
"id": "SDAq-AnzLNHW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### What is Machine Learning?"
]
},
{
"metadata": {
"id": "6t1mMbl8LQsz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"\n",
"* Subset of AI\n",
"* Deep Learning is subset of Machine Learning\n",
"\n"
]
},
{
"metadata": {
"id": "seXc9BlVLZpz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### What can Machine Learning Do?\n",
"\n"
]
},
{
"metadata": {
"id": "iVsOZlTDOeAM",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Make predictions\n",
"* Optimize utility functions\n",
"* Extract hidden data structures\n",
"* Classify data"
]
},
{
"metadata": {
"id": "mXxBpaedOfvv",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Machine Learning Use Cases"
]
},
{
"metadata": {
"id": "HWCc2hTbOf2p",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"##### Fraud Detection\n",
"\n",
" "
]
},
{
"metadata": {
"id": "yI4kPKU1PpkG",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
" - Mine data\n",
" - Identify hidden patterns and create labels (unsupervised learning)\n",
" - Train model\n",
" - Flag transaction as fraudulent\n",
" "
]
},
{
"metadata": {
"id": "WMmgxYymPCP3",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"##### Content Personalization\n"
]
},
{
"metadata": {
"id": "1QQntewZPoIT",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
" - Use predictive analytics to recommend items (recommendation systems)"
]
},
{
"metadata": {
"id": "L27fzNNqPU78",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"##### Target Marketing"
]
},
{
"metadata": {
"id": "nR0j74UtPYcR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Use custom activity to choose revelant email campaigns\n",
"* cross-selling\n",
"* upselling"
]
},
{
"metadata": {
"id": "cmOGo30EPzyI",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"##### Categorization"
]
},
{
"metadata": {
"id": "1tc01wYZP10m",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Matching hiring managers and resumes\n",
"* Unstructured content --> Machine Learning Model --> Categorized documents"
]
},
{
"metadata": {
"colab_type": "text",
"id": "ZcEl2FAfQMUv"
},
"cell_type": "markdown",
"source": [
"##### Customer Service"
]
},
{
"metadata": {
"colab_type": "text",
"id": "esZyYtFCQMUx"
},
"cell_type": "markdown",
"source": [
"* Analyze social media traffic to route customers to customer support\n",
"* Predictive routing of customer emails"
]
},
{
"metadata": {
"colab_type": "text",
"id": "L81LNGGpQth4"
},
"cell_type": "markdown",
"source": [
"#### Machine Learning Concepts"
]
},
{
"metadata": {
"id": "Cl6ejyv-lA-Z",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
""
]
},
{
"metadata": {
"id": "UDx9LcAERBGV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"**Combination of Methods and Systems**"
]
},
{
"metadata": {
"colab_type": "text",
"id": "cRWNOJ_JQth6"
},
"cell_type": "markdown",
"source": [
"* **Predict**\n",
" - Predict new data based on observed data\n",
"* **Extract**\n",
" - Extract hidden structure from data\n",
"* **Summarize**\n",
" - Summarize data into executive brief\n",
"* **Optimize**\n",
" - Optimize an action, given a cost function and observed data\n",
"* **Adapt**\n",
" - Adapt based on observed data"
]
},
{
"metadata": {
"id": "UzOX1NXvVajB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Machine Learning Types"
]
},
{
"metadata": {
"id": "egpHjCJbVa4f",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
""
]
},
{
"metadata": {
"id": "kEvX7IQEVkX_",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Supervised Learning\n",
" - Labels are known for inputs and outputs\n",
" - Model \"learns\" to generalize from data (records)\n",
" - Human needs to tell model which outputs are correct (to train model)\n",
" - Input --> Model --> Output/Prediction\n",
" - Classification (Category like \"red\", \"blue\") and Regression (Numerical i.e. Housing price)\n",
" \n",
"* Unsupervised Learning\n",
" - Labels are not known\n",
" - Lables \"hidden structures\" are discovered\n",
" - Self-organization\n",
" - Clustering\n",
" \n",
"* Reinforcement Learning\n",
" - Model learns by interacting with environment\n",
" - Learns to take action to maximize total reward\n",
" - Inspired by behavioral biology\n",
" \n",
" \n",
" "
]
},
{
"metadata": {
"id": "j_gRca1vByuL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Frameworks & Infrastructure"
]
},
{
"metadata": {
"id": "WiYj4WIOD0dr",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* All Major frameworks\n",
" - Apache MXNet\n",
" - Caffe and Caffe 2\n",
" - Tensorflow"
]
},
{
"metadata": {
"id": "jhSFhnKtDawl",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Machine learning platforms"
]
},
{
"metadata": {
"id": "TMgKoKi7o32f",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Fully managed platform for building models with own data\n",
" - Apache Spark on EMR\n",
" - SparkML (25 PetaByte models, Zillow and Netflix) \n",
" - Sagemaker\n",
" - Amazon Machine Learning"
]
},
{
"metadata": {
"id": "g2DKkOCHDaz4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### API-driven services"
]
},
{
"metadata": {
"id": "1IPzubP6o4hQ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Examples include:\n",
"\n",
"\n",
"\n",
"* Comprehend (NLP)\n",
"* Rekognition (Computer Vision)\n",
"\n"
]
},
{
"metadata": {
"id": "1hrzi0nhDa3V",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Models"
]
},
{
"metadata": {
"id": "p3QgVrxPo5Hl",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Can be stored in S3\n",
"* Can be A/B tested"
]
},
{
"metadata": {
"id": "fCPtd7bADa6X",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Optimized Instances & Machine Images"
]
},
{
"metadata": {
"id": "mS5BmQ0JDa9-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* Deep Learning AMIs\n",
"* Containers"
]
},
{
"metadata": {
"id": "7kXTB2KgyuVD",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 4.2 Feature Engineering\n"
]
},
{
"metadata": {
"id": "itASk_VSuAaC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Feature Engineering Overview"
]
},
{
"metadata": {
"id": "ZkLYBaBUuFVY",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"\n",
"* Feature is an attribute used in a prediction model\n",
"* Feature engineering is the creation and curation of these attributes\n",
"\n"
]
},
{
"metadata": {
"id": "0gpFkMsDo-nC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Explore Features"
]
},
{
"metadata": {
"id": "5zSCcP27pGtm",
"colab_type": "code",
"outputId": "e2cb8cd1-ba21-43ca-b78e-4898d2ed91c0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 253
}
},
"cell_type": "code",
"source": [
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"df = pd.read_csv(\"https://raw.githubusercontent.com/noahgift/aws-ml-guide/master/data/banking.csv\")\n",
"df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n", " | age | \n", "job | \n", "marital | \n", "education | \n", "default | \n", "housing | \n", "loan | \n", "contact | \n", "month | \n", "day_of_week | \n", "... | \n", "campaign | \n", "pdays | \n", "previous | \n", "poutcome | \n", "emp_var_rate | \n", "cons_price_idx | \n", "cons_conf_idx | \n", "euribor3m | \n", "nr_employed | \n", "y | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "44 | \n", "blue-collar | \n", "married | \n", "basic.4y | \n", "unknown | \n", "yes | \n", "no | \n", "cellular | \n", "aug | \n", "thu | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.4 | \n", "93.444 | \n", "-36.1 | \n", "4.963 | \n", "5228.1 | \n", "0 | \n", "
1 | \n", "53 | \n", "technician | \n", "married | \n", "unknown | \n", "no | \n", "no | \n", "no | \n", "cellular | \n", "nov | \n", "fri | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "-0.1 | \n", "93.200 | \n", "-42.0 | \n", "4.021 | \n", "5195.8 | \n", "0 | \n", "
2 | \n", "28 | \n", "management | \n", "single | \n", "university.degree | \n", "no | \n", "yes | \n", "no | \n", "cellular | \n", "jun | \n", "thu | \n", "... | \n", "3 | \n", "6 | \n", "2 | \n", "success | \n", "-1.7 | \n", "94.055 | \n", "-39.8 | \n", "0.729 | \n", "4991.6 | \n", "1 | \n", "
3 | \n", "39 | \n", "services | \n", "married | \n", "high.school | \n", "no | \n", "no | \n", "no | \n", "cellular | \n", "apr | \n", "fri | \n", "... | \n", "2 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "-1.8 | \n", "93.075 | \n", "-47.1 | \n", "1.405 | \n", "5099.1 | \n", "0 | \n", "
4 | \n", "55 | \n", "retired | \n", "married | \n", "basic.4y | \n", "no | \n", "yes | \n", "no | \n", "cellular | \n", "aug | \n", "fri | \n", "... | \n", "1 | \n", "3 | \n", "1 | \n", "success | \n", "-2.9 | \n", "92.201 | \n", "-31.4 | \n", "0.869 | \n", "5076.2 | \n", "1 | \n", "
5 rows × 21 columns
\n", "