Open In Colab

Lesson 7: Case Studies

Watch Lesson 7: Case Studies Video

Pragmatic AI Labs

alt text

This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:

Load AWS API Keys

Put keys in local or remote GDrive:

cp ~/.aws/credentials /Users/myname/Google\ Drive/awsml/

Mount GDrive

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
Mounted at /content/gdrive

import os;os.listdir("/content/gdrive/My Drive/awsml")
['kaggle.json', 'credentials', 'config']

Install Boto

!pip -q install boto3

Create API Config

!mkdir -p ~/.aws &&\
  cp /content/gdrive/My\ Drive/awsml/credentials ~/.aws/credentials 

Test Comprehend API Call

import boto3
comprehend = boto3.client(service_name='comprehend', region_name="us-east-1")
text = "There is smoke in San Francisco"
comprehend.detect_sentiment(Text=text, LanguageCode='en')
{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '160',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 14 Dec 2018 16:11:55 GMT',
   'x-amzn-requestid': 'fa00db36-ffba-11e8-882b-8bc33ca9084d'},
  'HTTPStatusCode': 200,
  'RequestId': 'fa00db36-ffba-11e8-882b-8bc33ca9084d',
  'RetryAttempts': 0},
 'Sentiment': 'NEUTRAL',
 'SentimentScore': {'Mixed': 0.008628507144749165,
  'Negative': 0.1037612184882164,
  'Neutral': 0.8582549691200256,
  'Positive': 0.0293553676456213}}

7.1 Sagemaker Features

Manage Machine Learning Experiments with Search

  • Finding training jobs
  • Rank training jobs
  • Tracing lineage of a model

Ground Truth

ground_truth

  • Setup and Manage labeling jobs
  • Uses active learning and human labeling
  • First 500 objects labeled per month are free

[Demo] Labeling Job

Notebook

notebooks

[Demo] Sagemaker Notebooks

  • Create and run Jupyter Notebooks
  • Using Jupyter
  • Using JupyterLab
  • Using the terminal

  • Lifecycle configurations

  • Git Repositories
  • public repositories can be cloned on Notebook launch

Training

training

[Demo] Sagemaker Training

  • Algorithms
  • Create algorithm
  • Subscribe AWS Marketplace

  • Training Jobs

  • HyperParameter Tuning Jobs

Inference

inference

[Demo] Sagemaker Inference

  • Compilation jobs

  • Model packages

  • Models

  • Endpoint configurations

  • Endpoints

  • Batch transform jobs

Built in Sagemaker Algorithms

Table of algorithms provided by Amazon Sagemaker

aws_algorithms

7.2 DeepLense Features

tech_specs

mqt

detection

[Demo] DeepLense

7.3 Kinesis Features

Kinesis FAQ

  • Processes Data in Real-Time
  • Can process hundreds of TBs an hour
  • Example inputs are:
  • logs
  • financial transactions
  • Streaming Data
!pip install -q sensible
import boto3

import asyncio
import time
import datetime
import uuid
import boto3
import json
from sensible.loginit import logger

LOG = logger(__name__)

def firehose_client(region_name="us-east-1"):
    """Kinesis Firehose client"""

    firehose_conn = boto3.client("firehose", region_name=region_name)
    extra_msg = {"region_name": region_name, "aws_service": "firehose"}
    LOG.info("firehose connection initiated", extra=extra_msg)
    return firehose_conn

async def put_record(data,
            client,
            delivery_stream_name="aws-ml-cert"):
    """
    See this:
        http://boto3.readthedocs.io/en/latest/reference/services/
        firehose.html#Firehose.Client.put_record
    """
    extra_msg = {"aws_service": "firehose"}
    LOG.info(f"Pushing record to firehose: {data}", extra=extra_msg)
    response = client.put_record(
        DeliveryStreamName=delivery_stream_name,
        Record={
            'Data': data
        }
    )
    return response


def gen_uuid_events():
    """Creates a time stamped UUID based event"""

    current_time = 'test-{date:%Y-%m-%d %H:%M:%S}'.format(date=datetime.datetime.now())
    event_id = str(uuid.uuid4())
    event = {event_id:current_time}
    return json.dumps(event)

def send_async_firehose_events(count=100):
    """Async sends events to firehose"""

    start = time.time() 
    client = firehose_client()
    extra_msg = {"aws_service": "firehose"}
    loop = asyncio.get_event_loop()
    tasks = []
    LOG.info(f"sending aysnc events TOTAL {count}",extra=extra_msg)
    num = 0
    for _ in range(count):
        tasks.append(asyncio.ensure_future(put_record(gen_uuid_events(), client)))
        LOG.info(f"sending aysnc events: COUNT {num}/{count}")
        num +=1
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
    end = time.time()  
    LOG.info("Total time: {}".format(end - start))


send_async_firehose_events(10)

7.4 AWS Flavored Python

Boto3

  • Main Interface for working with AWS

  • Any Service in AWS can be communicated with Boto
  • If Amazon is a country this is the language

Communicate with S3

import boto3
resource = boto3.resource("s3")
resource.meta.client.download_file('testntest', 'nba_2017_endorsement_full_stats.csv',
'/tmp/nba_2017_endorsement_full_stats.csv')
!ls -l /tmp
total 4
srw------- 1 root root    0 Dec 14 16:11 drivefs_ipc.0
srw------- 1 root root    0 Dec 14 16:11 drivefs_ipc.0_shell
-rw-r--r-- 1 root root 1447 Dec 14 19:06 nba_2017_endorsement_full_stats.csv

Pandas

Main Datascience library for AWS and Python

  • It is assumed you know about it
  • Many study videos will show examples using it
import pandas as pd

df = pd.read_csv("/tmp/nba_2017_endorsement_full_stats.csv")
df.head(2)
PLAYER TEAM SALARY_MILLIONS ENDORSEMENT_MILLIONS PCT_ATTENDANCE_STADIUM ATTENDANCE_TOTAL_BY_10K FRANCHISE_VALUE_100_MILLION ELO_100X CONF POSITION AGE MP GP MPG WINS_RPM PLAYER_TEAM_WINS WIKIPEDIA_PAGEVIEWS_10K TWITTER_FAVORITE_COUNT_1K
0 LeBron James Cleveland Cavaliers 30.96 55.0 100.0 84.0 12.0 15.45 East SF 32 37.8 74.0 37.8 20.43 51.0 14.70 5.53
1 Kevin Durant Golden State Warriors 26.50 36.0 100.0 80.0 26.0 17.70 West SF 28 33.4 62.0 33.4 12.24 51.0 6.29 1.43

Descriptive Statistics with Pandas

df.describe()
SALARY_MILLIONS ENDORSEMENT_MILLIONS PCT_ATTENDANCE_STADIUM ATTENDANCE_TOTAL_BY_10K FRANCHISE_VALUE_100_MILLION ELO_100X AGE MP GP MPG WINS_RPM PLAYER_TEAM_WINS WIKIPEDIA_PAGEVIEWS_10K TWITTER_FAVORITE_COUNT_1K
count 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000 10.000000
mean 23.216000 21.700000 99.800000 80.200000 21.375000 15.678000 29.300000 33.890000 70.800000 33.890000 11.506000 44.100000 6.532000 2.764000
std 5.294438 15.362653 2.394438 5.202563 8.507554 1.361142 3.164034 2.303837 8.390471 2.303837 6.868487 12.591443 5.204233 3.646399
min 12.110000 8.000000 94.000000 70.000000 10.250000 13.740000 24.000000 29.900000 60.000000 29.900000 1.170000 26.000000 2.690000 0.350000
25% 21.707500 13.000000 100.000000 78.500000 13.125000 15.250000 28.000000 32.725000 62.500000 32.725000 6.015000 32.500000 3.402500 0.865000
50% 23.880000 14.500000 100.000000 80.500000 22.500000 15.450000 28.000000 33.850000 73.000000 33.850000 12.860000 46.500000 4.475000 1.485000
75% 26.500000 31.250000 100.000000 83.250000 26.000000 16.275000 31.750000 34.975000 77.750000 34.975000 16.890000 51.000000 5.917500 2.062500
max 30.960000 55.000000 104.000000 89.000000 33.000000 17.700000 35.000000 37.800000 81.000000 37.800000 20.430000 65.000000 17.570000 12.280000

Plotting with Python

import warnings
import numpy as np
warnings.simplefilter('ignore', np.RankWarning)
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
fig.set_size_inches(11, 8.5)
sns.regplot(data=df, 
            x="SALARY_MILLIONS", y="ENDORSEMENT_MILLIONS", 
            order=2).set_title("NBA Salary & Endorsements")
Text(0.5,1,'NBA Salary & Endorsements')

png

Putting it all together (Production Style)

!pip -q install python-json-logger
import logging
from pythonjsonlogger import jsonlogger

LOG = logging.getLogger()
LOG.setLevel(logging.DEBUG)
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
LOG.addHandler(logHandler)

import click
import boto3
import pandas as pd

TEST_DF = pd.DataFrame(
    {"SentimentRaw": ["I am very Angry",
                    "We are very Happy",
                    "It is raining in Seattle"]}
)

def create_sentiment(row):
    """Uses AWS Comprehend to Create Sentiments on a DataFrame"""

    LOG.info(f"Processing {row}")
    comprehend = boto3.client(service_name='comprehend', region_name="us-east-1")
    payload = comprehend.detect_sentiment(Text=row, LanguageCode='en')
    LOG.debug(f"Found Sentiment: {payload}")    
    sentiment = payload['Sentiment']
    return sentiment

def apply_sentiment(df, column="SentimentRaw"):
    """Uses Pandas Apply to Create Sentiment Analysis"""

    df['Sentiment'] = df[column].apply(create_sentiment)
    return df
df = apply_sentiment(TEST_DF)
df.head()
SentimentRaw Sentiment
0 I am very Angry NEGATIVE
1 We are very Happy POSITIVE
2 It is raining in Seattle NEUTRAL

7.5 Cloud9

Cloud9

[Demo]Cloud9

  • Create a cloud9 environment
  • Install Python 3.6

sudo yum -y update
sudo yum -y install python36

Python Lambda Function

import json
import decimal


def lambda_handler(event, context):

  print(event)
  if 'body' in event:
    event = json.loads(event["body"])
  
  amount = float(event["amount"])
  res = []
  coins = [1,5,10,25]
  coin_lookup = {25: "quarters", 10: "dimes", 5: "nickels", 1: "pennies"}
  coin = coins.pop()
  num, rem  = divmod(int(amount*100), coin)
  res.append({num:coin_lookup[coin]})
  while rem > 0:
    coin = coins.pop()
    num, rem = divmod(rem, coin)
    if num:
      if coin in coin_lookup:
        res.append({num:coin_lookup[coin]})

  response = {
    "statusCode": "200",
    "headers": { "Content-type": "application/json" },
    "body": json.dumps({"res": res})
  }

  return response

payload

{"amount": ".71"}

response

Response
{
    "statusCode": "200",
    "headers": {
        "Content-type": "application/json"
    },
    "body": "{\"res\": [{\"2\": \"quarters\"}, {\"2\": \"dimes\"}, {\"1\": \"pennies\"}]}"
}

Function Logs
{'amount': '.71'}

Request ID
d7ec2cad-8da0-4394-957e-41f07bad23ae

7.6 Key Terminology

Sagemaker Built-in Algorithms

BlazingText

  • unsupervised learning algorithm for generating Word2Vec embeddings.
  • aws blog post BlazingText

BlazingText

DeepAR Forecasting

  • supervised learning algorithm for forecasting scalar (that is, one-dimensional) time series using recurrent neural networks (RNN)
  • DeepAR Documentation

DeepAR