Categories
Developing & Application Integration

Next-Level S3 Notifications With EventBridge

In this post I will use AWS managed services to enhance my S3 user experience with custom EventBridge notifications that are low cost, quick to set up and perform well at scale.

Table of Contents

Introduction

I’ve been restoring some S3 Glacier Flexible Retrieval objects lately. I use bulk retrievals to reduce costs – these finish within 5–12 hours. However, on a couple of occasions I’ve totally forgotten about them and almost missed the download deadline!

Having recently set up some alerting, I decided to make a similar setup that will trigger emails at key points in the retrieval process, using the following AWS services:

  • S3 for holding the objects and managing the retrieval process
  • EventBridge for receiving events from S3 and looking for patterns
  • SNS for sending notifications to me

The end result will look like this:

Let’s start with SNS.

SNS: The Notifier

I went into detail about Amazon Simple Notification Service (SNS) in my last post about making some security alerts so feel free to read that if some SNS terms are unfamiliar.

Here I want SNS to send me emails, so I start by making a new standard topic called s3-object-restore. I then create a new subscription with an email endpoint and link it to my new topic.

This completes my SNS setup. Next I need to make some changes to one of my S3 buckets.

S3: The Storage

Amazon S3 stores objects in buckets. The properties of a bucket can be customised to complement its intended purpose. For example, the Default Encryption property forces encryption on buckets containing sensitive objects. The Bucket Versioning property protects objects from accidental changes and deletes.

Here I’m interested in the Event Notifications property. This property sends notifications when certain events occur in the bucket. Examples of S3 events include uploads, deletes and, importantly for this use case, restore requests.

S3 can send events to a number of AWS services including, helpfully, EventBridge! This isn’t on by default but is easily enabled in the bucket’s properties:

My bucket will now send events to EventBridge. But what is EventBridge?

EventBridge: The Go-Between

Full disclosure. At first I wasn’t entirely sure what EventBridge was. The AWS description did little to change that:

I tend to uncomplicate topics by abstracting them. Here I found it helpful to think of EventBridge as a bus:

  • Busses provide high-capacity transport between bus stops. The bus is EventBridge.
  • Passengers use the bus to get to where they need to go. The passengers are events.
  • Bus stops are where passengers join or depart the bus. The bus stops are event sources and targets.

In the same way that a bus picks up passengers at one bus stop and drops them off at another, EventBridge receives events from a source and directs them to a target.

Much has been written about EventBridge’s benefits. Rather than spending the next few paragraphs copy/pasting, I will instead suggest the following for further reading:

In this use case, EventBridge’s main advantage is that it is decoupled from S3. This allows one EventBridge Rule to serve many S3 buckets. S3 can send notifications to SNS without EventBridge, but each bucket needs configuring separately so this quickly causes headaches with multiple buckets.

Currently my S3 bucket is already sending events to EventBridge, so let’s create an EventBridge rule for them.

EventBridge Rule: Setting A Pattern & Choosing A Source

Rules allow EventBridge to route events from a source to a target. After naming my new rule s3-object-restore, I need to choose what kind of rule I want:

  • Event Pattern: the rule will be triggered by an event.
  • Schedule: the rule will be triggered by a schedule.

I select Event Pattern. EventBridge then poses further questions to establish what events to look for:

  • Event Matching Pattern: Do I want to use EventBridge presets or write my own pattern?
  • Service Provider: Are the events coming from an AWS service or a third party?
  • Service Name: What service will be the source of events?

EventBridge will only present options relevant to the previous choices. For example, choosing AWS as Service Provider means that no third party services are available in Service Name.

My choices so far tell EventBrdige that S3 is the event source:

Next up is Event Type. As EventBridge knows the events are coming from S3, the options here are very specific:

I choose Amazon S3 Event Notification.

EventBridge now knows enough to create a rule, and offers the following JSON as an Event Pattern:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Access Tier Changed", "Object ACL Updated", "Object Created", "Object Deleted", "Object Restore Completed", "Object Restore Expired", "Object Restore Initiated", "Object Storage Class Changed", "Object Tags Added", "Object Tags Deleted"]
}

I’m only interested in restores, so I open the Specific Event(s) list and choose the three Object Restore events:

EventBridge then amends the event pattern to:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Restore Completed", "Object Restore Initiated", "Object Restore Expired"]
}

That’s it for the source. Now EventBridge needs to know what to do when it finds something!

EventBridge Rule: Choosing A Target & Configuring Inputs

One of EventBridge’s big selling points is how it interacts with targets. There are already numerous targets, and EventBridge rules can have more than one.

I select SNS Topic as a target then choose my s3-object-restore SNS topic from the list:

This alone is enough for EventBridge to interact with SNS. When I save this EventBridge rule and trigger it by running an S3 object restore, I receive this email:

Although this is technically a success, some factors aren’t ideal:

  • The formatting of the email is hard to read.
  • There’s a lot of information here, most of which is irrelevant.
  • It’s not immediately clear what this email is telling me.

To address this I can use EventBridge’s Configure Input feature to change what is sent to the target. This feature offers four options:

  • Matched Events: EventBridge passes all of the event text to the target. This is the default.
  • Part Of The Matched Event: EventBridge only sends part of the event text to the target.
  • Constant (JSON text): None of the event text is sent to the target. EventBridge sends user-defined JSON instead.
  • Input Transformer: EventBridge assigns lines of event text as variables, then uses those variables in a template.

Let’s look at the input transformer.

The AWS EventBridge user guide goes into detail about the input transformer and includes a good tutorial. Having consulted these resources, I start by getting the desired JSON from the initial email:

{
"detail-type":"Object Restore Initiated",
"source":"aws.s3",
"time":"2022-02-21T12:51:21Z",
"detail":
{
"bucket":{"name":"redacted"},
"object":{"key":"redacted"}
}
}

Then I convert the JSON into an Input Path:

{
"bucket":"$.detail.bucket.name",
"detail-type":"$.detail-type",
"object":"$.detail.object.key",
"source":"$.source",
"time":"$.time"
}

And finally specify an Input Template:

"<source> <detail-type> at <time>. Bucket: <bucket>. Object: <object>"

EventBridge checks input templates before accepting them, and will throw an error if the input template is invalid:

I update my EventBridge rule with the new Input Transformer configuration. Time to test it out!

Testing

When I trigger an S3 object restore I receive this email moments later:

I then receive a second email when the object is ready for download:

"aws.s3 Object Restore Completed at 2022-03-04T00:15:33Z. Bucket: REDACTED. Object: REDACTED"

And a final one when the object expires:

"aws.s3 Object Restore Expired at 2022-03-05T10:12:04Z. Bucket: REDACTED. Object: REDACTED"

Success!

Before moving on, let me share the results of an earlier test. My very first input path (not included here) contained some mistakes. The input template was valid but it couldn’t read the S3 event properly, so I ended up with this:

Something to bear in mind for future rules!

Cost Analysis

Before I wrap up, let’s run through the expected costs with this setup:

  • SNS: the first thousand email notifications SNS every month are included in the AWS Always Free tier, and I’m nowhere near that!
  • S3: There is no change for S3 passing events to EventBridge. Charges for object storage and retrieval are out of scope for this post.
  • EventBridge: All events published by AWS services are free.

There is no expected cost rise for this setup based on my current use.

Summary

In this post I’ve used EventBridge and SNS to produce free bespoke notifications at key points in the S3 object retrieval process. This offers me the following benefits:

  • Reassurance: I can choose the longer S3 retrieval offerings knowing that AWS will keep me updated on progress.
  • Convenience: I will know the status of retrievals without accessing the AWS console or using the CLI.
  • Cost: I am less likely to forget to download retrieved objects before expiry, and therefore less likely to need to retrieve those objects again.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Developing & Application Integration

Re-Runnable Strava API Calls Using Python

In this post I make my existing Python code re-runnable by enabling it to replace expired access tokens when it sends requests to the Strava API.

A couple of posts ago I wrote about authenticating Strava API calls. I ended up successfully requesting data using this Python code:

import requests

activities_url = "https://www.strava.com/api/v3/athlete/activities" 

header = {'Authorization': 'Bearer ' + "access_token"}
param = {'per_page': 200, 'page': 1}

my_dataset = requests.get(activities_url, headers=header, params=param).json()

print(my_dataset)

Although successful, the uses for this code are limited as it stops working when the header’s access_token expires. Ideally the code should be able to function constantly once Strava grants initial authorisation, which is what I’m exploring here. Plus the last post was unclear in places so this one will hopefully tie up some loose ends.

Please note that I have altered or removed all sensitive codes and tokens in this post in the interests of security.

The Story So Far

First of all, some reminders. Strava uses OAuth 2.0 for authentication, and this is a typical OAuth 2.0 workflow:

OAuthAnOverview

I am sending GET requests to Strava via Get Activity. Strava’s documentation for this is as follows:

StravaDevelopers

Finally, during my initial setup I created an API Application on the Strava site. Strava provided these details upon completion:

StravaAPICredentials

Authorizing My App To View Data

Strava’s Getting Started page explains that they require authentication via OAuth 2.0 for data requests and gives the following link for that process:

http://www.strava.com/oauth/authorize?client_id=[REPLACE_WITH_YOUR_CLIENT_ID]&response_type=code&redirect_uri=http://localhost/exchange_token&approval_prompt=force&scope=read

I must amend this URL as scope=read is insufficient for Get Activity requests. The end of the URL becomes scope=activity:read_all and the updated URL loads a Strava authorization screen:

StravaAuthorization

Selecting Authorize gives the following response:

http://localhost/exchange_token?state=&code=CODE9fbb&scope=read,activity:read_all

Where code=CODE9fbb is a single-use authorization code that I will use to create access tokens.

Getting Tokens For API Requests

Next I will use CODE9fbb to request access tokens which Get Activity will accept. This is done via the following cURL request:

curl -X POST https://www.strava.com/api/v3/oauth/token \
  -d client_id=APIAPP-CLIENTID \
  -d client_secret=APIAPP-SECRET \
  -d code=CODE9fbb \
  -d grant_type=authorization_code

Here, Client_ID and Client_Secret are from my API application, Code is the authorization code CODE9fbb and Grant_Type is what I’m asking for – Strava’s Authentication documentation states this must always be authorization_code for initial authentication.

Strava then responds to my cURL request with a refresh token, access token, and access token expiration date in Unix Epoch format:

"token_type": "Bearer",
  "expires_at": 1642370007,
  "expires_in": 21600,
  "refresh_token": "REFRESHc8c4",
  "access_token": "ACCESS22e5",

Why two tokens? Access tokens expire six hours after they are created and must be refreshed to maintain access to the desired data. Strava uses the refresh tokens as part of the access token refresh process.

Writing The API Request Code

With the tokens now available I can start assembling the Python code for my Strava API requests. I will again be using Visual Studio Code here. I make a new Python virtual environment called StravaAPI by running py -3 -m venv StravaAPI, activate it using StravaAPI\Scripts\activate and run pip install requests to install the module I need. Finally I create an empty StravaAPI.py file in the StravaAPI virtual environment folder for the Python code.

Onto the code. The first part imports the requests module, declares some variables and sets up a request to refresh an expired access code as detailed in the Strava Authentication documentation:

# Import modules
import requests


# Set Variables
apiapp_clientid = "APIAPP-CLIENTID"
apiapp_secret = 'APIAPP-SECRET'
token_refresh = 'REFRESHc8c4'

# Requesting Access Token
url_oauth = "https://www.strava.com/oauth/token"
payload_oauth = {
	'client_id': apiapp_clientid,
	'client_secret': apiapp_secret,
	'refresh_token': token_refresh,
	'grant_type': "refresh_token",
	'f': 'json'
}

Note this time that the Grant_Type is refresh_token instead of authorization_code. These variables can then be used by the requests module to send a request to Strava’s API:

print("Requesting Token...\n")
req_access_token = requests.post(url_oauth, data=payload_oauth, verify=False)
print(req_access_token.json())

This request is successful and returns existing tokens ACCESS22e5 and REFRESHc8c4 as they have not yet expired:

Requesting Token...

{'token_type': 'Bearer', 'access_token': 'ACCESS22e5', 'expires_at': 1642370008, 'expires_in': 20208, 'refresh_token': 'REFRESHc8c4'}

A warning is also presented here as my request is not secure:

InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.strava.com'. Adding certificate verification is strongly advised.

The warning includes a link to urllib3 documentation, which states:

Making unverified HTTPS requests is strongly discouraged, however, if you understand the risks and wish to disable these warnings, you can use disable_warnings()

As this code is currently in development, I import the urllib3 module and disable the warnings:

# Import modules
import requests
import urllib3

# Disable Insecure Request Warnings
urllib3.disable_warnings()

Next I extract the access token from Strava’s response into a new token_access variable and print that in the terminal as a process indicator:

print("Requesting Token...\n")
req_access_token = requests.post(url_oauth, data=payload_oauth, verify=False)

token_access = req_access_token.json()['access_token']
print("Access Token = {}\n".format(token_access))

So far the terminal’s output is:

Requesting Token...

Access Token = ACCESS22e5

Let’s get some data! I’m making a call to Get Activities now, so I declare three variables to compose the request and include the token_access variable from earlier :

# Requesting Athlete Activities
url_api_activities = "https://www.strava.com/api/v3/athlete/activities"
header_activities = {'Authorization': 'Bearer ' + token_access}
param_activities = {'per_page': 200, 'page' : 1}

Then I use the requests module to send the request to Strava’s API:

print("Requesting Athlete Activities...\n")
dataset_activities = requests.get(url_api_activities, headers=header_activities, params=param_activities).json()
print(dataset_activities)

And receive data about several recent activities as JSON in return. Success! The full Python code is as follows:

# Import modules
import requests
import urllib3

# Disable Insecure Request Warnings
urllib3.disable_warnings()

# Set Variables
apiapp_clientid = "APIAPP-CLIENTID"
apiapp_secret = 'APIAPP-SECRET'
token_refresh = 'REFRESHc8c4'

# Requesting Access Token
url_oauth = "https://www.strava.com/oauth/token"
payload_oauth = {
	'client_id': apiapp_clientid,
	'client_secret': apiapp_secret,
	'refresh_token': token_refresh,
	'grant_type': "refresh_token",
	'f': 'json'
}

print("Requesting Token...\n")
req_access_token = requests.post(url_oauth, data=payload_oauth, verify=False)

token_access = req_access_token.json()['access_token']
print("Access Token = {}\n".format(token_access))

# Requesting Athlete Activities
url_api_activities = "https://www.strava.com/api/v3/athlete/activities"
header_activities = {'Authorization': 'Bearer ' + token_access}
param_activities = {'per_page': 200, 'page' : 1}
print("Requesting Athlete Activities...\n")
dataset_activities = requests.get(url_api_activities, headers=header_activities, params=param_activities).json()
print(dataset_activities)

But Does It Work?

This only leaves the question of whether the code works when the access code expires. As a reminder this was Strava’s original response:

Requesting Token...

{'token_type': 'Bearer', 'access_token': 'ACCESS22e5', 'expires_at': 1642370008, 'expires_in': 20208, 'refresh_token': 'REFRESHc8c4'}

Expiry 1642370008 is Sunday, 16 January 2022 21:53:28. I run the code at 22:05 and:

Requesting Token...

{'token_type': 'Bearer', 'access_token': 'ACCESSe0e7', 'expires_at': 1642392321, 'expires_in': 21559, 'refresh_token': 'REFRESHc8c4'}

A new access token! The new expiry 1642392321 is Monday, 17 January 2022 04:05:21. And when I run the code at 09:39:

{'token_type': 'Bearer', 'access_token': 'ACCESS74dd', 'expires_at': 1642433966, 'expires_in': 21600, 'refresh_token': 'REFRESHc8c4'}

A second new access code. All working fine! As long as my refresh token remains valid I can continue to get valid access tokens when they expire.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Developing & Application Integration

Authenticating Strava API Calls Using OAuth 2.0 And Visual Studio Code

Picking up from last time, I continued on with the Strava API Getting Started page and got myself very confused over how things were supposed to work. The Strava API requires authentication via OAuth 2.0, and in fairness Strava includes a graph that shows how the process works which unfortunately went right over my head. By chance I found a YouTube video produced by InterSystems Learning Services that gave me a good entry-level understanding of the process, boiling down to this slide:

oauth 2.0 workflow

As it turned out I was misunderstanding the purpose of the access token Strava was giving me on the My API Application page. That access token wasn’t provided by OAuth and so couldn’t be used to get any data – instead I needed to use Strava’s OAuth API to request different credentials and define a scope for the access needed.

To that end, Strava provides this link on their Getting Started page:

http://www.strava.com/oauth/authorize?client_id=[REPLACE_WITH_YOUR_CLIENT_ID]&response_type=code&redirect_uri=http://localhost/exchange_token&approval_prompt=force&scope=read

Which was used in conjunction with my app’s ClientID to get an authorization code in the form of:

http://localhost/exchange_token?state=&code=MYAUTHCODE123456789&scope=read

Next, a cURL request must be made to exchange the authorization code and scope for a refresh token, access token, and access token expiration date. The page suggests https://www.strava.com/oauth/token which turns out to be wrong – the URL used in the Authentication documentation for this is https://www.strava.com/api/v3/oauth/token. At this point I was having varying success with Postman, so instead installed REST Client on Visual Studio Code on the advice of Vu Long Tran’s blog which made things much simpler.  

Sending a request of:

curl -X POST https://www.strava.com/api/v3/oauth/token 

  -d client_id=ReplaceWithClientID \

  -d client_secret=ReplaceWithClientSecret \

  -d code=ReplaceWithCode \

  -d grant_type=authorization_code\

Produced a response containing my Strava profile data along with new tokens and details of their expiry:

"token_type": "Bearer"

"expires_at": 1640810729,

"expires_in": 4002,

"refresh_token": REFRESHTOKEN123456789

"access_token": "ACCESSTOKEN123456789"

Awesome! Now to put these to work…

Part of my confusion coming into this was that I had been given a Python script in my travels that I was expecting to work with the access token from the My API Application screen, which of course with the benefit of hindsight was never going to be successful. The Python script is as follows:

import requests

activities_url = "https://www.strava.com/api/v3/athlete/activities"

header = {'Authorization': 'Bearer ' + "access_token"}
param = {'per_page': 200, 'page': 1}

my_dataset = requests.get(activities_url, headers=header, params=param).json()

print(my_dataset)

Time to test it out with the new access token! Straight into a problem:

{'message': 'Authorization Error', 'errors': [{'resource': 'Athlete', 'field': 'access_token', 'code': 'invalid'}]}

The Strava API and SDK Reference page was quick to indicate the source of the problem:

strava api get activity

The previous requests to Strava’s OAuth API were with scope=read. I changed this to scope=activity:read_all and received a new authorization code with the updated scope. This authorization code was then exchanged for a new access token which, when pasted into the Python script, outputted so much JSON that in the interests of sanity I’ve screenshotted a small section for illustration:

python json strava api response

To be fair it was what I asked for. And it all makes a bit more sense to me now!

Future plans for this involve making use of the refresh tokens to automate access token generation (assuming I’ve understood that part correctly – stay tuned!) and getting some working code into a Lambda function so I can start turning some cogs in AWS.

Thanks for reading ~~^~~