Categories
Developing & Application Integration

Authenticating Strava API Calls Using OAuth 2.0 And Visual Studio Code

Picking up from last time, I continued on with the Strava API Getting Started page and got myself very confused over how things were supposed to work. The Strava API requires authentication via OAuth 2.0, and in fairness Strava includes a graph that shows how the process works which unfortunately went right over my head. By chance I found a YouTube video produced by InterSystems Learning Services that gave me a good entry-level understanding of the process, boiling down to this slide:

oauth 2.0 workflow

As it turned out I was misunderstanding the purpose of the access token Strava was giving me on the My API Application page. That access token wasn’t provided by OAuth and so couldn’t be used to get any data – instead I needed to use Strava’s OAuth API to request different credentials and define a scope for the access needed.

To that end, Strava provides this link on their Getting Started page:

http://www.strava.com/oauth/authorize?client_id=[REPLACE_WITH_YOUR_CLIENT_ID]&response_type=code&redirect_uri=http://localhost/exchange_token&approval_prompt=force&scope=read

Which was used in conjunction with my app’s ClientID to get an authorization code in the form of:

http://localhost/exchange_token?state=&code=MYAUTHCODE123456789&scope=read

Next, a cURL request must be made to exchange the authorization code and scope for a refresh token, access token, and access token expiration date. The page suggests https://www.strava.com/oauth/token which turns out to be wrong – the URL used in the Authentication documentation for this is https://www.strava.com/api/v3/oauth/token. At this point I was having varying success with Postman, so instead installed REST Client on Visual Studio Code on the advice of Vu Long Tran’s blog which made things much simpler.  

Sending a request of:

curl -X POST https://www.strava.com/api/v3/oauth/token 

  -d client_id=ReplaceWithClientID \

  -d client_secret=ReplaceWithClientSecret \

  -d code=ReplaceWithCode \

  -d grant_type=authorization_code\

Produced a response containing my Strava profile data along with new tokens and details of their expiry:

"token_type": "Bearer"

"expires_at": 1640810729,

"expires_in": 4002,

"refresh_token": REFRESHTOKEN123456789

"access_token": "ACCESSTOKEN123456789"

Awesome! Now to put these to work…

Part of my confusion coming into this was that I had been given a Python script in my travels that I was expecting to work with the access token from the My API Application screen, which of course with the benefit of hindsight was never going to be successful. The Python script is as follows:

import requests

activities_url = "https://www.strava.com/api/v3/athlete/activities"

header = {'Authorization': 'Bearer ' + "access_token"}
param = {'per_page': 200, 'page': 1}

my_dataset = requests.get(activities_url, headers=header, params=param).json()

print(my_dataset)

Time to test it out with the new access token! Straight into a problem:

{'message': 'Authorization Error', 'errors': [{'resource': 'Athlete', 'field': 'access_token', 'code': 'invalid'}]}

The Strava API and SDK Reference page was quick to indicate the source of the problem:

strava api get activity

The previous requests to Strava’s OAuth API were with scope=read. I changed this to scope=activity:read_all and received a new authorization code with the updated scope. This authorization code was then exchanged for a new access token which, when pasted into the Python script, outputted so much JSON that in the interests of sanity I’ve screenshotted a small section for illustration:

python json strava api response

To be fair it was what I asked for. And it all makes a bit more sense to me now!

Future plans for this involve making use of the refresh tokens to automate access token generation (assuming I’ve understood that part correctly – stay tuned!) and getting some working code into a Lambda function so I can start turning some cogs in AWS.

Thanks for reading ~~^~~

Categories
Developing & Application Integration

Sending GET Requests To The Strava API With Postman: Getting Started

Today I went out for my first run post-Christmas. It was about as much fun as you’d expect but that’s not the point.

strava run summary

When I finish a run, my watch uploads the data to Garmin Connect, which in turn syncs the data with FetchEveryone (for their great analytics) and Strava (for their API). The Strava API is generally more accessible than Garmin’s and I already use it for integrations with Google Calendar, so it dawned on me that my Strava account has years’ worth of data that I can tap into for various personal projects. Then I had to change course to avoid three horses coming down the centre of the trail.

To get the ball rolling post-run I logged onto my Strava account and accessed my profile, only to find no mention of the API anywhere. Some Googling established that the URL needed updating from https://www.strava.com/settings/profile to https://www.strava.com/settings/api to show the My API Application options, which helpfully includes a link to the Strava Developer documentation:

strava api create app

I completed the fields using the recommendations in the documentation and in response Strava provided a set of API credentials:

strava api application credentials

The next step was to make a cURL request against the Strava API for my profile data, for which the Strava docs suggested Postman – a platform for building and using APIs. I made an account there, created a GET request for https://www.strava.com/api/v3/athlete and set an Authorization key-value pair using my Strava Access Token:

postman console

I then received the below response in JSON:

    "id": 18701823,

    "username": null,

    "resource_state": 2,

    "firstname": "Damien",

    "lastname": "Jones",

    "bio": "",

    "city": "[REDACTED]",

    "state": "England",

    "country": "United Kingdom",

    "sex": "M",

    "premium": false,

    "summit": false,

    "created_at": "2016-12-02T00:37:23Z",

    "updated_at": "2021-12-27T00:05:26Z",

    "badge_type_id": 0,

    "weight": 0.0,

    "profile_medium": "[REDACTED]",

    "profile": "[REDACTED]",

    "friend": null,

    "follower": null

}

Success! 

That’s as far as I’m going to take this today but in the coming days I want to take this forward and try out some of the Python examples in the Strava docs.

Thanks for reading! ~~^~~

Categories
Architecture & Resilience

S3 Glacier Instant Retrieval: First Impressions

On 30/11/2021, AWS introduced S3 Glacier Instant Retrieval – a new archive storage class for S3 that operates alongside S3 Glacier (now renamed S3 Glacier Flexible Retrieval) and S3 Glacier Deep Archive. Their announcements can be seen here and here and a summary of all Glacier classes is available on the S3 Glacier product page.

I already use most of the S3 storage classes in my AWS accounts. Earlier in the year I got tired of my laptop backups needing to run overnight and made an S3 cross-account replication setup in which whatever I upload to the AtRest bucket in my main account gets replicated to the AtRest bucket in my backup account and gets set as S3 Glacier Deep Archive. This way I have two versions of the object in different regions in different accounts, and although there are data transfer costs they are offset by the reduced cost I get from using S3 Glacier Deep Archive for the backup objects.

Objects in my main account use different classes depending on their purpose. Before I upload any objects there I consider whether the object is in motion or at rest and what my access pattern for the object is likely to be, then choose a storage class accordingly. This is the current storage class distribution for all buckets in my main account according to S3 Storage Lens:

The arrival of S3 Glacier Instant Retrieval is of interest to me as it might offer cost savings and accessibility improvements over my current setup. So far my decisions over S3 storage classes have usually boiled down to trade-offs. For example:

  • For Object X I could use S3 Intelligent Tiering or S3 Infrequent Access. S3 Infrequent Access has a minimum storage duration of 30 days and has retrieval costs, but S3 Intelligent Tiering has a handing fee per 1000 objects and each object will spend the first 30 days in, and be charged as, S3 Standard. So if I know I’m not going to touch this object for at least a month which class is most suitable?
  • For Object Y I could use S3 Glacier or S3 Glacier Deep Archive. Deep Archive will cost less for storage but the retrieval fees are higher than Glacier and Deep Archive’s minimum storage duration is 180 days where Glacier’s is only 90 days. Plus I can get objects out of Glacier far quicker as its standard retrieval time is 3 to 5 hours compared to Deep Archive’s standard of 12 hours. So could I afford to wait half a day for this object if I needed it? And how long do I see this object being around for?

Comparisons With Other S3 Storage Classes

So how does S3 Glacier Instant Retrieval compare to S3 Infrequent Access and S3 Glacier Flexible Retrieval? I loaded the S3 pricing site and had a look at various costs in eu-west-1 for S3 Infrequent Access (IFA), S3 Glacier Instant Retrieval (GIR) and S3 Glacier Flexible Retrieval (GFR), then used the S3 calculator to get some estimates based on my current S3 Storage Lens statistics and November 2021 bill.

Storage:

  • IFA $0.0125 per GB
  • GIR $0.004 per GB
  • GFR $0.0036 per GB

PUT, COPY, POST, LIST requests (per 1,000 requests):

  • IFA $0.01
  • GIR $0.02
  • GFR $0.33

GET, SELECT, and all other requests (per 1,000 requests):

  • IFA $0.001
  • GIR $0.01
  • GFR $0.0004

Data Retrieval requests (per 1,000 requests):

  • IFA N/A
  • GIR N/A
  • GFR $0.055 (Standard)

Data retrievals (per GB):

  • IFA $0.01
  • GIR $0.03
  • GFR $0.01 (Standard)

Estimated cost for storing 200GB per month (with average size of 4.4MB for Glacier Flexible Retrieval)24265 PUT, COPY, POST, LIST requests, 10402 GET, SELECT, and all other requests and retrieval of 50GB per month (using 1 Standard request for Glacier Flexible Retrieval):

  • IFR $3.25
  • GIR $2.89
  • GFR $2.38

A couple other items of note:

  • S3 Glacier Instant Retrieval has a minimum billable object size of 128 KB, which it shares with S3 Standard Infrequent Access
  • S3 Glacier Instant Retrieval offers instant retrieval in milliseconds, which it also shares with S3 Standard Infrequent Access
  • S3 Glacier Instant Retrieval has a minimum storage duration of 90 days, which it shares with S3 Glacier Flexible Retrieval

What’s interesting in the cost estimates for me is now close S3 Glacier Instant Retrieval is to S3 Standard Infrequent Access. The major difference between the two classes that I can see is that, while S3 Glacier Instant Retrieval has a minimum storage duration of 90 days, the same period for S3 Standard Infrequent Access is only 30 days. If you delete an object before the end of a minimum storage duration period, you are charged for the full period specified. Depending on the size and amount of the objects, this could get expensive if mismanaged. That said, AWS are offering S3 Glacier Instant Retrieval as being “For long-lived archive data accessed once a quarter with instant retrieval in milliseconds” so there are no smoke and mirrors here.

Conclusions

Would I use S3 Glacier Instant Retrieval over S3 Glacier Flexible Retrieval or S3 Standard Infrequent Access? Definitely in my AtRest bucket. The S3 Storage Lens stats for that bucket shows many objects in S3 Standard Infrequent Access, including all the old TV shows from Internet Archive because let’s face it – if you want to watch old TV you want to watch it now not in 3 hours’ time </Glacier>. In this scenario S3 Glacier Instant Retrieval keeps the millisecond access and, although the retrieval cost is higher (GIR $0.03 vs IFA $0.01) the cost of data storage is lower (GIR $0.004 per GB vs IFA $0.0125 per GB). So S3 Glacier Instant Retrieval looks like a winner there.

My InMotion bucket is a different story though. The objects here aren’t being retained permanently and most of them are in S3 so they don’t bring my laptop’s hard drive to its knees. If I’m looking at uploading objects here it’s usually with a question of “When will I deal with this?”, the answer to which will usually be:

  • The next few weeks, in which case I’ll keep the object in OneDrive instead (What a TWIST)
  • Next month, in which case I’d put the object in S3 Standard Infrequent Access because of its 30-day minimum storage duration
  • “I don’t know”, in which case I’d put the object in S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive because their storage costs are less than S3 Glacier Instant Retrieval

As a side note, most of the objects in my InMotion bucket are S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive already, so it looks like my estimates from the start of the year were half decent!

Thanks for reading! ~~^~~