Today I went out for my first run post-Christmas. It was about as much fun as you’d expect but that’s not the point.
When I finish a run, my watch uploads the data to Garmin Connect, which in turn syncs the data with FetchEveryone (for their great analytics) and Strava (for their API). The Strava API is generally more accessible than Garmin’s and I already use it for integrations with Google Calendar, so it dawned on me that my Strava account has years’ worth of data that I can tap into for various personal projects. Then I had to change course to avoid three horses coming down the centre of the trail.
I completed the fields using the recommendations in the documentation and in response Strava provided a set of API credentials:
The next step was to make a cURL request against the Strava API for my profile data, for which the Strava docs suggested Postman – a platform for building and using APIs. I made an account there, created a GET request for https://www.strava.com/api/v3/athlete and set an Authorization key-value pair using my Strava Access Token:
That’s as far as I’m going to take this today but in the coming days I want to take this forward and try out some of the Python examples in the Strava docs.
On 30/11/2021, AWS introduced S3 Glacier Instant Retrieval – a new archive storage class for S3 that operates alongside S3 Glacier (now renamed S3 Glacier Flexible Retrieval) and S3 Glacier Deep Archive. Their announcements can be seen here and here and a summary of all Glacier classes is available on the S3 Glacier product page.
I already use most of the S3 storage classes in my AWS accounts. Earlier in the year I got tired of my laptop backups needing to run overnight and made an S3 cross-account replication setup in which whatever I upload to the AtRest bucket in my main account gets replicated to the AtRest bucket in my backup account and gets set as S3 Glacier Deep Archive. This way I have two versions of the object in different regions in different accounts, and although there are data transfer costs they are offset by the reduced cost I get from using S3 Glacier Deep Archive for the backup objects.
Objects in my main account use different classes depending on their purpose. Before I upload any objects there I consider whether the object is in motion or at rest and what my access pattern for the object is likely to be, then choose a storage class accordingly. This is the current storage class distribution for all buckets in my main account according to S3 Storage Lens:
The arrival of S3 Glacier Instant Retrieval is of interest to me as it might offer cost savings and accessibility improvements over my current setup. So far my decisions over S3 storage classes have usually boiled down to trade-offs. For example:
For Object X I could use S3 Intelligent Tiering or S3 Infrequent Access. S3 Infrequent Access has a minimum storage duration of 30 days and has retrieval costs, but S3 Intelligent Tiering has a handing fee per 1000 objects and each object will spend the first 30 days in, and be charged as, S3 Standard. So if I know I’m not going to touch this object for at least a month which class is most suitable?
For Object Y I could use S3 Glacier or S3 Glacier Deep Archive. Deep Archive will cost less for storage but the retrieval fees are higher than Glacier and Deep Archive’s minimum storage duration is 180 days where Glacier’s is only 90 days. Plus I can get objects out of Glacier far quicker as its standard retrieval time is 3 to 5 hours compared to Deep Archive’s standard of 12 hours. So could I afford to wait half a day for this object if I needed it? And how long do I see this object being around for?
Comparisons With Other S3 Storage Classes
So how does S3 Glacier Instant Retrieval compare to S3 Infrequent Access and S3 Glacier Flexible Retrieval? I loaded the S3 pricing site and had a look at various costs in eu-west-1 for S3 Infrequent Access (IFA), S3 Glacier Instant Retrieval (GIR) and S3 Glacier Flexible Retrieval (GFR), then used the S3 calculator to get some estimates based on my current S3 Storage Lens statistics and November 2021 bill.
Storage:
IFA $0.0125 per GB
GIR $0.004 per GB
GFR $0.0036 per GB
PUT, COPY, POST, LIST requests (per 1,000 requests):
IFA $0.01
GIR $0.02
GFR $0.33
GET, SELECT, and all other requests (per 1,000 requests):
IFA $0.001
GIR $0.01
GFR $0.0004
Data Retrieval requests (per 1,000 requests):
IFA N/A
GIR N/A
GFR $0.055 (Standard)
Data retrievals (per GB):
IFA $0.01
GIR $0.03
GFR $0.01 (Standard)
Estimated cost for storing 200GB per month (with average size of 4.4MB for Glacier Flexible Retrieval), 24265 PUT, COPY, POST, LIST requests, 10402 GET, SELECT, and all other requests and retrieval of 50GB per month (using 1 Standard request for Glacier Flexible Retrieval):
IFR $3.25
GIR $2.89
GFR $2.38
A couple other items of note:
S3 Glacier Instant Retrieval has a minimum billable object size of 128 KB, which it shares with S3 Standard Infrequent Access
S3 Glacier Instant Retrieval offers instant retrieval in milliseconds, which it also shares with S3 Standard Infrequent Access
S3 Glacier Instant Retrieval has a minimum storage duration of 90 days, which it shares with S3 Glacier Flexible Retrieval
What’s interesting in the cost estimates for me is now close S3 Glacier Instant Retrieval is to S3 Standard Infrequent Access. The major difference between the two classes that I can see is that, while S3 Glacier Instant Retrieval has a minimum storage duration of 90 days, the same period for S3 Standard Infrequent Access is only 30 days. If you delete an object before the end of a minimum storage duration period, you are charged for the full period specified. Depending on the size and amount of the objects, this could get expensive if mismanaged. That said, AWS are offering S3 Glacier Instant Retrieval as being “For long-lived archive data accessed once a quarter with instant retrieval in milliseconds” so there are no smoke and mirrors here.
Conclusions
Would I use S3 Glacier Instant Retrieval over S3 Glacier Flexible Retrieval or S3 Standard Infrequent Access? Definitely in my AtRest bucket. The S3 Storage Lens stats for that bucket shows many objects in S3 Standard Infrequent Access, including all the old TV shows from Internet Archive because let’s face it – if you want to watch old TV you want to watch it now not in 3 hours’ time </Glacier>. In this scenario S3 Glacier Instant Retrieval keeps the millisecond access and, although the retrieval cost is higher (GIR $0.03 vs IFA $0.01) the cost of data storage is lower (GIR $0.004 per GB vs IFA $0.0125 per GB). So S3 Glacier Instant Retrieval looks like a winner there.
My InMotion bucket is a different story though. The objects here aren’t being retained permanently and most of them are in S3 so they don’t bring my laptop’s hard drive to its knees. If I’m looking at uploading objects here it’s usually with a question of “When will I deal with this?”, the answer to which will usually be:
The next few weeks, in which case I’ll keep the object in OneDrive instead (What a TWIST)
Next month, in which case I’d put the object in S3 Standard Infrequent Access because of its 30-day minimum storage duration
“I don’t know”, in which case I’d put the object in S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive because their storage costs are less than S3 Glacier Instant Retrieval
As a side note, most of the objects in my InMotion bucket are S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive already, so it looks like my estimates from the start of the year were half decent!