Categories
Developing & Application Integration

Uploading Music Files To Amazon S3 (PowerShell Mix)

In this post, I will upload lossless music files from my laptop to one of my Amazon S3 buckets using PowerShell.

Table of Contents

Introduction

For several months I’ve been going through some music from an old hard drive. These music files are currently on my laptop, and exist mainly as lossless .flac files.

For each file I’m doing the following:

  • Creating an .mp3 copy of each lossless file.
  • Storing the .mp3 file on my laptop.
  • Uploading a copy of the lossless file to S3 Glacier.
  • Transferring the original lossless file from my laptop to my desktop PC.

I usually do the uploads using the S3 console, and have been meaning to automate the process for some time. So I decided to write some code to upload files to S3 for me, in this case using PowerShell.

Prerequisites

Before starting to write my PowerShell script, I have done the following on my laptop:

Version 0: Functionality

Version 0 gets the basic functionality in place. No bells and whistles here – I just want to upload a file to an S3 bucket prefix, stored using the Glacier Flexible Retrieval storage class.

V0: Writing To S3

I am using the PowerShell Write-S3Object cmdlet to upload my files to S3. This cmdlet needs a couple of parameters to do what’s required:

  • -BucketName: The S3 bucket receiving the files.
  • -Folder: The folder on my laptop containing the files.
  • -KeyPrefix: The S3 bucket key prefix to assign to the uploaded objects.
  • -StorageClass: The S3 storage class to assign to the uploaded objects.

I create a variable for each of these so that my script is easier to read as I continue its development. I couldn’t find the inputs that the -StorageClass parameter uses in the Write-S3Object documentation. In the end, I found them in the S3 PutObject API Reference.

Valid inputs are as follows:

STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | OUTPOSTS | GLACIER_IR

V0: Code

V0BasicRedacted.ps1

#Set Variables
$LocalSource = "C:\Users\Files\"
$S3BucketName = "my-s3-bucket"
$S3KeyPrefix = "Folder\SubFolder\"
$S3StorageClass = "GLACIER"


#Upload File To S3
Write-S3Object -BucketName $S3BucketName -Folder $LocalSource -KeyPrefix $S3KeyPrefix -StorageClass $S3StorageClass
V0BasicRedacted.ps1 On GitHub

V0: Evaluation

Version 0 offers me the following benefits:

  • I don’t have to log onto the S3 console for uploads anymore.
  • Forgetting to specify Glacier Flexible Retrieval as the S3 storage class is no longer a problem. The script does this for me.
  • Starting an upload to S3 is now as simple as right-clicking the script and selecting Run With PowerShell from the Windows Context Menu.

Version 0 works great, but I’ll give away one of my S3 bucket names if I start sharing a non-redacted version. This has been known to cause security issues in severe cases. Ideally, I’d like to separate the variables from the Powershell commands, so let’s work on that next.

Version 1: Security

Version 1 enhances the security of my script by separating my variables from my PowerShell commands. To make this work without breaking things, I’m using the following features:

To take advantage of these features, I’ve made two new files in my repo:

  • Variables.ps1 for my variables.
  • V1Security.ps1 for my Write-S3Object command.

So let’s now talk about how this all works.

V1: Isolating Variables With Dot Sourcing

At the moment, my script is broken. Running Variables.ps1 will create the variables but do nothing with them. Running V1Security.ps1 will fail as the variables aren’t in that script anymore.

This is where Dot Sourcing comes in. Using Dot Sourcing lets PowerShell look for code in other places. Here, when I run V1Security.ps1 I want PowerShell to look for variables in Variables.ps1.

To dot source a script, type a dot (.) and a space before the script path. As both of my files are in the same folder, PowerShell doesn’t even need the full path:

. .\EDMTracksLosslessS3Upload-Variables.ps1

Now my script works again! But I still have the same problem – if Variables.ps1 is committed to GitHub at any point then my variables are still visible. How can I stop that?

This time it’s Git to the rescue. I need a .gitignore file.

V1: Selective Tracking With .gitignore

.gitignore is a way of telling Git what not to include in commits. Entering a file, folder or pattern into a repo’s .gitignore file tells Git not to track it.

When Visual Studio Code finds a .gitignore file, it helps out by making visual changes in response to the file’s contents. When I create a .gitignore file and add the following lines to it:

#Ignore PowerShell Files Containing Variables

EDMTracksLosslessS3Upload-V0Basic.ps1
EDMTracksLosslessS3Upload-Variables.ps1

Visual Studio Code’s Explorer tab will show those files as grey:

They won’t be visible at all in the Source Control tab:

And finally, when committed to GitHub the ignored files are not present:

Before moving on, I found this Steve Griffith .gitignore tutorial helpful in introducing the basics:

And this DevOps Journey tutorial helps show how .gitignore behaves within Visual Studio Code:

V1: Code

gitignore Version 1

#Ignore PowerShell Files Containing Variables

EDMTracksLosslessS3Upload-V0Basic.ps1
EDMTracksLosslessS3Upload-Variables.ps1

V1Security.ps1

#Load Variables
. .\EDMTracksLosslessS3Upload-Variables.ps1


#Upload File To S3
Write-S3Object -BucketName $S3BucketName -Folder $LocalSource -KeyPrefix $S3KeyPrefix -StorageClass $S3StorageClass
V1Security.ps1 On GitHub

VariablesBlank.ps1 Version 1

#Set Variables


#The local file path for objects to upload to S3
#E.g. "C:\Users\Files\"
$LocalSource =

#The S3 bucket to upload the objects to
#E.g. "my-s3-bucket"
$S3BucketName =

#The S3 bucket prefix / folder to upload the objects to (if applicable)
#E.g. "Folder\SubFolder\"
$S3KeyPrefix =

#The S3 Storage Class to upload to
#E.g. "GLACIER"
$S3StorageClass =
Version 1 VariablesBlank.ps1 On GitHub

V1: Evaluation

Version 1 now gives me the benefits of Version 0 with the following additions:

  • My variables and commands have now been separated.
  • I can now call Variables.ps1 from other scripts in the same folder, knowing the variables will be the same each time for each script.
  • I can use .gitignore to make sure Variables.ps1 is never uploaded to my GitHub repo.

The next problem is one of visibility. I have no way to know if my uploads have been successful. Or if they were duplicated. Nor do I have any auditing.

The S3 console gives me a summary at the end of each upload:

It would be great to have something similar with my script! In addition, some error handling and quality control checks would increase my confidence levels.

Let’s get to work!

Version 2: Visibility

Version 2 enhances the visibility of my script. The length of the script grows a lot here, so let’s run through the changes and I’ll explain what’s going on.

As a starting point, I copied V1Security.ps1 and renamed it to V2Visibility.ps1.

V2: Variables.ps1 And .gitignore Changes

Additions are being made to these files as a result of the Version 2 changes. I’ll mention them as they come up, but it makes sense to cover a few things up-front:

  • I added External to all variable names in Variables.ps1 to keep track of them in the script. For example, $S3BucketName is now $ExternalS3BucketName.
  • There are some additional local file paths in Variables.ps1 that I’m using for transcripts and some post-upload checks.
  • .gitignore now includes a log file (more on that shortly) and the Visual Studio Code debugging folder.

V2: Transcripts

The first change is perhaps the simplest. PowerShell has built-in cmdlets for creating transcripts:

  • Start-Transcript creates a record of all or part of a PowerShell session in a separate file.
  • Stop-Transcript stops a transcript that was started by the Start-Transcript cmdlet.

These go at the start and end of V2Visibility.ps1, along with a local file path for the EDMTracksLosslessS3Upload.log file I’m using to record everything.

Start-Transcript -Path $ExternalTranscriptPath -IncludeInvocationHeader

This new path is stored in Variables.ps1. In addition, EDMTracksLosslessS3Upload.log has been added to .gitignore.

V2: Check If There Are Any Files

Now the error handing begins. I want the script to fail gracefully, and I start by checking that there are files in the correct folder. First I count the files using Get-ChildItem and Measure-Object:

$LocalSourceCount = (Get-ChildItem -Path $ExternalLocalSource | Measure-Object).Count

And then stop the script running if no files are found:

If ($LocalSourceCount -lt 1) 
{
Write-Output "No Local Files Found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

There are a couple of cmdlets here that make several appearances in Version 2:

  • Start-Sleep suspends PowerShell activity for the time stated. This gives me time to read the output when I’m running the script using the context menu.
  • Exit causes PowerShell to completely stop everything it’s doing. In this case, there’s no point continuing as there’s nothing in the folder.

If files are found, PowerShell displays the count and carries on:

Else 
{
Write-Output "$LocalSourceCount Local Files Found"          
}

V2: Check If The Files Are Lossless

Next, I want to stop any file uploads that don’t belong in the S3 bucket. The bucket should only contain lossless music – anything else should be rejected.

To arrange this, I first capture the extensions for each file using Get-ChildItem and [System.IO.Path]::GetExtension:

$LocalSourceObjectFileExtensions = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetExtension($_) }

Then I check each extension using a ForEach loop. If an extension isn’t in the list, PowerShell will report this and exit the script:

ForEach ($LocalSourceObjectFileExtension In $LocalSourceObjectFileExtensions) 

{
If ($LocalSourceObjectFileExtension -NotIn ".flac", ".wav", ".aif", ".aiff") 
{
Write-Output "Unacceptable $LocalSourceObjectFileExtension file found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

If the extension is in the list, PowerShell records this and checks the next one:

Else 
{
Write-Output "Acceptable $LocalSourceObjectFileExtension file."
}

So now, if I attempt to upload an unacceptable .log file, the transcript will say:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Unacceptable .log file found.  Exiting.
**********************

Whereas an acceptable .flac file will produce:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Acceptable .flac file.
**********************

And when uploading multiple files:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Acceptable .flac file.
Acceptable .wav file.
Acceptable .flac file.
**********************

V2: Check If The Files Are Already In S3

The next step checks if the files are already in S3. This might not seem like a problem, as S3 usually overwrites an object if it already exists.

Thing is, this bucket is replicated. This means it’s also versioned. As a result, S3 will keep both copies in this scenario. In the world of Glacier this doesn’t cost much, but it will distort the bucket’s S3 Inventory. This could lead to confusion when I check them with Athena. And if I can stop this situation with some automation then I might as well.

I’m going to use the Get-S3Object cmdlet to query my bucket for each file. For this to work, I need two things:

  • -BucketName: This is in Variables.ps1.
  • -Key

-Key is the object’s S3 file path. For example, Folder\SubFolder\Music.flac. As the files shouldn’t be in S3 yet, these keys shouldn’t exist. So I’ll have to make them using PowerShell.

I start by getting all the filenames I want to check using Get-ChildItem and [System.IO.Path]::GetFileName:

$LocalSourceObjectFileNames = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetFileName($_) }

Now I start another ForEach loop. I make an S3 key for each filename by combining it with $ExternalS3KeyPrefix in Variables.ps1:

ForEach ($LocalSourceObjectFileName In $LocalSourceObjectFileNames) 

{
$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

Then I query S3 using Get-S3Object and my constructed S3 key, and capture the result in a variable:

$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

Get-S3Object should return null as the object shouldn’t exist.

If this doesn’t happen then the object is already in the bucket. In this situation, PowerShell identifies the file causing the problem and then exits the script:

If ($null -ne $LocalSourceObjectFileNameS3Check) 
{
Write-Output "File already exists in S3 bucket: $LocalSourceObjectFileName.  Please review.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit

If the file isn’t found then PowerShell continues to run:

Else 
{
Write-Output "$LocalSourceObjectFileName does not currently exist in S3 bucket."
}

Assuming no files are found at this point, the log will read as follows:

Checking if local files already exist in S3 bucket.
Checking S3 bucket for Artist-Track-ExtendedMix.flac
Artist-Track-ExtendedMix.flac does not currently exist in S3 bucket.
Checking S3 bucket for Artist-Track-OriginalMix.flac
Artist-Track-OriginalMix.flac does not currently exist in S3 bucket.

V2: Uploading Files Instead Of Folders

Now to start uploading to S3!

In Version 2 I’ve altered how this is done. Previously my script’s purpose was to upload a folder to S3 using the PowerShell cmdlet Write-S3Object.

Version 2 now uploads individual files instead. There is a reason for this that I’ll go into shortly.

This means I have to change things around as Write-S3Object now needs different parameters:

  • Instead of telling the -Folder parameter where the local folder is, I now need to tell the -File parameter where each file is located.
  • Instead of telling the -KeyPrefix parameter where to store the uploaded objects in S3, I now need to tell the -Key parameter the full S3 path for each object.

I’ll do -Key first. I start by opening another ForEach loop, and create an S3 key for each file in the same way I did earlier:

$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

Next is -File. I make the local file path for each file using variables I’ve already created:

$LocalSourceObjectFilepath = $ExternalLocalSource + "\" + $LocalSourceObjectFileName

Then I begin uploads for each file using Write-S3Object with the new -File and -Key parameters instead of -Folder and -KeyPrefix:

Write-Output "Starting S3 Upload Of $LocalSourceObjectFileName"

Write-S3Object -BucketName $ExternalS3BucketName -File $LocalSourceObjectFilepath -Key $LocalSourceObjectFileNameS3Key -StorageClass $ExternalS3StorageClass

The main benefit of this approach is that, if something goes wrong mid-upload, the transcript will tell me which uploads were successful. Version 1’s script would only tell me that uploads had started, so in the event of failure I’d need to check the S3 bucket’s contents.

Speaking of failure, wouldn’t it be good to check that the uploads worked?

V2: Were The Uploads Successful?

For this, I’m still working in the ForEach loop I started for the uploads. After an upload finishes, PowerShell checks if the object is in S3 using the Get-S3Object command I wrote earlier:

Write-Output "Starting S3 Upload Check Of $LocalSourceObjectFileName"
      
$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

This time I want the object to be found, so null is a bad result.

Next, I get PowerShell to do some heavy lifting for me. I’ve created a pair of new local folders called S3WriteSuccess and S3WriteFail. The paths for these are stored in Variables.ps1.

If my S3 upload check doesn’t find anything and returns null, PowerShell moves the file from the source folder to S3WriteFail using Move-Item:

If ($null -eq $LocalSourceObjectFileNameS3Check) 

{
Write-Output "S3 Upload Check FAIL: $LocalSourceObjectFileName.  Moving to local Fail folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationFail
}

If the object is found, PowerShell moves the file to S3WriteSuccess:

Else 

{
Write-Output "S3 Upload Check Success: $LocalSourceObjectFileName.  Moving to local Success folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationSuccess           
} 

The ForEach loop then repeats with the next file until all are processed.

So now, a failed upload produces the following log:

**********************
Beginning S3 Upload Checks On Following Objects: StephenJKroos-Micrsh-OriginalMix
S3 Upload Check: StephenJKroos-Micrsh-OriginalMix.flac
S3 Upload Check FAIL: StephenJKroos-Micrsh-OriginalMix.  Moving to local Fail folder
**********************
Windows PowerShell transcript end
**********************

While a successful S3 upload produces this one:

**********************
Beginning S3 Upload Checks On Following Objects: StephenJKroos-Micrsh-OriginalMix
S3 Upload Check: StephenJKroos-Micrsh-OriginalMix.flac
S3 Upload Check Success: StephenJKroos-Micrsh-OriginalMix.  Moving to local Success folder
**********************
Windows PowerShell transcript end
**********************

PowerShell then shows a final message before ending the transcript:

Write-Output "All files processed.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript

V2: Code

gitignore Version 2

###################
###### FILES ######
###################

#Powershell Transcript log
EDMTracksLosslessS3Upload.log

#PowerShell Files Containing Variables
EDMTracksLosslessS3Upload-V0Basic.ps1

#PowerShell Files Containing Variables
EDMTracksLosslessS3Upload-Variables.ps1


#####################
###### FOLDERS ######
#####################

#VSCode Debugging
.vscode/
Version 2.gitignore On GitHub

V2Visibility.ps1

##################################
####### EXTERNAL VARIABLES #######
##################################


#Load External Variables Via Dot Sourcing
. .\EDMTracksLosslessS3Upload-Variables.ps1

#Start Transcript
Start-Transcript -Path $ExternalTranscriptPath -IncludeInvocationHeader


###############################
####### LOCAL VARIABLES #######
###############################


#Get count of items in $ExternalLocalSource
#Get list of filenames in $ExternalLocalSource
$LocalSourceCount = (Get-ChildItem -Path $ExternalLocalSource | Measure-Object).Count

#Get list of extensions in $ExternalLocalSource
$LocalSourceObjectFileExtensions = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetExtension($_) }

#Get list of filenames in $ExternalLocalSource
$LocalSourceObjectFileNames = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetFileName($_) }


##########################
####### OPERATIONS #######
##########################


#Check there are files in local folder.
Write-Output "Counting files in local folder."

#If local folder less than 1, output this and stop the script.  
If ($LocalSourceCount -lt 1) 

{
Write-Output "No Local Files Found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

#If files are found, output the count and continue.
Else 

{
Write-Output "$LocalSourceCount Local Files Found"          
}


#Check extensions are valid for each file.
Write-Output " "
Write-Output "Checking extensions are valid for each local file."

ForEach ($LocalSourceObjectFileExtension In $LocalSourceObjectFileExtensions) 

{
#If any extension is unacceptable, output this and stop the script. 
If ($LocalSourceObjectFileExtension -NotIn ".flac", ".wav", ".aif", ".aiff") 

{
Write-Output "Unacceptable $LocalSourceObjectFileExtension file found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

#If extension is fine, output the extension for each file and continue.
Else 
{
Write-Output "Acceptable $LocalSourceObjectFileExtension file."
}
}


#Check if local files already exist in S3 bucket.
Write-Output " "
Write-Output "Checking if local files already exist in S3 bucket."

#Do following actions for each file in local folder
ForEach ($LocalSourceObjectFileName In $LocalSourceObjectFileNames) 

{
#Create S3 object key using $ExternalS3KeyPrefix and current object's filename
$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

#Create local filepath for each object for the file move
$LocalSourceObjectFilepath = $ExternalLocalSource + "\" + $LocalSourceObjectFileName

#Output that S3 upload check is starting
Write-Output "Checking S3 bucket for $LocalSourceObjectFileName"
      
#Attempt to get S3 object data using $LocalSourceObjectFileNameS3Key
$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

#If local file found in S3, output this and stop the script.
If ($null -ne $LocalSourceObjectFileNameS3Check) 

{
Write-Output "File already exists in S3 bucket: $LocalSourceObjectFileName.  Please review.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

#If local file not found in S3, report this and continue.
Else 
{
Write-Output "$LocalSourceObjectFileName does not currently exist in S3 bucket."
}
}


#Output that S3 uploads are starting - count and file names
Write-Output " "
Write-Output "Starting S3 Upload Of $LocalSourceCount Local Files."
Write-Output "These files are as follows: $LocalSourceObjectFileNames"
Write-Output " "


#Do following actions for each file in local folder
ForEach ($LocalSourceObjectFileName In $LocalSourceObjectFileNames) 

{
#Create S3 object key using $ExternalS3KeyPrefix and current object's filename
$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

#Create local filepath for each object for the file move
$LocalSourceObjectFilepath = $ExternalLocalSource + "\" + $LocalSourceObjectFileName

#Output that S3 upload is starting
Write-Output "Starting S3 Upload Of $LocalSourceObjectFileName"

#Write object to S3 bucket
Write-S3Object -BucketName $ExternalS3BucketName -File $LocalSourceObjectFilepath -Key $LocalSourceObjectFileNameS3Key -StorageClass $ExternalS3StorageClass

#Output that S3 upload check is starting
Write-Output "Starting S3 Upload Check Of $LocalSourceObjectFileName"
      
#Attempt to get S3 object data using $LocalSourceObjectFileNameS3Key
$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

#If $LocalSourceObjectFileNameS3Key doesn't exist in S3, move to local Fail folder.
If ($null -eq $LocalSourceObjectFileNameS3Check) 

{
Write-Output "S3 Upload Check FAIL: $LocalSourceObjectFileName.  Moving to local Fail folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationFail
}

#If $LocalSourceObjectFileNameS3Key does exist in S3, move to local Success folder.
Else 
{
Write-Output "S3 Upload Check Success: $LocalSourceObjectFileName.  Moving to local Success folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationSuccess           
}
}


#Stop Transcript
Write-Output " "
Write-Output "All files processed.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
V2Visibility.ps1 On GitHub

VariablesBlank.ps1 Version 2

##################################
####### EXTERNAL VARIABLES #######
##################################

#The local file path for the transcript file
#E.g. "C:\Users\Files\"
$ExternalTranscriptPath =

#The local file path for objects to upload to S3
#E.g. "C:\Users\Files\"
$ExternalLocalSource =

#The S3 bucket to upload objects to
#E.g. "my-s3-bucket"
$ExternalS3BucketName =

#The S3 bucket prefix / folder to upload  objects to (if applicable)
#E.g. "Folder\SubFolder\"
$ExternalS3KeyPrefix =

#The S3 Storage Class to upload to
#E.g. "GLACIER"
$ExternalS3StorageClass =

#The local file path for moving successful S3 uploads to
#E.g. "C:\Users\Files\"
$ExternalLocalDestinationSuccess =

#The local file path for moving failed S3 uploads to
#E.g. "C:\Users\Files\"
$ExternalLocalDestinationFail =
Version 2 VariablesBlank.ps1 On GitHub

V2: Evaluation

Overall I’m very happy with how this all turned out! Version 2 took a script that worked with some supervision, and turned it into something I can set and forget.

The various checks now have my back if I select the wrong files or if my connection breaks. And, while the Get-S3Object checks mean that I’m making more S3 API calls, the increase won’t cause any bill spikes.

The following is a typical transcript that my script produces following a successful upload of two .flac files:

**********************
Transcript started, output file is C:\Users\Files\EDMTracksLosslessS3Upload.log
Counting files in local folder.
2 Local Files Found

Checking extensions are valid for each local file.
Acceptable .flac file.
Acceptable .flac file.

Checking if local files already exist in S3 bucket.
Checking S3 bucket for MarkOtten-Tranquility-OriginalMix.flac
MarkOtten-Tranquility-OriginalMix.flac does not currently exist in S3 bucket.
Checking S3 bucket for StephenJKroos-Micrsh-OriginalMix.flac
StephenJKroos-Micrsh-OriginalMix.flac does not currently exist in S3 bucket.

Starting S3 Upload Of 2 Local Files.
These files are as follows: MarkOtten-Tranquility-OriginalMix StephenJKroos-Micrsh-OriginalMix.flac

Starting S3 Upload Of MarkOtten-Tranquility-OriginalMix.flac
Starting S3 Upload Check Of MarkOtten-Tranquility-OriginalMix.flac
S3 Upload Check Success: MarkOtten-Tranquility-OriginalMix.flac.  Moving to local Success folder
Starting S3 Upload Of StephenJKroos-Micrsh-OriginalMix.flac
Starting S3 Upload Check Of StephenJKroos-Micrsh-OriginalMix.flac
S3 Upload Check Success: StephenJKroos-Micrsh-OriginalMix.flac.  Moving to local Success folder

All files processed.  Exiting.
**********************
Windows PowerShell transcript end
End time: 20220617153926
**********************

GitHub ReadMe

To round everything off, I’ve written a ReadMe for the repo. This is written in Markdown using the template at makeareadme.com, and the finished article is available here.

Summary

In this post, I created a script to upload lossless music files from my laptop to one of my Amazon S3 buckets using PowerShell.

I introduced automation to perform checks before and after each upload, and logged the outputs to a transcript. I then produced a repo for the scripts, accompanied by a ReadMe document.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Training & Community

AWS Summit London 2022 Takeaways

In this post, I will talk about my main takeaways from my visit to the AWS Summit London 2022 event.

Table of Contents

Introduction

Anyone following my Instagram will have seen that I attended the AWS Summit London 2022 event in April. This was my first AWS event, and I had a great time watching the presentations, taking in the atmosphere and finding things that a magnetic shark could stick to.

Besides stickers and badges, I left the event with pages of notes and photos of slides that fell roughly into two lists:

  • Consider for work
  • Consider for me

I’ve done the work list, so it’s time for mine! This post has two halves. Firstly, I’ll talk about some of the AWS services I want to try out on the blog over the next few months.

Then, in the second half, I’ll talk about some of the third party presentations that introduced me to interesting things that I hadn’t heard about before.

Let’s get started!

AWS Presentations

In this section, I’ll talk about some of the services mentioned in the AWS Summit London 2022 sessions that I want to try out over the next few months.

Amazon CloudWatch SDK For Python

Having seen the CloudWatch SDK in passing while studying for my Certified Developer Associate certification, I saw a demo of it in one of the sessions.

I was impressed with how quick and simple the SDK is to use, and have a few ideas for it as part of some Python ETLs and IoT functions I want to try. In addition, I can create and then re-use common monitoring modules to save myself some time in future.

Amazon Timestream

From the Amazon Timestream website:

Amazon Timestream is a fast, scalable, and serverless time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day up to 1,000 times faster and at as little as 1/10th the cost of relational databases.

Some time soon I’m hoping to try out a Raspberry Pi project that uses a temperature sensor. Timestream looks like a good fit for this! It’s built with IoT in mind, is serverless and offers built-in analytics. In addition, it offers integrations with Amazon Kinesis and Grafana, so it sounds simple to get off the ground.

AWS Data Exchange

From the AWS Data Exchange website:

AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud.

After you’ve subscribed to a data product, you can use the AWS Data Exchange API to load data directly into Amazon Simple Storage Service (S3) and use a range of AWS analytics and machine learning (ML) services to analyze it.

One of the challenges of trying out services aimed at big data is a lack of big data.

Sample databases like Northwind, AdventureWorks and WideWorldImporters have been around for a while, helping generations of people learn their craft. However, Northwind was intended for SQL Server 2000. And although WideWorldImporters is more recent it’s a bit limited by modern standards.

AWS Data Exchange offers a variety of modern Data Products via the AWS Marketplace. Currently, there are over 3500 Data Products and almost half of them cost nothing to access. So lots to use for potential EMR, Glue and SageMaker projects!

AWS DataOps Development Kit (DDK)

From the AWS DataOps Development Kit repo:

The AWS DataOps Development Kit is an open source development framework for customers that build data workflows and modern data architecture on AWS. Based on the AWS CDK, it offers high-level abstractions allowing you to build pipelines that manage data flows on AWS, driven by DevOps best practices.

The DDK joins the CDK as something I want to try out. I’ve not done anything with infrastructure as code on the blog yet. However, the CDK sounds like a good place to start, and the DDK could quickly spin me up some infrastructure to use with some Data Exchange data.

AWS One Observability Workshop

From the One Observability Workshop Studio:

You will learn about AWS observability functionalities on Amazon CloudWatch, AWS X-Ray, Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS Distro for OpenTelemetry (ADOT). The workshop will deploy a micro-service application and help you learn monitoring.

I’ve already made some bespoke monitoring for my main AWS account. I’m interested in trying this workshop out to see what else I can learn. I’m also keen on getting some first-hand experience with X-Ray, Prometheus and Grafana.

Third-Party Presentations

In this section, I’ll talk about some of the third party presentations that introduced me to interesting things that I hadn’t heard about before.

Cazoo’s Serverless Architecture

Cazoo‘s Engineering Coach Bob Gregory spoke about their use of AWS serverless technologies including Lambda, DynamoDB and Athena. As a result, Cazoo was the first to market and could scale quickly in response to rapid customer demand.

This was my first time hearing about Cazoo, and Bob turned a very business-oriented presentation into a chat with some mates at the pub. He has a great speaking style, an example of which is here:

Amazon published a press release about Cazoo on the day of the Summit. It details Cazoo’s current and future relationship with AWS and includes Cazoo’s plans to integrate various AWS machine learning tools. Examples include Textract for paperwork processing and invoice management and Rekognition for inventory handling and rapid image and video analytics.

And speaking of analytics…

EMIS Group’s Data Architecture

EMIS Group‘s CTO Richard Jarvis spoke about how they use various AWS services to ingest, analyse and present health care data. During the 2020 Pandemic, they were able to quickly analyse national COVID-19 data and provide clinical research about topics including transmission, treatment and vaccination.

EMIS Group’s data security includes a Data Mesh architecture, which separates data producers from data consumers. Meanwhile, AWS IAM handles the security of their applications by controlling how users access them and how they interact with each other.

As a result, EMIS Group can ensure that the right applications are accessible by the right people, and that sensitive and personal data is stored appropriately and in line with GDPR.

Ocado’s Fulfilment Robots

Ocado‘s Chief Technology Officer James Donkin and Chief of Advanced Technology Alex Harvey spoke about the use of AWS at their fulfilment centres. Ocado has made a name for itself in the field of robotics and has used this technology to drive efficiency and innovation.

That video is from 2018 and a lot has changed since then. This year Ocado have begun upgrading to their new 600 Series fulfilment robot, pictured here:

Wait. That’s a Borg Cube. Hold on.

YOU WILL BE REFRIGERATED

Alex and James talked about the challenges of operating thousands of robots, and how AWS help them innovate and scale while maintaining low latency and cost. Ocado deploys microservices and web applications to AWS, which the robots rely on for communication and navigation.

Further information is available on an Ocado case study on the AWS website.

Summary

In this post, I discussed the main takeaways from my recent visit to the AWS Summit London 2022 event. I talked about some of the services I want to try out on the blog over the next few months, as well as some of the third party presentations that introduced me to interesting things that I hadn’t heard about before.

In conclusion, I had a great time at the summit! I came away with a lot of good ideas and had some great conversations. Hopefully, I’ll be able to go back next year!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Security & Monitoring

Unexpected CloudWatch In The Billing Area

In this post I will investigate an unexpected CloudWatch charge on my April 2022 AWS bill, and explain how to interpret the bill and find the resources responsible.

Table of Contents

Introduction

My April 2022 AWS bill has arrived. The total wasn’t unusual – £4.16 is a pretty standard charge for me at the moment, most of which is S3. Then I took a closer look at the services and found an unexpected cost for CloudWatch, which is usually zero.

But not this month:

While $0.30 isn’t bank-breaking, it is unexpected and worth investigating. More importantly, nothing should be running in EU London! And there were no CloudWatch changes at all on my March 2022 bill. So what’s going on here?

Let’s start with the bill itself.

The April 2022 Bill

Looking at the bill, the rows with unexpected CloudWatch charges all mention alarms. Since nothing else has generated any charges, let’s take a closer look at all of the rows referring to alarms.

$0.00 Per Alarm Metric Month – First 10 Alarm Metrics – 10.000 Alarms

The AWS Always Free Tier includes ten CloudWatch alarms.

$0.10 Per Alarm Metric Month (Standard Resolution) – EU (Ireland) – 2.000002 Alarms

In EU Ireland, each standard resolution alarm after the first ten costs $0.10. The bill says there are twelve alarms in EU Ireland – ten of these are free and the other two cost $0.10 each – $0.20 in total.

$0.10 Per Alarm Metric Month (Standard Resolution) – EU (London) – 1.000001 Alarms

CloudWatch standard resolution alarms also cost $0.10 in EU London. As all my free alarms are seemingly in EU Ireland, the one in EU London costs a further $0.10.

So the bill is saying I have thirteen alarms – twelve in EU Ireland and one in EU London. Let’s open CloudWatch and see what’s going on there.

CloudWatch Alarm Dashboard

It seems I have thirteen CloudWatch alarms. Interesting, because I could only remember the four security alarms I set up in February.

CloudWatch says otherwise. This is my current EU Ireland CloudWatch dashboard:

Closer inspection finds eight alarms with names like:

  • TargetTracking-table/Rides-ProvisionedCapacityHigh-a53f2f67-9477-45a6-8197-788d2c7462b3
  • TargetTracking-table/Rides-ProvisionedCapacityLow-a36cf02f-7b3c-4fb0-844e-cf3d03fa80a9

Two of these are constantly In Alarm, and all have Last State Update values on 2022-03-17. The alarm names led me to suspect that DynamoDB was involved, and this was confirmed by viewing the Namespace and Metric Name values in the details of one of the alarms:

At this point I had an idea of what was going on. To be completely certain, I wanted to check my account history for 2022-03-17. That means a trip to CloudTrail!

CloudTrail Event History

CloudTrail’s Event History shows the last 90 days of management events. I entered a date range of 2022-03-17 00:00 > 2022-03-18 00:01 into the search filter, and it didn’t take long to start seeing some familiar-looking Resource Names:

Alongside the TargetTracking-table resource names linked to monitoring.amazonaws.com, there are also rows on the same day for other Event Sources including:

  • dynamodb.amazonaws.com
  • apigateway.amazonaws.com
  • lambda.amazonaws.com
  • cognito-idp.amazonaws.com

I now know with absolute certainty where the unexpected CloudWatch alarms came from. Let me explain.

Charge Explanations

So far I’ve reviewed my bills, found the CloudWatch alarms and established what was happening in my account when they were added. Now I’ll explain how this all led to charges on my bill.

The $0.20 EU Ireland Charge

When I was recently studying for the Developer Associate certification, I followed an AWS tutorial on how to Build a Serverless Web Application with AWS Lambda, Amazon API Gateway, AWS Amplify, Amazon DynamoDB, and Amazon Cognito. This was to top up my serverless knowledge before the exam.

The third module involves creating a DynamoDB table for the application. A table that I provisioned with auto-scaling for read and write capacity:

These auto-scaling policies rely on CloudWatch alarms to function, as demonstrated by some of the alarm conditions:

The DynamoDB auto-scaling created eight CloudWatch alarms. Four for Read Capacity Units:

  • ConsumedReadCapacityUnits > 42 for 2 datapoints within 2 minutes
  • ConsumedReadCapacityUnits < 30 for 15 datapoints within 15 minutes
  • ProvisionedReadCapacityUnits > 1 for 3 datapoints within 15 minutes
  • ProvisionedReadCapacityUnits < 1 for 3 datapoints within 15 minutes

And four for Write Capacity Units:

  • ConsumedWriteCapacityUnits > 42 for 2 datapoints within 2 minutes
  • ConsumedWriteCapacityUnits < 30 for 15 datapoints within 15 minutes
  • ProvisionedWriteCapacityUnits > 1 for 3 datapoints within 15 minutes
  • ProvisionedWriteCapacityUnits < 1 for 3 datapoints within 15 minutes

These eight alarms joined the existing four. The first ten were free, leaving two accruing charges.

This also explains why two alarms are always In Alarm – the criteria for scaling in are being met but the DynamoDB table can’t scale down any further.

I could have avoided this situation by destroying the resources after finishing the tutorial. The final module of the tutorial covers this. Instead I decided to keep everything around so I could take a proper look at everything under the hood.

No resources accrued any charges in March, so I left everything in place during April. I’ll go into why there was nothing on the March bill shortly, but first…

The $0.10 EU London Charge

Remember when I said that I shouldn’t be running anything in EU London? Turns out I was!

I found a very old CloudWatch alarm from 2020. It’s been there ever since. Never alerting so I didn’t know it was there. Included in the Always Free tier, so never costing me anything or triggering an AWS Budget alert. Appearing on my bill, but always as a free entry so never drawing attention.

When I exceeded my ten free CloudWatch alarms, the one in EU London became chargeable for the first time. A swift delete later and that particular problem is no more.

No CloudWatch Charge On The March 2022 Bill

That only leaves the question of why there were no CloudWatch charges on my March 2022 bill, despite there being thirteen alarms on my account for almost half of that month:

I wanted to understand what was going on, so I reached out to AWS Support.

In what must have been a first for them, I asked why no money had been billed for CloudWatch in March:

On my April 2022 bill I was charged $0.30 for CloudWatch. $0.20 in Ireland and $0.10 in London. I understand why.

What I want to understand is why I didn’t see a charge for them on my March 2022 bill. The alerts were added to the account on March 17th, so from that moment on I had thirteen alerts which is three over the free tier.

Can I get confirmation on why they don’t appear on March but do on April please?

I soon received a reply from AWS Support that explained the events in full:

…although you enabled all 13 Alarms in March, the system only calculated a pro-rated usage value, since the Alarms were only enabled on 17th March. The pro-rated Alarm usage values only amounted to 7.673 Alarms in the EU (Ireland) region, and 1.000003 Alarms in the EU (London) region.

The total pro-rated Alarm usage calculated for March (8.673003 Alarms) is thus within the 10 Alarm Free Tier threshold and thus incurred no charges, whereas in April the full 13 Alarm usage came into play for the entire month…

To summarise, I hadn’t been charged for the alarms in March because they’d only been on my account for almost half a month. Thanks for the help folks!

Summary

In this post I investigated an unexpected CloudWatch charge on my April 2022 AWS bill. I showed what the bill looked like, demonstrated how to find the resources generating the charges and explained how those resources came to be on my AWS account.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~