Categories
Developing & Application Integration

Uploading Music Files To Amazon S3 (PowerShell Mix)

In this post, I will upload lossless music files from my laptop to one of my Amazon S3 buckets using PowerShell.

Table of Contents

Introduction

For several months I’ve been going through some music from an old hard drive. These music files are currently on my laptop, and exist mainly as lossless .flac files.

For each file I’m doing the following:

  • Creating an .mp3 copy of each lossless file.
  • Storing the .mp3 file on my laptop.
  • Uploading a copy of the lossless file to S3 Glacier.
  • Transferring the original lossless file from my laptop to my desktop PC.

I usually do the uploads using the S3 console, and have been meaning to automate the process for some time. So I decided to write some code to upload files to S3 for me, in this case using PowerShell.

Prerequisites

Before starting to write my PowerShell script, I have done the following on my laptop:

Version 0: Functionality

Version 0 gets the basic functionality in place. No bells and whistles here – I just want to upload a file to an S3 bucket prefix, stored using the Glacier Flexible Retrieval storage class.

V0: Writing To S3

I am using the PowerShell Write-S3Object cmdlet to upload my files to S3. This cmdlet needs a couple of parameters to do what’s required:

  • -BucketName: The S3 bucket receiving the files.
  • -Folder: The folder on my laptop containing the files.
  • -KeyPrefix: The S3 bucket key prefix to assign to the uploaded objects.
  • -StorageClass: The S3 storage class to assign to the uploaded objects.

I create a variable for each of these so that my script is easier to read as I continue its development. I couldn’t find the inputs that the -StorageClass parameter uses in the Write-S3Object documentation. In the end, I found them in the S3 PutObject API Reference.

Valid inputs are as follows:

STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | OUTPOSTS | GLACIER_IR

V0: Code

V0BasicRedacted.ps1

#Set Variables
$LocalSource = "C:\Users\Files\"
$S3BucketName = "my-s3-bucket"
$S3KeyPrefix = "Folder\SubFolder\"
$S3StorageClass = "GLACIER"


#Upload File To S3
Write-S3Object -BucketName $S3BucketName -Folder $LocalSource -KeyPrefix $S3KeyPrefix -StorageClass $S3StorageClass
V0BasicRedacted.ps1 On GitHub

V0: Evaluation

Version 0 offers me the following benefits:

  • I don’t have to log onto the S3 console for uploads anymore.
  • Forgetting to specify Glacier Flexible Retrieval as the S3 storage class is no longer a problem. The script does this for me.
  • Starting an upload to S3 is now as simple as right-clicking the script and selecting Run With PowerShell from the Windows Context Menu.

Version 0 works great, but I’ll give away one of my S3 bucket names if I start sharing a non-redacted version. This has been known to cause security issues in severe cases. Ideally, I’d like to separate the variables from the Powershell commands, so let’s work on that next.

Version 1: Security

Version 1 enhances the security of my script by separating my variables from my PowerShell commands. To make this work without breaking things, I’m using the following features:

To take advantage of these features, I’ve made two new files in my repo:

  • Variables.ps1 for my variables.
  • V1Security.ps1 for my Write-S3Object command.

So let’s now talk about how this all works.

V1: Isolating Variables With Dot Sourcing

At the moment, my script is broken. Running Variables.ps1 will create the variables but do nothing with them. Running V1Security.ps1 will fail as the variables aren’t in that script anymore.

This is where Dot Sourcing comes in. Using Dot Sourcing lets PowerShell look for code in other places. Here, when I run V1Security.ps1 I want PowerShell to look for variables in Variables.ps1.

To dot source a script, type a dot (.) and a space before the script path. As both of my files are in the same folder, PowerShell doesn’t even need the full path:

. .\EDMTracksLosslessS3Upload-Variables.ps1

Now my script works again! But I still have the same problem – if Variables.ps1 is committed to GitHub at any point then my variables are still visible. How can I stop that?

This time it’s Git to the rescue. I need a .gitignore file.

V1: Selective Tracking With .gitignore

.gitignore is a way of telling Git what not to include in commits. Entering a file, folder or pattern into a repo’s .gitignore file tells Git not to track it.

When Visual Studio Code finds a .gitignore file, it helps out by making visual changes in response to the file’s contents. When I create a .gitignore file and add the following lines to it:

#Ignore PowerShell Files Containing Variables

EDMTracksLosslessS3Upload-V0Basic.ps1
EDMTracksLosslessS3Upload-Variables.ps1

Visual Studio Code’s Explorer tab will show those files as grey:

They won’t be visible at all in the Source Control tab:

And finally, when committed to GitHub the ignored files are not present:

Before moving on, I found this Steve Griffith .gitignore tutorial helpful in introducing the basics:

And this DevOps Journey tutorial helps show how .gitignore behaves within Visual Studio Code:

V1: Code

gitignore Version 1

#Ignore PowerShell Files Containing Variables

EDMTracksLosslessS3Upload-V0Basic.ps1
EDMTracksLosslessS3Upload-Variables.ps1

V1Security.ps1

#Load Variables
. .\EDMTracksLosslessS3Upload-Variables.ps1


#Upload File To S3
Write-S3Object -BucketName $S3BucketName -Folder $LocalSource -KeyPrefix $S3KeyPrefix -StorageClass $S3StorageClass
V1Security.ps1 On GitHub

VariablesBlank.ps1 Version 1

#Set Variables


#The local file path for objects to upload to S3
#E.g. "C:\Users\Files\"
$LocalSource =

#The S3 bucket to upload the objects to
#E.g. "my-s3-bucket"
$S3BucketName =

#The S3 bucket prefix / folder to upload the objects to (if applicable)
#E.g. "Folder\SubFolder\"
$S3KeyPrefix =

#The S3 Storage Class to upload to
#E.g. "GLACIER"
$S3StorageClass =
Version 1 VariablesBlank.ps1 On GitHub

V1: Evaluation

Version 1 now gives me the benefits of Version 0 with the following additions:

  • My variables and commands have now been separated.
  • I can now call Variables.ps1 from other scripts in the same folder, knowing the variables will be the same each time for each script.
  • I can use .gitignore to make sure Variables.ps1 is never uploaded to my GitHub repo.

The next problem is one of visibility. I have no way to know if my uploads have been successful. Or if they were duplicated. Nor do I have any auditing.

The S3 console gives me a summary at the end of each upload:

It would be great to have something similar with my script! In addition, some error handling and quality control checks would increase my confidence levels.

Let’s get to work!

Version 2: Visibility

Version 2 enhances the visibility of my script. The length of the script grows a lot here, so let’s run through the changes and I’ll explain what’s going on.

As a starting point, I copied V1Security.ps1 and renamed it to V2Visibility.ps1.

V2: Variables.ps1 And .gitignore Changes

Additions are being made to these files as a result of the Version 2 changes. I’ll mention them as they come up, but it makes sense to cover a few things up-front:

  • I added External to all variable names in Variables.ps1 to keep track of them in the script. For example, $S3BucketName is now $ExternalS3BucketName.
  • There are some additional local file paths in Variables.ps1 that I’m using for transcripts and some post-upload checks.
  • .gitignore now includes a log file (more on that shortly) and the Visual Studio Code debugging folder.

V2: Transcripts

The first change is perhaps the simplest. PowerShell has built-in cmdlets for creating transcripts:

  • Start-Transcript creates a record of all or part of a PowerShell session in a separate file.
  • Stop-Transcript stops a transcript that was started by the Start-Transcript cmdlet.

These go at the start and end of V2Visibility.ps1, along with a local file path for the EDMTracksLosslessS3Upload.log file I’m using to record everything.

Start-Transcript -Path $ExternalTranscriptPath -IncludeInvocationHeader

This new path is stored in Variables.ps1. In addition, EDMTracksLosslessS3Upload.log has been added to .gitignore.

V2: Check If There Are Any Files

Now the error handing begins. I want the script to fail gracefully, and I start by checking that there are files in the correct folder. First I count the files using Get-ChildItem and Measure-Object:

$LocalSourceCount = (Get-ChildItem -Path $ExternalLocalSource | Measure-Object).Count

And then stop the script running if no files are found:

If ($LocalSourceCount -lt 1) 
{
Write-Output "No Local Files Found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

There are a couple of cmdlets here that make several appearances in Version 2:

  • Start-Sleep suspends PowerShell activity for the time stated. This gives me time to read the output when I’m running the script using the context menu.
  • Exit causes PowerShell to completely stop everything it’s doing. In this case, there’s no point continuing as there’s nothing in the folder.

If files are found, PowerShell displays the count and carries on:

Else 
{
Write-Output "$LocalSourceCount Local Files Found"          
}

V2: Check If The Files Are Lossless

Next, I want to stop any file uploads that don’t belong in the S3 bucket. The bucket should only contain lossless music – anything else should be rejected.

To arrange this, I first capture the extensions for each file using Get-ChildItem and [System.IO.Path]::GetExtension:

$LocalSourceObjectFileExtensions = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetExtension($_) }

Then I check each extension using a ForEach loop. If an extension isn’t in the list, PowerShell will report this and exit the script:

ForEach ($LocalSourceObjectFileExtension In $LocalSourceObjectFileExtensions) 

{
If ($LocalSourceObjectFileExtension -NotIn ".flac", ".wav", ".aif", ".aiff") 
{
Write-Output "Unacceptable $LocalSourceObjectFileExtension file found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

If the extension is in the list, PowerShell records this and checks the next one:

Else 
{
Write-Output "Acceptable $LocalSourceObjectFileExtension file."
}

So now, if I attempt to upload an unacceptable .log file, the transcript will say:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Unacceptable .log file found.  Exiting.
**********************

Whereas an acceptable .flac file will produce:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Acceptable .flac file.
**********************

And when uploading multiple files:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Acceptable .flac file.
Acceptable .wav file.
Acceptable .flac file.
**********************

V2: Check If The Files Are Already In S3

The next step checks if the files are already in S3. This might not seem like a problem, as S3 usually overwrites an object if it already exists.

Thing is, this bucket is replicated. This means it’s also versioned. As a result, S3 will keep both copies in this scenario. In the world of Glacier this doesn’t cost much, but it will distort the bucket’s S3 Inventory. This could lead to confusion when I check them with Athena. And if I can stop this situation with some automation then I might as well.

I’m going to use the Get-S3Object cmdlet to query my bucket for each file. For this to work, I need two things:

  • -BucketName: This is in Variables.ps1.
  • -Key

-Key is the object’s S3 file path. For example, Folder\SubFolder\Music.flac. As the files shouldn’t be in S3 yet, these keys shouldn’t exist. So I’ll have to make them using PowerShell.

I start by getting all the filenames I want to check using Get-ChildItem and [System.IO.Path]::GetFileName:

$LocalSourceObjectFileNames = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetFileName($_) }

Now I start another ForEach loop. I make an S3 key for each filename by combining it with $ExternalS3KeyPrefix in Variables.ps1:

ForEach ($LocalSourceObjectFileName In $LocalSourceObjectFileNames) 

{
$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

Then I query S3 using Get-S3Object and my constructed S3 key, and capture the result in a variable:

$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

Get-S3Object should return null as the object shouldn’t exist.

If this doesn’t happen then the object is already in the bucket. In this situation, PowerShell identifies the file causing the problem and then exits the script:

If ($null -ne $LocalSourceObjectFileNameS3Check) 
{
Write-Output "File already exists in S3 bucket: $LocalSourceObjectFileName.  Please review.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit

If the file isn’t found then PowerShell continues to run:

Else 
{
Write-Output "$LocalSourceObjectFileName does not currently exist in S3 bucket."
}

Assuming no files are found at this point, the log will read as follows:

Checking if local files already exist in S3 bucket.
Checking S3 bucket for Artist-Track-ExtendedMix.flac
Artist-Track-ExtendedMix.flac does not currently exist in S3 bucket.
Checking S3 bucket for Artist-Track-OriginalMix.flac
Artist-Track-OriginalMix.flac does not currently exist in S3 bucket.

V2: Uploading Files Instead Of Folders

Now to start uploading to S3!

In Version 2 I’ve altered how this is done. Previously my script’s purpose was to upload a folder to S3 using the PowerShell cmdlet Write-S3Object.

Version 2 now uploads individual files instead. There is a reason for this that I’ll go into shortly.

This means I have to change things around as Write-S3Object now needs different parameters:

  • Instead of telling the -Folder parameter where the local folder is, I now need to tell the -File parameter where each file is located.
  • Instead of telling the -KeyPrefix parameter where to store the uploaded objects in S3, I now need to tell the -Key parameter the full S3 path for each object.

I’ll do -Key first. I start by opening another ForEach loop, and create an S3 key for each file in the same way I did earlier:

$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

Next is -File. I make the local file path for each file using variables I’ve already created:

$LocalSourceObjectFilepath = $ExternalLocalSource + "\" + $LocalSourceObjectFileName

Then I begin uploads for each file using Write-S3Object with the new -File and -Key parameters instead of -Folder and -KeyPrefix:

Write-Output "Starting S3 Upload Of $LocalSourceObjectFileName"

Write-S3Object -BucketName $ExternalS3BucketName -File $LocalSourceObjectFilepath -Key $LocalSourceObjectFileNameS3Key -StorageClass $ExternalS3StorageClass

The main benefit of this approach is that, if something goes wrong mid-upload, the transcript will tell me which uploads were successful. Version 1’s script would only tell me that uploads had started, so in the event of failure I’d need to check the S3 bucket’s contents.

Speaking of failure, wouldn’t it be good to check that the uploads worked?

V2: Were The Uploads Successful?

For this, I’m still working in the ForEach loop I started for the uploads. After an upload finishes, PowerShell checks if the object is in S3 using the Get-S3Object command I wrote earlier:

Write-Output "Starting S3 Upload Check Of $LocalSourceObjectFileName"
      
$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

This time I want the object to be found, so null is a bad result.

Next, I get PowerShell to do some heavy lifting for me. I’ve created a pair of new local folders called S3WriteSuccess and S3WriteFail. The paths for these are stored in Variables.ps1.

If my S3 upload check doesn’t find anything and returns null, PowerShell moves the file from the source folder to S3WriteFail using Move-Item:

If ($null -eq $LocalSourceObjectFileNameS3Check) 

{
Write-Output "S3 Upload Check FAIL: $LocalSourceObjectFileName.  Moving to local Fail folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationFail
}

If the object is found, PowerShell moves the file to S3WriteSuccess:

Else 

{
Write-Output "S3 Upload Check Success: $LocalSourceObjectFileName.  Moving to local Success folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationSuccess           
} 

The ForEach loop then repeats with the next file until all are processed.

So now, a failed upload produces the following log:

**********************
Beginning S3 Upload Checks On Following Objects: StephenJKroos-Micrsh-OriginalMix
S3 Upload Check: StephenJKroos-Micrsh-OriginalMix.flac
S3 Upload Check FAIL: StephenJKroos-Micrsh-OriginalMix.  Moving to local Fail folder
**********************
Windows PowerShell transcript end
**********************

While a successful S3 upload produces this one:

**********************
Beginning S3 Upload Checks On Following Objects: StephenJKroos-Micrsh-OriginalMix
S3 Upload Check: StephenJKroos-Micrsh-OriginalMix.flac
S3 Upload Check Success: StephenJKroos-Micrsh-OriginalMix.  Moving to local Success folder
**********************
Windows PowerShell transcript end
**********************

PowerShell then shows a final message before ending the transcript:

Write-Output "All files processed.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript

V2: Code

gitignore Version 2

###################
###### FILES ######
###################

#Powershell Transcript log
EDMTracksLosslessS3Upload.log

#PowerShell Files Containing Variables
EDMTracksLosslessS3Upload-V0Basic.ps1

#PowerShell Files Containing Variables
EDMTracksLosslessS3Upload-Variables.ps1


#####################
###### FOLDERS ######
#####################

#VSCode Debugging
.vscode/
Version 2.gitignore On GitHub

V2Visibility.ps1

##################################
####### EXTERNAL VARIABLES #######
##################################


#Load External Variables Via Dot Sourcing
. .\EDMTracksLosslessS3Upload-Variables.ps1

#Start Transcript
Start-Transcript -Path $ExternalTranscriptPath -IncludeInvocationHeader


###############################
####### LOCAL VARIABLES #######
###############################


#Get count of items in $ExternalLocalSource
#Get list of filenames in $ExternalLocalSource
$LocalSourceCount = (Get-ChildItem -Path $ExternalLocalSource | Measure-Object).Count

#Get list of extensions in $ExternalLocalSource
$LocalSourceObjectFileExtensions = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetExtension($_) }

#Get list of filenames in $ExternalLocalSource
$LocalSourceObjectFileNames = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetFileName($_) }


##########################
####### OPERATIONS #######
##########################


#Check there are files in local folder.
Write-Output "Counting files in local folder."

#If local folder less than 1, output this and stop the script.  
If ($LocalSourceCount -lt 1) 

{
Write-Output "No Local Files Found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

#If files are found, output the count and continue.
Else 

{
Write-Output "$LocalSourceCount Local Files Found"          
}


#Check extensions are valid for each file.
Write-Output " "
Write-Output "Checking extensions are valid for each local file."

ForEach ($LocalSourceObjectFileExtension In $LocalSourceObjectFileExtensions) 

{
#If any extension is unacceptable, output this and stop the script. 
If ($LocalSourceObjectFileExtension -NotIn ".flac", ".wav", ".aif", ".aiff") 

{
Write-Output "Unacceptable $LocalSourceObjectFileExtension file found.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

#If extension is fine, output the extension for each file and continue.
Else 
{
Write-Output "Acceptable $LocalSourceObjectFileExtension file."
}
}


#Check if local files already exist in S3 bucket.
Write-Output " "
Write-Output "Checking if local files already exist in S3 bucket."

#Do following actions for each file in local folder
ForEach ($LocalSourceObjectFileName In $LocalSourceObjectFileNames) 

{
#Create S3 object key using $ExternalS3KeyPrefix and current object's filename
$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

#Create local filepath for each object for the file move
$LocalSourceObjectFilepath = $ExternalLocalSource + "\" + $LocalSourceObjectFileName

#Output that S3 upload check is starting
Write-Output "Checking S3 bucket for $LocalSourceObjectFileName"
      
#Attempt to get S3 object data using $LocalSourceObjectFileNameS3Key
$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

#If local file found in S3, output this and stop the script.
If ($null -ne $LocalSourceObjectFileNameS3Check) 

{
Write-Output "File already exists in S3 bucket: $LocalSourceObjectFileName.  Please review.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
Exit
}

#If local file not found in S3, report this and continue.
Else 
{
Write-Output "$LocalSourceObjectFileName does not currently exist in S3 bucket."
}
}


#Output that S3 uploads are starting - count and file names
Write-Output " "
Write-Output "Starting S3 Upload Of $LocalSourceCount Local Files."
Write-Output "These files are as follows: $LocalSourceObjectFileNames"
Write-Output " "


#Do following actions for each file in local folder
ForEach ($LocalSourceObjectFileName In $LocalSourceObjectFileNames) 

{
#Create S3 object key using $ExternalS3KeyPrefix and current object's filename
$LocalSourceObjectFileNameS3Key = $ExternalS3KeyPrefix + $LocalSourceObjectFileName 

#Create local filepath for each object for the file move
$LocalSourceObjectFilepath = $ExternalLocalSource + "\" + $LocalSourceObjectFileName

#Output that S3 upload is starting
Write-Output "Starting S3 Upload Of $LocalSourceObjectFileName"

#Write object to S3 bucket
Write-S3Object -BucketName $ExternalS3BucketName -File $LocalSourceObjectFilepath -Key $LocalSourceObjectFileNameS3Key -StorageClass $ExternalS3StorageClass

#Output that S3 upload check is starting
Write-Output "Starting S3 Upload Check Of $LocalSourceObjectFileName"
      
#Attempt to get S3 object data using $LocalSourceObjectFileNameS3Key
$LocalSourceObjectFileNameS3Check = Get-S3Object -BucketName $ExternalS3BucketName -Key $LocalSourceObjectFileNameS3Key

#If $LocalSourceObjectFileNameS3Key doesn't exist in S3, move to local Fail folder.
If ($null -eq $LocalSourceObjectFileNameS3Check) 

{
Write-Output "S3 Upload Check FAIL: $LocalSourceObjectFileName.  Moving to local Fail folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationFail
}

#If $LocalSourceObjectFileNameS3Key does exist in S3, move to local Success folder.
Else 
{
Write-Output "S3 Upload Check Success: $LocalSourceObjectFileName.  Moving to local Success folder"
Move-Item -Path $LocalSourceObjectFilepath -Destination $ExternalLocalDestinationSuccess           
}
}


#Stop Transcript
Write-Output " "
Write-Output "All files processed.  Exiting."
Start-Sleep -Seconds 10
Stop-Transcript
V2Visibility.ps1 On GitHub

VariablesBlank.ps1 Version 2

##################################
####### EXTERNAL VARIABLES #######
##################################

#The local file path for the transcript file
#E.g. "C:\Users\Files\"
$ExternalTranscriptPath =

#The local file path for objects to upload to S3
#E.g. "C:\Users\Files\"
$ExternalLocalSource =

#The S3 bucket to upload objects to
#E.g. "my-s3-bucket"
$ExternalS3BucketName =

#The S3 bucket prefix / folder to upload  objects to (if applicable)
#E.g. "Folder\SubFolder\"
$ExternalS3KeyPrefix =

#The S3 Storage Class to upload to
#E.g. "GLACIER"
$ExternalS3StorageClass =

#The local file path for moving successful S3 uploads to
#E.g. "C:\Users\Files\"
$ExternalLocalDestinationSuccess =

#The local file path for moving failed S3 uploads to
#E.g. "C:\Users\Files\"
$ExternalLocalDestinationFail =
Version 2 VariablesBlank.ps1 On GitHub

V2: Evaluation

Overall I’m very happy with how this all turned out! Version 2 took a script that worked with some supervision, and turned it into something I can set and forget.

The various checks now have my back if I select the wrong files or if my connection breaks. And, while the Get-S3Object checks mean that I’m making more S3 API calls, the increase won’t cause any bill spikes.

The following is a typical transcript that my script produces following a successful upload of two .flac files:

**********************
Transcript started, output file is C:\Users\Files\EDMTracksLosslessS3Upload.log
Counting files in local folder.
2 Local Files Found

Checking extensions are valid for each local file.
Acceptable .flac file.
Acceptable .flac file.

Checking if local files already exist in S3 bucket.
Checking S3 bucket for MarkOtten-Tranquility-OriginalMix.flac
MarkOtten-Tranquility-OriginalMix.flac does not currently exist in S3 bucket.
Checking S3 bucket for StephenJKroos-Micrsh-OriginalMix.flac
StephenJKroos-Micrsh-OriginalMix.flac does not currently exist in S3 bucket.

Starting S3 Upload Of 2 Local Files.
These files are as follows: MarkOtten-Tranquility-OriginalMix StephenJKroos-Micrsh-OriginalMix.flac

Starting S3 Upload Of MarkOtten-Tranquility-OriginalMix.flac
Starting S3 Upload Check Of MarkOtten-Tranquility-OriginalMix.flac
S3 Upload Check Success: MarkOtten-Tranquility-OriginalMix.flac.  Moving to local Success folder
Starting S3 Upload Of StephenJKroos-Micrsh-OriginalMix.flac
Starting S3 Upload Check Of StephenJKroos-Micrsh-OriginalMix.flac
S3 Upload Check Success: StephenJKroos-Micrsh-OriginalMix.flac.  Moving to local Success folder

All files processed.  Exiting.
**********************
Windows PowerShell transcript end
End time: 20220617153926
**********************

GitHub ReadMe

To round everything off, I’ve written a ReadMe for the repo. This is written in Markdown using the template at makeareadme.com, and the finished article is available here.

Summary

In this post, I created a script to upload lossless music files from my laptop to one of my Amazon S3 buckets using PowerShell.

I introduced automation to perform checks before and after each upload, and logged the outputs to a transcript. I then produced a repo for the scripts, accompanied by a ReadMe document.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Developing & Application Integration

Next-Level S3 Notifications With EventBridge

In this post I will use AWS managed services to enhance my S3 user experience with custom EventBridge notifications that are low cost, quick to set up and perform well at scale.

Table of Contents

Introduction

I’ve been restoring some S3 Glacier Flexible Retrieval objects lately. I use bulk retrievals to reduce costs – these finish within 5–12 hours. However, on a couple of occasions I’ve totally forgotten about them and almost missed the download deadline!

Having recently set up some alerting, I decided to make a similar setup that will trigger emails at key points in the retrieval process, using the following AWS services:

  • S3 for holding the objects and managing the retrieval process
  • EventBridge for receiving events from S3 and looking for patterns
  • SNS for sending notifications to me

The end result will look like this:

Let’s start with SNS.

SNS: The Notifier

I went into detail about Amazon Simple Notification Service (SNS) in my last post about making some security alerts so feel free to read that if some SNS terms are unfamiliar.

Here I want SNS to send me emails, so I start by making a new standard topic called s3-object-restore. I then create a new subscription with an email endpoint and link it to my new topic.

This completes my SNS setup. Next I need to make some changes to one of my S3 buckets.

S3: The Storage

Amazon S3 stores objects in buckets. The properties of a bucket can be customised to complement its intended purpose. For example, the Default Encryption property forces encryption on buckets containing sensitive objects. The Bucket Versioning property protects objects from accidental changes and deletes.

Here I’m interested in the Event Notifications property. This property sends notifications when certain events occur in the bucket. Examples of S3 events include uploads, deletes and, importantly for this use case, restore requests.

S3 can send events to a number of AWS services including, helpfully, EventBridge! This isn’t on by default but is easily enabled in the bucket’s properties:

My bucket will now send events to EventBridge. But what is EventBridge?

EventBridge: The Go-Between

Full disclosure. At first I wasn’t entirely sure what EventBridge was. The AWS description did little to change that:

I tend to uncomplicate topics by abstracting them. Here I found it helpful to think of EventBridge as a bus:

  • Busses provide high-capacity transport between bus stops. The bus is EventBridge.
  • Passengers use the bus to get to where they need to go. The passengers are events.
  • Bus stops are where passengers join or depart the bus. The bus stops are event sources and targets.

In the same way that a bus picks up passengers at one bus stop and drops them off at another, EventBridge receives events from a source and directs them to a target.

Much has been written about EventBridge’s benefits. Rather than spending the next few paragraphs copy/pasting, I will instead suggest the following for further reading:

In this use case, EventBridge’s main advantage is that it is decoupled from S3. This allows one EventBridge Rule to serve many S3 buckets. S3 can send notifications to SNS without EventBridge, but each bucket needs configuring separately so this quickly causes headaches with multiple buckets.

Currently my S3 bucket is already sending events to EventBridge, so let’s create an EventBridge rule for them.

EventBridge Rule: Setting A Pattern & Choosing A Source

Rules allow EventBridge to route events from a source to a target. After naming my new rule s3-object-restore, I need to choose what kind of rule I want:

  • Event Pattern: the rule will be triggered by an event.
  • Schedule: the rule will be triggered by a schedule.

I select Event Pattern. EventBridge then poses further questions to establish what events to look for:

  • Event Matching Pattern: Do I want to use EventBridge presets or write my own pattern?
  • Service Provider: Are the events coming from an AWS service or a third party?
  • Service Name: What service will be the source of events?

EventBridge will only present options relevant to the previous choices. For example, choosing AWS as Service Provider means that no third party services are available in Service Name.

My choices so far tell EventBrdige that S3 is the event source:

Next up is Event Type. As EventBridge knows the events are coming from S3, the options here are very specific:

I choose Amazon S3 Event Notification.

EventBridge now knows enough to create a rule, and offers the following JSON as an Event Pattern:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Access Tier Changed", "Object ACL Updated", "Object Created", "Object Deleted", "Object Restore Completed", "Object Restore Expired", "Object Restore Initiated", "Object Storage Class Changed", "Object Tags Added", "Object Tags Deleted"]
}

I’m only interested in restores, so I open the Specific Event(s) list and choose the three Object Restore events:

EventBridge then amends the event pattern to:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Restore Completed", "Object Restore Initiated", "Object Restore Expired"]
}

That’s it for the source. Now EventBridge needs to know what to do when it finds something!

EventBridge Rule: Choosing A Target & Configuring Inputs

One of EventBridge’s big selling points is how it interacts with targets. There are already numerous targets, and EventBridge rules can have more than one.

I select SNS Topic as a target then choose my s3-object-restore SNS topic from the list:

This alone is enough for EventBridge to interact with SNS. When I save this EventBridge rule and trigger it by running an S3 object restore, I receive this email:

Although this is technically a success, some factors aren’t ideal:

  • The formatting of the email is hard to read.
  • There’s a lot of information here, most of which is irrelevant.
  • It’s not immediately clear what this email is telling me.

To address this I can use EventBridge’s Configure Input feature to change what is sent to the target. This feature offers four options:

  • Matched Events: EventBridge passes all of the event text to the target. This is the default.
  • Part Of The Matched Event: EventBridge only sends part of the event text to the target.
  • Constant (JSON text): None of the event text is sent to the target. EventBridge sends user-defined JSON instead.
  • Input Transformer: EventBridge assigns lines of event text as variables, then uses those variables in a template.

Let’s look at the input transformer.

The AWS EventBridge user guide goes into detail about the input transformer and includes a good tutorial. Having consulted these resources, I start by getting the desired JSON from the initial email:

{
"detail-type":"Object Restore Initiated",
"source":"aws.s3",
"time":"2022-02-21T12:51:21Z",
"detail":
{
"bucket":{"name":"redacted"},
"object":{"key":"redacted"}
}
}

Then I convert the JSON into an Input Path:

{
"bucket":"$.detail.bucket.name",
"detail-type":"$.detail-type",
"object":"$.detail.object.key",
"source":"$.source",
"time":"$.time"
}

And finally specify an Input Template:

"<source> <detail-type> at <time>. Bucket: <bucket>. Object: <object>"

EventBridge checks input templates before accepting them, and will throw an error if the input template is invalid:

I update my EventBridge rule with the new Input Transformer configuration. Time to test it out!

Testing

When I trigger an S3 object restore I receive this email moments later:

I then receive a second email when the object is ready for download:

"aws.s3 Object Restore Completed at 2022-03-04T00:15:33Z. Bucket: REDACTED. Object: REDACTED"

And a final one when the object expires:

"aws.s3 Object Restore Expired at 2022-03-05T10:12:04Z. Bucket: REDACTED. Object: REDACTED"

Success!

Before moving on, let me share the results of an earlier test. My very first input path (not included here) contained some mistakes. The input template was valid but it couldn’t read the S3 event properly, so I ended up with this:

Something to bear in mind for future rules!

Cost Analysis

Before I wrap up, let’s run through the expected costs with this setup:

  • SNS: the first thousand email notifications SNS every month are included in the AWS Always Free tier, and I’m nowhere near that!
  • S3: There is no change for S3 passing events to EventBridge. Charges for object storage and retrieval are out of scope for this post.
  • EventBridge: All events published by AWS services are free.

There is no expected cost rise for this setup based on my current use.

Summary

In this post I’ve used EventBridge and SNS to produce free bespoke notifications at key points in the S3 object retrieval process. This offers me the following benefits:

  • Reassurance: I can choose the longer S3 retrieval offerings knowing that AWS will keep me updated on progress.
  • Convenience: I will know the status of retrievals without accessing the AWS console or using the CLI.
  • Cost: I am less likely to forget to download retrieved objects before expiry, and therefore less likely to need to retrieve those objects again.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Security & Monitoring

Creating Security Alerts For AWS Console Access

In this post I will use AWS managed services to produce security alerts when attempts are made to access my AWS account’s console.

Table of Contents

Introduction

I am currently studying towards the AWS Certified Developer – Associate certification using Stéphane Maarek’s video course and the Tutorials Dojo practice exams. As part of my studies I want to better understand the various AWS monitoring services, and setting up some security alerts is a great way to get some real-world experience of them.

My current security posture is already in line with AWS best practices. For example:

  • I use a password manager and autogenerate passwords so they’re not reused.
  • MFA is enabled on everything offering it.
  • I have created IAM users and roles for all my AWS requirements and never use my root account.

To strengthen this, I will create some security alerts that will notify me when attempts are made to access my AWS console whether they succeed or fail.

I will be using the following AWS services:

  • IAM for handling authentication requests.
  • CloudTrail for creating log events.
  • CloudWatch for analysing log events and triggering alarms.
  • SNS for sending notifications when alarms are triggered.

The end result will look like this:

Let’s start with IAM.

IAM

AWS Identity and Access Management (IAM) is the AWS tool for managing permissions and access policies. I don’t need to do anything with IAM here, but I include it as IAM is the source of the events that my security alerts will use.

Next I’ll create the setup that AWS will use to send the security alerts.

SNS

Amazon Simple Notification Service (SNS) focuses on delivering notifications from sources to subscribers. SNS offers hundreds of potential combinations, and here I’m using it to send notifications to me in response to certain AWS events.

To do this, SNS uses Topics for what the notifications are and Subscriptions for where to send them.

SNS Topics

SNS Topics can be heavily customised but here I only need a simple setup. First I choose my topic’s type:

Standard is fine here as I don’t need to worry about message ordering or duplication. I also need a name for the topic. I will use the following naming pattern for the topics and for the security alerts themselves:

  • Action – this will always be signin.
  • Signin Method – this will always be console.
  • Outcome – this will be either failure or success.
  • User Type – this will be either iam or root.

Thus my first topic is named signin-console-failure-iam.

I am creating four security alerts and want each to have a separate SNS topic generating different notifications. A short time later I have created all four successfully:

SNS Subscriptions

Now I need some SNS Subscriptions. These tell SNS where to send notifications.

An SNS Subscription needs the following details:

  • Topic ARN – the Amazon Resource Name of the desired SNS Topic.
  • Protocol – there are several choices including email and SMS.
  • Endpoint – this will depend on the choice of protocol. Here I have selected the Email protocol so SNS requests an email address.

Once an SNS Subscription is created it must be confirmed. Here my endpoint is an email address, so SNS sends this email:

When the owner of the email confirms the subscription, a new window opens displaying the Subscription ID:

I then create further subscriptions for each SNS Topic. The SNS Subscription dashboard updates to show the list of endpoints and the confirmation status of each:

Note that signin-console-success-root has two subscriptions – one email and one SMS. This is because I never use my root account and want the heightened awareness of an SMS!

In terms of cost, the first thousand email notifications SNS every month are included in the AWS Always Free tier. Any costs will be from the infrequent SMS notifications.

With the alerts created, let’s start on the events they’ll alert against.

CloudTrail

AWS CloudTrail records user activity and API interactions. These include resource creation, service configuration and, crucially, sign-in activity.

By default, CloudTrail holds 90 days of events that can be viewed, searched and downloaded from the CloudTrail dashboard. A CloudTrail Trail is needed to store events for longer periods or to export them to other AWS services.

CloudTrail Trails

Here my CoudTrail Trail will be delivering events from CloudTrail to CloudWatch for analysis and storing them in an S3 bucket. I use the AWS CloudTrail documentation to create an events-management trail based in eu-west-1:

This trail is included in the AWS Always Free tier as it is my only one and it only records management events. There will be S3 charges for the objects in the bucket but this is usually around 30kb a day, so the cost here is trivial.

CloudTrail Logs

While I’m talking about CloudTrail, let’s look at the log events themselves. The CloudTrail user guide has some examples so let’s take examine one.

This example log event shows that the IAM user Alice used the AWS CLI to create a new user named Bob.

{"Records": [{
    "eventVersion": "1.0",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "userName": "Alice"
    },
    "eventTime": "2014-03-24T21:11:59Z",
    "eventSource": "iam.amazonaws.com",
    "eventName": "CreateUser",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "127.0.0.1",
    "userAgent": "aws-cli/1.3.2 Python/2.7.5 Windows/7",
    "requestParameters": {"userName": "Bob"},
    "responseElements": {"user": {
        "createDate": "Mar 24, 2014 9:11:59 PM",
        "userName": "Bob",
        "arn": "arn:aws:iam::123456789012:user/Bob",
        "path": "/",
        "userId": "EXAMPLEUSERID"
    }}
}]}

Let’s break this down. The first section of the log tells us about Alice:

    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "userName": "Alice"

Her userName is Alice, she is an IAMUser and her accountId is 123456789012. The next part tells us what happened:

    "eventTime": "2014-03-24T21:11:59Z",
    "eventSource": "iam.amazonaws.com",
    "eventName": "CreateUser",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "127.0.0.1",
    "userAgent": "aws-cli/1.3.2 Python/2.7.5 Windows/7"

At 21:11 on 24/03/2014 Alice used the AWS CLI to call iam.amazonaws.com‘s CreateUser action. Alice’s IP address was 127.0.0.1 and she was using us-east-2.

Finally the log shows the parameters Alice supplied:

    "requestParameters": {"userName": "Bob"},
    "responseElements": {"user": {
        "createDate": "Mar 24, 2014 9:11:59 PM",
        "userName": "Bob",
        "arn": "arn:aws:iam::123456789012:user/Bob"

Alice provided a userName of Bob, and AWS responded with a createDate and arn for the new user.

Armed with the knowledge of what an event looks like, let’s start analysing some of them!

CloudWatch

Amazon CloudWatch is a service for monitoring, analysing and observing events. CloudWatch has several features for various situations – I’ll be using the following features for my security alerts:

  • Log Groups – collections of streams of events coming from the same source.
  • Metric Filters – filter expressions that are applied to events to create data points for CloudWatch metrics.
  • Alarms – rules that watch CloudWatch metrics and perform actions based on their values.

Let’s start with a log group.

CloudWatch Log Group

As this is a new Log Group, I can configure it from the CloudTrail console by editing the existing events-management trail.

Nothing too complex here. The Log Group needs a name – aws-cloudtrail-logs-signin-console. It also needs an IAM role. This is essential otherwise CloudTrail can’t send the events to CloudWatch. In these situations the console usually offers to create a new role, which I call CloudTrail-CloudWatchLogs-Signin-Console.

That’s it! Now that CloudWatch is receiving the logs it needs to know what to look for.

CloudWatch Metric Filters

The Metric Filters are going to look at events in the Log Group and use them to create data points for my alarms. There are two steps to creating a metric filter:

  • A pattern must be defined for the filter.
  • A metric must be assigned to the filter.

Defining A Pattern

CloudWatch needs to know the terms and/or patterns to look for in the Log Group. This is done by specifying a filter pattern, for which there is an AWS user guide about filter and pattern syntax.

For the signin-console-failure-iam alert, the filter will be:

{ $.eventSource = "signin.amazonaws.com" && $.eventName = "ConsoleLogin" && $.responseElements.ConsoleLogin = "Failure" && $.userIdentity.type = "IAMUser" }

To break this down:

For limiting events to sign-ins I use $.eventSource = "signin.amazonaws.com". This will be the same for all filters.

I only want to know about AWS console logins so I add $.eventName = "ConsoleLogin". This will also be the same for all filters.

This filter only cares about failed events, so I need to add $.responseElements.ConsoleLogin = "Failure". This will change to "Success" for other filters.

Finally I want to limit this filter to IAM users, and so add $.userIdentity.type = "IAMUser". This will change to "root" for other filters.

To aid understanding, this will be the filter for the signin-console-success-root alert:

{ $.eventSource = "signin.amazonaws.com" && $.eventName = "ConsoleLogin" && $.responseElements.ConsoleLogin = "Success" && $.userIdentity.type = "root" }

Assigning A Metric

CloudWatch now needs to know about the metrics to create. First it needs a name for the new filter, one for the metric itself and one for the namespace that will contain the metric.

Then I need to make some decisions about the metric values, which I set as follows:

Nothing elaborate here – the metric will be one every time a matching event is found and zero otherwise. I originally left the default value blank but this led to some undesirable alarm behaviour, which I will go into later.

Speaking of alarms…

CloudWatch Alarms

At this point CloudWatch understands what I’m looking for but has no way to tell me if it finds anything. The alarms will provide that missing link!

Creating an alarm starts by selecting one of the new metrics and choosing how often CloudWatch will check it. Then I set the conditions the alarm will use:

Here I want the alarm to trigger when the metric is greater than zero.

CloudWatch now needs to know what actions it must perform when an alarm triggers. CloudWatch alarms have three states:

  • OK (Green) – The metric or expression is within the defined threshold
  • In Alarm (Red) – The metric or expression is outside of the defined threshold.
  • Insufficient Data (Grey) – The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.

Here I tell CloudWatch to notify the signin-console-failure-iam SNS Topic when the corresponding metric is outside of the defined threshold of zero and enters the In Alarm state.

This is why I created the SNS resources first. It makes CloudWatch alarm creation a lot smoother.

Three additional alarms later I’m all done! As AWS provide ten custom metrics and ten alarms in the Always Free tier, my new CloudWatch setup will be free.

Insufficient Data (Grey) Alarms

Before moving on, let’s talk about grey alarms for a moment.

While grey alarms generally aren’t a problem in CloudWatch, they don’t look great from a human perspective. While green suggests everything’s fine and red suggests problems, grey is more vague. Everything could be ok. There might be problems. Not ideal.

This is why I set my default metric values to zero in the Metric Filters section. When no default value was set, CloudWatch considered the data to be missing and set the alarm state to Insufficient Data unless it was alerting. While this isn’t a problem, the DBA in me will always prefer green states to grey ones!

During an alarm’s configuration, it is possible to change the treatment of missing data:

I did try this out and it did work as expected – the grey alarm turned green when told to treat missing data as good. But this would mean that any missing data would be treated as good. That setting did not fill me with reassurance for this use case!

Does This All Work?

Let’s find out! I have four CloudWatch alarms, each partnered with a different SNS topic. This should mean I get a different notification for each type of event when the alarms trigger.

Here is my CloudWatch Alarms dashboard in its base state with no alarms triggered.

This is the same dashboard with all alarms triggered.

Failing to sign in as an IAM user triggered this email:

While a successful IAM sign-in triggered this one:

Failing to sign in as the root user triggered this email:

And a successful root sign in triggered this email:

And this SMS:

Cost Analysis

The eagle-eyed will have noticed that some of the dates in these screenshots are from early February. I was going to publish this post around that time too, but I wanted to finish my T-SQL Tuesday post first and then had a busy week.

This means I can demonstrate the actual costs of my security alerts though!

These AWS Billing screenshots are from 2022-02-13, so a week after the earlier screenshots. Be aware that, as IAM is always free, it has no entry on the bill.

First of all, CloudTrail and CloudWatch:

As expected, well within the AWS Always Free tier limits and no charges. Next is SNS:

The cost here is for SMS notifications. I’ve triggered two in testing so that’s averaging 0.04 USD each. This cost is acceptable considering the peace of mind it gives – these are the notifications for the successful root account sign-ins!

Summery

In this post I’ve demonstrated how a number of AWS managed services work together to turn a collection of events into meaningful security alerts that give me peace of mind when I’m signed out of my AWS account. I’ve also analysed the costs of my setup and have used various AWS Always Free tier offerings to minimise the impact on my monthly AWS bill.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~