Categories
Developing & Application Integration

Stress-Free AWS Invoice PDF Administration With Power Automate

In this post, I create a stress-free AWS invoice PDF administration workflow with Microsoft Power Automate.

Table of Contents

Introduction

Each month, AWS sends me VAT invoices for all of my AWS accounts. Their filenames aren’t descriptive, with names like EUINGB21-2079861, EUINGB22-3276756 and EUINGB23-2216483. As a result, I rename them to make the contents clearer.

Now while I have the best intentions in the world, life happens and the PDFs usually end up in a to-do folder. As a result, I now have a folder like this:

2023 05 08 LakeFolder

I don’t want to sort through that. No one wants to sort through that. There must be another way! After successfully automating my application management with Winget, I’m keen to see what options I have here.

Firstly, let’s examine what currently happens with my new AWS invoice PDFs.

Manual Solution

This section examines my manual AWS invoice PDF workflow.

Here are several sample AWS invoices for reference. Although they’re from 2018, the 2023 bills are mostly the same. Let’s focus on a document header:

2023 05 08 AWSSampleInvoice

I copy the account number and VAT invoice date from a bill and combine them with some extra strings to produce a filename schema of:

AWS-{VAT Invoice Date}-{Account Number}-VATInvoice

In this example, that would produce:

AWS-2018-08-03-292122068034-VATInvoice

There are several pain points here:

  • The PDF must be opened to view the data.
  • The invoice date needs converting to ISO 8601 standard.
  • An open file can’t be renamed, so the new filename must be created in Notepad and then copy/pasted over.
  • Everything is manual and needs my full attention.

Wouldn’t it be nice if all this could be automated?

Microsoft Power Automate

This section examines Microsoft Power Automate and how it works.

What Is Power Automate?

Microsoft Power Automate is an automation tool. It uses triggers and actions to create workflows that complete manual, repetitive, and time-consuming tasks without human involvement. Power Automate is part of the Microsoft Power Platform.

Benefits of Power Automate include:

  • A no-code GUI with drag-and-drop functionality.
  • An extensive library of pre-defined templates.
  • Onboard testing and deployment features.
  • A free plan for work or school accounts.
  • Integration with the other Power Platform applications, Office 365 and other Microsoft and third-party services.

Power Automate Versions

Power Automate has two versions:

  • Power Automate Desktop focuses on desktop flows like file & folder manipulation, user input and script triggering. I’ll be using Power Automate Desktop in this post.

It’s worth pointing out that there are several visual differences between the two versions. This confused me when researching this post, as most screenshots I found were from the cloud service!

Power Automate Flows

Central to Power Automate is the concept of Flows. A Power Automate flow is a sequence of actions that are triggered by an event, performing a series of operations to achieve the desired results.

Flows are created through a visual interface that defines triggers, actions, and conditions using a drag-and-drop approach. Code can be written, but is optional. Completed flows can then be saved, published and run as needed.

Creating A Power Automate Flow

In this section, I create my AWS invoice Power Automate flow.

Getting All Files In The Folder

Firstly, I capture the AWS invoice PDFs using the Get Files In Folder action:

2023 05 11 PAGetFilesInFolder

Here, I specify the folder containing the invoices. I tell Power Automate to only get PDF files, to stop any other file types causing errors.

Power Automate stores a list of these files in a %Files% variable. I want to work on each file separately, so I pass the %Files% variable to a For Each action:

2023 05 11 PAForEach

This creates a loop that lets Power Automate capture each file in the %Files% variable. Each time the For Each action triggers, Power Automate stores a file from %Files% in a new %CurrentItem% variable.

Each subsequent action within the loop will then be applied to the %CurrentItem% file. At the end of the loop, Power Automate captures the next file from %Files% and overwrites the %CurrentItem% variable with it. This continues until all %Files% files are processed and the loop ends.

Getting The PDF Text

Secondly, I need to get the text from each %CurrentItem% PDF. There’s a great Extract Text From PDF action that can do this:

2023 05 11 PAExtractTextPDF

The PDF text is stored in an %ExtractedPDFText% variable that is going to be very useful for the rest of this flow!

So far, the flow looks like this:

2023 05 11 PAFlowFolderAndFiles

With the PDF text captured, the rest of the automation can begin!

Getting The Account Number

Next, let’s examine the PDF text more closely. This is part of %ExtractedPDFText% for an AWS invoice:

VAT Invoice
Email or talk to us about your AWS account or bill, visit console.aws.amazon.com/support
More information regarding your service charges is available by accessing your Billing Management Console
Account number:
[REDACTED]
Address:
[REDACTED]
Invoice Summary
VAT Invoice Number: EUINGB22-3617620
VAT Invoice Date: September 2, 2022
TOTAL AMOUNT GBP 0.34

On AWS invoices, the account number is always between ‘Account number:’ and ‘Address:’. Knowing this, I can use Crop Text to get the text between these strings:

2023 05 11 PACropTextAWSAccNo

This action produces two new variables:

  • %CroppedText%
  • %IsFlagFound%

Usually, these names would be fine. This isn’t the only text I’m going to be cropping though, so I change these to %AWSAccNoCroppedText% and %AWSAccNoIsFlagFound% respectively.

Before continuing, I’d like to check that %AWSAccNoCroppedText% contains the expected value. A good way to test this is to use the Display Message action:

2023 05 11 PAFlowCropAcctNo

(The account number is obfuscated here, but it was correct!)

Getting The Invoice Date

Now to get the invoice date! This is similar to getting the account number, using Crop Text to capture the date between ‘VAT Invoice Date:’ and ‘TOTAL AMOUNT’:

2023 05 11 PAAWSBillCropText

This produces variables %CroppedText% and %IsFlagFound% again, which I then change to %AWSBillDateCroppedText% and %AWSBillDateIsFlagFound%.

This isn’t enough this time though. Currently, the %AWSBillDateCroppedText% value is a date formatted as August 3, 2018. This won’t produce the filename I want, as I need the date to be in the ISO format of 2018-08-03.

To do this, I use the Convert Text To Datetime action. This converts a text representation of a date value to a datetime value:

2023 05 11 PAAWSBillTextoDatetime

I supply %AWSBillDateCroppedText% to convert. A datetime value of 03/08/2018 00:00:00 is returned, which I then assign to a %AWSBillDateTextAsDateTime% variable.

This is still unacceptable though! The date format doesn’t match the requirement and the slashes and colons are not legal filename characters.

To get the required date format, I use the Convert Datetime to Text action to convert 03/08/2018 00:00:00 to the custom format yyyy-MM-dd, in which:

  • yyyy is the year as a four-digit number.
  • MM is the month as a two-digit number, from 01 through 12.
  • dd is the day of the month as a two-digit number, from 01 through 31.
  • - is…a hyphen.
2023 05 11 AWSBillDateDatetimeToText

There is a table describing the custom date and time format specifiers in the Microsoft documentation.

This action produces a %AWSBillDateFormattedDateTime% variable with a value of 2018-08-03. Success!

Creating The New Filename

Now I can start thinking about the filename! To recap, I currently have:

  • A %AWSAccNoCroppedText% variable with a value of (for example) 1234123412341234.
  • A %AWSBillDateFormattedDateTime% variable with a value of 2018-08-03.

To create the filename, I need to combine my variables with some additional strings. For this, I use the Set Variable action to create a %AWSFilename% variable with the following value:

AWS-%AWSBillDateFormattedDateTime%-%AWSAccNoCroppedText%-VATInvoice

This breaks down to:

  • AWS- string
  • The %AWSBillDateFormattedDateTime% variable
  • A hyphen: -
  • The %AWSAccNoCroppedText% variable
  • -VATInvoice string

This should produce a string like AWS-2018-08-03-123456781234-VATInvoice. However, this was %AWSFilename%‘s initial output:

AWS-2018-08-03-
123456781234
-VATInvoice

This is unacceptable as a filename, but why is it happening?

It turns out that the Crop Text action capturing the account number is also capturing the line breaks on either side of it. I remove these with the Trim Text action:

2023 05 13 PATrimText

This creates a new %AWSAccNoTrimmedAndCroppedText% variable. Replacing %AWSAccNoCroppedText% with this new variable when setting the %AWSFilename% variable produces the required output:

AWS-2018-08-03-123456781234-VATInvoice

The %AWSFilename% variable can then be used by the Rename File(s) action to rename the file assigned to %CurrentItem% by the earlier For Each action:

2023 05 14 PARenameFiles

The updated file is assigned to a %AWSRenamedFiles% variable, which I use in the final part of the loop.

Moving The File

I usually move renamed invoices to a different path manually. I might as well add this process to my Power Automate flow since it will save me even more time!

Here, I move the file assigned to the %AWSRenamedFiles% variable to my destination folder using the Move File(s) action:

2023 05 14 PAMoveFIles

This is the last action in the loop. Or at least it was, until I reviewed the flow and identified a few improvements.

Enhancements

Originally, this AWS invoice Power Automate flow was just for AWS invoices. However, with some additional Power Automate actions the flow could easily extend to other kinds of invoices. This would allow one flow to administrate multiple invoice types instead of each type needing its own flow.

To begin, I limit the scope of the flow’s actions by adding an If action inside the For Each loop. It will now only apply the AWS invoice flow actions if %ExtractedPDFText% contains the string ‘AWS account or bill’:

2023 05 11 PAIf

Non-AWS invoices will not meet this condition, and will not be mislabeled by the flow or cause the flow to fail.

So what will happen to non-AWS invoices then? Currently, they’ll just be left in the source folder. Ideally, something should happen to them too, so let’s add a dead letter queue to the flow.

I add two new actions to the very end of the flow, positioned after the For Each loop has processed each file in the source folder:

2023 05 14 PAFlowDLQ

Both actions have already been used in this flow, but this time their focus has changed. By the time Power Automate reaches these actions, all files in the source folder will have gone through the For Each loop.

At this point, one of two sequences has happened:

  • The file did meet the AWS invoice condition, is renamed and then moved to the destination folder.
  • The file didn’t meet the AWS invoice condition and is still in the source folder.

So the only files now in the source folder are not AWS invoices.

The Get File In Folder action will get these files and store them in a %DLQFiles% variable. This is then passed to the Move File(s) action, which moves the unprocessed files to a PowerAutomateDLQ folder.

The end result is that:

  • The source folder is empty.
  • AWS invoices are renamed and then moved to the destination folder.
  • Any other files are moved to the PowerAutomateDLQ folder.

So, does it all work?

Testing

This section tests my AWS invoice Power Automate flow.

I put three files into the source folder and then ran the Power Automate flow:

  • A DALL-E PNG.
  • An AWS invoice PDF.
  • A utility bill PDF.
2023 05 14 TestFiles

The AWS invoice is renamed and then moved to the destination folder:

2023 05 13 TestAfter

The remaining files are unaltered and moved to the DLQ folder.

The flow also works far faster than my manual workflow. The old process – from opening the PDF to moving the renamed folder – took about 2 minutes each on average. Conversely, the flow processed 12 files in 2 seconds – this would have taken me 24 minutes!

Summary

In this post, I created a stress-free AWS invoice PDF administration workflow with Microsoft Power Automate.

I’m very impressed by Power Automate! It can certainly save me a ton of time with tasks I currently do, and with other actions like the ability to run Powershell and Python scripts, execute SQL statements and emulate the terminal, I feel like I haven’t even scratched the surface yet!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Developing & Application Integration

Production Code Qualities

In this post, I respond to November 2022’s T-SQL Tuesday #156 Invitation and give my thoughts on some production code qualities.

tsql tuesday

Table of Contents

Introduction

This month, Tomáš Zíka’s T-SQL Tuesday invitation was as follows:

Which quality makes code production grade?

Please be as specific as possible with your examples and include your reasoning.

Good question!

In each section, I’ll use a different language. Firstly I’ll create a script, and then show a problem the script could encounter in production. Finally, I’ll show how a different approach can prevent that problem from occurring.

I’m limiting myself to three production code qualities to keep the post at a reasonable length, and so I can show some good examples.

Precision

In this section, I use T-SQL to show how precise code in production can save a data pipeline from unintended failure.

Setting The Scene

Consider the following SQL table:

USE [amazonwebshark]
GO

CREATE TABLE [2022].[sharkspecies](
	[shark_id] [int] IDENTITY(1,1) NOT NULL,
	[name_english] [varchar](100) NOT NULL,
	[name_scientific] [varchar](100) NOT NULL,
	[length_max_cm] [int] NULL,
	[url_source] [varchar](1000) NULL
)
GO

This table contains a list of sharks, courtesy of the Shark Foundation.

Now, let’s say that I have a data pipeline that uses data in amazonwebshark.2022.sharkspecies for transformations further down the pipeline.

No problem – I create a #tempsharks temp table and insert everything from amazonwebshark.2022.sharkspecies using SELECT *:

When this script runs in production, I get two tables with the same data:

2022 11 02 SQLResults1

What’s The Problem?

One day a new last_evaluated column is needed in the amazonwebshark.2022.sharkspecies table. I add the new column and backfill it with 2019:

ALTER TABLE [2022].sharkspecies
ADD last_evaluated INT DEFAULT 2019 WITH VALUES
GO

However, my script now fails when trying to insert data into #tempsharks:

2022 11 02 SQLResults2Sharp
(1 row affected)

(4 rows affected)

Msg 213, Level 16, State 1, Line 17
Column name or number of supplied values does not match table definition.

Completion time: 2022-11-02T18:00:43.5997476+00:00

#tempsharks has five columns but amazonwebshark.2022.sharkspecies now has six. My script is now trying to insert all six sharkspecies columns into the temp table, causing the msg 213 error.

Doing Things Differently

The solution here is to replace row 21’s SELECT * with the precise columns to insert from amazonwebshark.2022.sharkspecies:

While amazonwebshark.2022.sharkspecies now has six columns, my script is only inserting five of them into the temp table:

2022 11 02 SQLResults3Sharp

I can add the last_evaluated column into #tempsharks in future, but its absence in the temp table isn’t causing any immediate problems.

Works The Same In Other Environments

In this section, I use Python to show the value of production code that works the same in non-production.

Setting The Scene

Here I have a Python script that reads data from an Amazon S3 bucket using a boto3 session. I pass my AWS_ACCESSKEY and AWS_SECRET credentials in from a secrets manager, and create an s3bucket variable for the S3 bucket path:

When I deploy this script to my dev environment it works fine.

What’s The Problem?

When I deploy this script to production, s3bucket will still be s3://dev-bucket. The potential impact of this depends on the AWS environment setup:

Different AWS account for each environment:

  • dev-bucket doesn’t exist in Production. The script fails.

Same AWS account for all environments:

  • Production IAM roles might not have any permissions for dev-bucket. The script fails.
  • Production processes might start using a dev resource. The script succeeds but now data has unintentionally crossed environment boundaries.

Doing Things Differently

A solution here is to dynamically set the s3bucket variable based on the ID of the AWS account the script is running in.

I can get the AccountID using AWS STS. I’m already using boto3, so can use it to initiate an STS client with my AWS credentials.

STS then has a GetCallerIdentity action that returns the AWS AccountID linked to the AWS credentials. I capture this AccountID in an account_id variable, then use that to set s3bucket‘s value:

More details about get_caller_identity can be found in the AWS Boto3 documentation.

For bonus points, I can terminate the script if the AWS AccountID isn’t defined. This prevents undesirable states if the script is run in an unexpected account.

Speaking of which…

Prevents Undesirable States

In this section, I use PowerShell to demonstrate how to stop production code from doing unintended things.

Setting The Scene

In June I started writing a PowerShell script to upload lossless music files from my laptop to one of my S3 buckets.

I worked on it in stages. This made it easier to script and test the features I wanted. By the end of Version 1, I had a script that dot-sourced its variables and wrote everything in my local folder $ExternalLocalSource to my S3 bucket $ExternalS3BucketName:

#Load Variables Via Dot Sourcing
. .\EDMTracksLosslessS3Upload-Variables.ps1


#Upload File To S3
Write-S3Object -BucketName $ExternalS3BucketName -Folder $ExternalLocalSource -KeyPrefix $ExternalS3KeyPrefix -StorageClass $ExternalS3StorageClass

What’s The Problem?

NOTE: There were several problems with Version 1, all of which were fixed in Version 2. In the interests of simplicity, I’ll focus on a single one here.

In this script, Write-S3Object will upload everything in the local folder $ExternalLocalSource to the S3 bucket $ExternalS3BucketName.

Problem is, the $ExternalS3BucketName S3 bucket isn’t for everything! It should only contain lossless music files!

At best, Write-S3Object will upload everything in the local folder to S3 whether it’s music or not.

At worst, if the script is pointing at a different folder it will start uploading everything there instead! PowerShell commonly defaults to C:\Windows, so this could cause all kinds of problems.

Doing Things Differently

I decided to limit the extensions that the PowerShell script could upload.

Firstly, the script captures the extensions for each file in the local folder $ExternalLocalSource using Get-ChildItem and [System.IO.Path]::GetExtension:

$LocalSourceObjectFileExtensions = Get-ChildItem -Path $ExternalLocalSource | ForEach-Object -Process { [System.IO.Path]::GetExtension($_) }

Then it checks each extension using a ForEach loop. If an extension isn’t in the list, PowerShell reports this and terminates the script:

ForEach ($LocalSourceObjectFileExtension In $LocalSourceObjectFileExtensions) 

{
If ($LocalSourceObjectFileExtension -NotIn ".flac", ".wav", ".aif", ".aiff") 
{
Write-Output "Unacceptable $LocalSourceObjectFileExtension file found.  Exiting."
Start-Sleep -Seconds 10
Exit
}

So now, if I attempt to upload an unacceptable .log file, PowerShell raises an exception and terminates the script:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Unacceptable .log file found.  Exiting.
**********************

While an acceptable .flac file will produce this message:

**********************
Transcript started, output file is C:\Files\EDMTracksLosslessS3Upload.log

Checking extensions are valid for each local file.
Acceptable .flac file.
**********************

To see the code in full, as well as the other problems I solved, please check out my post from June.

Summary

In this post, I responded to November 2022’s T-SQL Tuesday #156 Invitation and gave my thoughts on some production code qualities. I gave examples of each quality and showed how they could save time and prevent unintended problems in a production environment.

Thanks to Tomáš for this month’s topic! My previous T-SQL Tuesday posts are here.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Developing & Application Integration

Open Source Documentation With Read The Docs

In this post, I try out the open source documentation tool Read The Docs to see how it can help me out with future projects.

Table of Contents

Introduction

Read the Docs is an open source tool for creating documentation. It uses the Sphinx documentation generator and is free for public repos. It offers the following features:

  • Free hosting for all documentation.
  • Documentation available in online and offline formats.
  • Automatic builds in response to Git commits.
  • Document versioning in response to Git branches and version control settings.

Having recently uploaded my first GitHub repo, I’m keen to see how Read The Docs can help me with my documentation. Read The Docs has a tutorial on their site that I will be using here.

Tutorial Time

I begin by accessing the Read The Docs tutorial GitHub template and use this to create a repo on my GitHub account. I then sign up for a Read The Docs account and authorise it to interact with my GitHub account:

This allows Read The Docs to view the public repos in my GitHub account. They are then displayed in my Read The Docs console:

I select my ReadTheDocs-Tutorial repo and Read The Docs immediately starts building the documentation for it. Builds usually take around 30 to 40 seconds and Read The Docs gives updates throughout the process:

The end result is a site like the one below:

So far everything has been going well. What will happen when I try it out with the GitHub repository I made last time?

Going Solo

I start by importing my EDMTracksLosslessS3Upload-PowerShell GitHub repository into Read The Docs:

As before, the build takes around 30 seconds and gives me a link to my documentation. This time the site shows an autogenerated template instead:

This is because there is an important difference between the repos. The ReadMe in my repo is an .md (Markdown) file, whereas the Read The Docs tutorial documentation uses.rst (reStructuredText) files.

As .rst is the default plaintext markup language used by Sphinx, neither it nor Read The Docs can do much with .md files out of the box.

I’m currently getting to know Read The Docs and .rst, so I’ll use my tutorial repo for the remainder of this post and let my experiences guide my next steps.

Discovering .rst

Now that I’m more clued up on how Read The Docs works behind the scenes, let’s examine what .rst files look like and how they can be changed.

Included within the Read The Docs tutorials repo is a docs folder, which contains a source folder with four files:

  • api.rst
  • conf.py
  • index.rst
  • usage.rst

These files mirror the site generated by Read The Docs. For example, index.rst:

Welcome to Lumache's documentation!
===================================

**Lumache** (/lu'make/) is a Python library for cooks and food lovers
that creates recipes mixing random ingredients.
It pulls data from the `Open Food Facts database <https://world.openfoodfacts.org/>`_
and offers a *simple* and *intuitive* API.

Check out the :doc:`usage` section for further information, including
how to :ref:`installation` the project.

.. note::

   This project is under active development.
   
   Lumache has its documentation hosted on Read the Docs.

Contents
--------

.. toctree::

   usage
   api

Mirrors the page at readthedocs.io/en/latest/index.html:

Let’s make some changes. I update index.rst to include new code on lines 18, 20 and 29:

Welcome to Lumache's documentation!
===================================

**Lumache** (/lu'make/) is a Python library for cooks and food lovers
that creates recipes mixing random ingredients.
It pulls data from the `Open Food Facts database <https://world.openfoodfacts.org/>`_
and offers a *simple* and *intuitive* API.

Check out the :doc:`usage` section for further information, including
how to :ref:`installation` the project.

.. note::

   This project is under active development.
   
   Lumache has its documentation hosted on Read the Docs.
   
.. note::

   This page also now holds test content for `EDMTracksLosslessS3Upload-PowerShell <https://github.com/MrDamienJones/EDMTracksLosslessS3Upload-PowerShell>`_.

Contents
--------

.. toctree::

   usage
   api
   instructions

I then create and commit a new instructions.rst file with text from my EDMTracksLosslessS3Upload-PowerShell ReadMe:

Instructions
===== 

.. _instructions:


Installation
------------

EDMTracksLosslessS3Upload is a PowerShell script for uploading local lossless music files to Amazon S3. The script includes:

- Recording outputs using the ``Start-Transcript`` cmdlet.
- Checking there are files in the local folder.

**(Some text removed to avoid unnecessary scrolling)**

Please use the most recent version. Previous versions are included for completeness.

.. _usage:

Usage
------------

When everything is in place, run the PowerShell script. PowerShell will then move through the script, producing outputs as work is completed. A typical example of a successful transcript is as follows:

.. code-block:: console

              **********************
              Transcript started, output file is C:\Users\Files\EDMTracksLosslessS3Upload.log

**(Some text removed to avoid unnecessary scrolling)**

              All files processed.  Exiting.
              **********************
              Windows PowerShell transcript end
              End time: 20220617153926
              **********************
instructions.rst on GitHub

The GitHub commit triggers a new Read The Docs build:

The new build updates the Index page with a new note and additional links in the Contents menu:

A new Instructions page is also created:

Thoughts

On paper, the reStructureText format is compelling. It avoids having a single ReadMe file that can easily get large and unwelcoming. The documentation produced by .rst is comparable to a wiki and GitHub supports it in preview and display modes.

That said, Markdown has embedded itself in more places and has found more buy-in as a result. Applications like Trello, Azure DevOps and, crucially, Visual Studio Code support it out of the box. This gives more opportunities to practise and use Markdown, essentially making it the de facto winner of this unofficial format war.

Although, while Markdown is designed for writing for the web, .rst is specifically designed for writing technical documentation. Support is out there – Sphinx has an .rst tutorial and some .rst previewers exist. The versatility of .rst and its ability to auto-generate documentation and navigation is also of interest.

I’m likely to give it a go when I have some beefier documentation to write and see how it works out. There are still parts of the tutorial I haven’t touched on, and the documentation is, perhaps unsurprisingly, very good. So it looks like Read The Docs would be a good tool to use for the right project.

Summary

In this post, I tried out the open source documentation tool Read The Docs. I made some sample documentation and experienced the reStructureText format for the first time. Then I committed some changes to work with the .rst format and get a feel for how it works.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~