Categories
Me

Video Thrilled The Dataflow Shark

In this post, I debut both the amazonwebshark YouTube channel and my first demonstration video and shark shorts.

Table of Contents

Introduction

So, this is amazonwebshark’s fiftieth post! It’s also my first post as a consultant! I joined the Steamhaus team this month and am looking forward to what the future brings!

In my last YearCompass review, I committed to building my personal brand. So far 2024 has seen this shark presenting at AWS Summit London and AWS User Group Liverpool, and making video content seems like the logical next step.

Before this, the amazonwebshark YouTube channel held playlists for videos I’ve referenced in previous posts. And as of July 2024 I’ve joined the video content creator ranks with some humble contributions of my very own!

So what’s this post about? Firstly, I’ll examine my motivation for making video content. Then I’ll link my current uploads, and finally I’ll set initial expectations for the channel.

Why?

So why am I doing this? This section explains my motivation for recording videos and what I hope to achieve.

Practice & Improve Speaking

Generally, doing something more often reveals improvements, efficiencies and optimisations. And I want to improve my speaking. So I need to do more of it!

I got some great advice from Laurie Kirk a while ago on this topic. She has a habit of filming herself daily and reviewing the footage for improvements. By her own admission, this has improved her confidence and quality.

Besides this, speaking practice draws parallels with training runs. Becoming an optimal runner involves various types of training. Want to build endurance? Run slower over distance. Want to improve speed? Focus on faster, shorter bursts.

And it’s the same with speaking. Want to practise lightning talks? Make short videos. Want to improve sessions? Film a demo. My work so far made me more confident at AWS UG Liverpool earlier this month, so I hope to see more improvements in the coming months by practising different types of speaking.

Audience Diversification

In addition to improving my current abilities, I want to upskill my ability to communicate with people outside my field of expertise.

It’s well known that technical people love speaking with technical people. Getting into the weeds about operating systems, functions, architectures and paradigms regularly see hours fly by at meetups.

This is, however, inherently limiting to those less knowledgeable in those areas. I was originally going to label this as ‘non-tech’, but on reflection this extends in all directions. For example, I won’t understand an architectural discussion that turns to von Neumann architectures. I then have equal potential to confuse if I start talking about Lakehouse architectures. This can even happen with people sharing a speciality: I had no idea what instantiate meant when I first heard it from another Data Engineer.

Amongst the best sessions and videos I’ve seen are ones where topics are made accessible and inclusive to a diverse audience with ranging skills and experience. Some viewers will have years of experience in the field and are looking for the latest insights. Others will be hearing about the topic for the very first time. Appealing to both ends of the spectrum is the ideal scenario.

This is the skill level I’m aiming towards. Creating sessions and videos that appeal to a diverse range of viewers will make me a more inclusive and effective communicator. And it’s not just about audience diversification…

Content Diversification

Next, producing videos will let me make different kinds of content.

Applying the Diátaxis framework, my blog posts lean more towards Tutorial than the others. This is intentional, as I’ve always preferred practice over theory and like sharing cool stuff that enables people.

That’s not to say I don’t get curious about the other Diátaxis ‘needs’ of How-To Guide, Explanation and Reference. While past exploration of these with text hasn’t worked out, videos offer new opportunities here such as:

  • Trying out new AWS services and features.
  • Running through concepts and architectures.
  • Exploring unfamiliar and less common settings and parameters.

In short, content ideas that lend themselves better to video than text. And speaking of ideas…

Failing Fast

Videos may save a post idea that has promise but isn’t working out.

I am never stuck for blog post ideas. There’s always something to write about – from services and architectures to current events. I probably have more potential post topics than I can ever write.

This isn’t to say that every post I start is completed though. Some ideas begin well but start to unravel. They might meander, lack cohesion or simply become uninteresting. And while I’m getting better at seeing the early signs of this, occasionally some still slip through.

I’m a big believer in avoiding the sunken cost fallacy. So in those situations, I admit defeat and defer to the Cult of Done Manifesto’s fifth principle:

“If you wait more than a week to get an idea done, abandon it.”

The Cult of Done Manifesto – Bre Pettis

There are several interpretations of this principle. NoBoilerplate‘s advice is that:

“Ideas in your brain are like a pipe full of random stuff. Some of it will be good; some not so good. If you’re not feeling it, don’t try to make a bad idea better – try the next idea.”

The Cult of Done: How To Get *Started* – No Boilerplate

I agree. But it’s still disheartening sometimes to delete something that still feels like it has legs – just not for the body you’re trying to stitch them to. More recently, I found Jason Fladlien‘s interpretation which has a different take:

“The longer you go not getting something done the more baggage you create around getting it done. “Abandoning” an idea simply means throwing this version of it in the trash. You can start it fresh later.”

The Cult of Done – The Drive-Contentment Connection

This applies very well to situations where an idea loses traction as a blog post but is still worth pursuing. Instead of deleting everything, post material could be repurposed into video material.

Additionally, I’m likely to have session abstracts that either don’t work out or aren’t accepted. If I consider the idea to be sound, videos are ideal solutions to these situations too.

Chasing Internet Stardom

Yeah ok not really.

Current Uploads

So what have I produced so far on the video and shark short front?

Firstly, I’ve filmed a pair of data-themed YouTube shorts. The first examines one of the functions of an AWS Glue Crawler:

The other considers one of the differences between Parquet files and CSV files:

Future shorts will be uploaded to YouTube, Instagram and TikTok. I’ll see how this goes over the coming months.

I’ve also uploaded an extended demo for my Building And Automating Serverless Auto-Scaling Data Pipelines In AWS session:

The demo I use in this session begins with some existing AWS resources. This keeps me within the session’s time limit, but at the cost of an incomplete picture of what the Step Function workflow is doing. This extended version starts with a blank workflow and shows the Glue and Athena setup behind the scenes.

I’m not holding these up as works of art! They are rough around the edges, and I’m sure I’ll improve over time. In the meantime, a different Cult of Done Manifesto principle applies:

“Pretending you know what you’re doing is almost the same as knowing what you are doing, so just accept that you know what you’re doing even if you don’t and do it.”

The Cult of Done Manifesto – Bre Pettis

Expectations

So what are my expectations for the video and shark shorts?

This is all very much early days. There are no grand plans or ambitions, and I don’t have an upload schedule planned. I already have lots going on personally and professionally and don’t want to burn myself out. Much like this blog, it’s something for me to experiment and upskill with.

That said, I recently bought some streaming gear and a posh microphone in the Prime Day sales. So let’s see where this goes!

Summary

In this post, I debuted both the amazonwebshark YouTube channel and my first demonstration video and shark shorts.

As I said, there’s no grand vision for any of this and I’m totally winging it. It’s a bit of fun and I’m interested to see where it goes. In the meantime, the button below has links for contact, socials, projects and sessions:

SharkLinkButton 1

Thanks for reading ~~^~~

Categories
Data & Analytics

Discovering Data With AWS Glue

In this post, I use the data discovering features of AWS Glue to crawl and catalogue my WordPress API pipeline data.

Table of Contents

Introduction

By the end of my WordPress Bronze Data Orchestration post, I had created an AWS Step Function workflow that invokes two AWS Lambda functions:

stepfunctions graph

The data_wordpressapi_raw function gets data from the WordPress API and stores it as CSV objects in Amazon S3. The data_wordpressapi_bronze function transforms these objects to Parquet and stores them in a separate bucket. If either function fails, AWS SNS publishes an alert.

While this process works fine, the extracted data is not currently utilized. To derive value from this data, I need to consider transforming it. Several options are available, such as:

  • Creating new Lambda functions.
  • Importing the data into a database.
  • Third-party solutions like Databricks.

Here, I’ve chosen to use AWS Glue. As a fully managed ETL service, Glue automates various data processes in a low-code environment. I’ve not written much about it, so it’s time that changed!

Firstly, I’ll examine AWS Glue and some of its concepts. Next, I’ll create some Glue resources that interact with my WordPress S3 objects. Finally, I’ll integrate those resources into my existing Step Function workflow and examine their costs.

Let’s begin with some information about AWS Glue.

AWS Glue Concepts

This section explores AWS Glue and some of its data discovering features.

AWS Glue

From the AWS Glue User Guide:

AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. You can use it for analytics, machine learning, and application development. It also includes additional productivity and data ops tooling for authoring, running jobs, and implementing business workflows.

https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

Glue can be accessed using the AWS Glue console web interface and the AWS Glue Studio graphical interface. It can also be accessed programmatically via the AWS Glue CLI and the AWS Glue SDK.

Benefits of AWS Glue include:

  • Data-specific features like data cataloguing, schema discovery, and automatic ETL code generation.
  • Infrastructure optimised for ETL tasks and data processes.
  • Built-in scheduling capabilities and job execution.
  • Integration with other AWS services like Athena and Redshift.

AWS Glue’s features fall into Discover, Prepare, Integrate and Transform categories. The Glue features used in this post come from the data discovering category.

Glue Data Catalog

An AWS Glue Data Catalog is a managed repository serving as a central hub for storing metadata about data assets. It includes table and job definitions, and other control information for managing an AWS Glue environment. Each AWS account has a dedicated AWS Glue Data Catalog for each region.

The Data Catalog stores information as metadata tables, with each table representing a specific data store and its schema. Glue tables can serve as sources or targets in job definitions. Tables are organized into databases, which are logically grouped collections of related table definitions.

Each table contains column names, data type definitions, partition information, and other metadata about a base dataset. Data Catalog tables can be populated either manually or using Glue Crawlers.

Glue Crawler

A Glue Crawler connects to a data store, analyzes and determines its schema, and then creates metadata tables in the AWS Glue Data Catalog. They can run on-demand, be automated by services like Amazon EventBridge Scheduler and AWS Step Functions, and be started by AWS Glue Triggers.

Crawlers can crawl several data stores including:

  • Amazon S3 buckets via native client.
  • Amazon RDS databases via JDBC.
  • Amazon DocumentDB via MongoDB client.

An activated Glue Crawler performs the following processes on the chosen data store:

  • Firstly, data within the store is classified to determine its format, schema and properties.
  • Secondly, data is grouped into tables or partitions.
  • Finally, the Glue Data Catalog is updated. Glue creates, updates and deletes tables and partitions, and then writes the metadata to the Data Catalog accordingly.

Now let’s create a Glue Crawler!

Creating A Glue Crawler

In this section, I use the AWS Glue console to create and run a Glue Crawler for discovering my WordPress data.

Crawler Properties & Sources

There are four steps to creating a Glue Crawler. Step One involves setting the crawler’s properties. Each crawler needs a name and can have optional descriptions and tags. This crawler’s name is wordpress_bronze.

Step Two sets the crawler’s data sources, which is greatly influenced by whether the data is already mapped in Glue. If it is then the desired Glue Data Catalog tables must be selected. Since my WordPress data isn’t mapped yet, I need to add the data sources instead.

My Step Function workflow puts the data in S3, so I select S3 as my data source and supply the path of my bronze S3 bucket’s wordpress_api folder. The crawler will process all folders and files contained in this S3 path.

Finally, I need to configure the crawler’s behaviour for subsequent runs. I keep the default setting, which re-crawls all folders with each run. Other options include crawling only folders added since the last crawl or using S3 Events to control which folders to crawl.

Classifiers are also set here but are out of scope for this post.

Crawler Security & Targets

Step Three configures security settings. While most of these are optional, the crawler needs an IAM role to interact with other AWS services. This role consists of two IAM policies:

  • An AWSGlueServiceRole AWS managed policy which allows access to related services including EC2, S3, and Cloudwatch Logs.
  • A customer-managed policy with s3:GetObject and s3:PutObject actions allowed on the S3 path given in Step Two.

This role can be chosen from existing roles or created with the crawler.

Step Four begins with setting the crawler’s output. The Crawler creates new tables, requiring the selection of a target database for these tables. This database can be pre-existing or created with the crawler.

An optional table name prefix can also be set, which enables easy table identification. I create a wordpress_api database in the Glue Data Catalog, and set a bronze- prefix for the new tables.

The Crawler’s schedule is also set here. The default is On Demand, which I keep as my Step Function workflow will start this crawler. Besides this, there are choices for Hourly, Daily, Weekly, Monthly or Custom cron expressions.

Advanced options including how the crawler should handle detected schema changes and deleted objects in the data store are also available in Step Four, although I’m not using those here.

And with that, my crawler is ready to try out!

Running The Crawler

My crawler can be tested by accessing it in the Glue console and selecting Run Crawler:

2024 05 23 RunCrawler

The crawler’s properties include run history. Each row corresponds to a crawler execution, recording data including:

  • Start time, end time and duration.
  • Execution status.
  • DPU hours for billing.
  • Changes to tables and partitions.
2024 05 23 GlueCrawlerRuns

AWS stores the logs in an aws-glue/crawlers CloudWatch Log Group, in which each crawler has a dedicated log stream. Logs include messages like the crawler’s configuration settings at execution:

Crawler configured with Configuration 
{
    "Version": 1,
    "CreatePartitionIndex": true
}
 and SchemaChangePolicy 
{
    "UpdateBehavior": "UPDATE_IN_DATABASE",
    "DeleteBehavior": "DEPRECATE_IN_DATABASE"
}

And details of what was changed and where:

Table bronze-statistics_pages in database wordpress_api has been updated with new schema

Checking The Data Catalog

So what impact has this had on the Data Catalog? Accessing it and selecting the wordpress_api database now shows five tables, each matching S3 objects created by the Step Functions workflow:

2024 05 23 GlueDataCatalogTables

Data can be viewed by selecting Table Data on the desired row. This action executes an Athena query, triggering a message about the cost implications:

You will be taken to Athena to preview data, and you will be charged separately for Athena queries.

If accepted, Athena generates and executes a SQL query in a new tab. In this example, the first ten rows have been selected from the wordpress_api database’s bronze-posts table:

SQL
SELECT * 
FROM "AwsDataCatalog"."wordpress_api"."bronze-posts" 
LIMIT 10;

When this query is executed, Athena checks the Glue Data Catalog for the bronze-posts table in the wordpress_api database. The Data Catalog provides the S3 location for the data, which Athena reads and displays successfully:

2024 05 23 AthenaBronzeQueryResults

Now that the crawler works, I’ll integrate it into my Step Function workflow.

Crawler Integration & Costs

In this section, I integrate my Glue Crawler into my existing Step Function workflow and examine its costs.

Architectural Diagrams

Let’s start with some diagrams. This is how the crawler will behave:

While updating the crawler’s wordpress_bronze CloudWatch Log Stream throughout:

  1. The wordpress_bronze Glue Crawler crawls the bronze S3 bucket’s wordpress-api folder.
  2. The crawler updates the Glue Data Catalog’s wordpress-api database.

This is how the Crawler will fit into my existing Step Functions workflow:

wordpress api stepfunction rawbronze

While updating the workflow’s CloudWatch Log Group throughout:

  1. An EventBridge Schedule executes the Step Functions workflow.
  2. Raw Lambda function is invoked.
    • Invocation Fails: Publish SNS message. Workflow ends.
    • Invocation Succeeds: Invoke Bronze Lambda function.
  3. Bronze Lambda function is invoked.
    • Invocation Fails: Publish SNS message. Workflow ends.
    • Invocation Succeeds: Run Glue Crawler.
  4. Glue Crawler runs.
    • Run Fails: Publish SNS message. Workflow ends.
    • Run Succeeds: Update Glue Data Catalog. Workflow ends.

An SNS message is published if the Step Functions workflow fails.

Step Function Integration

Time to build! Let’s begin with the crawler’s requirements:

  • The crawler must only run after both Lambda functions.
  • It must also only run if both functions invoke successfully first.
  • If the crawler fails it must alert via the existing PublishFailure SNS topic.

This requires adding an AWS Glue: StartCrawler action to the workflow after the second AWS Lambda: Invoke action:

2024 05 23 StepFunctionStartCrawler

This action differs from the ones I’ve used so far. The existing actions all use optimized integrations that provide special Step Functions workflow functionality.

Conversely, StartCrawler uses an SDK service integration. These integrations behave like a standard AWS SDK API call, enabling more fine-grained control and flexibility than optimised integrations at the cost of needing more configuration and management.

Here, the Step Functions StartCrawler action calls the Glue API StartCrawler action. After adding it to my workflow, I update the action’s API parameters with the desired crawler’s name:

JSON
{
  "Name": "wordpress_bronze"
}

Next, I update the action’s error handling to catch all errors and pass them to the PublishFailure task. These actions produce these additions to the workflow’s ASL code:

JSON
  "Start Bronze Crawler": {
      "Type": "Task",
      "End": true,
      "Parameters": {
        "Name": "wordpress_bronze"
      },
      "Resource": "arn:aws:states:::aws-sdk:glue:startCrawler",
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "PublishFailure"
        }
      ]
    },

And result in an updated workflow graph:

stepfunctions graph gluecrawler

Additionally, the fully updated Step Functions workflow ASL script can be viewed on my GitHub.

Finally, I need to update the Step Function workflow IAM role’s policy so that it can start the crawler. This involves allowing the glue:StartCrawler action on the crawler’s ARN:

JSON
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBronzeGlueCrawler",
            "Effect": "Allow",
            "Action": [
                "glue:StartCrawler"
            ],
            "Resource": [
                "arn:aws:glue:eu-west-1:[REDACTED]:crawler/wordpress_bronze"
            ]
        }

My Step Functions workflow is now orchestrating the Glue Crawler, which will only run once both Lambda functions are successfully invoked. If either function fails, the SNS topic is published and the crawler does not run. If the crawler fails, the SNS topic is published. Otherwise, if everything runs successfully, the crawler updates the Data Catalog as needed.

So how much does discovering data with AWS Glue cost?

Glue Costs

This is from AWS Glue’s pricing page for crawlers:

There is an hourly rate for AWS Glue crawler runtime to discover data and populate the AWS Glue Data Catalog. You are charged an hourly rate based on the number of Data Processing Units (or DPUs) used to run your crawler. A single DPU provides 4 vCPU and 16 GB of memory. You are billed in increments of 1 second, rounded up to the nearest second, with a 10-minute minimum duration for each crawl.

$0.44 per DPU-Hour, billed per second, with a 10-minute minimum per crawler run

https://aws.amazon.com/glue/pricing/

And for the Data Catalog:

With the AWS Glue Data Catalog, you can store up to a million objects for free. If you store more than a million objects, you will be charged $1.00 per 100,000 objects over a million, per month. An object in the Data Catalog is a table, table version, partition, partition indexes, statistics or database.

The first million access requests to the Data Catalog per month are free. If you exceed a million requests in a month, you will be charged $1.00 per million requests over the first million. Some of the common requests are CreateTable, CreatePartition, GetTable , GetPartitions, and GetColumnStatisticsForTable.

https://aws.amazon.com/glue/pricing/

So how does this relate to my workflow? The below Cost Explorer chart shows my AWS Glue API costs from 01 May to 28 May. Only the CrawlerRun API operation has generated charges, with a daily average of $0.02:

2024 05 28 GlueAPICostMay28

My May 2024 AWS bill shows further details on the requests and storage items. The Glue Data Catalog’s free tier covers my usage:

2024 05 28 GlueCostsMay28

Finally, let’s review the entire pipeline’s costs for April and May. Besides Glue, my only other cost remains S3:

2024 05 28 CostExplorerAprMay

Summary

In this post, I used the data discovering features of AWS Glue to crawl and catalogue my WordPress API pipeline data.

Glue’s native features and integration with other AWS services make it a great fit for my WordPress pipeline’s pending processes. I’ll be using additional Glue features in future posts, and wanted to spotlight the Data Catalog early on as it’ll become increasingly helpful as my use of AWS Glue increases.

If this post has been useful then the button below has links for contact, socials, projects and sessions:

SharkLinkButton 1

Thanks for reading ~~^~~

Categories
Training & Community

Shark’s Summit Session

In this post, I discuss my recent Building And Automating Data Pipelines session presented at 2024’s AWS Summit London.

Table of Contents

Introduction

One of my YearCompass 2024 goals was to build a personal brand and focus on my soft skills and visibility. After participating in 2023’s New Stars Of Data 6 event, I wrote some new session abstracts and considered my next move. Then in February I saw Matheus Guimaraes‘ LinkedIn invite to submit sessions for 2024’s AWS Summit London event:

2024 04 09 LinkedInMatheus

I mulled it over, deciding to submit an abstract using my recent WordPress Data Pipeline project. At the very least, it’d be practice for both writing abstracts and pushing myself to submit them.

And that’s where I expected it to end. Until…

2024 04 09 GmailAccept

Just like that, I was heading to the capital again! And this time as a speaker!

My AWS Summit London (ASL) experience was going to be different from my New Stars Of Data (NSOD) one in several ways:

  • While NSOD was virtual, ASL would be in person.
  • I had four months to prepare for NSOD, and five weeks for ASL.
  • My NSOD session was 60 minutes, while ASL would be 30.

So I dusted off my NSOD notes, put a plan together and got to work!

Preparation

This section examines the preparation of the slides and demo for my summit session.

Slides

Firstly, I brainstormed what the session should include from my recent posts. Next, I storyboarded what the session’s sections would be. These boiled down to:

  • Defining the problem. I wanted to use an existing framework for this, ultimately choosing the 4Vs Of Big Data. These have been around since the early 2000’s, and are equally valid today for EDA and IOT events, API requests, logging metrics and many other modern technologies.
  • Examining the AWS services comprising the data pipeline, and highlighting features of each service that relate to the 4Vs.
  • Demonstrating the AWS services in a real pipeline and showing further use cases.

This yielded a rough schedule for the session:

  • 00:00-05:00 Introduction
  • 05:00-10:00 Problem Definition
  • 10:00-15:00 Solution Architecture
  • 15:00-20:00 Demo
  • 20:00-25:00 Summary
  • 25:00-30:00 Questions

Creating and editing the slide deck was much simpler with this in place. Each slide now needed to conform with and add value to its section. It became easier to remove bloat and streamline the wordier slides.

Several slides then received visual elements. This made them more audience-friendly and gave me landmarks to orient myself within the deck. I used AWS architecture icons on the solution slides and sharks on the problem slides. Lots of sharks.

Here’s the finished deck. I regret nothing.

As I was rounding off the slides, the summit agenda was published with the Community Lounge sessions. It was real now!

2024 04 24 AWSSummitSession

Demo

I love demos and was keen to include one for my summit session.

Originally I wanted a live demo, but this needed a good internet connection. It was pointed out to me that an event with thousands of people might not have the best WiFi reception, leading to slow page loads at best and 404s at worst!

So I recorded a screen demo instead. From a technical standpoint, this protected the demo from platform outages, network failures and zero-day bugs. And from a delivery standpoint, a pre-recorded demo let me focus on communicating my message to the audience instead of potentially losing my place, mistyping words and overrunning the allocated demo time.

The demo used this workflow, executed by an EventBridge Schedule:

stepfunctions graph

The demo’s first versions involved building the workflow and schedule from scratch. This overran the time allocation and felt unfocused. Later versions began with a partly constructed workflow. This built on the slides and improved the demo’s flow. I was far happier with this version, which was ultimately the one I recorded.

I recorded the demo with OBS Studio – a free open-source video recording and live-streaming app. There’s a lot to OBS, and I found this GuideRealmVideos video helpful in setting up my recording environment:

Delivery

This section covers the rehearsal and delivery of my AWS summit session.

Rehearsal

With everything in place, it was time to practise!

I had less time to practise this compared to NSOD, so I used various strategies to maximise my time. I started by practising sections separately while refining my notes. This gave all sections equal attention and highlighted areas needing work.

Next, after my success with it last time, I did several full run-throughs using PowerPoint’s Speaker Coach. This went well and gave me confidence in the content and slide count.

2024 04 27 RehersalReportClip

The slide visuals worked so well that I could practise the opening ten minutes without the slides in front of me! This led to run-throughs while shopping, on public transport and even in the queue for the AWS Summit passes.

Probably got some weird looks for that. I still regret nothing.

Practising the demo was more challenging! While the slides were fine as long as I hit certain checkpoints, the demo’s pace was entirely pre-determined. I knew what the demo would do, but keeping in sync with my past self was tricky to master. I was fine as long as I could see my notes and the demo in real-time.

Finally, on the night before I did some last-minute practice runs with the hotel room’s TV:

PXL 20240423 1801344612

On The Day

My day started with being unable to find the ExCel’s entrance! Great. But still better than a 05:15 Manchester train! I got my event pass and hunted for the Community Lounge, only to find my name in lights!

PXL 20240424 1202226372

The lounge itself was well situated, away from potentially distracting stands and walkways but still feeling like an important part of the summit.

PXL 20240424 0850558352

I spent some time adjusting to the space and battling my brewing anxiety. Then Matheus and Rebekah Kulidzan appeared and gave me some great and much-appreciated advice and encouragement! Next, I went for a wander with my randomised feel-good playlist that threw out some welcome bangers:

I watched Yan’s session, paying attention to his delivery and mannerisms alongside his session’s content. After he finished I powered on, signed in and miked up. The lounge setup was professional but not intimidating, and the AWS staff were helpful and attentive. Finally, at noon I went live!

IMG 3907
Photo by Thembile Ndlovu

The session went great! I had a good audience, kept my momentum and hit my section timings. I had a demo issue when my attempt to duplicate displays failed. Disaster was averted by playing the demo on the main screens only!

Finally, my half-hour ended and I stepped off the stage to applause, questions and an unexpected hug!

Looking Back

So what’s next?

I was happy with the amount of practice I did, and will continue putting time into this in the coming months. I’ve submitted my summit session to other events, and the more rehearsals I complete the higher my overall standard should get.

I also want to find a more reliable way of showing demos without altering Windows display settings. Changing these settings mid-presentation isn’t the robust solution I thought it was, so I want to find a feature or setting that’ll take care of that.

Finally, I plan to act on advice from Laurie Kirk. She suggested speaking about a day’s events on camera and then watching it back the following day. This highlights development areas and will get me used to speaking under observation.

Summary

In this post, I discussed my recent Building And Automating Data Pipelines session presented at 2024’s AWS Summit London.

When writing my post about the 2022 AWS Summit London event, I could never have known I’d find myself on the lineup a few years later! Tech communities do great jobs of driving people forward, and while this is usually seen through a technical lens the same is true for personal skills.

The AWS Community took this apprehensive, socially anxious shark and gave him time, a platform and an audience. These were fantastic gifts that I’m hugely grateful for and will always remember.

PXL 20240424 1606465872

If this post has been useful then the button below has links for contact, socials, projects and sessions:

SharkLinkButton 1

Thanks for reading ~~^~~