Categories
Training & Community

New Stars Of Data 6 Final Preparations

In this post, I talk about my final preparations for the upcoming New Stars Of Data 6 event in October 2023.

Table of Contents

Introduction

In July, I shared the news that I’m speaking at the next New Stars Of Data event in October:

2023 08 11 NewStarsOfDataSchedule

Last month I talked about how the slides were coming along, and about getting my presentation setup ready. So what have I been up to since?

Presentation Slides

In this section, I talk about the presentation slides for my session.

Content

The presentation has been mostly finished since the start of October. My mentor Olivier Van Steenlandt suggested that I commit to this deadline early in the process, and now I see why – it makes practising far easier! I’m still making minor tweaks based on delivery observations and feedback, but the slides now have the required content in the desired order.

I’ve also included some Unsplash images and personal photos in the deck. These images simplify and enhance the message of the slides they’re on, and inject some variety into the session.

Style

After some thought, I decided to add a slide theme to the deck. It was fine without one, but I felt the right theme would add some extra polish. So I duplicated the presentation and experimented with PowerPoint’s default themes.

I eventually decided on the Facet theme with the Office colour palette. It was easily the best fit of the default themes, with good colour and white space balance. I could have reviewed others online or made my own, but as the theme is basically an optional extra I didn’t want to put more time into the decision than was necessary.

So my presentation has gone from this:

2023 10 18 OpenSlideBasic

To this:

2023 10 18 OpenSlideTheme

I’m really happy with how it turned out!

Demo Material

In this section, I’ll talk about my session’s demos.

Demos form a big part of my session. I have two: a Data Wrangler demo showing several data transformations, and a Power BI demo showing visuals and insights generated from the wrangled data.

Data Wrangler Demo

The Data Wrangler is a great tool, and I want my demo to show it both in the best possible light, and in the context of an actual use case. So I spent time with the wrangler’s documentation and sample content to find the best transformations for my session.

Next, I considered how I’d transform the data in my day job and what I wanted to report against in Power BI. This quickly established an order of operations, governed by complexity (some transformations are simpler than others) and dependence (some transformations rely on others).

Finally, I drew up a rough end-to-end process and began to practise. The limited selection of transformations keeps the demo focused and streamlined, and knowing the order of operations helps my fluency and delivery.

Power BI Demo

For the Power BI demo, I considered what insights a race director would be most interested in that they couldn’t get from the CSV data alone. This led me towards visuals that would analyse the entire Sizzler series, like the Key Influencers and Distribution Tree visuals.

Next, I built some visuals and reviewed them in terms of how helpful they were, and how complex they were to explain. Complex visuals run the risk of alienating some viewers, which I would prefer to avoid!

Having selected my visuals, I tuned their filters and data fields to add value. For example, knowing that a 70-year-old male is faster than an under-18-year-old male isn’t valuable, but comparing the fastest speeds of all 70-year-old males across the series is!

Finally, I looked for links between the visuals and wrote practise notes based on them. For example, discussing the fastest times for each race leads into the key influencers for decreased chip times. In practice, this helps me tell a story with the data and provides a clear narrative for the demo.

Demo Resilience

There are always risks with live demos. Maybe an update will change the way a process runs. Perhaps a breaking change will stop a feature from working entirely. The program might not even load at all! So what’s the best way of managing those risks when presenting to a live audience?

I have a few aces up my sleeve just in case. Some suggested by Olivier; others off my own back:

  • Each Sizzler event has a pre-wrangled CSV. This protects against wrangler bugs and avoids repetition in the session.
  • I recorded a silent demo of the wrangler in case the entire extension won’t load.
  • All related files are stored on OneDrive in case my laptop dies.
  • A laptop change freeze will be applied on the week of the event. Updates for Windows, Ubuntu, VS Code and Power BI will not be applied.

Session Practise

In this section, I talk about perhaps the most important part of my final preparations for New Stars Of Data – practising my session!

Pre Flight

To get my brain used to the tension of waiting for 15:45 on October 27, I use my phone’s timer to count me into a rehearsal. Olivier made a point of practising this with me, and it definitely helps to create the environment I’ll be in on the day!

I’m also the proud owner of a cheap ring light, which has indirectly become my clapperboard. When my face is lit, the camera is rolling! While my Logitech Streamcam has a live light, it’s a bit tiny and it can’t compete with my Hollywood lights and director:

Ringlight500

Rehearsal

I’m using PowerPoint’s Speaker Coach. It monitors aspects like filler words and slide repetition, and measures pacing and cadence. Speaker Coach offers feedback both during and after a practice session, generating a report with insights and recommendations:

2023 10 18 PresenterCoachSummary

Microsoft has documented Speaker Coach’s suggestions and the research used to determine them, such as:

Based on field study and past academic research, Speaker Coach recommends that presenters speak at a rate of 100 to 165 words per minute; this is the rate at which most audiences we’ve tested find it easiest to process the information they hear.

I’ve also had regular meetings with Olivier to run through the presentation in person. Not all of these went well! But this was the idea. Make mistakes. Loads of mistakes! Because then there’s less chance of them happening on the day!

I was also fortunate enough to get some advice from Redgate Product Advocate Grant Fritchey at October’s Data Relay event. The changes I’ve made to my notes based on his suggestions have been very helpful. Thanks, Grant!

Summary

In this post, I talked about my final preparations for the upcoming New Stars Of Data 6 event in October 2023.

This will be my final post before the event! I’m presenting my Racing Towards Insights session online on October 27 at 15:45. The track links are currently on the New Stars Of Data schedule!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Training & Community

New Stars Of Data 6 Preparations: July

In this post, I talk about my July preparations for the upcoming New Stars Of Data 6 event in October 2023. Yes – it’s now the middle of August. I’ve been very busy!

Table of Contents

Introduction

I’m speaking at the next New Stars Of Data event in October! I have a Sessionize profile now and everything!

2023 08 11 NewStarsOfDataSchedule

Back in December, I said that I wanted to try improving my presentation skills in 2023. I found out about New Stars Of Data when they advertised their May 2023 event, and when they opened the NSOD6 Call For Speakers I decided the time was right to have a go myself!

But wait. What’s New Stars Of Data?

Nom Sharks On What?

New Stars Of Data is an event focused on the tuition and promotion of new speakers in the Microsoft space. It is run by Ben Weissman and William Durkin, and is supported by a team of experienced speakers. There have been five events at the time of writing, with the sixth scheduled for October 27 2023.

NSOD open their Call For Speakers roughly every six months and announce it on their Twitter feed. There are four criteria that all applicants must meet:

  1. You have never spoken at a large, public event before (User Groups/Meetups do not rule you out!).
  2. Your presentation is on a topic in the Microsoft Data Platform world.
  3. Your presentation is in English.
  4. Your presentation will fill the allotted time (60 minutes).

Successful applicants have an experienced speaker assigned to them as a mentor. The mentor supports the newcomer through the process, coaching them in all aspects of creating and delivering their session.

Not sure if this is the case for every event, but I also received a cool New Stars Of Data t-shirt for taking part!

Creating My Session

In this section, I cover how I came up with my session’s topic and how I wrote my abstract.

Choosing The Topic

Once I decided to submit a session, I needed to decide what it would be about! While I specialise in AWS (amazonwebshark being a bit of a giveaway) I use many Microsoft products in a typical working week. Two of these are Power BI and Visual Studio Code. By chance, I was starting a new post when the Call For Speakers was announced. This post was about a recent VS Code extension called Data Wrangler.

Data Wrangler is a no-code data preparation and cleaning tool. It uses Python and the Pandas and Regex libraries to provide on-demand data operations, and uses Excel and Power Query technology to enable data profiling, data quality checks and the visualisation of data distributions.

Changes are presented in real-time, with a Git-like interface showing the original and updated data. This lends itself very well to demos! Data Wrangler also offers several export methods for the transformed data, one of which is CSV. This is perfect for Power BI visuals! The session was taking shape!

Finally, I needed a data source. This decision was easy, as I’ve already been using the race results from the 2023 Sale Sizzlers events for another project. The Sale Sizzlers are a series of four 5k running events that place over the Summer, the results of which are freely available as CSVs on the Nifty platform.

Having chosen a topic for the session, I needed to write an abstract for it. So what’s that?

Writing The Abstract

A session abstract is a brief summary of a session or talk. It typically provides an overview of the session topic, the services used and the key takeaways for attendees.

When writing mine, I took advice from a Brent Ozar post and a Johan Ludvig Brattås video. The result was an abstract that was technically accurate but felt a bit flat. So I turned to the tool du jour ChatGPT.

I supplied my abstract and asked ChatGPT for improvements. The results were…mixed, ranging from cliché city to word salad. Some highlights:

  • “…effortlessly assimilating the race results…”
  • “Reveling in the effortless efficiency…”
  • “…fervent sports aficionados…”
  • “Seize this opportunity to join forces in this compelling expedition of knowledge!”

None of which I could say with a straight face, so I turned those down.

I did like some of ChatGPT’s suggestions though. Ultimately, an abstract is as much an advert for a session as it is a description of one, so there needs to be some marketing and persuasiveness in there somewhere.

My finished abstract is on Sessionize and is included below:

In this session, I explore the capabilities of the Visual Studio Code Data Wrangler extension and Microsoft Power BI using real results from the Sale Sizzler 5k race series. I’ll uncover valuable insights through engaging visualisations and user-friendly and low-code data transformations.

This session will cover the following key steps:

– Getting started by setting up a Visual Studio Code environment and seamlessly importing the race results.

– Discovering the convenience of the Visual Studio Code Data Wrangler extension for effortlessly transforming and cleaning the race results

– Taking a closer look at the Python code generated by Data Wrangler to understand what’s happening behind the scenes.

– Loading the transformed race results into Power BI to generate informative visualisations and analyse trends.

Join me if you’re a data professional, a budding analyst or a sports enthusiast!

So ChatGPT wasn’t marking its own homework, I asked Google Bard what it thought of this abstract. It responded:

I think your abstract is very well-written and informative. It clearly states the topic of your session, the speaker’s qualifications, and the key takeaways for attendees.

Well, it made the robots happy. It also pleased Ben and William, as they accepted my session!

So who’s mentoring me?

My Mentor

My mentor is Olivier Van Steenlandt. Upon discovering this, I thought his name looked familiar. It turned out he’d written an Azure DevOps Pipelines post I’d seen the week prior! Small world!

Fotoshoot D1 33 681x1024 1

Olivier is a BI professional specialising in Microsoft. He has substantial experience with SQL Server, Power BI and Azure, and is currently a BI Team Lead.

Since his first session at Datagrillen 2022, Olivier has presented at several events internationally and is currently delivering a session about migrating from SSRS to Power BI Paginated Reports. I enjoy our conversations and he’s given me lots to think about!

Progress Update

In this section, I cover the specifics of my July preparations for my New Stars Of Data session.

I had my first meeting with Olivier in mid-July and got my first jobs! Jobs like “Start getting familiar with Zoomit” and “Please close some of your fifty million Chrome tabs”. I also agreed to start working on the start and end of the session and to decide on its general flow.

To begin, I consulted the New Stars Of Data Speaker Improvement Library and watched Steve Jones‘s “Creating a Slide Deck from an Idea” video. I then reached out to Steve, who happily supplied his example deck for me to review. Thanks for all your help Steve!

Next, I watched Rob Sewell‘s “How do you do that? Remote Presentations.” video. While I expected a video like Steve’s, Rob focused more on his equipment. I hadn’t even thought about this! In August I plan to test my laptop’s microphone and webcam to see what the output is like.

In my August meeting with Olivier, I demoed my starting and summary slides and showed my new-found Zoomit skills. I committed to some extra tasks besides equipment testing:

  • Finalise the start and end of the session.
  • Decide on and start producing the main section and demos.
  • Practise the session opening with a view to presenting it to Olivier in September.

Practising will work well with the equipment testing, as I’m going to have lots of disposable footage over the next few weeks…

Summary

In this post, I talked about my July preparations for the upcoming New Stars Of Data 6 event in October 2023. I’ll be posting further updates in the run-up to October, so watch this space!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Data & Analytics

Connecting Athena To Power BI With Simba Athena

In this post, I use Simba Athena to create a secure connection between my iTunes data in Amazon Athena and Microsoft Power BI.

Table of Contents

Introduction

In my recent posts, I’ve been transforming an iTunes Export CSV file using Python and AWS.

Firstly, in July I built a Python ETL that extracts data from my iTunes CSV into a Pandas DataFrame and transforms some columns.

Next, I updated my ETL script at the start of August. It now uploads the changed data to S3 as a Parquet file. Then I made my data available in an Athena table so I could use some of Athena’s benefits:

  • My data now has high availability at low cost.
  • My data can be queried faster from Athena than from the CSV.
  • I can limit what data is accessed, as opposed to all-or-nothing.

Now I want to start analysing my data. There are many business intelligence (BI) tools available to help me with this. I will be using the latest version of Power BI on my Windows 10 laptop.

But wait. If Power BI is on my laptop and my data is in Athena, how can Power BI access my data? Do I need to make my AWS resources publically accessible? Do I need to download the data to my laptop?

Fortunately not! Welcome to the world of data connectors. Meet Simba Athena.

Simba Athena

In this section, I will look at how Simba Athena bridges the gap between my locally-installed BI tool and my data in AWS.

What Is Simba Athena?

Simba Athena is an Open Database Connectivity (ODBC) driver built for Athena. The history of Simba dates back to 1992 when Simba Technologies co-developed the first standards-based ODBC driver with Microsoft. Magnitude acquired Simba Technologies in 2016.

Simba offers numerous data connectors that all work in roughly the same way:

Relating this diagram to Athena and Power BI:

  • The user sends a query to Power BI.
  • Power BI passes the query to Simba Athena via the ODBC Device Manager.
  • Simba Athena queries Athena and gets the results.
  • Simba Athena passes the results to Power BI via the ODBC Device Manager.
  • Power BI shows the results to the user.

Features Of Simba Athena

Simba Athena has several features that make it a great partner for Athena:

  • Simba Athena works with Windows, macOS and Linux. Just as Athena supports multiple operating systems, Simba Athena is also OS-agnostic.
  • Numerous applications support Simba Athena including Excel, Tableau and Power BI.

Speaking of Power BI…

Microsoft Power BI

In this section, I will examine Power BI and explain why I chose to use it.

What Is Power BI?

Microsoft Power BI is a data visualization solution with a primary focus on BI. At the time of writing, Power BI’s main components are:

  • Power BI Desktop: a free locally-installed application designed for connecting to, transforming, and visualizing data.
  • Power BI Service: a cloud-based SaaS supporting the deployment and sharing of dashboards.
  • Power BI Mobile: a mobile app platform for Windows, iOS, and Android devices.

So what makes Power BI a good choice here?

Choosing Power BI

My decision to use Power BI came down to three factors:

  • Prior Experience. I’ve used Power BI many times over the years, and have become very familiar with it. This will let me deliver results quickly.
  • Support: Both Microsoft and AWS have rich documentation for Simba Athena. This gives me confidence in setting it up and reduces the chance of any blockers.
2022 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms

So now I’ve talked about Simba Athena and Power BI, let’s get them working together.

Setting Up Simba Athena

In this section, I will install and configure Simba Athena on my laptop. I will then attempt to extract data from Athena using Power BI.

The remainder of this post will focus on the Windows version of Simba Athena. AWS offers download links for Windows, Linux and macOS, and provides installation instructions in the Simba Athena Documentation.

Downloading Simba Athena

The first step is to download the Simba Athena ODBC driver provided by AWS. The options vary depending on platform and bitness.

The installation process mainly focuses on the end-user license agreement and destination folder selection. Once Simba Athena is installed, it can be configured.

Configuring Simba Athena

Simba Athena’s configuration settings are available via the Windows ODBC Data Source Administrator. This can be found in the Start Bar’s Windows Administrative Tools folder, or by running a Windows search for ODBC.

Accessing this and selecting the System DSN tab shows Simba Athena as a System Data Source:

2022-08-28-SimbaSystemDSN

From here, selecting Configure shows a setup screen with a few familiar fields:

Of these, Catalog, Schema and Workgroup are pre-populated with Athena defaults and Metadata Retrieval Method is set to Auto.

That leaves the Data Source Name and Description to identify the data source, and the AWS Region containing the Athena data.

In Output Options, I can state my S3 Output Location and Encryption Options. The output location is Athena’s Query Result Location, and the encryption options should mirror the S3 bucket’s encryption settings.

If the S3 Output Location is left blank, this will cause an error when Power BI tries to connect to Athena:

Details: "ODBC: ERROR [HY000] [Simba][Athena] (1040) An error has been thrown from the AWS Athena client. Athena Error No: 130, HTTP Response Code: 400, Exception Name: InvalidRequestException, Error Message: outputLocation is null or empty 

Simba Athena’s remaining settings are out of scope for this post, although there’s one I definitely need to mention – Authentication Options:

This is how Simba Athena authenticates its requests to AWS. As mentioned earlier, there are several options here. Depending on the authentication type selected, Simba Athena can store Access Keys, Session Tokens, TenantIDs and any other required credentials.

That’s all the Simba Athena configuration I’m going to do here. For full details on all of Simba Athena’s features, please refer to the Simba Athena Documentation.

Now let’s use Simba Athena to get Athena and Power BI talking to each other!

Using Simba Athena

The Athena documentation has a great section about using the Athena Power BI connector. After launching Power BI and selecting Amazon Athena as a data source, Power BI will need to know which DSN to use.

This is the Simba Athena DSN in the System DSN tab:

The Navigator screen then shows my Athena data catalog, my blog_amazonwebshark database, and my basic_itunes_python_etl table with a sample of the data it contains:

That’s everything! My basic_itunes_python_etl Athena table is now available in Power BI.

Summary

In this post, I used Simba Athena to create a secure connection between my iTunes data in Amazon Athena and Microsoft Power BI.

This post was originally part of a larger post that is still being written. But after I’d finished my Simba Athena section it made sense to have a separate post for it!

Finally, in other news, this post’s featured image is a DALL·E 2 creation. This was by far the best image it gave me for pixel art baby lion and shark – I’m sure it’ll improve soon!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~