Tag: Open Source

WordPress MySQL Database Tables Deep Dive

Post author By Damien Jones
Post date May 6, 2023

In this post, I do a deep dive into some of the amazonwebshark WordPress MySQL database tables following the journey of a recent post.

Introduction
WordPress Database 101
wp_posts
- Table Purpose
- Post Journey
wp_term_relationships
- Table Purpose
- Post Journey
wp_term_taxonomy
- Table Purpose
- Post Journey
wp_terms
- Table Purpose
- Post Journey
wp_statistics_pages
- Table Purpose
- Post Journey
Summary

Introduction

In January I used Python and Matplotlib to create some visualisations using the WordPress amazonwebshark MySQL database.

Since then I’ve been doing a lot with Power BI at work, so I’ve created a Power BI connection to the amazonwebshark database to reacquaint myself with some features and experiment with a familiar dataset.

I talked about doing a views analysis in January’s post. While some of the 2022 data is missing, I can still accurately analyse 2023 data. I plan to measure:

Total views for each post.
Total views for each category.

I’ll use this post to examine some of the MySQL tables, and link back to it in future analysis posts.

Let’s begin with a brief WordPress database overview.

WordPress Database 101

In this section, I take a high-level view of a typical WordPress database and identify the tables I’ll need.

There’s plenty of great documentation online about typical WordPress installations. I’m particularly keen on The Ultimate Developer’s Guide to the WordPress Database by DeliciousBrains, which includes an in-depth tour of the various tables.

As for table relationships, this WordPress ERD shows object names, primary keys and relationship types:

I’ll be concentrating on these WordPress tables:

And the wp_statistics_pages table used by WPStatistics.

I’ll examine each table in the context of a recent post: DBeaver OpenAI ChatGPT Integration.

wp_posts

In this section of my WordPress database deep dive, I examine the most important WordPress database table: wp_posts.

Table Purpose

WordPress uses wp_posts to manage a site’s content. Each row in the table is an event relating to a piece of content, like a post, page or attachment. Examples of these events in the context of a blog post are:

Creating A New Draft: A new row is created with a post_status of draft. This row is the parent of all future activity for the blog post.

Updating A Draft: A new row is created with details of the update. The new row’s post_parent is set to the initial post’s ID.

Publishing A Draft: The initial row’s post_status is changed to publish, and the post_date is changed to the publication date. WordPress finds revisions to the post by filtering rows with a post_parent matching the initial row’s ID.

Post Journey

Let’s start by finding DBeaver OpenAI ChatGPT Integration‘s parent row, which is its earliest record. The following query finds rows where the post_title is DBeaver OpenAI ChatGPT Integration, then orders by ID and returns the first result.

SELECT 
  id, 
  post_date, 
  post_title, 
  post_status, 
  post_name, 
  post_parent, 
  post_type 
FROM 
  `wp_posts` 
WHERE 
  post_title = 'DBeaver OpenAI ChatGPT Integration' 
ORDER BY 
  id 
LIMIT 
  1

Note that I order by ID, not post_date. The publication process changes the parent post’s post_date, so I must use ID to find the earliest post.

This record is returned:

Name	Value
ID	1902
post_date	2023-02-19 20:28:22
post_title	DBeaver OpenAI ChatGPT Integration
post_status	publish
post_name	dbeaver-openai-chatgpt-integration
post_parent	0
post_type	post

So the DBeaver OpenAI ChatGPT Integration parent row is ID 1902. I can use this to count the number of changes to this post by searching for wp_posts rows with a post_parent of 1902:

SELECT 
  COUNT(*) 
FROM 
  `wp_posts`
WHERE 
  post_parent = 1902

81 rows are returned:

Name    |Value|
--------+-----+
COUNT(*)|81   |

Now let’s examine these results more closely.

In the following query, I get all rows relating to DBeaver OpenAI ChatGPT Integration and then group the results by:

Date the post was made (using the MySQL DATE function to remove the time values for more meaningful aggregation).
Status of the post.
Post’s parent post.
Type of post.

I also count the rows that match each group and order the results by ID to preserve the event order:

SELECT 
  COUNT(*) AS ID_count, 
  DATE(post_date) AS post_date, 
  post_status, 
  post_parent, 
  post_type 
FROM 
  `wp_posts`
WHERE 
  ID = 1902 
  OR post_parent = 1902 
GROUP BY 
  DATE(post_date), 
  post_status, 
  post_parent, 
  post_type 
ORDER BY 
  ID

The query results are below. A couple of things to note:

The first two columns show what happens when a post is published. Row 1 is ID 1902 as it has no post_parent, and it has a post_status of publish and a post_date of 2023-02-19.
Row 2 is the first revision of ID 1902, and it has a post_status of inherit and a post_date of 2023-02-15. This is why I order by ID instead of post_date – ordering by post_date would show the revisions before the parent post in the results.
There are various post_type valves – revisions are text updates and attachments are image updates.

ID_count	post_date	post_status	post_parent	post_type
1	2023-02-19	publish	0	post
1	2023-02-15	inherit	1902	revision
19	2023-02-16	inherit	1902	revision
7	2023-02-16	inherit	1902	attachment
24	2023-02-17	inherit	1902	revision
1	2023-02-17	inherit	1902	attachment
7	2023-02-18	inherit	1902	revision
21	2023-02-19	inherit	1902	revision
1	2023-02-26	inherit	1902	revision

Spotlighting some of these results for context:

On 2023-02-16 there were 19 text revisions and 7 images attached. I save a lot!
On 2023-02-19 there were 21 text revisions and then the post was published.
There was a further text revision on 2023-02-26 in response to a DBeaver software update.

That’s enough about wp_posts for now. Next, let’s start examining how WordPress groups content.

wp_term_relationships

In this section, I examine the first of the WordPress taxonomy tables: wp_term_relationships.

Table Purpose

wp_term_relationships stores information about the relationship between posts and their associated taxonomy terms (More on taxonomies in the next section). WordPress uses it as a bridge table between wp_posts and the various taxonomy tables.

Post Journey

In this query, I join wp_term_relationships to wp_posts on object_id (this is ID in wp_posts), then find the rows where either wp_posts.id or wp_posts.post_parent is 1902:

SELECT 
  yjp.ID, 
  DATE(yjp.post_date) AS post_date, 
  yjp.post_type, 
  yjp.post_status,
  yjtr.object_id, 
  yjtr.term_taxonomy_id 
FROM 
  `wp_posts` AS yjp 
  INNER JOIN `wp_term_relationships` AS yjtr 
    ON yjtr.object_id = yjp.ID 
WHERE 
  yjp.ID = 1902 
  OR yjp.post_parent = 1902

wp_term_relationships only contains published posts, so the only rows returned concern the parent ID 1902:

ID	post_date	post_type	post_status	object_id	term_taxonomy_id
1902	2023-02-19	post	publish	1902	2
1902	2023-02-19	post	publish	1902	69
1902	2023-02-19	post	publish	1902	71
1902	2023-02-19	post	publish	1902	74
1902	2023-02-19	post	publish	1902	76
1902	2023-02-19	post	publish	1902	77

The query returned six distinct wp_term_relationships.term_taxonomy_id values. My next step is to establish what these IDs relate to.

wp_term_taxonomy

In this section, I examine the table that groups term_taxonomy_id values into taxonomy types: wp_term_taxonomy.

Table Purpose

WordPress uses the wp_term_taxonomy table to store the taxonomy data for terms. Taxonomies in WordPress are used to group posts and custom post types together. Examples of WordPress taxonomies are category, post_tag and nav_menu.

Post Journey

In this query, I add a new join to the previous query, joining wp_term_taxonomy to wp_term_relationships on term_taxonomy_id. Some of the wp_posts columns have been removed from the query to save space.

SELECT 
  yjp.ID,  
  yjtr.term_taxonomy_id, 
  yjtt.taxonomy
FROM 
  `wp_posts` AS yjp 
  INNER JOIN `wp_term_relationships` AS yjtr 
    ON yjtr.object_id = yjp.ID 
  INNER JOIN `wp_term_taxonomy` AS yjtt 
    ON yjtr.term_taxonomy_id = yjtt.term_taxonomy_id 
WHERE 
  yjp.ID = 1902 
  OR yjp.post_parent = 1902

These results give some content to the previous results. I can now see that wp_posts.id 1902 has one category and five tags.

ID	term_taxonomy_id	taxonomy
1902	2	category
1902	69	post_tag
1902	71	post_tag
1902	74	post_tag
1902	76	post_tag
1902	77	post_tag

To get the names of the categories and tags, I must bring one more table into play…

wp_terms

In this section of my WordPress database deep dive, I examine the table that holds the names and details of the taxonomy terms used on amazonwebshark: wp_terms.

Table Purpose

The wp_terms table stores all of the terms that are used across all taxonomies on a WordPress site. Each row represents a single term, and the columns in the table contain information about that term, including name and ID.

Post Journey

In this query, I add another join to the previous query, joining wp_terms to wp_term_taxonomy on term_id.

SELECT 
  yjp.ID, 
  yjtr.term_taxonomy_id, 
  yjtt.taxonomy,
  yjt.name 
FROM 
  `wp_posts` AS yjp 
  INNER JOIN `wp_term_relationships` AS yjtr 
    ON yjtr.object_id = yjp.ID 
  INNER JOIN `wp_term_taxonomy` AS yjtt 
    ON yjtr.term_taxonomy_id = yjtt.term_taxonomy_id 
  INNER JOIN `wp_terms` AS yjt 
    ON yjtt.term_id = yjt.term_id 
WHERE 
  yjp.ID = 1902 
  OR yjp.post_parent = 1902

The results now identify the category and each of the five tags by name:

ID	term_taxonomy_id	taxonomy	name
1902	2	category	AI & Machine Learning
1902	69	post_tag	WordPress
1902	71	post_tag	DBeaver
1902	74	post_tag	MySQL
1902	76	post_tag	OpenAI
1902	77	post_tag	ChatGPT

This is a perfect match for the post’s taxonomy in the WordPress portal:

So that’s the categories. What about the views?

wp_statistics_pages

In this final section, I examine the WPStatistics table that holds view counts: wp_statistics_pages.

Table Purpose

WPStatistics uses wp_statistics_pages to store data about page views. Each row shows a URI’s total views on the date specified.

WPStatistics documentation isn’t as in-depth as WordPress, so here are the table’s DDL and column descriptions:

CREATE TABLE `1yJ_statistics_pages` (
  `page_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `uri` varchar(190) NOT NULL,
  `type` varchar(180) NOT NULL,
  `date` date NOT NULL,
  `count` int(11) NOT NULL,
  `id` int(11) NOT NULL,
  PRIMARY KEY (`page_id`),
  UNIQUE KEY `date_2` (`date`,`uri`),
  KEY `url` (`uri`),
  KEY `date` (`date`),
  KEY `id` (`id`),
  KEY `uri` (`uri`,`count`,`id`)
)

Table Name	Description
page_id	Primary key. Unique identifier for the table.
uri	Uniform Resource Identifier used to access a page.
type	uri type: `home` / `page` / `post`
date	Date the uri was viewed
count	uri total views on the specified date
id	uri ID in `wp_posts.ID`

Post Journey

As wp_statistics_pages.id is the same as wp_posts.id, I can use id 1902 in a query knowing it will still refer to DBeaver OpenAI ChatGPT Integration.

For example, this query counts the number of rows in wp_statistics_pages relating to id 1902:

SELECT 
  COUNT(*) 
FROM 
  `wp_statistics_pages` 
WHERE 
  id = 1902

COUNT(*)|
--------+
      14|

I can also calculate how many visits DBeaver OpenAI ChatGPT Integration has received by using SUM on all wp_statistics_pages.count values for id 1902:

SELECT 
  SUM(yjsp.count) 
FROM 
  `wp_statistics_pages` AS yjsp
WHERE 
  yjsp.id = 1902

SUM(count)|
----------+
        40|

So the page currently has 40 views. I can see how these views are made up by selecting and ordering by wp_statistics_pages.date:

SELECT 
  yjsp.date, 
  yjsp.count 
FROM 
  `wp_statistics_pages` AS yjsp
WHERE 
  yjsp.id = 1902 
ORDER BY 
  yjsp.date

date	count
2023-02-19	1
2023-02-20	5
2023-02-21	1
2023-02-22	4
…	…
2023-03-07	6
2023-03-08	3
2023-03-09	2
2023-03-10	1

I can also join wp_posts to wp_statistics_pages on their id columns, bridging the gap between the WPStatistics table and the standard WordPress tables:

SELECT 
  yjsp.date, 
  yjsp.count, 
  yjp.post_title 
FROM 
  `wp_statistics_pages` AS yjsp 
  INNER JOIN `wp_posts` AS yjp 
    ON yjsp.id = yjp.id 
WHERE 
  yjsp.id = 1902 
ORDER BY 
  yjsp.date

date	count	post_title
2023-02-19	1	DBeaver OpenAI ChatGPT Integration
2023-02-20	5	DBeaver OpenAI ChatGPT Integration
2023-02-21	1	DBeaver OpenAI ChatGPT Integration
2023-02-22	4	DBeaver OpenAI ChatGPT Integration
…	…	…
2023-03-07	6	DBeaver OpenAI ChatGPT Integration
2023-03-08	3	DBeaver OpenAI ChatGPT Integration
2023-03-09	2	DBeaver OpenAI ChatGPT Integration
2023-03-10	1	DBeaver OpenAI ChatGPT Integration

Summary

In this post, I did a deep dive into some of the amazonwebshark WordPress MySQL database tables following the journey of a recent post.

I’ve used this post to present the journey a typical post goes through in the WordPress database. Future posts will use this knowledge and the WordPress database as a data source for various dashboards, scripting and processes. Watch this space!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Tags DBeaver, MySQL, Open Source, SQL, WordPress, WordPress amazonwebshark MySQL Database

Architecture & Resilience

Automating Application Management With Winget

Post author By Damien Jones
Post date April 30, 2023

In this post, I try automating my laptop’s application management with the Windows Package Manager tool Winget.

Introduction
User Story
Introducing Winget
Winget Scripting With VSCode
Automation With Task Scheduler
Summary

Introduction

After much frustration with my laptop’s performance, I finally booked it in for upgrades to an SSD hard drive and 16GB RAM. It’s now very responsive and far faster!

The shop originally planned to clone my existing HDD drive onto the new SSD. Unfortunately, the clone kept failing due to some bad sectors. Fortunately, this didn’t present a risk of data loss – most of my files are in OneDrive, and everything else is either in Amazon S3 or on external drives.

The failing clone meant that none of my previously installed programs and packages were on the new drive. I wasn’t flying blind here though, as I regularly use the free Belarc Advisor tool to create a list of installed programs.

But this is a heavily manual process, and the Belarc Advisor files contain a lot of unnecessary data that isn’t easy to use. So I found myself looking for an alternative!

User Story

In this section, I outline the problem I want to solve.

I want to capture a list of all applications installed on a given Windows device so that I can audit my device and have a better disaster recovery strategy.

ACCEPTANCE CRITERIA:

The process must be fully automated. I don’t want another job to do – I want the device to own this process.

The process must be efficient. Belarc Advisor gets the job done, but it takes time to load and does a bunch of other stuff that I don’t need.

There is no budget. Belarc Advisor isn’t ideal, but it’s free. I don’t want to start spending money on this problem now.

Introducing Winget

This section explains what Winget is and examines some of the features and benefits it offers.

What Is Winget?

Winget is a Windows Package Manager that helps install, upgrade, configure and delete applications on Windows 10 and Windows 11.

Package Managers look through configured repositories like the Windows Package Manager Community Repository for applications. If the application is available, it will be downloaded from the repository and installed onto the device.

Microsoft has open-sourced Winget, and has committed it to their GitHub account. After installation, Winget is accessible via the Windows Terminal, PowerShell, and the Command Prompt.

Package Manager Benefits

Package Managers like Winget offer several benefits over traditional methods:

Applications are installed as CLI commands, so there is no need to navigate to different websites or go through multiple installation steps.

Their repositories enforce a strict submission policy and use standardized package formats, so applications are installed consistently and reliably.

They manage application dependencies. If a desired application needs another application to work, the package manager will automatically install that application as well.

They lend themselves well to CI/CD pipelines, IAC and disaster recovery, as package manager commands can be used in scripts and automated processes.

Community tools like winstall exist that can create batch-installation Winget commands and scripts using a web GUI.

Winget Commands

Winget regularly receives new commands, a list of which is maintained by Microsoft. These commands can be loosely grouped into:

Winget & Device Info e.g. info, source and features
Package Operations e.g. search, install and upgrade
Package Visibility e.g. list and export

For this post, I will be focusing on the last group.

winget list displays a list of installed applications. The list includes the current version and the package’s source, and has several filtration options.

The winget list syntax is:

winget list [[-q] \<query>] [\<options>]

winget export creates and exports a JSON file of apps to a specified path.

This JSON file can combine with the winget import command to allow the batch-installing of applications and the creation of build environments.

winget export‘s JSON files do not include applications that are unavailable in the Windows Package Manager Community Repository. In these cases, the export command will show a warning.

The winget export syntax is:

winget export [-o] <output> [<options>]

Winget Scripting With VSCode

In this section, I write a script that will run the Winget commands.

I’m writing the script using Visual Studio Code, as this allows me to write the Winget script in the same way as other PowerShell scripts I’ve written.

Unique Filename

Firstly, I want to give each file a unique filename to make sure nothing is overwritten. A good way to do that here is by capturing Get-Date‘s output formatted as the ISO 8601 standard:

$RunDate = Get-Date -Format 'yyyy-MM-dd-HHmm'

This returns a string with an appropriate level of granularity, as I’m not going to be running this script multiple times a minute:

2023-04-26-1345

Winget Export Code

Next, I’ll script my export command.

I need to tell Winget where to create the file, and what to call it. I create a new folder for the exports and capture its path in a $ExportsFilePath variable.

Then I create a $ExportsFileName variable for the first part of the export file’s name. It uses a WingetExport string and the device’s name, which PowerShell can access using $env:computername:

$ExportsFileName = 'WingetExport' + '-' + $env:computername + '-'

Including the computer’s name means I can run this script on different devices and know which export files belong to which device:

WingetExport-LAPTOP-IFIJ32T-

My third $ExportsOutput variable joins everything together to produce an acceptable string for winget export‘s output argument:

$ExportsOutput = $ExportsFilePath + '\' + $ExportsFileName  + $RunDate + '.json'

An example of which is:

C:\{PATH}\WingetExport-LAPTOP-IFIJ32T-2023-04-26-1345.json

Finally, I can script the full command. This command creates an export file at the desired location and includes application version numbers for accuracy and auditing:

winget export --output $ExportsOutput --include-versions

Here are some sample exports:

{
  "$schema": "https://aka.ms/winget-packages.schema.2.0.json",
  "CreationDate": "2023-04-27T11:02:04.321-00:00",
  "Sources": [
    {
      "Packages": [
        {
          "PackageIdentifier": "Git.Git",
          "Version": "2.40.0"
        },
        {
          "PackageIdentifier": "Anki.Anki",
          "Version": "2.1.61"
        },
        {
          "PackageIdentifier": "Microsoft.PowerToys",
          "Version": "0.69.1"
        }
      ],
      "SourceDetails": {
        "Argument": "https://cdn.winget.microsoft.com/cache",
        "Identifier": "Microsoft.Winget.Source_8wekyb3d8bbwe",
        "Name": "winget",
        "Type": "Microsoft.PreIndexed.Package"
      }
    }
  ],
  "WinGetVersion": "1.4.10173"
}

As a reminder, these exports don’t include applications that are unavailable in Winget. This means winget export alone doesn’t meet the user story requirements, so there is still work to do!

Winget List Code

Finally, I’ll script my list command. This is mostly similar to the export command and I create the file path in the same way:

$ListsOutput = $ListsFilePath + '\' + $ListsFileName + $RunDate + '.txt'

The filename is changed for accuracy, and the suffix is now TXT as no JSON is produced:

WingetList-LAPTOP-IFIJ32T-2023-04-25-2230.txt

Now, while winget list shows all applications on the device, it has no argument to save this list anywhere. For that, I need to pipe the winget list output to a PowerShell command that does create files – Out-File:

winget list | Out-File -FilePath $ListsOutput

Out-File writes the list to the $ListsOutput path, producing rows like these:

Name	Id	Version	Available	Source
Anki	Anki.Anki	2.1.61		winget
Audacity 2.4.2	Audacity.Audacity	2.4.2	3.2.4	winget
DBeaver 23.0.2	dbeaver.dbeaver	23.0.2		winget
S3 Browser version 10.8.1	S3 Browser_is1	10.8.1

The entire script takes around 10 seconds to run in an open PowerShell session and produces no CPU spikes or memory load. The script is on my GitHub with redacted file paths.

Automation With Task Scheduler

In this section, I put Task Scheduler in charge of automating my application management Winget script.

What Is The Task Scheduler?

Task Scheduler began life on Windows 95 and is still used today by applications including Dropbox, Edge and OneDrive. Parts of it aren’t great. The Send Email and Display Message features are deprecated, and monitoring and error handling relies on creating additional tasks that are triggered by failure events.

However, it’s handy for running local scripts and has no dependencies as it’s built into Windows. It supports a variety of use cases which can be scripted or created in the GUI. Existing tasks are exportable as XML.

Creating A New Task

There is plentiful documentation for the Task Scheduler. The Microsoft Learn developer resources cover every inch of it, and these Windows Central and Windows Reports guides are great resources with extensive coverage.

In my case, I create a new ApplicationInventory task, set to trigger every time I log on to Windows:

The task starts powershell.exe, passing an argument of -file "C:\{PATH}\ApplicationInventory.ps1".

This works, but will force a PowerShell window to open every time the schedule runs. This can be stopped by configuring the task to Run whether user is logged on or not. Yup – it feels a bit hacky. But it works!

I now have a new scheduled task:

Testing

An important part of automating my application management with Winget is making sure everything works! In this section, I check the script and automation processes are working as expected.

I’ll start with the task automation. Task Scheduler has a History tab, which filters events from Event Viewer. Upon checking this tab, I can see the chain of events marking a successful execution:

When I check the WingetExport folder, it contains an export file created on 25/04/2023 at 22:30:

And there are similar findings in the WingetList folder:

Both files open successfully and contain the expected data. Success!

Summary

In this post, I try automating my laptop’s application management with the Windows Package Manager tool Winget.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Tags GitHub, Open Source, PowerShell, Task Scheduler, Visual Studio Code, Winget

AI & Machine Learning

DBeaver OpenAI ChatGPT Integration

Post author By Damien Jones
Post date February 19, 2023

"A happy cartoon shark with a happy robot cartoon beaver in a forest." by DALL-E

In this post, I install the new DBeaver OpenAI ChatGPT integration, try it out with some prompts and examine my OpenAI usage costs.

Introduction
Preparation
ChatGPT Data Sharing
Tests
Pricing
- Tokens
- Usage
- Billing
Summary

Introduction

DBeaver is a free database tool and SQL client. It is multi-platform, open-source and supports a variety of databases including Microsoft SQL Server, Amazon Athena and MySQL.

In February 2023, the latest version of DBeaver’s Community Edition was released. Version 22.3.4’s release notes start with:

ChatGPT integration for smart completion and code generation (as optional extension).

ChatGPT has been in the news a lot since its launch in November 2022. It has earned widespread praise and scorn, has been embraced in some places and banned in others and has attracted billions of dollars of investment.

So how does ChatGPT fare in this setting? Let’s find out!

Preparation

In this section, I install and configure DBeaver’s OpenAI ChatGPT integration.

DBeaver’s OpenAI ChatGPT integration is not automatically installed with Version 22.3.4. This makes sense – it’s currently quite new and many users and organisations may not want to install it (for reasons I’ll go into later). This approach keeps DBeaver’s footprint as small as possible and lets people make up their own minds.

DBeaver’s GitHub wiki covers the installation process, which amounts to:

Open DBeaver’s integration installer.
Select the DBeaver AI (GPT) integration library.
Select and install DBeaver AI (GPT) Support.

Following this, the DBeaver OpenAI ChatGPT integration must be configured. This process needs an active API token from the OpenAI platform.

OpenAPI account holders can generate tokens in the API Keys section of their profile. Afterwards, this token must be pasted into the API Token field in DBeaver’s preferences:

The DBeaver OpenAI ChatGPT integration will now work. The other settings are out of scope for this post and are covered in DBeaver’s wiki.

Separately, I found I was getting SQL 08S01 timeout errors during my first attempts to send ChatGPT prompts. I ended up increasing my database connection’s connectionTimeout value to resolve this.

There is an important fact to consider when using DBeaver’s OpenAI ChatGPT integration:

DBeaver needs to send database metadata to the OpenAI platform.

To clarify, this means that DBeaver sends a list of the selected database’s table names and column names to OpenAI.

DBeaver doesn’t hide this fact away. Their wiki is very explicit, and the DBeaver OpenAI ChatGPT integration states this when used for the first time.

For my purposes, this isn’t a problem. My amazonwebshark MySQL database contains nothing confidential or sensitive, and the tables and columns of a standard WordPress database are well documented.

But I wouldn’t currently use this with databases I don’t own. Although the shared metadata is limited to table names and column names, these can still be sensitive and potentially misused. I’m sure OpenAI have data security measures in place, but as this technology is still quite new I currently prefer to be cautious.

Tests

In this section, I test the DBeaver OpenAI ChatGPT integration with prompts referring to my amazonwebshark MySQL database. Before starting, I should define the concepts of prompts and completions:

Prompts are the instructions given to ChatGPT.
Completions are ChatGPT’s response to prompts.

Test 1: Category Usage

For my first test, I wanted to answer a question similar to one from my last post:

How many amazonwebshark blog categories have been used more than twice?

I answered the question with this SQL query:

SELECT t.name, tt.count
FROM term_taxonomy AS tt
INNER JOIN terms AS t 
	ON tt.term_id = t.term_id
WHERE tt.taxonomy = 'category'
	AND tt.count > 2

Which returned these results:

name	count
Data & Analytics	6
Me	4
Developing & Application Integration	8
Security & Monitoring	3

Now it’s ChatGPT’s turn! My first prompt for ChatGPT was:

Find all categories used more than 2 times.

GPT’s completion was:

SELECT COUNT(*) AS count, category
FROM statistics_historical
GROUP BY category
HAVING count > 2;

Although this query looks fine, it doesn’t return any results! In fact the whole table has nothing in it:

SELECT COUNT(*) FROM statistics_historical

Name	Value
COUNT(*)	0

ChatGPT interpreted my prompt literally and searched my database metadata for tables with a category column. This search only returned one table, which is used for archiving purged report data: statistics_historical.

ChatGPT didn’t know that the table was empty, and so had no way to know its query wasn’t appropriate. This was my fault – my prompt needed more work.

Test 2: Category Usage Take 2

For my second test, I tried to re-engineer my prompt to give ChatGPT more insight into my data. My question remained:

“How many amazonwebshark blog categories have been used more than twice?”

My second prompt was phrased to give ChatGPT some guidance on what to look for:

Get all names of terms with taxonomy of category and count greater than 2

This time, GPT’s completion was:

SELECT name, count 
FROM terms 
INNER JOIN term_taxonomy 
	ON terms.term_id = term_taxonomy.term_id 
WHERE term_taxonomy.taxonomy = 'category' 
	AND term_taxonomy.count > 2

This was almost identical to my own SQL query:

SELECT t.name, tt.count
FROM term_taxonomy AS tt
INNER JOIN terms AS t 
	ON tt.term_id = t.term_id
WHERE tt.taxonomy = 'category'
	AND tt.count > 2

And the results were the same in both cases:

name	count
Data & Analytics	6
Me	4
Developing & Application Integration	8
Security & Monitoring	3

Test 3: Page Views

For my final test I tried a different approach. Instead of writing a SQL query myself, I gave ChatGPT a prompt and then used a WPStatistics report to compare results.

My question is now:

Which amazonwebshark posts have had the highest views in the past week?

I know which table has the data needed for this, so I included it along with some vital columns in my ChatGPT prompt:

Get the uris for the past week from statistics_pages. Sum all the counts. Show the top 5 in descending order.

GPT’s completion was:

SELECT uri, SUM(count) AS total_count
FROM statistics_pages
WHERE date >= DATE_SUB(CURDATE(), INTERVAL 7 DAY)
GROUP BY uri
ORDER BY total_count DESC
LIMIT 5;

This was impressive! ChatGPT used the table and columns I specified, filtered the dates correctly and limited the results as instructed.

The query results were:

uri	total count
/connecting-athena-to-power-bi-with-simba-athena/	31
/writing-user-stories-for-an-itunes-dashboard/	24
/using-athena-to-query-s3-inventory-parquet-objects/	22
/s3-glacier-instant-retrieval-first-impressions/	10
/open-source-documentation-with-read-the-docs/	7

I compared these with my WPStatistics Pages report:

And the results were the same!

Pricing

In this section, I examine OpenAI’s usage and billing services.

Tokens

OpenAI uses the concept of tokens to calculate pricing. In their words:

The GPT family of models process text using tokens, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.
OpenAI Tokenizer site

And:

“You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.”
OpenAI Pricing site

OpenAI currently offers several tools for determining pricing:

Pricing site for costs in USD.
Tokenizer site for showing how the API will tokenize text.
transformers package for Python.
gpt-3-encoder package for node.js.

OpenAI also operates a free tier. New users get $18 worth of free tokens. These expire after 3 months, after which accounts can be upgraded to paid plans operating on a pay-as-you-go basis. At this point, accounts with no billing information can still be accessed but no further API requests will be granted.

Usage

Here is my OpenAI usage for 2023-02-15:

And 2023-02-16:

To understand this better, let’s examine the last row that relates to Test 3. The values stated are:

3284 prompt tokens.
61 completion tokens.

The Completion value is easy to verify. Entering ChatGPT’s Test 3 completion into the tokenizer site scores it at 61 tokens:

Now let’s examine the Prompt value. When I enter my Test 3 prompt into the tokenizer site, it only scores it at 25 tokens. I wanted to see where the other 3259 tokens came from, so I needed to see what DBeaver sent to ChatGPT.

Currently, my OpenAI account doesn’t keep a record of what their APIs send and receive. However, I can log what DBeaver sends to OpenAI! This is done by enabling the Write GPT queries to debug log option in DBeaver’s ChatGPT settings.

It turns out that DBeaver includes DDL statements for all of the tables and columns in my amazonwebshark MySQL database in every ChatGPT prompt. Even in cases like Test 3 where I tell ChatGPT which table to use!

This isn’t unreasonable, as ChatGPT’s information retention is limited and it can’t access past conversations to inform its responses. But this explains the high prompt values per request. The OpenAI tokeniser says Test 3’s full prompt is 8293 characters. And remember – every 4 characters requires 1 token!

Addendum: DBeaver version 22.3.5 added metadata scope configuration to their ChatGPT smart completion.

In terms of size, my amazonwebshark MySQL database has around 20 tables with modest column counts. Larger databases with dozens of tables and potentially hundreds of columns could quickly rack up high usage fees, so use caution in these situations.

Billing

OpenAI has features to help with billing. Their site stores payment methods and shows invoices from the past 12 months. In addition, preferences like additional email addresses for billing notifications and business tax ID retention can be specified.

Accounts can also have usage limits, triggering notification emails when reached. These work alongside the usage tracking dashboard to help monitor spending.

So how much have I spent here?

By default, DBeaver uses the DaVinci model. This is currently listed at $0.0200 per 1000 tokens. If I take the 30,057 total tokens I’ve used and price them at $0.0200 per 1000 tokens, I get $0.60114.

In GBP that’s about £0.50 (or about 5p per query) which my free tier will cover.

Summary

In this post, I installed the new DBeaver OpenAI ChatGPT integration, tried it out with some prompts and examined the OpenAI usage costs.

Overall I was very impressed with what I found! ChatGPT works really well with a well-engineered prompt. It understood what I wanted and worked quickly. Would I pay for it? I think so, provided I kept an eye on the prompts being sent and the resulting usage costs.

In a future version, I’d like to be able to control what database metadata is included in a prompt. This would let me make sure that ChatGPT wasn’t getting sensitive or unnecessary metadata, and the smaller prompts would reduce my API usage costs.

Addendum: DBeaver version 22.3.5 added metadata scope configuration to their ChatGPT smart completion.

This is all very exciting though! I’m interested to discover how else the DBeaver OpenAI ChatGPT integration can help me out, and fascinated to see future developments!

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Tags ChatGPT, DBeaver, MySQL, Open Source, OpenAI, SQL, WordPress, WordPress amazonwebshark MySQL Database

Table of Contents

Introduction

WordPress Database 101

wp_posts

Table Purpose

Post Journey

wp_term_relationships

Table Purpose

Post Journey

wp_term_taxonomy

Table Purpose

Post Journey

wp_terms

Table Purpose

Post Journey

wp_statistics_pages

Table Purpose

Post Journey

Summary

Table of Contents

Introduction

User Story

Introducing Winget

What Is Winget?

Package Manager Benefits

Winget Commands

Winget Scripting With VSCode

Unique Filename

Winget Export Code

Winget List Code

Automation With Task Scheduler

What Is The Task Scheduler?

Creating A New Task

Testing

Summary

Table of Contents

Introduction

Preparation

ChatGPT Data Sharing

Tests

Test 1: Category Usage

Test 2: Category Usage Take 2

Test 3: Page Views

Pricing

Tokens

Usage

Billing

Summary