Categories
Architecture & Resilience

Can SQL Upgrades Be Avoided In The Cloud?

In this post I consider the February 2022 T-SQL Tuesday #147 Invitation “Upgrade Strategies” and look at the importance of upgrades in the cloud.

For this month’s T-SQL Tuesday, VoiceOfTheDBA’s invitation was as follows:

This month I want you to write about how you look at SQL Server upgrades. A few things you might think about:

Why we wait to upgrade?

Strategies for testing an upgrade

Smoke tests or other ways to verify the upgrade worked

Moving to the cloud to avoid upgrades

Using compatibility levels to upgrade an instance by not a database.

Checklists of things to use in planning

The time it takes to upgrade your environment

What you evaluate in making a decision to upgrade or not?

Anything else

Immediately I was drawn to “Moving to the cloud to avoid upgrades”. Some perceive the cloud as a ‘set it and forget it’ environment. The reality is that cloud services still require upgrades that can cause security vulnerabilities and data issues if left unchecked.

What follows are some SQL based observations from my experience to date. While it’s true this list is AWS specific, it isn’t AWS exclusive as Azure and GCP operate similar services with similar considerations.

EC2 Upgrades

When I create a new EC2 instance I can generally expect it to be running the latest build of my chosen OS. However, an instance that has been running for a while will soon find itself needing system updates like any other computer. Some updates offer performance improvements or new features and are essentially optional. Others fix security vulnerabilities and bugs and are non-negotiable.

If that instance is running my relational database of choice, that too will need a range of updates from the desirable to the critical. AWS views this as a customer responsibility, with the AWS Shared Responsibility Model including the following:

Customers that deploy an Amazon EC2 instance are responsible for management of the guest operating system (including updates and security patches), any application software or utilities installed by the customer on the instances, and the configuration of the AWS-provided firewall (called a security group) on each instance.

However the managed services are viewed differently:

For abstracted services, AWS operates the infrastructure layer, the operating system, and platforms, and customers access the endpoints to store and retrieve data. Customers are responsible for managing their data (including encryption options), classifying their assets, and using IAM tools to apply the appropriate permissions.

So what if I get AWS to run my database for me?

RDS Upgrades

Amazon Relational Database Service (RDS) offers managed relational databases including Microsoft SQL Server, MySQL and PostgreSQL. RDS still uses EC2 instances but here they are managed by AWS and are not accessible by the user. This means OS management is no longer a customer responsibility.

Upgrades to the database engine are still a factor though. AWS try to make this as painless as possible – upgrades can be done using the console, AWS CLI or the RDS API. This is still a manual process though, although it is possible to upgrade some minor engine versions automatically.

However, even on rails it’s still possible for an update to go wrong. AWS have a nine-point checklist for testing an upgrade that wouldn’t be out of place on-premises. AWS also encourage database snapshots and non-production testing. While RDS removes infrastructure complexity, the data is still the customer’s responsibility and needs the same care as ever.

Operational Upgrades

AWS constantly release new services intended to simplify workflows and reduce costs. Even when an organisation’s cloud setup is fully mature, it can still benefit from upgrading to these services.

When Athena debuted in 2016 it enabled the analysis of data in S3 using standard SQL. This removed the need for complex ETLs and data warehouses, and with Athena being serverless it was faster to set up and cheaper to operate than EC2, RDS or Redshift.

In 2020 Amazon announced new EBS GP3 volumes. GP3s have separate settings for performance and storage, and are recommended for applications like MySQL that need high performance at low costs. This meant organisations could save money by reducing their use of the more expensive IO1 volumes.

More recently, AWS announced a new S3 Glacier Instant Retrieval storage class in 2021. This made S3 less expensive for a range of use cases including SQL backup storage and data lake archival.

Conclusion

The Cloud offers numerous opportunities for individuals and organisations to develop, build and deploy quicker and easier. But upgrades are a fact of life in technology regardless of platform. The cloud is still a collection of computers, which still need to respond to changing requirements and threats.

A well maintained and fully upgraded cloud environment is reliable, scalable and secure. A poorly maintained one can, at best, be expensive, slow and unwieldy. At worst it can be unreliable, vulnerable and in breach of terms of service.

If you want to check the health of your AWS account, AWS offers their Trusted Advisor and Well-Architected Tool services. These give free architectural advice, security recommendations, cost optimisations and best practice guidance.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~

Categories
Architecture & Resilience

S3 Glacier Instant Retrieval: First Impressions

On 30/11/2021, AWS introduced S3 Glacier Instant Retrieval – a new archive storage class for S3 that operates alongside S3 Glacier (now renamed S3 Glacier Flexible Retrieval) and S3 Glacier Deep Archive. Their announcements can be seen here and here and a summary of all Glacier classes is available on the S3 Glacier product page.

I already use most of the S3 storage classes in my AWS accounts. Earlier in the year I got tired of my laptop backups needing to run overnight and made an S3 cross-account replication setup in which whatever I upload to the AtRest bucket in my main account gets replicated to the AtRest bucket in my backup account and gets set as S3 Glacier Deep Archive. This way I have two versions of the object in different regions in different accounts, and although there are data transfer costs they are offset by the reduced cost I get from using S3 Glacier Deep Archive for the backup objects.

Objects in my main account use different classes depending on their purpose. Before I upload any objects there I consider whether the object is in motion or at rest and what my access pattern for the object is likely to be, then choose a storage class accordingly. This is the current storage class distribution for all buckets in my main account according to S3 Storage Lens:

The arrival of S3 Glacier Instant Retrieval is of interest to me as it might offer cost savings and accessibility improvements over my current setup. So far my decisions over S3 storage classes have usually boiled down to trade-offs. For example:

  • For Object X I could use S3 Intelligent Tiering or S3 Infrequent Access. S3 Infrequent Access has a minimum storage duration of 30 days and has retrieval costs, but S3 Intelligent Tiering has a handing fee per 1000 objects and each object will spend the first 30 days in, and be charged as, S3 Standard. So if I know I’m not going to touch this object for at least a month which class is most suitable?
  • For Object Y I could use S3 Glacier or S3 Glacier Deep Archive. Deep Archive will cost less for storage but the retrieval fees are higher than Glacier and Deep Archive’s minimum storage duration is 180 days where Glacier’s is only 90 days. Plus I can get objects out of Glacier far quicker as its standard retrieval time is 3 to 5 hours compared to Deep Archive’s standard of 12 hours. So could I afford to wait half a day for this object if I needed it? And how long do I see this object being around for?

Comparisons With Other S3 Storage Classes

So how does S3 Glacier Instant Retrieval compare to S3 Infrequent Access and S3 Glacier Flexible Retrieval? I loaded the S3 pricing site and had a look at various costs in eu-west-1 for S3 Infrequent Access (IFA), S3 Glacier Instant Retrieval (GIR) and S3 Glacier Flexible Retrieval (GFR), then used the S3 calculator to get some estimates based on my current S3 Storage Lens statistics and November 2021 bill.

Storage:

  • IFA $0.0125 per GB
  • GIR $0.004 per GB
  • GFR $0.0036 per GB

PUT, COPY, POST, LIST requests (per 1,000 requests):

  • IFA $0.01
  • GIR $0.02
  • GFR $0.33

GET, SELECT, and all other requests (per 1,000 requests):

  • IFA $0.001
  • GIR $0.01
  • GFR $0.0004

Data Retrieval requests (per 1,000 requests):

  • IFA N/A
  • GIR N/A
  • GFR $0.055 (Standard)

Data retrievals (per GB):

  • IFA $0.01
  • GIR $0.03
  • GFR $0.01 (Standard)

Estimated cost for storing 200GB per month (with average size of 4.4MB for Glacier Flexible Retrieval)24265 PUT, COPY, POST, LIST requests, 10402 GET, SELECT, and all other requests and retrieval of 50GB per month (using 1 Standard request for Glacier Flexible Retrieval):

  • IFR $3.25
  • GIR $2.89
  • GFR $2.38

A couple other items of note:

  • S3 Glacier Instant Retrieval has a minimum billable object size of 128 KB, which it shares with S3 Standard Infrequent Access
  • S3 Glacier Instant Retrieval offers instant retrieval in milliseconds, which it also shares with S3 Standard Infrequent Access
  • S3 Glacier Instant Retrieval has a minimum storage duration of 90 days, which it shares with S3 Glacier Flexible Retrieval

What’s interesting in the cost estimates for me is now close S3 Glacier Instant Retrieval is to S3 Standard Infrequent Access. The major difference between the two classes that I can see is that, while S3 Glacier Instant Retrieval has a minimum storage duration of 90 days, the same period for S3 Standard Infrequent Access is only 30 days. If you delete an object before the end of a minimum storage duration period, you are charged for the full period specified. Depending on the size and amount of the objects, this could get expensive if mismanaged. That said, AWS are offering S3 Glacier Instant Retrieval as being “For long-lived archive data accessed once a quarter with instant retrieval in milliseconds” so there are no smoke and mirrors here.

Conclusions

Would I use S3 Glacier Instant Retrieval over S3 Glacier Flexible Retrieval or S3 Standard Infrequent Access? Definitely in my AtRest bucket. The S3 Storage Lens stats for that bucket shows many objects in S3 Standard Infrequent Access, including all the old TV shows from Internet Archive because let’s face it – if you want to watch old TV you want to watch it now not in 3 hours’ time </Glacier>. In this scenario S3 Glacier Instant Retrieval keeps the millisecond access and, although the retrieval cost is higher (GIR $0.03 vs IFA $0.01) the cost of data storage is lower (GIR $0.004 per GB vs IFA $0.0125 per GB). So S3 Glacier Instant Retrieval looks like a winner there.

My InMotion bucket is a different story though. The objects here aren’t being retained permanently and most of them are in S3 so they don’t bring my laptop’s hard drive to its knees. If I’m looking at uploading objects here it’s usually with a question of “When will I deal with this?”, the answer to which will usually be:

  • The next few weeks, in which case I’ll keep the object in OneDrive instead (What a TWIST)
  • Next month, in which case I’d put the object in S3 Standard Infrequent Access because of its 30-day minimum storage duration
  • “I don’t know”, in which case I’d put the object in S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive because their storage costs are less than S3 Glacier Instant Retrieval

As a side note, most of the objects in my InMotion bucket are S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive already, so it looks like my estimates from the start of the year were half decent!

Thanks for reading! ~~^~~