ChatGPT Series Part 3: Understanding SQL Server High Availability and Disaster Recovery Options

Part 3 in the ChatGPT series where ChatGPT gives me topics to blog about; High Availability and Disaster Recovery (HADR) options. There are several options SQL Server offers to keep your databases both highly available, and redundant in the event of a disaster. The three I’d like to chat about today are: Always on Availability Groups, Failover Cluster Instances, and Log Shipping. I’ll give you some overview on each, and some pros and cons to consider as well. I’m not going to talk about Database Mirroring since it is 1. Depreciated by Microsoft, and 2. Really what drives Availability Groups under the hood.

Covering Some Basics

Before we get too deep into the weeds with these different options here, let’s cover some high level basics. What does it mean to be highly available in SQL Server? Well, basically to be highly available means that you are able to maintain database uptime, so when there is an outage, or a database becomes unavailable, developers and more importantly, end users don’t experience any issues. You achieve high availability by having redundancy in your database environment, meaning, you have multiple “copies” if you will of your data all being synchronized at the same time, usually in the same datacenter so that in the event one of your copies goes down, you can quickly and seamlessly switch over to another copy. Alternatively, disaster recovery usually involves an asynchronous copy of your data, in some other geographical location. This serves as a last resort failsafe of maintaining database uptime and data. In the event a natural disaster occurs and destroys your primary data center, your disaster recovery strategy ensures you have backup copies, or replicas of your data in another location that you can use to carry out business.

Another key basic to cover is the concept of 9s. This concept refers to the percentage of uptime you have based on your HADR implementation. I don’t remember and am not smart enough to remember or know how to calculate the number of 9s so I’m not going to try. However, this number is quantified by the number of 9s your implementation provides. The more 9s the better. For example, if your uptime is 99% that’s good. If your uptime is 99.9% that’s incredible. If it’s 99.99% you’re a wizard. If it’s beyond that, please share your knowledge. The thing to remember with this is, the more 9s your company wants you to ensure, the more expensive it gets. This means more nodes, licensing, hardware, etc. It can add up quick. Okay, let’s get into it.

Always On Availability Groups: Overview

Always on Availability Groups (AAGs) are a great HADR solution that allows you to group databases together for failover purposes. This gives you improved availability and data redundancy. Failovers occur at the database level giving you more flexibility and granularity when addressing database outages. AAGs consist of primary and secondary replicas. You have one primary replica where databases live in a read/write state, and one or more secondary replicas where databases are read-only. Depending on your configuration of your primary and secondary nodes data is replicated either synchronously or asynchronously between the databases that are in the availability group. Synchronous replication means that as transactions are written to the primary node, SQL Server will not move on to the next transaction until it gets confirmation from the secondary replica that the transaction has been hardened to the transaction log. This ensures that data stays synchronized between the two replicas. This can also be referred to as no data loss. Conversely, asynchronous replication means that transactions can be written to the primary replica and not have to wait for confirmation from the secondary before moving on to the next transaction. This has positives and negatives. If you have an AAG in a busy environment, asynchronous commit mode may function better from a performance standpoint because transactions can flow through without waiting for confirmation. Its’ a negative, because now we have data loss.

AAGs are also great for their capabilities of automatic and manual failover. Automatic failover means that if the primary replica goes down, SQL will automatically failover to the secondary node. Manual failover means that a user with appropriate permissions has to manually tell SQL Server to failover to another node.

You typically see a configuration of a three node AAG. You’ll have node 1 and node 2 in the same datacenter, functioning in synchronous commit mode along with automatic failover to give you your HA solution, and then a tertiary node in a different data center running in asynchronous commit mode along with manual failover to give you your DR solution. The DR node in the AAG is also useful for Read-Only Routing which AAGs offer. Read-Only Routing is a function that allows read-only workloads, offloading read operations from the primary replica to another server.

Always On Availability Groups: Pros

Some pros of AAGs include but are not limited to:

High availability by minimizing downtime for database outages.
Disaster recovery by geographic distribution of replicas providing resilience against regional failures.
Mitigated maintenance downtimes for patching by allowing you to patch passive nodes, failing over, then patching the former primary node.
Scalability by allowing read-only workloads to be offloaded to secondary replicas.
Data protection by ensuring redundancy, mitigating data loss.

Always On Availability Groups: Cons

Some cons of AAGs include but are not limited to:

Complexity in the way they are needed to be set up and configured as well as maintained.
Not cost effective and requires expensive licensing.
Can be resource intensive especially for synchronous commit.
Network latency can be experienced especially with DR nodes.
Limited support in SQL Server Standard Edition by only allowing you to have access to basic features.
Instance level objects are not protected (logins, jobs, etc…)

Differences Between SQL Server Standard and Enterprise Editions

Standard Edition:

Basic Availability Groups where you are limited to one database per AG.
Two node limit restricts your AAGs to only having a maximum of two nodes (one primary and one secondary node).
Only supports asynchronous replication limiting data protection.
No read-only routing for offloading workloads.
Only supports manual failover.

Enterprise Edition:

Multiple databases within AAGs.
Multiple nodes in AAG.
Supports synchronous and asynchronous commit.
Supports read-only routing.
Automatic and manual failover is available.

Failover Cluster Instances: Overview

Failover Cluster Instances (FCIs) allow you to have high availability at the instance level by leveraging Windows Server Failover Clustering (WSFC). FCIs make sure that if one server (node) in a cluster fails, another node can take over with minimal downtime. One of the many key differences to note between FCIs and AAGs is that FCIs failover at the server level, meaning every database on that server will failover and the instance level objects (logins, jobs, etc) will persist. This is different from AAGs that allow you the granularity of failing over at the database level.

Another thing to know about FCIs is that they function utilizing shared storage. FCIs rely on shared storage like a SAN or file share accessible by all nodes in the cluster. Like AAGs, FCIs have automatic failover, so, if the active node fails, another node in the cluster automatically will take over, thus minimizing downtime. You also get node level redundancy which protects you against server or hardware failures.

Failover Cluster Instances: Pros

Some pros of FCIs include but are not limited to:

High availability through automatic failover at the server level to ensure minimal downtime in case of node failure.
Instance-level protection so that the entire SQL Server instance including SQL Server Agent jobs and other instance-level objects are maintained.
Transparent client connections allow clients to continue to connect to the same instance name, regardless of which instance owns the instance of SQL Server, simplifying application configuration.
Improved maintenance that allows for planned patching with minimal impact by failing over to another node.
Can be configured with multiple nodes for increased redundancy and load balancing.

Failover Cluster Instances: Cons

Some cons of FCIs include but are not limited to:

Just like AAGs, setup can be complex and troubleshooting issues can prove to be challenging.
Cost can be an issue if you are requiring multiple nodes in your FCI.
FCIs rely on shared storage which can become a single point of failure if not properly configured.
Geographically limited because FCIs are typically used within the same datacenter.
Can be resource intensive due to heightened hardware requirements and increased resources to support failover nodes.

Differences Between SQL Server Standard and Enterprise Editions

Standard Edition:

Only supports up to two nodes in a FCI.
Provides core FCI capabilities but lacks advanced features only in Enterprise Edition.
More cost effective but significant feature limitations.

Enterprise Edition:

Multiple nodes can be in the FCI for higher redundancy and load balancing.
Flexible storage options.
Better performance for more complex environments.
Higher availability with enhanced failover capabilities and support for larger clusters.

Log Shipping: Overview

Now, this last one here can be argued that it can be used for high availability, however, I see it more useful as a disaster recovery tool. Log shipping is another HADR solution that involves automating the process of backing up and restoring transaction logs from a primary database to a secondary database on another server to keep them in sync with each other. The catch here is that the secondary database is usually in one of two states: RESTORING or in STANDBY. The former allows users to read from the secondary database, but when transaction logs are restored (usually every 15 minutes, but you can control the cadence), you have to kick the users out of the database before the logs can be restored to the secondary database. The latter state allows users to read from the secondary database in a continuous state. When failovers happen, there are usually more scripts involved make sure that the former secondary databases get brought online properly, and set it up so that the former primary databases, can now act as the secondary databases and receive transaction logs.

Log Shipping: Pros

Some pros of log shipping include but are not limited to:

A good solution for DR by providing a robust approach to maintaining a copy of the database on a secondary server.
Lower in cost and does not require SQL Server Enterprise Edition.
Gives you flexible options to have multiple secondary servers allowing for multiple copies of the database.
Easy setup with relatively simple steps to configure and support compared to its other HADR counterparts.
Has a low impact on performance of the primary server as log backups are asynchronous.

Log Shipping: Cons

Some cons of log shipping include but are not limited to:

Requires manual intervention to failover to a secondary server in case of primary server failure which can lead to longer downtime.
There is a delay between the primary and secondary databases depending on the frequency of the log backups and restores.
No instance-level protection and only protects databases and not instance-level objects such as logins, jobs, linked servers, etc.
More complex management as failovers usually involve setting up log shipping again to sync databases to their former primary server.

Differences Between SQL Server Standard and Enterprise Editions

There is not much notable difference in log shipping between SQL Server Standard and Enterprise Editions which make it a cost-effective solution for DR.

Conclusion

So as you can see, each solution listed above has its own positives, and tradeoffs. AAGs are a great solution to get both HA and DR but can be costly to implement as well as complex to maintain and troubleshoot. FCIs are a great alternative and are a bit more cost friendly but lack in the granularity and control you get with managing failovers. Log shipping is a great tool that is cheap and easy to implement, regardless of what edition of SQL Server you have, but can create longer downtime situations in the event of a failover. Hopefully the notes above gave you some more insight into common HADR options, and you can use this to help influence your decision as to which solution fits your environment the best.

DBA Unfiltered

ChatGPT Series Part 3: Understanding SQL Server High Availability and Disaster Recovery Options

Covering Some Basics

Always On Availability Groups: Overview

Always On Availability Groups: Pros

Always On Availability Groups: Cons

Differences Between SQL Server Standard and Enterprise Editions

Failover Cluster Instances: Overview

Failover Cluster Instances: Pros

Failover Cluster Instances: Cons

Differences Between SQL Server Standard and Enterprise Editions

Log Shipping: Overview

Log Shipping: Pros

Log Shipping: Cons

Differences Between SQL Server Standard and Enterprise Editions

Conclusion

Leave a comment Cancel reply

ChatGPT Series Part 3: Understanding SQL Server High Availability and Disaster Recovery Options

Covering Some Basics

Always On Availability Groups: Overview

Always On Availability Groups: Pros

Always On Availability Groups: Cons

Differences Between SQL Server Standard and Enterprise Editions

Failover Cluster Instances: Overview

Failover Cluster Instances: Pros

Failover Cluster Instances: Cons

Differences Between SQL Server Standard and Enterprise Editions

Log Shipping: Overview

Log Shipping: Pros

Log Shipping: Cons

Differences Between SQL Server Standard and Enterprise Editions

Conclusion

Share this:

Leave a comment Cancel reply