ChatGPT Series Part 6: SQL Server Replication – Overview, Types, and Implementation

The next installation of our ChatGPT series where ChatGPT gives me topics to blog about; everybody’s favorite topic – replication. In this post, we’ll cover the overview of replication, different types, and how to implement each type. Hopefully this can be your one-stop-shop for guidance on replication implementation.

Overview of Replication

Replication is a very robust tool in SQL Server that replicates data from one database (publisher), and passes that data to a centralized location (distributor) that then distributes that data to other databases (subscriber) either on the same server or on a different server. The most common use cases for replication are for reporting purposes where the business would like to query the data located on one of their OLTP servers but do not want to cause any performance degradation for writing queries. So, they come to you, the awesome and all knowing DBA to figure out a way to take data that exists on that OLTP server and put it somewhere else in a different database so they can perform reporting. This isn’t the only use case for replication, but it’s definitely the most common one I see. Now, let’s talk about the different types of replication and how to implement them.

Snapshot Replication

Backwards compatibility rules

So, for all types of replication, they have their own backwards compatibility rules. This mainly has to do with SQL Server versioning. Basically, there are certain rules that have to be followed when replicating data in SQL Server. Here’s the backwards compatibility rules for snapshot replication:

REPLICATION BACKWARD COMPATIBILITY RULES
	
- A distributor can be any version as long as it is greater than or equal to the publisher version.
- A publisher can be any version as long as it is less than or equal to the distributor version.
- A subscriber's version depends on the type of replication:
	* A subscriber to snapshot publication can be any version within two versions of the publisher version.

Key points about snapshot replication

Snapshot replication distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data. When synchronization occurs, the entire snapshot is generated and sent to Subscribers.
Snapshot replication is a good candidate when:
- Data changes infrequently.
- It is acceptable to have copies of data that are out of data with respect to the publisher for a period of time.
- Replicating small volumes of data.
- A large volume of changes occurs over a short period of time.
In snapshot replication, shared locks are placed on all tables published as part of replication for the duration of snapshot generation. This can prevent updates from being made on the publishing tables.

How to implement snapshot replication

So, I’m not going to write my step-by-step guides for each type of replication. Otherwise, this post would be way too long. I will, however link you to my Git Hub for snapshot replication so you can view the instructions there.

Transactional Replication

Backwards compatibility rules

Here is the backwards compatibility rules for transactional replication:

REPLICATION BACKWARD COMPATIBILITY RULES
	
- A distributor can be any version as long as it is greater than or equal to the publisher version.
- A publisher can be any version as long as it is less than or equal to the distributor version.
- A subscriber's version depends on the type of replication:
	* A subscriber to transactional publication can be any version within two versions of the publisher version.

Key points about transactional replication

Transactional replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the publisher are usually delivered to the subscriber as they occur (in near real time).
Transactional replication is a good candidate when:
- You want incremental changes to be propagated to subscribers as they occur.
- The application requires low latency between the time changes are made at the publisher and the changes arrive at the subscriber.
- The application requires access to intermediate data states. For example, if a row changes five times, transactional replication allows an application to respond to each change (such as firing a trigger), not simply the net data change to the row.
- The publisher has a very high volume of insert, update, and delete activity.
- The publisher or subscriber is a non-SQL Server database, such as Oracle.
In transactional replication, concurrent snapshot processing, which is the default with transactional replication, does not hold the share lock in place during the entire snapshot generation, which allows users to continue working.

How to implement transactional replication

Just like with snapshot replication, I’ll just link you to my Git Hub for transactional replication. All of the steps and guidance can be found there.

Merge Replication

Backwards compatibility rules

I couldn’t find a good AI generated image for merge replication so here is paint “merging” together. Anyways, here are the backwards compatibility rules for merge replication:

REPLICATION BACKWARD COMPATIBILITY RULES
	
- A distributor can be any version as long as it is greater than or equal to the publisher version.
- A publisher can be any version as long as it is less than or equal to the distributor version.
- A subscriber's version depends on the type of replication:
	* A subscriber to merge publication can be any version equal to or lower than the publisher version.

Key points about merge replication

Merge replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the Publisher and Subscribers are tracked with triggers. The Subscriber synchronizes with the Publisher when connected to the network and exchanges all rows that have changed between the Publisher and Subscriber since the last time synchronization occurred.
Merge replication is a good candidate when:
- Multiple Subscribers might update the same data at various times and propagate those changes to the Publisher and to other Subscribers.
- Subscribers need to receive data, make changes offline, and later synchronize changes with the Publisher and other Subscribers.
- Each Subscriber requires a different partition of data.
- Conflicts might occur and, when they do, you need the ability to detect and resolve them.
- The application requires net data change rather than access to intermediate data states. For example, if a row changes five times at a Subscriber before it synchronizes with a Publisher, the row will change only once at the Publisher to reflect the net data change (that is, the fifth value).
In merge replication, the snapshot agent does not take any locks.
Any schema changes need to be made on the publisher end.
Data changes can be made on either publisher or subscriber side.

How to implement merge replication

Keeping with the trend, you can check out my Git Hub for merge replication to learn more about implementation.

Peer to Peer Replication

Backwards compatibility rules

The backwards compatibility rules follow that of transactional replication. So, please reference that section of this post.

Key points about peer to peer replication

Peer-to-peer replication is build on top of transactional replication in the sense that it will propagate transactionally consistent changes in near real-time. Peer-to-peer replication maintains data consistency between the nodes that participate. Similar to merge replication, you can make data changes on any of the participating nodes and those changes will be propagated to the rest of the nodes.
Peer-to-peer replication is a good candidate when:
- Catalog queries and other reads are spread across multiple nodes. This enables performance to remain consistent as reads increase.
- If one of the nodes in the system fails, an application layer can redirect the writes for that node to another node. This maintains availability.
- If a node requires maintenance or the whole system requires an upgrade, each node can be taken offline and added back to the system without affecting the availability of the application.
In peer-to-peer replication, write performance for the topology is like that for a single node. This is because all inserts, updates, and deletes are propagated to each node in the topology.
We strongly recommend that write operations for each row be performed at only one node, for the following reasons:
- If a row is modified at more than one node, this can cause a conflict or even a lost update when the row is propagated to other nodes.
- There is always some latency involved when changes are replicated. For applications that require the latest change to be seen immediately, dynamically load balancing the application across multiple nodes can be problematic.
Peer-to-peer replication includes the option to enable conflict detection across a peer-to-peer topology. This option helps prevent the issues that are caused from undetected conflicts, including inconsistent application behavior and lost updates. By enabling this option, by default a conflicting change is treated as a critical error that causes the failure of the Distribution Agent. In the event of a conflict, the topology remains in an inconsistent state until the conflict is resolved manually and the data is made consistent across the topology.
NOTE: To avoid potential data inconsistency, make sure that you avoid conflicts in a peer-to-peer topology, even with conflict detection enabled. To ensure that write operations for a particular row are performed at only one node (BEST PRACTICE TO HAPPEN AT THE PUBLISHER NODE), applications that access and change data must partition insert, update, and delete operations. This partitioning ensures that modifications to a given row originating at one node are synchronized with all other nodes in the topology before the row is modified by a different node. If an application requires sophisticated conflict detection and resolution capabilities, use merge replication.

How to implement peer to peer replication

Here’s my Git Hub for peer to peer replication. Use this for a step-by-step guide to implement.

Done and Done

Phew. Another long one. But, hopefully you found this useful and a good place to start when digging into replication. By now you should have more of an understanding about replication in general, each type of replication, when to use each type, key points and considerations, and implementation. Happy replicating!

DBA Unfiltered

ChatGPT Series Part 6: SQL Server Replication – Overview, Types, and Implementation

Overview of Replication

Snapshot Replication

Backwards compatibility rules

Key points about snapshot replication

How to implement snapshot replication

Transactional Replication

Backwards compatibility rules

Key points about transactional replication

How to implement transactional replication

Merge Replication

Backwards compatibility rules

Key points about merge replication

How to implement merge replication

Peer to Peer Replication

Backwards compatibility rules

Key points about peer to peer replication

How to implement peer to peer replication

Done and Done

Leave a comment Cancel reply

ChatGPT Series Part 6: SQL Server Replication – Overview, Types, and Implementation

Overview of Replication

Snapshot Replication

Backwards compatibility rules

Key points about snapshot replication

How to implement snapshot replication

Transactional Replication

Backwards compatibility rules

Key points about transactional replication

How to implement transactional replication

Merge Replication

Backwards compatibility rules

Key points about merge replication

How to implement merge replication

Peer to Peer Replication

Backwards compatibility rules

Key points about peer to peer replication

How to implement peer to peer replication

Done and Done

Share this:

Leave a comment Cancel reply