Change Data Capture (CDC)
/1. What is Change Data Capture?
Change Data Capture (CDC) is the ability to identify and track changes made to data in a database. While there are a number of common patterns database administrators can employ to build custom CDC solutions, many database vendors provide embedded CDC features out of the box. This blog post will focus on Microsoft’s implementation of change data capture available across SQL Server and Azure SQL.
2. CDC in SQL Server and Azure SQL
Change Data Capture was initially introduced as a feature back in SQL Server 2008, and continues to be available today across supported versions of SQL Server, Azure SQL Managed Instance, and Azure SQL Database. Once enabled, the service monitors for DML changes (INSERT, UPDATE, and DELETE) on a CDC enabled table, then persists these modifications to a change table that mirrors the column structure of the tracked source table along with additional metadata columns, enabling downstream consumption by other users and systems (e.g. show me all rows that changed within a certain time period).
3. Process Flow
DML operations (DELETE, INSERT, or UPDATE) occur against a source table in the database. This activity is written to the transaction log.
Change data capture process monitors for changes against tracked source tables by periodically scanning the transaction log.
Change data capture process persists a copy of these changes to the associated change table.
Change data capture data is made available via the associated change data capture query functions.
4. Key Concepts
Capture Instance
When a table is enabled for change data capture, an associated capture instance is created. The capture instance consists of a change table (e.g. cdc.<capture instance>_CT) and up to two query functions (e.g. cdc.fn_cdc_get_all_changes_<capture_instance> and cdc.fn_cdc_get_net_changes_<capture_instance>).
Capture Process
The capture process monitors the transaction log by scanning for changes, then subsequently writing identified changes to the associated change table.
Change Table
When a table is enabled for change data capture, an associated change table is created (e.g. cdc.<capture_instance>_CT). The change table consists of columns that mirror the columns from the source table (e.g. CustomerID, CustomerAddress, etc), as well as additional metadata columns that provide additional information that is relevant to the recorded change.
Query Functions
While change tables can be queried directly (e.g. SELECT * FROM cdc.dbo_Customer_CT), system functions are made available which can provide a filtered result set. It is recommended you do not query the system tables directly. Instead, leverage the built-in query functions.
5. Working Example
The following snippets of SQL code walks through:
How to enable CDC
Implications of INSERT, DELETE, and UPDATE operations on the underlying change tables
How to query for changes using the query functions
How to disable CDC
Alternatively, the code samples are available for download in this Python Notebook which can be opened in tools such as Azure Data Studio.
Enable CDC
The first step is to enable change data capture, first at the database level, then followed by enabling change data capture for a specific target table.
-- 01. Create Table
DROP TABLE IF EXISTS dbo.Customer;
CREATE TABLE Customer
(
CustomerID INT PRIMARY KEY,
CustomerAddress VARCHAR(255)
);
-- 02. Enable change data capture for the current database
sys.sp_cdc_enable_db
-- 03. Enable change data capture for the specified source table
sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Customer',
@role_name = NULL,
@supports_net_changes = 1;
-- 04. As we have yet to trigger a DELETE/INSERT/UPDATE, the change table is initially empty
SELECT * FROM cdc.dbo_Customer_CT
Track Changes
Now that we have CDC enabled on the database and monitoring for changes on a specific source table, by performing a variety of DML operations (INSERT, DELETE, and UPDATE), we can see the way these changes are persisted within the associated change table.
-- 05. Populate the Customer table (INSERT)
INSERT INTO dbo.Customer
(CustomerID, CustomerAddress)
VALUES
(1, '82 Margate Drive, Sheffield, S4 8FQ'),
(2, '135 High Barns, Ely, CB7 4RH'),
(3, '39 Queen Annes Drive, Bedale, DL8 2EL');
SELECT * FROM dbo.Customer;
-- 06. The change table returns three rows in total, one row for each INSERT.
-- The __$operation column identifies the DML operation associated with the change. 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
-- NOTE: There will be some latency between a transaction log being commited on a source table and the last captured transaction being committed on the change table.
-- If the records are not immediately available, run cell again.
SELECT * FROM cdc.dbo_Customer_CT
-- 07. Remove a row from the Customer table (DELETE)
DELETE FROM dbo.Customer WHERE CustomerID = 2;
SELECT * FROM dbo.Customer;
-- 08. The change table returns four rows in total, one additional row for the DELETE.
-- The __$operation column identifies the DML operation associated with the change. 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
SELECT * FROM cdc.dbo_Customer_CT;
-- 09. Update a row from the Customer table (UPDATE)
UPDATE dbo.Customer SET CustomerAddress = 'Guyzance Cottage, Guyzance NE65 9AF' WHERE CustomerID = 3;
SELECT * FROM dbo.Customer;
-- 10. The change table returns six rows in total, two additional rows for the UPDATE (one for the old, one for the new).
-- The __$operation column identifies the DML operation associated with the change. 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
SELECT * FROM cdc.dbo_Customer_CT;
Query Changes
In addition to the change table, a change data capture instance includes query functions that makes it relatively easy to isolate specific changes. For example, show me all changes that occurred over the last day. It is recommended to use these query functions as opposed to querying the system change tables directly.
-- 11. It is recommended that you do not query the system tables (e.g. cdc._CT) directly.
-- Function: cdc.fn_cdc_get_all_changes_
-- Description: Returns one row for each change applied to the source table within the specified log sequence number (LSN) range.
DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Customer');
SET @to_lsn = sys.fn_cdc_get_max_lsn();
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Customer (@from_lsn, @to_lsn, N'all');
-- 12. The following function will only be available if @supports_net_changes was activated when enabling the table for CDC (sys.sp_cdc_enable_table)
-- Function: cdc.fn_cdc_get_net_changes_
-- Description: Returns one net change row for each source row changed within the specified Log Sequence Numbers (LSN) range.
DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);
SET @begin_time = DATEADD(day, -1, GETDATE());
SET @end_time = GETDATE();
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Customer(@from_lsn, @to_lsn, 'all');
Disable CDC
Finally, the steps below illustrate how to turn CDC off, both at a table level, and a database level.
-- 13. Disables change data capture for the specified source table and capture instance in the current database
sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'Customer',
@capture_instance = N'dbo_Customer';
-- 14. Disables change data capture for the current database
sys.sp_cdc_disable_db;
-- 15. Drop Customer table if it exists
DROP TABLE IF EXISTS dbo.Customer;
6. Additional Resources
Microsoft Docs
Tech Community
Introducing Change Data Capture for Azure SQL Databases (Public Preview)
Using Change Data Capture and Azure Data Factory to Incrementally Load Change Data
Deep Dive into Change Data Capture (CDC) in Azure SQL Databases
Track and Record Data Changes with Change Data Capture (CDC) in Azure SQL | Data Exposed
7. Reference
System Dynamic Management Views
Name | Description |
---|---|
sys.dm_cdc_errors | Returns one row for each error encountered during the change data capture log scan session. |
sys.dm_cdc_log_scan_sessions | Returns one row for each log scan session in the current database. |
System Functions
Name | Description |
---|---|
cdc.fn_cdc_get_all_changes_<capture_instance> | Returns one row for each change applied to the source table within the specified LSN range. |
cdc.fn_cdc_get_net_changes_<capture_instance> | Returns one net change row for each source row changed within the specified LSN range. |
sys.fn_cdc_decrement_lsn | Returns the previous log sequence number (LSN) in the sequence based upon the specified LSN. |
sys.fn_cdc_get_column_ordinal | Returns the column ordinal as it appears in the change table. |
sys.fn_cdc_get_max_lsn | Returns the maximum LSN from the start_lsn column in the cdc.lsn_time_mapping system table. |
sys.fn_cdc_get_min_lsn | Returns the start_lsn column value for the specified capture instance. |
sys.fn_cdc_has_column_changed | Identifies whether the specified update mask indicates that the column has been updated. |
sys.fn_cdc_increment_lsn | Returns the next log sequence number (LSN) in the sequence based upon the specified LSN. |
sys.fn_cdc_is_bit_set | Indicates whether a captured column has been updated. |
sys.fn_cdc_map_lsn_to_time | Returns the date and time valuefor the specified log sequence number (LSN). |
sys.fn_cdc_map_time_to_lsn | Returns the log sequence number (LSN) value for the specified time. |
sys.fn_all_changes_<capture_instance> | Wrappers for the all changes query functions. |
sys.fn_net_changes_<capture_instance> | Wrappers for the net changes query functions. |
System Stored Procedures
Name | Description |
---|---|
sys.sp_cdc_add_job | Creates a change data capture cleanup or capture job in the current database. |
sys.sp_cdc_change_job | Modifies the configuration of a change data capture cleanup or capture job in the current database. |
sys.sp_cdc_cleanup_change_table | Removes rows from the change table in the current database based on the specified low_water_mark value. |
sys.sp_cdc_disable_db | Disables change data capture for the current database. |
sys.sp_cdc_disable_table | Disables change data capture for the specified source table and capture instance in the current database. |
sys.sp_cdc_drop_job | Removes a change data capture cleanup or capture job for the current database from msdb. |
sys.sp_cdc_enable_db | Enables change data capture for the current database. |
sys.sp_cdc_enable_table | Enables change data capture for the specified source table in the current database. |
sys.sp_cdc_generate_wrapper_function | Generates scripts to create wrapper functions for the CDC query functions that are available in SQL Server. |
sys.sp_cdc_get_captured_columns | Returns CDC metadata for the captured source columns tracked by the specified capture instance. |
sys.sp_cdc_get_ddl_history | Returns the DDL change history associated with the specified capture instance. |
sys.sp_cdc_help_change_data_capture | Returns the CDC configuration for each table enabled for change data capture in the current database. |
sys.sp_cdc_help_jobs | Reports information about all change data capture cleanup or capture jobs in the current database. |
sys.sp_cdc_scan | Executes the change data capture log scan operation. |
sys.sp_cdc_start_job | Starts a change data capture cleanup or capture job for the current database. |
sys.sp_cdc_stop_job | Stops a change data capture cleanup or capture job for the current database. |
System Tables
Name | Description |
---|---|
cdc.<capture_instance>_CT | Is the change table created when change data capture is enabled on a source table. |
cdc.captured_columns | Returns one row for each column tracked in a capture instance. |
cdc.change_tables | Returns one row for each change table in the database. |
cdc.ddl_history | Returns one row for each data definition language (DDL) change made to tables that are enabled for change data capture. |
cdc.lsn_time_mapping | Returns one row for each transaction having rows in a change table. |
cdc.index_columns | Returns one row for each index column associated with a change table. |
cdc.cdc_jobs | Stores the change data capture configuration parameters for capture and cleanup jobs. |