AlwaysOn Availability Groups and SQL Server Jobs, Part 17: Creating Jobs to Check on Synchronization
February 11, 2015
In my previous post, I provided a high-level outline of the core logic (and rationale behind that logic) that would be needed to set up regular synchronization checks on SQL Server Agent Jobs for servers where AlwaysOn Availability Groups have been deployed. In this post, I’ll walk through the steps--and the code--needed to setup those checks.
Step 1: Start with Linked Servers
In order for the logic outlined in this post to work, you’ll need to set up Linked Server definitions between all of the servers hosting your AlwaysOn Availability Groups. In Post #15 I outlined some key concerns around Setting Up Linked Servers, which you’ll have to address before any of the code or logic below will work.
Step 2: Define UDF(s) to Determine Primary Replica Host
During checkups it will be important to know which server (or instance) is currently hosting the Primary Replica(s) for a given Availability Group.
As per post #11, A Fork in the Road, if you’re using the option/technique where you’ll be enabling/disabling jobs depending upon whether they’re on the same server as the Primary Replica(s), this logic will be indispensable. On the other hand, if you’re just using if/else logic within your job steps, you’ll still want to know which server is hosting the Primary Replica(s) in your target Availability Group when running synchronization checks so that you can simplify checks and run them only on/from the primary, as outlined below.
Technically speaking, there are two ways to check for "primacy"of your Availability Groups: by database name (for example, “Is the suchAndSuch database hosted on this server currently the Primary?”) or by the name of the Availability Group (“Is the blahBlah Availability Group currently hosting Primary Replicas on this server?”). Scripts for both techniques are listed below.
USE master;GOIF OBJECT_ID('dbo.fn_hadr_database_is_primary','FN') IS NOT NULLDROP FUNCTION dbo.fn_hadr_database_is_primary;GOCREATE FUNCTION dbo.fn_hadr_database_is_primary (@DBName sysname)RETURNS bitASBEGINDECLARE @description sysname;SELECT @description = hars.role_descFROM sys.databases dINNER JOIN sys.dm_hadr_availability_replica_states hars ON d.replica_id = hars.replica_idWHERE database_id = DB_ID(@DBName);IF @description = 'PRIMARY'RETURN 1;RETURN 0;END;GOUSE master;GOIF OBJECT_ID('dbo.fn_hadr_group_is_primary','FN') IS NOT NULLDROP FUNCTION dbo.fn_hadr_group_is_primary;GOCREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname)RETURNS bit ASBEGIN DECLARE @PrimaryReplica sysname; SELECT @PrimaryReplica = hags.primary_replica FROM sys.dm_hadr_availability_group_states hagsINNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_idWHEREag.name = @AGName;IF UPPER(@PrimaryReplica) = UPPER(@@SERVERNAME)RETURN 1; -- primaryRETURN 0; -- not primaryEND; GO
The naming "convention" around both of these UDFs is patterned after that of sys.fn_hadr_backup_is_preferred_replica. As with that UDF, if you specify the name of the @DBName or @AGName parameters above incorrectly, you’ll get a FALSE result instead of an error (which is by design).
Step 3: Deploy a Sproc to Check Sever-Level Jobs
With Linked Servers in place, it’s now time to check up on Server-Level Jobs--or SQL Server Agent Jobs that operate at the Server Level or which target "server level" details (such as checking available disk space, cycling the error logs or truncating msdb history). This logic will also (likely) be used to checkup on backup routines.
The idea behind this code is pretty simple: It just requests details about SQL Server Agent Jobs on one server, grabs details about Jobs on another server, and then compares relevant bits of information to see if there are any differences. Or, at least, that’s what it’s doing at a high level.
USE master;GOIF OBJECT_ID('dbo.dba_SyncCheckupForInstanceLevelJobs','P') IS NOT NULLDROP PROC dbo.dba_SyncCheckupForInstanceLevelJobsGOCREATE PROC dbo.dba_SyncCheckupForInstanceLevelJobs@MailProfileNamesysname = NULL,-- e.g., 'General'@OperatorNamesysname = NULL,-- e.g., 'Alerts', @PrimaryAGAfinitysysname = NULL, -- see comments above. @IgnoredJobsnvarchar(MAX) = '',@ConsoleOnlybit= 0-- output only to console - don't email alerts (for debugging/manual execution, etc.)AS SET NOCOUNT ON;IF @PrimaryAGAfinity IS NOT NULL BEGIN IF (SELECT dbo.fn_hadr_group_is_primary(@PrimaryAGAfinity)) = 0 BEGIN PRINT 'Server is Not Primary.'RETURN 0;ENDEND-- if we're not manually running this, make sure the server is the primary:IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters:IF ISNULL(@MailProfileName, '') = '' BEGINRAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4);RETURN -2;ENDIF ISNULL(@OperatorName, '') = '' BEGINRAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4);RETURN -3;ENDEND;CREATE TABLE #IgnoredJobs (job_name sysname);-- single-line / in-line 'split' function:DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT ');INSERT INTO #IgnoredJobs EXEC(@DeserializedJobs);DECLARE @LocalServerName sysname = @@SERVERNAME;DECLARE @RemoteServerName sysname; SET @RemoteServerName = (SELECT TOP 1 name FROM PARTNER.master.sys.servers WHERE server_id = 0);------------------------------------------------------------------------------ Start by checking all jobs that aren't 'mapped' to categories representing their HA name... DECLARE @AvailabilityGroupDatabases TABLE ( name sysname NOT NULL ); INSERT INTO @AvailabilityGroupDatabasesSELECT name FROM sys.availability_groupsUNION SELECT name FROM PARTNER.master.sys.availability_groups;CREATE TABLE #LocalJobs (job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85),notify_level_email int, notify_email_operator_id int);CREATE TABLE #RemoteJobs (job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85),notify_level_email int, notify_email_operator_id int);INSERT INTO #LocalJobsSELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id,sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_idFROM msdb.dbo.sysjobs sjINNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_idWHERE-- categoryName of the Job in question != an AG Group NameUPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases);INSERT INTO #RemoteJobsSELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id,sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_idFROM PARTNER.msdb.dbo.sysjobs sjINNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_idWHEREUPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases);CREATE TABLE #Divergence (name sysname, diff_type sysname);INSERT INTO #DivergenceSELECT name,'Job Exists on ' + @LocalServerName + ' Only.'FROM #LocalJobs WHEREname NOT IN (SELECT name FROM #RemoteJobs);INSERT INTO #DivergenceSELECT name, 'Job Exists on ' + @RemoteServerName + ' Only.'FROM #RemoteJobsWHEREname NOT IN (SELECT name FROM #LocalJobs);INSERT INTO #DivergenceSELECT lj.name, 'Job-Level Differences (enabled, start-step, notification, etc)'FROM #LocalJobs ljINNER JOIN #RemoteJobs rj ON rj.name = lj.nameWHERElj.[enabled] != rj.[enabled]OR lj.start_step_id != rj.start_step_idOR lj.notify_email_operator_id != rj.notify_email_operator_idOR lj.notify_level_email != rj.notify_level_email;CREATE TABLE #LocalJobSteps (step_id int, [checksum] int);CREATE TABLE #RemoteJobSteps (step_id int, [checksum] int);CREATE TABLE #LocalJobSchedules (schedule_name sysname, [checksum] int);CREATE TABLE #RemoteJobSchedules (schedule_name sysname, [checksum] int);-- Now, (matching job by job) go through and check schedules and job steps: DECLARE checker CURSOR FAST_FORWARD FORSELECT lj.job_id local_job_id, rj.job_id remote_job_id, lj.name FROM #LocalJobs ljINNER JOIN #RemoteJobs rj ON lj.name = rj.name;DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname;DECLARE @LocalCount int, @RemoteCount int;OPEN checker;FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;WHILE @@FETCH_STATUS = 0 BEGIN -- check jobsteps first:DELETE FROM #LocalJobSteps;DELETE FROM #RemoteJobSteps;INSERT INTO #LocalJobStepsSELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]FROM msdb.dbo.sysjobstepsWHERE job_id = @LocalJobId;INSERT INTO #RemoteJobStepsSELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]FROM PARTNER.msdb.dbo.sysjobstepsWHERE job_id = @RemoteJobId;SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps;SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps;IF @LocalCount != @RemoteCountINSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers');ELSE BEGIN INSERT INTO #DivergenceSELECT @JobName, 'Different Job Step Details between Servers'FROM #LocalJobSteps ljs INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_idWHEREljs.[checksum] != rjs.[checksum];END;-- Now Check Schedules:DELETE FROM #LocalJobSchedules;DELETE FROM #RemoteJobSchedules;INSERT INTO #LocalJobSchedulesSELECT ss.name,BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]FROM msdb.dbo.sysjobschedules sjsINNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_idWHEREsjs.job_id = @LocalJobId;INSERT INTO #RemoteJobSchedulesSELECT ss.name,BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]FROM PARTNER.msdb.dbo.sysjobschedules sjsINNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_idWHEREsjs.job_id = @RemoteJobId;SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules;SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules;IF @LocalCount != @RemoteCountINSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers');ELSE BEGIN INSERT INTO #DivergenceSELECT@JobName, 'Different Schedule Details between Servers.'FROM #LocalJobSchedules ljsINNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_nameWHERE ljs.[checksum] != rjs.[checksum];END;FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;END;CLOSE checker;DEALLOCATE checker;IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems';DECLARE @crlf char(2) = CHAR(13) + CHAR(10);DECLARE @tab char(1) = CHAR(9);DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: '+ @crlf;SELECT @message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlfFROM #DivergenceWHEREname NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '')ORDER BY NAME;SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.'+ @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.'IF @ConsoleOnly = 1 BEGIN -- just Print out details:PRINT 'SUBJECT: ' + @subject;PRINT 'BODY: ' + @crlf + @message; ENDELSE BEGIN-- send a message:EXEC msdb..sp_notify_operator @profile_name = @MailProfileName, @name = @OperatorName, @subject = @subject,@body = @message;END;END;DROP TABLE #LocalJobs;DROP TABLE #RemoteJobs;DROP TABLE #Divergence;DROP TABLE #LocalJobSteps;DROP TABLE #RemoteJobSteps;DROP TABLE #LocalJobSchedules;DROP TABLE #RemoteJobSchedules;DROP TABLE #IgnoredJobs;RETURN 0;
Behind the scenes there’s a bit more going on. For example, there’s an @PrimaryAGAfinity parameter, which is used to specify "affinity" or correlation with/against a specified Availability Group being hosted on your servers. The idea is that if you specify this value (for example, “MyProductionDbs”) and SQL Server a.) detects that AG is defined on the host where this code is run and b.) that AG is currently not hosting Primary Replicas, then this checkup logic will exit because it will assume that we’ll be doing the checkups on the Server hosting the Primary Replica(s).
Likewise, there are parameters for a MailProfileName (or the name of the Mail Profile to use if there are synchronization issues detected), as well as the name of the Operator to notify.
Finally, there’s also an @IngoredJobs parameter, where you can comma-delimit a list of Job names that you wish to ignore (or not have checked for synchronization), and an @ConsoleOnly flag, which you can use for running/testing this sproc within SSMS--where it won’t bother sending an email alert as it would when scheduled.
Step 4: Deploy a Sproc to Check on AlwaysOn Availability Group Jobs
Another key thing that the sproc above does, though, is IGNORE any SQL Server Agent Job where the name of the Job Category assigned to that Job matches the name of an Availability Group on the host--in keeping with the convention outlined in Post #12, Issues with Enabling and Disabling Batch Jobs. As such, we’ll need additional, follow-up, logic that checks on AG-Level Jobs and their synchronization state.
Ultimately, this logic is pretty similar to the logic above, in that we’ll grab job details from one server and compare them against the details on another server. Only, in this case, we’ll only be grabbing details for jobs that "belong" to a specific SQL Server Agent Job Category (that is, to a given Availability Group), and we’ll, optionally, be looking to see if we need to "toggle" the Enabled/Disabled status for these jobs--based upon where they’re currently hosted and whether they’re on the same host as the Primary Replica.
Code is as follows:
USE master;GOIF OBJECT_ID('dbo.dba_SyncCheckupForAGLevelJobs','P') IS NOT NULLDROP PROC dbo.dba_SyncCheckupForAGLevelJobs;GOCREATE PROC dbo.dba_SyncCheckupForAGLevelJobs@AGNamesysname,-- Required@JobsTableDBNamesysname,-- Required (tells us which DB the list of enabled/disabled jobs is in).@MailProfileNamesysname = NULL,-- optional - only needed if @ConsoleOnly = 0@OperatorNamesysname = NULL,-- as per above@IgnoredJobsnvarchar(MAX) = NULL,-- jobs to ignore (in the category with name of MirroredDB)@ConsoleOnlybit = 0 -- output details to console (1), or send alerts/output via email (0). ASSETNOCOUNT ON;IF ISNULL(@AGName, '') = '' BEGINRAISERROR('Procedure expects parameter ''@AGName'' which was not provided.', 16, 4);RETURN -1;ENDIF ISNULL(@JobsTableDBName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@JobsTableDBName'' which was not provided.', 16, 4);RETURN -2;END IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters:IF ISNULL(@MailProfileName, '') = '' BEGINRAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4);RETURN -5;ENDIF ISNULL(@OperatorName, '') = '' BEGINRAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4);RETURN -6;ENDEND;-- Start by running the Job that syncs/updates Job Enabled/Disabled Status first: EXEC master.dbo.dba_EnsureAGLevelJobEnabledStatuses @AGName, @JobsTableDBName;-- Remaining checks/reporting only needs to be run from/against the Primary - so bail if that's not this server. IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGINPRINT 'Server is Not Primary';RETURN 0; ENDCREATE TABLE #IgnoredJobs (job_name sysname);-- single-line / in-line 'split' function:DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT ');INSERT INTO #IgnoredJobs EXEC(@DeserializedJobs);CREATE TABLE #LocalJobs (job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85),notify_level_email int, notify_email_operator_id int);CREATE TABLE #RemoteJobs (job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85),notify_level_email int, notify_email_operator_id int);INSERT INTO #LocalJobsSELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id,sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_idFROM msdb.dbo.sysjobs sjINNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_idWHEREUPPER(sc.name) = UPPER(@AGName);INSERT INTO #RemoteJobsSELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id,sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_idFROM PARTNER.msdb.dbo.sysjobs sjINNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_idWHEREUPPER(sc.name) = UPPER(@AGName);-- Now start comparing differences: CREATE TABLE #Divergence (name sysname, diff_type nvarchar(300));INSERT INTO #DivergenceSELECT name,'Exists on Primary Server Only'FROM #LocalJobs WHEREname NOT IN (SELECT name FROM #RemoteJobs);INSERT INTO #DivergenceSELECT name, 'Exists on Secondary Server Only'FROM #RemoteJobsWHEREname NOT IN (SELECT name FROM #LocalJobs);-- check meta-data properties:INSERT INTO #DivergenceSELECT lj.name, 'Job-Level Differences (owner, start-step, notification, etc)'FROM #LocalJobs ljINNER JOIN #RemoteJobs rj ON rj.name = lj.nameWHERElj.owner_sid != rj.owner_sidOR lj.start_step_id != rj.start_step_idOR lj.notify_email_operator_id != rj.notify_email_operator_idOR lj.notify_level_email != rj.notify_level_email;-- For Job Enabled Status we have two checks: --A: No Jobs that have a Job.CategoryName = @AGName should be enabled on the SECONDARY server. --B: we can't infer that on the PRIMARY server. So, instead, if Job.CategoryName = @AGName and--enabled/disabled != Enabled bit-flag from the _JobEnabledStates table, then we need--to raise an error because a job that should be enabled is NOT. (Or vice versa.)-- A) Check for jobs on the remote server:INSERT INTO #DivergenceSELECT name, 'Job Enabled on SECONDARY SERVER (Jobs should only be enabled on PRIMARY).'FROM #RemoteJobsWHEREenabled = 1; -- B) Check for jobs on the local server: DECLARE @Jobs TABLE ( JobName sysname NOT NULL, [Enabled] bit NOT NULL )DECLARE @sql nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];'INSERT INTO @JobsEXEC(@sql); -- Report on Jobs that are not documented: INSERT INTO #DivergenceSELECTlj.name, 'Job Enabled/Disabled Status is not currently defined. Please check job Enabled/Disabled status then run "EXEC master.dbo.dba_DocumentJobEnabledStatuses ''' + @AGName + ''';" to document Job Enabled/Disabled state.'FROM #LocalJobs ljWHERElj.name NOT IN (SELECT JobName FROM @Jobs);-- Report on Jobs that are Enabled but shouldn't be:INSERT INTO #DivergenceSELECT lj.name, 'Job is ENABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be DISABLED. (Update Table or Job Status to correct this issue.)'FROM #LocalJobs lj LEFT OUTER JOIN @Jobs j ON lj.name = j.JobNameWHERE ISNULL(j.[Enabled],0) = 0 AND lj.[enabled] = 1; -- job is listed/defaulted to disabled, but is currently enabled-- Report on Jobs that are Disabled but shouldn't be:INSERT INTO #DivergenceSELECT lj.name, 'Job is DISABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be _ENABLED_ (Running). (Update Table or Job Status to correct this issue.)'FROM #LocalJobs lj LEFT OUTER JOIN @Jobs j ON lj.name = j.JobNameWHERE ISNULL(j.[Enabled], 0) = 1 AND lj.[enabled] = 0; -- job is listed/defaulted to enabled, but is currently disabled-- Now, (matching job by job) go through and check schedules and job steps: CREATE TABLE #LocalJobSteps (step_id int, [checksum] int);CREATE TABLE #RemoteJobSteps (step_id int, [checksum] int);CREATE TABLE #LocalJobSchedules (schedule_name sysname, [checksum] int);CREATE TABLE #RemoteJobSchedules (schedule_name sysname, [checksum] int);DECLARE checker CURSOR FAST_FORWARD FORSELECT lj.job_id local_job_id, rj.job_id remote_job_id, lj.name FROM #LocalJobs ljINNER JOIN #RemoteJobs rj ON lj.name = rj.name;DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname;DECLARE @LocalCount int, @RemoteCount int;OPEN checker;FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;WHILE @@FETCH_STATUS = 0 BEGIN -- check jobsteps first:DELETE FROM #LocalJobSteps;DELETE FROM #RemoteJobSteps;INSERT INTO #LocalJobStepsSELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]FROM msdb.dbo.sysjobstepsWHERE job_id = @LocalJobId;INSERT INTO #RemoteJobStepsSELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]FROM PARTNER.msdb.dbo.sysjobstepsWHERE job_id = @RemoteJobId;SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps;SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps;IF @LocalCount != @RemoteCountINSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers');ELSE BEGIN INSERT INTO #DivergenceSELECT @JobName, 'Different Job Step Details between Servers'FROM #LocalJobSteps ljs INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_idWHEREljs.[checksum] != rjs.[checksum];END;-- Now Check Schedules:DELETE FROM #LocalJobSchedules;DELETE FROM #RemoteJobSchedules;INSERT INTO #LocalJobSchedulesSELECT ss.name,BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]FROM msdb.dbo.sysjobschedules sjsINNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_idWHEREsjs.job_id = @LocalJobId;INSERT INTO #RemoteJobSchedulesSELECT ss.name,BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]FROM PARTNER.msdb.dbo.sysjobschedules sjsINNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_idWHEREsjs.job_id = @RemoteJobId;SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules;SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules;IF @LocalCount != @RemoteCountINSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers');ELSE BEGIN INSERT INTO #DivergenceSELECT@JobName, 'Different Schedule Details between servers.'FROM #LocalJobSchedules ljsINNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_nameWHERE ljs.[checksum] != rjs.[checksum];END;FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;END;CLOSE checker;DEALLOCATE checker;IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems';DECLARE @crlf char(2) = CHAR(13) + CHAR(10);DECLARE @tab char(1) = CHAR(9);DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: '+ @crlf;SELECT @message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlfFROM #DivergenceWHEREname NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '')ORDER BY NAME;SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.'+ @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.'IF @ConsoleOnly = 1 BEGIN -- just Print out details:PRINT 'SUBJECT: ' + @subject;PRINT 'BODY: ' + @crlf + @message; ENDELSE BEGIN-- send a message:EXEC msdb..sp_notify_operator @profile_name = @MailProfileName, @name = @OperatorName, @subject = @subject,@body = @message;END;END;DROP TABLE #LocalJobs;DROP TABLE #RemoteJobs;DROP TABLE #Divergence;DROP TABLE #LocalJobSteps;DROP TABLE #RemoteJobSteps;DROP TABLE #LocalJobSchedules;DROP TABLE #RemoteJobSchedules;DROP TABLE #IgnoredJobs;RETURN 0;GO
If you decided to implement the Enable/Disable approach to managing ‘Batch Jobs’ (as outlined in Post #12) the code above will work as-is and will try to Enable/Disable jobs on your servers as needed--by making a call to the following sproc (which makes use of the Jobs State Table discussed and outlined in Post #13):
USE master;GOIF OBJECT_ID('dbo.dba_EnsureAGLevelJobEnabledStatuses','P') IS NOT NULLDROP PROC dbo.dba_EnsureAGLevelJobEnabledStatusesGOCREATE PROC dbo.dba_EnsureAGLevelJobEnabledStatuses@AGNamesysname, -- name of the AG to failover. @JobsTableDBNamesysname -- name of the db with the jobs-enabled-details tableASSET NOCOUNT ON; IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN-- If we're not on the PRIMARY server, then just make sure all jobs with categoryName = @AGName are disabledDECLARE disabler CURSOR FAST_FORWARD FORSELECT j.job_id,j.name FROM msdb.dbo.sysjobs jINNER JOIN msdb.dbo.syscategories c ON c.category_id = j.category_idWHEREj.[enabled] = 1 ANDc.name = @AGName;DECLARE @JobId uniqueidentifier;DECLARE @JobName sysname; OPEN disabler; FETCH NEXT FROM disabler INTO @JobId, @JobName;WHILE @@FETCH_STATUS = 0 BEGIN EXEC msdb.dbo.sp_update_job@job_id = @JobId, @enabled = 0;PRINT 'Disabled Job: [' + @JobName + '] ON Non-Primary Server.';FETCH NEXT FROM disabler INTO @JobId, @JobName;END;CLOSE disabler; DEALLOCATE disabler; END ;ELSE BEGIN -- otherwise, if we ARE on the primary, then we should ENABLE jobs that need to be enabled. DECLARE @Error sysname;DECLARE @Jobs TABLE ( JobName sysname NOT NULL, [Enabled] bit NOT NULL );DECLARE @sql nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];'INSERT INTO @JobsEXEC(@sql); DECLARE enabler CURSOR FAST_FORWARD FOR SELECT JobName FROM @Jobs WHERE [Enabled] = 1; OPEN enabler; FETCH NEXT FROM enabler INTO @JobName; WHILE @@FETCH_STATUS = 0 BEGIN SELECT @JobId = job_id FROM msdb.dbo.sysjobs WHERE name = @JobName;IF @JobId IS NULL BEGIN SET @Error = 'Unable to Locate (and set ENABLED) on Job: ' + @JobName + '.';THROW 50001, @Error, 1;RETURN -1; -- we're done. (should we send out an email alert? probably)ENDEXEC msdb.dbo.sp_update_job @job_id = @JobId, @enabled = 1; -- enable the job here... FETCH NEXT FROM enabler INTO @JobName; END; CLOSE enabler; DEALLOCATE enabler;END;RETURN 0; GO
However, if you choose to go with the IF/ELSE logic embedded in your Batch Jobs, then you’ll want to disable (that is, comment-out) the code in the ‘parent’ sproc that attempts these operations. Likewise, it should be called out that all of the code in this post is for a scenario where you’ve just got TWO servers in your Availability Group (hence the notion of calls to a ‘Partner’ Linked Server, as per post #15). If you’ve got more than two servers in your topology, the basic logic outlined in these posts will work--but you’ll need to modify it accordingly (which we’ll look at a tiny bit in a future post).
Step 5: Create a SQL Server Agent Job to Execute Regular Checkups
With code in place, you’re now ready to set up a Job that’ll execute regular synchronization checks. Basic logic for setting up this job is that you’ll want to set up a recurring schedule, then add a single job step to run ‘Server Level’ checks, and then add in an additional Job Step for each AlwaysOn Availability Group that you want to check up on. So, if you’ve got one Availability Group on your servers, you’d end up with a Job with two Job Steps; whereas if you have three AGs, you’ll end up with a Job with four Job Steps (one for Server-Level checkups and a distinct job step, or call to dba_SyncCheckupForAGLevelJobs for each of your Availability Groups).
In my next post, we’ll take a more concrete look at a sample/example Synchronization Job--and discuss some options and best practices for scheduling and review how these checks will "behave" in the "wild."
About the Author
You May Also Like