AlwaysOn Availability Groups and SQL Server Jobs, Part 28: Additional Options for Tackling Jobs Failover
July 6, 2015
Throughout this series of posts I’ve taken a somewhat pessimistic view of how SQL Server Agent jobs are managed within most organizations – meaning that most of the code and examples I’ve provided up until this point were based on assumptions about how CHANGE to jobs is managed. That pessimism, to date, has come in two forms:
Change Management. Based on past experience, I’ve assumed that if/when it comes time to make a change to an existing SQL Server Agent Job, that many organizations don’t have the policy and/or culture in place to PERFECTLY ensure that if/when a job needs to be changed, that they can 100% ensure that it gets perfectly changed on any/all servers hosting the current Availability Group. As such, I’ve provided a number of posts that showcase how to set up synchronization checks to notify admins of any discrepancies that might occur after someone might inadvertently go to change a detail about a job – and forget to change those details across other servers.
Enabling/Disabling Jobs. Likewise, I’ve also assumed that most orgs would prefer to simply disable a batch job (see Post #3) instead of deleting it if it wasn’t needed for a few days. So, for example, if there was some kind of processing job that needed to be disabled for, say, 3 days – I’ve assumed that admins would prefer to find the job in question and disable it – rather than delete it and then recreate it a few days later – when it was ready to spin up again. (Not only is disabling/enabling easier in my mind, but you also preserve execution history when you simply disable/enable.) The problem, of course, is that if you simply disable a job on ServerA, for example, then if/when you failover to ServerB (as the Primary) you’re going to have a job that should, “logically” be disabled, but which will most likely be enabled – and that could have disastrous consequences. As such, I outlined (in posts #12 and #13) a technique for using a ‘state table’ to help address this issue.
With the rationale for that skepticism re-iterated, it’s worth noting that if you DO have perfect policy and culture in place to ensure that you don’t need checks on job synchronization and/or if your organization would prefer to DELETE jobs (and recreate them if/when they need to be run again) instead of disabling/enabling them, then there are some techniques you can use to make overall Jobs management easier when Availability Groups are part of the equation.
Another Option for Tackling Failover and Jobs Activation
As an example of an easier or more streamlined (and less pessimistic) approach to managing Jobs, Tom Meyer emailed me a few months back with a different take or approach on Jobs Management for AG-enabled servers. He’s been following along with this series of posts and wanted to share a different approach or philosophy for managing jobs – and which I, in turn, wanted to share with everyone else that’s been following along.
Tom’s logic or approach is MUCH more streamlined than mine – and works for him primarily because he’s sure he doesn’t need regular ‘synchronization checks’ nor does he need to worry about disabled/enabled jobs because he handles both of these details much differently than I do. As such, he ends up with a much simpler approach to Jobs Management. Here’s how he outlined the process:
What I set up is this:
Create job category/categories for all jobs you want to sync
Put all the jobs in them. You may want to note that if you manually update the category on a bunch of jobs by manually updating the sysjobs table that you have to restart the agent otherwise it will throw an error if you try and reopen the job.
The stored proc loops through all of these jobs using the following logic:
If primary replica and disabled, enable it
If primary replica and enabled, do nothing
If secondary replica and enabled, disable it
If secondary replica and disabled, do nothing
Create agent job that runs the stored proc for specified categories
Create the alert for message 1480 (post #5 – Setting up Failover Alerts)
Set the response for this alert to execute the job.
With this approach, Tom doesn’t need a ‘sync check’ job running every N minutes, and he doesn’t need a state table. He also doesn’t need Linked Servers (in order to run synchronization checks) which further decreases complexity. In fact, here’s a sample of the code Tom sent me a few months ago that he’s using:
CREATE procedure [dbo].[uspDBAHADRAgentJobFailover] (@agname varchar(200))asbegin declare @is_primary_replicate bit declare @job_name sysname declare @job_enabled bit select @is_primary_replicate=DBAWork.dbo.fn_hadr_group_is_primary(@agname) declare job_cursor cursor for select s.name from msdb.dbo.sysjobs s inner join msdb.dbo.syscategories c on s.category_id = c.category_id where c.name = @agname order by name open job_cursor fetch next from job_cursor into @job_name while @@fetch_status = 0 begin select @job_enabled=enabled from msdb.dbo.sysjobs where name = @job_name if @is_primary_replicate = 1 begin if @job_enabled = 1 print @job_name+' enabled on primary. do nothing' else begin print @job_name+' disabled on primary. enable it !' exec msdb.dbo.sp_update_job @job_name = @job_name,@enabled = 1 end end else if (@is_primary_replicate = 0) begin if @job_enabled = 1 begin print @job_name+' enabled on secondary. disable it !' exec msdb.dbo.sp_update_job @job_name = @job_name,@enabled = 0 end else print @job_name+' disabled on secondary. do nothing' end fetch next from job_cursor into @job_name end close job_cursor deallocate job_cursorend GO Click and drag to move
Overall, it’s a simple, and elegant solution that makes tons of sense if you don’t need to worry about either of the issues I raised in the start of this post (i.e., related to change management and enabling/disabling jobs). Likewise, IF you didn’t have to worry about change management but did want to keep jobs enabled/disabled – it wouldn’t take too much effort to ‘tweak’ Tom’s approach and add in a state table (like I outlined previously in #12 and #13) OR to possibly create a new Jobs Category called ‘AGSuchAndSuch-Disabled’ where you could accomplish very similar needs without a state table.
In short, there are a number of different ways to tackle the issue of Jobs Management – and the best fit will really depend upon your own environment, culture, needs, and so on.
Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3
Part 11: A Fork in the Road
Part 12: Issues with Enabling and Disabling Batch Jobs
Part 13: Creating and Populating a Batch Job State Table
Part 14: Establishing Synchronization Checks
Part 15: Setting Up Linked Servers
Part 16: Job Synchronization Checkup Logic
Part 17: Creating Jobs to Check on Synchronization
Part 18: Health Checks for Availability Groups
Part 19: Availability Group Database Backups
Part 20: Avoiding Backup Fragmentation
Part 21: Assigning Backup Preferences
Part 22: Executing Backups
Part 23: Maintenance Plan Backups and SSIS Packages
Part 24: Synchronizing SSIS Packages
Part 25: Final Remarks on Backups
Part 26: Using Master Servers for Administration
Part 27: Options and Concerns for More Advanced Deployments
About the Author
You May Also Like