Or, Log-Shipping over 1200 databases automagically.

The Problem

Question: How many SQL Server databases can you put on one server?

Answer: It depends on how fast you need them restored if something terrible happens.

Because Dude, if your server crashes most heinously, and all your customers can’t get at their stuff, then, you know, that would be really bogus.

Okay, yeah. That’s a problem. What should I do about it?

The first thing I did was write a PowerShell script that restores a list of databases, pulling their backups from the directories where they’re written.

$controlfile=”C:\path\to\listofdatabases.txt”$restoreserver=”127.0.0.1\Instance”
$fullbackpath=”F:\Full Backups\Instance”$diffbackpath=”F:\Differential Backups\Instance”
$logbackpath=”F:\Log backups\prod1″$mdfpath=”D:\Instance\Instance”
$ldfpath=”E:\Instance\Log”$password=”password”

foreach ($dbname in get-content$controlfile)
{
$dbsearch =$dbname + “*”
$fullback = Get-ChildItem$fullbackpath | Where-Object {$_.Name -like$dbsearch}
$query = “EXECUTE master..sqlbackup ‘-SQL “”””RESTORE DATABASE [” +$dbname + “] FROM DISK = ”” + $fullback.FullName + “” WITH PASSWORD = ”” +$password + “”, NORECOVERY, MOVE ”NamingConvention” TO ”” + $mdfpath + “\” +$dbname + “.mdf”, MOVE ”NamingConvention_log” TO ”” + $ldfpath + “\” +$dbname + “_log.ldf”””””‘”
Write-Host $query sqlcmd -E -S$restoreserver -Q $query$dbsearch = $dbname + “*”$diffback = Get-ChildItem $diffbackpath | Where-Object {$_.Name -like $dbsearch} | Sort-Object LastAccessTime -Descending | Select-Object -First 1$query = “EXECUTE master..sqlbackup ‘-SQL “”””RESTORE DATABASE [” + $dbname + “] FROM DISK = ”” +$diffback.FullName + “” WITH PASSWORD = ”” + $password + “”, NORECOVERY, REPLACE””””‘” sqlcmd -E -S$restoreserver -Q $query$dbsearch = $dbname + “_LOG*”$logbacks = Get-ChildItem $logbackpath | Where-Object {$_.Name -like $dbsearch} | Where-Object {$_.LastWriteTime -gt $diffback.LastWriteTime} | Sort-Object LastWriteTime foreach ($logback in $logbacks) {$query = “EXECUTE master..sqlbackup ‘-SQL “”””RESTORE LOG [” + $dbname + “] FROM DISK = ”” +$logback.FullName + “” WITH PASSWORD = ”” + $password + “”, NORECOVERY””””‘” sqlcmd -E -S$restoreserver -Q $query }$query = “RESTORE DATABASE [” + $dbname + “] WITH RECOVERY” sqlcmd -E -S$restoreserver -Q $query } Restoring would take a very long time. Some larger databases would take four or five hours. Other databases went faster, but there are a lot of databases on that instance. I estimate that running this script to restore everything would take days. Let’s Log Ship! The script came in handy when it came time to populate the log ship instances, though! (I just edited out the line about restoring with recovery.) Then all I needed was a job with dynamic SQL to do the actual log-shipping. (I wanted it to work automagically after I added more databases without my editing anything.) I pulled a list of databases out of the main product configuration database and stole SQL from Red Gate’s generated log ship job (the one I use for aforementioned main product database). I’m not including the get database function because, well, because it’s pretty my-company specific and I don’t know that it would be useful to everyone else. You know how you’re going to get a list of databases off another server, right? declare @SQL nvarchar(max), @dbname nvarchar(80), @CurrentRow int, @RowsToProcess int CREATE TABLE #maintenancetemp (RowID int not null primary key identity(1,1), dbname nvarchar(80) ) INSERT into #maintenancetemp (dbname) SELECT [maintenance].[dbo].[getDBList] () inner join sys.databases on dbname=name SET @RowsToProcess=@@ROWCOUNT SET @CurrentRow=0 WHILE @CurrentRow<@RowsToProcess BEGIN SET @CurrentRow=@CurrentRow+1 SELECT @dbname=dbname FROM #maintenancetemp WHERE RowID=@CurrentRow set @SQL=’EXECUTE master..sqlbackup ”-SQL “RESTORE LOG [‘ + @dbname + ‘] FROM DISK = ””F:\Log Backups from Instance\’ + @dbname + ‘_LOG_*.sqb”” WITH PASSWORD = ””$password”’, NORECOVERY, MOVETO = ””F:\Log Backups from Instance\processed””””’
EXEC SP_EXECUTESQL @SQL

END

That doesn’t sound that hard! Was it really that easy?

Alas. Here are some of the problems I ran into.

• The dynamic SQL didn’t fail if it didn’t find any log files to restore, so I had to write a job to check for log ship FAIL.

(That’s what the job is actually called, too: “Check for log ship FAIL.”)

with LogShip_CTE (destination_database_name, restore_date)
as
(
SELECT [destination_database_name], max([restore_date]) as restore_date
FROM [msdb].[dbo].[restorehistory]
where restore_type=’L’
group by destination_database_name
)

select destination_database_name, restore_date
from LogShip_CTE
order by destination_database_name

• I had to write a job to check for unshipped databases, because I hadn’t automated set up yet.

Okay, maybe that’s not that heinous, but I’m a sysadmin and therefore lazy. So for about a month whenever a new customer signed up I’d get an email telling me to set them up. It’s a lot nicer if it sets itself up and I can go on vacation.

• After a period of time, the jobs slowed down. This took some effort to figure out.

There were a couple of things that slowed down restores over time, and I had to try several things to figure out what they were.

• The first thing I tried was splitting up the jobs and running them simultaneously. This actually made things worse because of MSDB contention.
• (Aside: MSDB is not super-efficient. I, um, added an index for my log ship query.)
• I tried increasing memory to the log ship instance, which helped slightly, but more memory didn’t make things better.

Finally, I found the problem:

• Red Gate SQL Backup, while awesome, creates a single log file per restore by default. There are over 1200 databases being restored every 15 minutes. That’s a whole lot of files. BOGUS!
• Windows has a maximum number of files per directory.
• Basically, the jobs waited after each restore for Windows to tell them, “You cannot have more log files. You have a log file problem. I’m cutting you off.”
• I had to write a script to clean out the directory:

forfiles -p “C:\ProgramData\Red Gate\SQL Backup\Log\DR” -s -m *.* /D -7 /C “cmd /c del @path /q

(Yes, I could write that in PowerShell, but I had batch lying around. I still have some kickin’ it old skool assets.)

FAST!

No more Log Ship FAIL!

But it’s still not Automagical…

I was still getting emails to tell me to set up log shipping on new databases. This took me a while to figure out, too.

First Step:

• Parameterize that restore script so it can be invoked with variables.
• The script backs up the database on the production server, copies it to the DR instance, and restores it.
• It kind of looked like a bunch of this: initialSetup.ps1 databasename servername

I tried to run it as a SQL Agent job, but that didn’t work so well because of permissions. (It may be happier in your environment than in ours, though.) PowerShell and Task Scheduler to the rescue!

Basically, it:

• Runs the “check for unshipped databases” query, but outputs the information like “powershell initialsetup.ps1 databasename servername”
• Runs SQLCMD with the –o flag to generate a temp file
• Looks for lines that start with the word “powershell” and puts them in a .cmd file
• Runs and then deletes the cmd file.

Okay, so it’s a PowerShell script that generates a .cmd file that runs a bunch of different PowerShell scripts with different parameters. But now I can go on vacation and everything, and log shipping proceeds apace. EXCELLENT!