Category Archives: monitoring

Katherine’s Excellent Log-Shipping Adventure

Or, Log-Shipping over 1200 databases automagically.

Picture1

Continue reading

Leave a Comment

Filed under monitoring, powershell, sql

Nagios Event Handlers on Windows

Nagios event handlers are WHERE IT’S AT, BABY, YEAH!  There are some services that I can just automagically restart without any problems.  (WSUS, SQL Agent, etc.) This way, instead of notifying me, Nagios can just fix the problem for me and We Need Never Know.

These instructions assume I’m running NSClient++.

The script is

@echo off
net start %1
@exit 0

(This is kept intentionally minimal so it’ll be reusable.)  I’m referring to this in nsclient.ini, under the “; A list of scripts available to run from the CheckExternalScripts module. Syntax is: <command>=<script> <arguments>” header.

restartwsus=scripts\runcmd.bat wsusservice

On the Nagios server, I’ve defined the check in commands.cfg as:

define command{
 command_name restartwsus
 command_line /usr/lib/nagios/plugins/check_nrpe -H '$HOSTADDRESS$' -c restartwsus
}

and in the service definition as:

define service{
        use                     generic-service
        host_name               wsusserver
        service_description     WSUS
        contacts                me
        notification_options    w,c,r
        notification_period     24x7
        notification_interval   0
        check_command           check_nt!SERVICESTATE!-d SHOWALL -l WsusService
        event_handler           restartwsus
        }

It looks like this is copy and paste-able.

Leave a Comment

Filed under monitoring

Nagios Twitter Notifications – working again!

Yeah.

Twitter changed their authentication, and my old Twitter notifications (based on Ed Voncken’s work) seized up and failed.  I had to update the python tweepy library to get them to work.

pip install tweepy –upgrade

And they’re back!

I love Twitter notifications, BTW.  <3

Leave a Comment

Filed under monitoring

Jabber Nagios Notifications – Working Again!

I like using non-email notifications, especially when monitoring, you know, email.  So I have notifications going out via twitter and google talk.  Of the three, the order of speediness is:

  1. Google Talk
  2. Twitter
  3. Email

So I was really sad when the google talk notifications stopped working late last week.  It took awhile for me to have time to fix them, though, and basically I just found a google groups post telling me what to do.  Namely, this at the top:

use IO::Socket::SSL;
{
no warnings ‘redefine’;
my $old_connect_SSL = \&IO::Socket::SSL::connect_SSL;
*IO::Socket::SSL::connect_SSL = sub {
my $sock = $_[0];
${*$sock}{_SSL_arguments}{SSL_cipher_list} = ‘RC4-MD5’;
goto $old_connect_SSL;
};
}

Merci beaucoups à Cédric Bouvier pour la correction!

2 Comments

Filed under monitoring

Thanksgiving Gluttony

Yum, Nagios gluttony!

I’m donating Nagios monitoring to a couple of nonprofits, and this brings up how Nagios configurations grow.  In short, you learn over time what you need to keep an eye on.

For example:  On one nonprofit, someone forgot to renew the domain (oops!).  It just so happens that there’s a plugin for that.  Godaddy outage hoses DNS?  Add a check for that.  SSL cert expires (oops!)?  Add a check for that.  The web site returns 200 OK (thereby showing up as okay in Nagios) but no content appears?  Add a check for that.

And then apply all those checks to your other hosts.  So the same thing doesn’t happen to them.

And this is how you end up with so many checks.

define command {
command_name check_content
command_line $USER1$/check_http -r “</body>” -H $HOSTADDRESS
}

define command {
command_name DNS_resolving
command_line $USER1$/check_dns -H $HOSTADDRESS
}

define command {
command_name check_domain
command_line $USER1$/check_domain -d $HOSTADDRESS
}

define command {
command_name check_cert
command_line $USER1$/check_http -ssl -C 14 -H ‘$HOSTADDRESS’
}

Yes, I added checks on Thanksgiving.  *facepalm*

Leave a Comment

Filed under monitoring

True Confessions

I run Nagios at home.  It texts me when my machines need patches.

I told this to a charming gentleman who was my dinner companion for the evening and he gave me a look that implied that I was not all there.  (He’s a Nagios admin, too, but not willingly.)  I found myself spluttering defensively, “I’m testing things for work!”

It’s absolutely true that I’m testing things for work.  In fact, I just set it up to check and make sure work’s email and web were up today, because they lost connectivity earlier this weekend.  But it’s also true that it’s fun.

My home Nagios server also Twitters.  I don’t remember if I told the charming gentleman that or not.  I suspect that I did.

Maybe I shouldn’t tell him about the webcam that tweets whenever someone is in my driveway.

Leave a Comment

Filed under monitoring

Clearly, you’re doing it wrong.

So, I have this friend.  (No, really, it’s my friend, it’s not me, I set up my own Nagios server.)  She’s a DBA with no responsibility for anything outside of a bunch of SQL Servers. Nagios wakes her up in the middle of the night if the web server goes down.

If you page people in the middle of the night over things that aren’t their responsibility, you’re just training them to ignore their pagers.  I once worked with someone who was, according to legend, the only person ever to work at [name of company redacted] ever to successfully flush a pager.  (And they didn’t even have Nagios at that time!)

I feel the same way about people who receive daily “CRITICAL!!!” emails that their servers’ drives are 98% full.   Nagios is supposed to be informing you about things that are unusual.  If your SQL Server typically uses 96% of its RAM (mine do), don’t turn off warnings and only receive notifications for critical, and don’t receive daily emails saying that the servers are using too much RAM.  Up the thresholds to sane numbers that indicate an unusual condition.  What do you think happens if, in the slew of daily emails about “CRITICAL!!!” there’s a disk that usually isn’t 100% full, or a service down, or a memory leak?  No, no.  You don’t want your slew of “Situation Normal:  All Frelled Up” emails, you want to know when something unusual is occurring.

If you’re like me, you resist this. “Dammit, my C: drive should be at least 20% free!”  There comes a time when you have to accept that a number is not an attainable number and work from there.

Leave a Comment

Filed under monitoring

Tracking total file usage, DBs only.

Yes, I’ve been bad about updating.  I was traumatized by a bunch of friends getting laid off from a former employer, and then I had two very busy weeks.

This system of tracking total file usage, DBs only, came up talking to someone recently (Eric, are you reading this?) and I thought I would share.

First, there’s the query to get SQL Server to tell you how much space it’s using.  Sure, you could map the drive or remote out to the server (unless, you know, you can’t), but this is a good sanity-checking number that you can compare to what the OS says you’re using.  I had an issue recently where that was handy information.  I may have downloaded this off the internet somewhere, or may have written it.  I forget which.  So, if I just stole your query and posted it as my own, I’m sorry!  (I did Google and didn’t find it.)

declare @totalsize float,
@bytes float,
@kb float,
@mb float,
@gb float,
@tb float

CREATE TABLE #temp
(
size int
)

insert into #temp (size) EXECUTE sp_msforeachdb ‘SELECT size FROM [?].sys.database_files’

select @totalsize = SUM(size) from #temp
set @bytes = (@totalsize * 8192)
set @kb = (@bytes / 1024)
set @mb = (@kb / 1024)
set @gb = (@mb / 1024)
set @tb = (@gb / 1024)

drop table #temp

–print @totalsize
–print @kb
–print @mb
print @gb
print @tb

(Download.)

You can tell by which two aren’t commented out the data sizes I’m generally dealing with. Comment or uncomment as suits your situation.

Well, that’s fine and dandy, but you might have more than one server, or more than one instance, or you might want to track those numbers over time (which is what I was going for, yes).  I do, so I have a table:

CREATE TABLE [dbo].[datasize](
[id] [int] IDENTITY(1,1) NOT NULL,
[instance] [varchar](50) NULL,
[datasize] [float] NULL,
[dateadded] [datetime] NULL,
CONSTRAINT [PK_datasize] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[datasize] ADD  CONSTRAINT [DF_datasize_dateadded]  DEFAULT (getdate()) FOR [dateadded]
GO

(Download.)

I generally store the GB number in the datasize column, because some instances are bigger than others (GB is my smallest sane number), but you can store whichever size is meaningful for your situation.  Just, you know.  Always store the same measure (duh, you knew that).

Sadly, I have not yet automated populating this table because I’m working across two domains that don’t trust each other, so I’m C&Ping the results from six instances.  That’s not that bad.  You could probably easily automate that, though.  I’m automating tracking log size and usage (for reasons I may address later), so, you know.  Easy peasy.  Make SQL Agent do it for you. (I’m a sysadmin.  We’re lazy.)  Especially if you have, like, a million instances.

Okay, so you have this data.  Now what?

I have a view that’s the current data:

CREATE VIEW [dbo].[current_size_by_instance]
AS
SELECT     instance, datasize, CONVERT(varchar, CAST(DATEDIFF(dd, 0, dateadded) AS Datetime), 110) AS date
FROM         dbo.datasize
WHERE     (dateadded >
(SELECT     MAX(CAST(DATEDIFF(dd, 0, dateadded) AS Datetime)) AS Expr1
FROM          dbo.datasize AS datasize_1))

GO

(Download.)

Make SQL sum or average or any other kind of slicing and dicing you want there. However, the really shiny part for me is tracking data usage:

CREATE TABLE #filegrowth (instance varchar(255), maxdata float, mindata float, maxdate datetime, mindate datetime)

insert into #filegrowth (instance, maxdata, maxdate)
(select instance, datasize as maxdata, dateadded from CPMaintenance.dbo.datasize where dateadded  >
(SELECT     CONVERT(varchar, MAX(dateadded), 112)
FROM          Maintenance.dbo.datasize))
insert into #filegrowth (instance, mindata, mindate)
(select instance, datasize as mindata, dateadded from Maintenance.dbo.datasize where dateadded  <
(SELECT     CONVERT(varchar, MIN(dateadded) + 1, 112)
FROM          Maintenance.dbo.datasize))

select instance, (MAX(maxdata) – MAX(mindata)) as filegrowth, datediff(d,(max(mindate)),(MAX(maxdate))) as timeframe_in_days from #filegrowth group by instance
union
select ‘total’ as instance, (SUM(maxdata) – SUM(mindata)) as filegrowth, datediff(d,(max(mindate)),(MAX(maxdate))) as timeframe from #filegrowth

drop table #filegrowth

SELECT sum(datasize) / 1024 as TB, CONVERT(varchar, CAST(DATEDIFF(dd,0,dateadded) AS Datetime), 110) as date
FROM Maintenance.dbo.datasize GROUP BY CAST(DATEDIFF(dd,0,dateadded) AS Datetime)

(Download.)

I actually have that as a stored procedure so I don’t have to open a file to load a script. (Lazy!)  So, you can, too.  Just paste that as the main part of the procedure into CREATE STORED PROCEDURE SP_OMGLAZYBUM and go from there.

And yes, this is in addition to SQL Monitor and Nagios.  But it came in handy recently.  We were having DAS issues and there was some question about SQL’s file usage, and I was able to confirm based on numbers from last week and fresh numbers that yes, those are sane numbers.

Leave a Comment

Filed under monitoring

Perfmon Link

Awesome webcast by Brent Ozar.

There’s more here.  I’d say more, but I’m busy enjoying Memorial Day.

Leave a Comment

Filed under monitoring

Your Servers’ Baby Monitor

Do you know about Write or Die?  It’s described as “putting the prod into productivity” and is for procrastinating writers to force themselves to write.  (Writers procrastinate.  It’s a thing.  You can spend hours surfing the web for baby name pages to come up with the perfect name for your walk-on character.  Or you can name him John Doe and fix it in revisions.  The latter is probably more productive.)  You enter a word goal and a time limit and click “Write,” and any time you stop writing the screen turns red.  If you stop long enough, an annoying sound will play.  You might get RickRolled, or have painfully bad violin practice, but I’ve set my copy of the desktop edition to exclusively play the crying babies sound.

This is the perfect metaphor for my monitoring philosophy.

For this reason, it makes me a little insane that I have 392 new email messages from SQL Monitor today about fragmented indexes.  (My phone said 687.)  That’s a whole lot of crying babies. Apparently, I have some work to do.

I’m much happier when my baby monitor is silent and the monitoring page shows a lot of happy, peaceful servers. You know, when I come in the morning and look and they’re all cheerfully perking away doing their thing.  I used to keep my Nagios screen 100% green, and it made me a little wacky when we merged Nagios servers and I added the servers of the guys who actually like getting their daily, “Yes, your hard drive is still 100% full!” emails.  *twitch*  Ah well, it makes them happy.

There’s a Nagios plugin for Firefox that plays a sound when you have a problem.  I’d like to get that to play the crying babies sound.  The advantage to that would be that if anything of mine ever broke, not only would my sensibilities be offended, but if I didn’t fix it promptly my coworkers would kill me and no one would ever find my body!  Now there’s putting the prod into productivity!

And now?  Apparently, I need to go into the nursery and shut up  calm some babies.  (Not about the indexes, about Something Else.)

Leave a Comment

Filed under monitoring