Category Archives: monitoring

Nagios Checks for Dell OpenManage Disk Health Written in PowerShell

I wrote these in PowerShell to run with nrpe/nsclient.  They query Dell OpenManage’s command line and return a Nagios-readable result.

(There are plugins available on Nagios Exchange, but they all seemed to be… more than I wanted.  I just wanted to know, “Is my RAID okay?”  Because whenever I’ve had an Event, I like to monitor for that event happening again.)

This checks the Physical Disks in the array.  If one or more disks reports anything other than OK, it alerts:

$status = 0; omreport storage pdisk controller=0 | Where-Object {$_ -match "^status"} | %{if($_ -notlike "*OK*"){$status=2}}

If ($status -eq 0) {
Write-Host "OK:  Physical Disks report OK"
} else {
Write-Host "CRITICAL:  Check OpenManage"
exit $status

You might have to edit the scripts to check Virtual Disk health.  It could probably be made more elegant, but it suits my purposes.

This script checks the health of my C drive (vdisk 0):

omreport storage vdisk controller=0 vdisk=0 | ?{$_ -match "^status"} | %{$status=0}{if($_ -notlike "*OK*"){$status=2}}

If ($status -eq 0) {
Write-Host "OK:  Virtual Disk (OS) reports OK"
} else {
Write-Host "CRITICAL:  Check OpenManage"
exit $status

This script checks the health of my data drive (E, vdisk 1):

omreport storage vdisk controller=0 vdisk=1 | ?{$_ -match "^status"} | %{$status=0}{if($_ -notlike "*OK*"){$status=2}}

If ($status -eq 0) {
Write-Host "OK:  Virtual Disk (data) reports OK"
} else {
Write-Host "CRITICAL:  Check OpenManage"
exit $status

Running these involves adding lines like this to the external script section of nsclient.ini or equivalent:

check_physicaldisk = cmd /c echo scripts\\pdiskcheck.ps1; exit($lastexitcode) | powershell.exe -command -

check_virtualdisk = cmd /c echo scripts\\vdiskcheck.ps1; exit($lastexitcode) | powershell.exe -command -

And lines like this to command.cfg in Nagios:

define command {
command_name    check_physicaldisk
command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c check_physicaldisk
define command {
command_name    check_CRaid
command_line    /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c check_virtualdisk

Comments Off on Nagios Checks for Dell OpenManage Disk Health Written in PowerShell

Filed under monitoring, powershell

Katherine’s Excellent Log-Shipping Adventure

Or, Log-Shipping over 1200 databases automagically.


Continue reading

Comments Off on Katherine’s Excellent Log-Shipping Adventure

Filed under monitoring, powershell, sql

Nagios Event Handlers on Windows

Nagios event handlers are WHERE IT’S AT, BABY, YEAH!  There are some services that I can just automagically restart without any problems.  (WSUS, SQL Agent, etc.) This way, instead of notifying me, Nagios can just fix the problem for me and We Need Never Know.

These instructions assume I’m running NSClient++.

The script is

@echo off
net start %1
@exit 0

(This is kept intentionally minimal so it’ll be reusable.)  I’m referring to this in nsclient.ini, under the “; A list of scripts available to run from the CheckExternalScripts module. Syntax is: <command>=<script> <arguments>” header.

restartwsus=scripts\runcmd.bat wsusservice

On the Nagios server, I’ve defined the check in commands.cfg as:

define command{
 command_name restartwsus
 command_line /usr/lib/nagios/plugins/check_nrpe -H '$HOSTADDRESS$' -c restartwsus

and in the service definition as:

define service{
        use                     generic-service
        host_name               wsusserver
        service_description     WSUS
        contacts                me
        notification_options    w,c,r
        notification_period     24x7
        notification_interval   0
        check_command           check_nt!SERVICESTATE!-d SHOWALL -l WsusService
        event_handler           restartwsus

It looks like this is copy and paste-able.

Comments Off on Nagios Event Handlers on Windows

Filed under monitoring

Nagios Twitter Notifications – working again!


Twitter changed their authentication, and my old Twitter notifications (based on Ed Voncken’s work) seized up and failed.  I had to update the python tweepy library to get them to work.

pip install tweepy –upgrade

And they’re back!

I love Twitter notifications, BTW.  <3

Comments Off on Nagios Twitter Notifications – working again!

Filed under monitoring

Jabber Nagios Notifications – Working Again!

I like using non-email notifications, especially when monitoring, you know, email.  So I have notifications going out via twitter and google talk.  Of the three, the order of speediness is:

  1. Google Talk
  2. Twitter
  3. Email

So I was really sad when the google talk notifications stopped working late last week.  It took awhile for me to have time to fix them, though, and basically I just found a google groups post telling me what to do.  Namely, this at the top:

use IO::Socket::SSL;
no warnings ‘redefine’;
my $old_connect_SSL = \&IO::Socket::SSL::connect_SSL;
*IO::Socket::SSL::connect_SSL = sub {
my $sock = $_[0];
${*$sock}{_SSL_arguments}{SSL_cipher_list} = ‘RC4-MD5’;
goto $old_connect_SSL;

Merci beaucoups à Cédric Bouvier pour la correction!


Filed under monitoring

Thanksgiving Gluttony

Yum, Nagios gluttony!

I’m donating Nagios monitoring to a couple of nonprofits, and this brings up how Nagios configurations grow.  In short, you learn over time what you need to keep an eye on.

For example:  On one nonprofit, someone forgot to renew the domain (oops!).  It just so happens that there’s a plugin for that.  Godaddy outage hoses DNS?  Add a check for that.  SSL cert expires (oops!)?  Add a check for that.  The web site returns 200 OK (thereby showing up as okay in Nagios) but no content appears?  Add a check for that.

And then apply all those checks to your other hosts.  So the same thing doesn’t happen to them.

And this is how you end up with so many checks.

define command {
command_name check_content
command_line $USER1$/check_http -r “</body>” -H $HOSTADDRESS

define command {
command_name DNS_resolving
command_line $USER1$/check_dns -H $HOSTADDRESS

define command {
command_name check_domain
command_line $USER1$/check_domain -d $HOSTADDRESS

define command {
command_name check_cert
command_line $USER1$/check_http -ssl -C 14 -H ‘$HOSTADDRESS’

Yes, I added checks on Thanksgiving.  *facepalm*

Comments Off on Thanksgiving Gluttony

Filed under monitoring

True Confessions

I run Nagios at home.  It texts me when my machines need patches.

I told this to a charming gentleman who was my dinner companion for the evening and he gave me a look that implied that I was not all there.  (He’s a Nagios admin, too, but not willingly.)  I found myself spluttering defensively, “I’m testing things for work!”

It’s absolutely true that I’m testing things for work.  In fact, I just set it up to check and make sure work’s email and web were up today, because they lost connectivity earlier this weekend.  But it’s also true that it’s fun.

My home Nagios server also Twitters.  I don’t remember if I told the charming gentleman that or not.  I suspect that I did.

Maybe I shouldn’t tell him about the webcam that tweets whenever someone is in my driveway.

Comments Off on True Confessions

Filed under monitoring

Clearly, you’re doing it wrong.

So, I have this friend.  (No, really, it’s my friend, it’s not me, I set up my own Nagios server.)  She’s a DBA with no responsibility for anything outside of a bunch of SQL Servers. Nagios wakes her up in the middle of the night if the web server goes down.

If you page people in the middle of the night over things that aren’t their responsibility, you’re just training them to ignore their pagers.  I once worked with someone who was, according to legend, the only person ever to work at [name of company redacted] ever to successfully flush a pager.  (And they didn’t even have Nagios at that time!)

I feel the same way about people who receive daily “CRITICAL!!!” emails that their servers’ drives are 98% full.   Nagios is supposed to be informing you about things that are unusual.  If your SQL Server typically uses 96% of its RAM (mine do), don’t turn off warnings and only receive notifications for critical, and don’t receive daily emails saying that the servers are using too much RAM.  Up the thresholds to sane numbers that indicate an unusual condition.  What do you think happens if, in the slew of daily emails about “CRITICAL!!!” there’s a disk that usually isn’t 100% full, or a service down, or a memory leak?  No, no.  You don’t want your slew of “Situation Normal:  All Frelled Up” emails, you want to know when something unusual is occurring.

If you’re like me, you resist this. “Dammit, my C: drive should be at least 20% free!”  There comes a time when you have to accept that a number is not an attainable number and work from there.

Comments Off on Clearly, you’re doing it wrong.

Filed under monitoring

Tracking total file usage, DBs only.

Yes, I’ve been bad about updating.  I was traumatized by a bunch of friends getting laid off from a former employer, and then I had two very busy weeks.

This system of tracking total file usage, DBs only, came up talking to someone recently (Eric, are you reading this?) and I thought I would share.

First, there’s the query to get SQL Server to tell you how much space it’s using.  Sure, you could map the drive or remote out to the server (unless, you know, you can’t), but this is a good sanity-checking number that you can compare to what the OS says you’re using.  I had an issue recently where that was handy information.  I may have downloaded this off the internet somewhere, or may have written it.  I forget which.  So, if I just stole your query and posted it as my own, I’m sorry!  (I did Google and didn’t find it.)

declare @totalsize float,
@bytes float,
@kb float,
@mb float,
@gb float,
@tb float

size int

insert into #temp (size) EXECUTE sp_msforeachdb ‘SELECT size FROM [?].sys.database_files’

select @totalsize = SUM(size) from #temp
set @bytes = (@totalsize * 8192)
set @kb = (@bytes / 1024)
set @mb = (@kb / 1024)
set @gb = (@mb / 1024)
set @tb = (@gb / 1024)

drop table #temp

–print @totalsize
–print @kb
–print @mb
print @gb
print @tb


You can tell by which two aren’t commented out the data sizes I’m generally dealing with. Comment or uncomment as suits your situation.

Well, that’s fine and dandy, but you might have more than one server, or more than one instance, or you might want to track those numbers over time (which is what I was going for, yes).  I do, so I have a table:

CREATE TABLE [dbo].[datasize](
[id] [int] IDENTITY(1,1) NOT NULL,
[instance] [varchar](50) NULL,
[datasize] [float] NULL,
[dateadded] [datetime] NULL,
[id] ASC



ALTER TABLE [dbo].[datasize] ADD  CONSTRAINT [DF_datasize_dateadded]  DEFAULT (getdate()) FOR [dateadded]


I generally store the GB number in the datasize column, because some instances are bigger than others (GB is my smallest sane number), but you can store whichever size is meaningful for your situation.  Just, you know.  Always store the same measure (duh, you knew that).

Sadly, I have not yet automated populating this table because I’m working across two domains that don’t trust each other, so I’m C&Ping the results from six instances.  That’s not that bad.  You could probably easily automate that, though.  I’m automating tracking log size and usage (for reasons I may address later), so, you know.  Easy peasy.  Make SQL Agent do it for you. (I’m a sysadmin.  We’re lazy.)  Especially if you have, like, a million instances.

Okay, so you have this data.  Now what?

I have a view that’s the current data:

CREATE VIEW [dbo].[current_size_by_instance]
SELECT     instance, datasize, CONVERT(varchar, CAST(DATEDIFF(dd, 0, dateadded) AS Datetime), 110) AS date
FROM         dbo.datasize
WHERE     (dateadded >
(SELECT     MAX(CAST(DATEDIFF(dd, 0, dateadded) AS Datetime)) AS Expr1
FROM          dbo.datasize AS datasize_1))



Make SQL sum or average or any other kind of slicing and dicing you want there. However, the really shiny part for me is tracking data usage:

CREATE TABLE #filegrowth (instance varchar(255), maxdata float, mindata float, maxdate datetime, mindate datetime)

insert into #filegrowth (instance, maxdata, maxdate)
(select instance, datasize as maxdata, dateadded from CPMaintenance.dbo.datasize where dateadded  >
(SELECT     CONVERT(varchar, MAX(dateadded), 112)
FROM          Maintenance.dbo.datasize))
insert into #filegrowth (instance, mindata, mindate)
(select instance, datasize as mindata, dateadded from Maintenance.dbo.datasize where dateadded  <
(SELECT     CONVERT(varchar, MIN(dateadded) + 1, 112)
FROM          Maintenance.dbo.datasize))

select instance, (MAX(maxdata) – MAX(mindata)) as filegrowth, datediff(d,(max(mindate)),(MAX(maxdate))) as timeframe_in_days from #filegrowth group by instance
select ‘total’ as instance, (SUM(maxdata) – SUM(mindata)) as filegrowth, datediff(d,(max(mindate)),(MAX(maxdate))) as timeframe from #filegrowth

drop table #filegrowth

SELECT sum(datasize) / 1024 as TB, CONVERT(varchar, CAST(DATEDIFF(dd,0,dateadded) AS Datetime), 110) as date
FROM Maintenance.dbo.datasize GROUP BY CAST(DATEDIFF(dd,0,dateadded) AS Datetime)


I actually have that as a stored procedure so I don’t have to open a file to load a script. (Lazy!)  So, you can, too.  Just paste that as the main part of the procedure into CREATE STORED PROCEDURE SP_OMGLAZYBUM and go from there.

And yes, this is in addition to SQL Monitor and Nagios.  But it came in handy recently.  We were having DAS issues and there was some question about SQL’s file usage, and I was able to confirm based on numbers from last week and fresh numbers that yes, those are sane numbers.

Comments Off on Tracking total file usage, DBs only.

Filed under monitoring

Perfmon Link

Awesome webcast by Brent Ozar.

There’s more here.  I’d say more, but I’m busy enjoying Memorial Day.

Comments Off on Perfmon Link

Filed under monitoring