ClearChain

Raid Status Script

One thing that has always bugged me about the free operating systems is a lack of notification when things go wrong. In particular I run a FreeBSD server at home that has a SATA RAID 1 using a Promise controller in particular:

atapci0: <Promise PDC20371 SATA150 controller> port 0xc000-0xc03f,0xc400-0xc40f,0xc800-0xc87f mem 0xde031000-0xde031fff,0xde000000-0xde01ffff irq 11 at device 1.0 on pci1

the problem is I will only get an email alert at the end of the day if something goes wrong. What I really want is for the server to start driving me nuts, calling out my name, flashing lights and kicking up all merry hell.

Alas, I couldn’t find an easy way for it to do that. Hence the program below will check the raid status of 1 or more ar devices, and if one of them is not normal it will:

The other benefit to this is it will keep playing the tune whilst the raid is rebuilding and keep email me to provide me with a status report. A bit like the old Alpha’s did – <sigh> I miss the Alpha.

Installation

To install and use this program simply:

  1. Grab the source code (drop me an email as a thank if you can 🙂
  2. Modify the RAID_ARRAYS line to indicate which arrays you want monitored. Ie: {"ar0", "ar1", "ar2", NULL} would monitor 3 arrays
  3. Compile the app using: gcc -o raidstatus yourfilename.c to build the executable
  4. Run application to test: ./raidstatus
  5. If you want sound you’ll need the to install the speaker kernel module. Putting speaker_load="YES" in /boot/loader.conf will make sure it loads at boot time.
  6. Finally, set a cron job to have it monitor your system as often as you want. By default it won’t output anything so you won’t get an email. However, if something fails you’ll start hearing / receiving email. The following crontab checks the raid status every 10 minutes.
*/10 * * * *    /root/bin/raidstatus > /dev/null

Code

This script is designed to run on FreeBSD 6.x.

It should work on 5.2+ but won’t work on FreeBSD 4.x or below due to changes in the ata subsystem.

It can be used to monitor any ata based raid. Including RAID0, RAID1, RAID3, RAID4 & RAID5 (note at the time of writing RAID5 operates like a RAID0 under FreeBSD – see: http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/106431 )

Hope the script helps!

–Benjsc 08:18, 8 May 2007 (EIT)

/**
 * A simple little application that queries an ata raid
 * device looking for any errors. It outputs the date/time
 * the query was made and the status for each device.
 * If a device has failed and is in the degraded mode
 * it exits printing the failed device to stderr with
 * error code 127. Any error creates a return status greater than
 * zero; On failure root is also emailed indicating the failed raid
 * device and a tune is played to /dev/speaker
 *
 * Copyright(C) 2007 Benjamin Close <Benjamin.Close@clearchain.com>
 *
 * Usage: raidstatus
 *
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <err.h>
#include <sys/types.h>
#include <sys/ata.h>
#include <sys/time.h>

#define RAID_ARRAYS {"ar0",NULL}

main(int argc, char **argv)
{
    int fd, fd2, i;
    char *devices[] = RAID_ARRAYS;
    char **device = devices;
    int status = 0;
    int notified= 0;
    struct ata_ioc_raid_config config;

    if((fd=open("/dev/ata", O_RDWR)) < 0 ){
        err(1,"control device not found");
        exit(1);
    }

    while (*device != NULL ){

        if (!(sscanf(*device, "ar%d", &config.lun) == 1)) {
            fprintf(stderr, "atacontrol: Invalid array %s\n", *device);
            printf("Invalid ar device");
            status=1;
        }
        if (ioctl(fd, IOCATARAIDSTATUS, &config) < 0)
          err(1, "ioctl(IOCATARAIDSTATUS)");

        printf("ar%d: ATA ", config.lun);
        switch (config.type) {
            case AR_RAID0:
                printf("RAID0 stripesize=%d", config.interleave);
                break;
            case AR_RAID1:
                printf("RAID1");
                break;
            case AR_RAID01:
                printf("RAID0+1 stripesize=%d", config.interleave);
                break;
            case AR_RAID5:
                printf("RAID5 stripesize=%d", config.interleave);
                break;
            case AR_JBOD:
                printf("JBOD");
            case AR_SPAN:
                printf("SPAN");
                break;
        }
        printf(" subdisks: ");
        for (i = 0; i < config.total_disks; i++) {
            if (config.disks[i] >= 0)
              printf("ad%d ", config.disks[i]);
            else
              printf("DOWN ");
        }
        printf("status: ");
        switch (config.status) {
            case AR_READY:
                printf("READY\n");
                break;
            case AR_READY | AR_DEGRADED:
                printf("DEGRADED\n");
                status=127;
                break;
            case AR_READY | AR_DEGRADED | AR_REBUILDING:
                printf("REBUILDING %d%% completed\n",
                            config.progress);
                status=127;
                break;
            default:
                printf("BROKEN\n");
                status=127;
        }

        if ( status > 0 && ! notified ){

            char buffer[1024];

            const char *song = "mst200o2ola.l8bc.~a.~>l2d#";

            // Play a tune to the speaker
            if ( (fd2 = open("/dev/speaker",O_RDWR)) < 0 ){
                fprintf(stderr, "Unable to open speaker");
            } else {
                // Send the tune to the speaker
                write(fd2, song, strlen(song));
                close(fd2);
            }

            sprintf(buffer, "/sbin/atacontrol status %s | /usr/bin/mail -s Raid_Fault root", *device);

            // Send an email indicating things are broken
            system(buffer);

            notified = 1;
        }

        device++;
    }

    close(fd);
    exit(status);
}
Exit mobile version