Zabbix Agent Deamon for IDM Driver Monitoring Installation

0 Likes
This document describes how to add IDM driver checking to an existing Zabbix monitoring solution.

It does NOT describe how to deploy the Zabbix server itself, just the agents and the configuration of the items and triggers.

Table of contents

  1. Create the Bash script and store this per server

    1. Assign execution rights for this script to Zabbix daemon

  2. Extend schema in IDM to include Zabbix heartbeats

















  • Create the Bash script


Create a file in /usr/local/bin called "check_dxml_drvstate.sh"
Edit it, and put the text below in there:
#! /bin/sh
# ZABBIX script to determine the state of a dirxml driver
# based on the Nagios plugin by:
# copyright (c) Lothar Haeger (lothar.haeger@brummelhook.com)
# Modified by Jan van Zanten (j.vanzanten@carmel.nl)
# v1.0, 2006-04-10, initial release
# v1.1, 2007-05-21, added support for IDM 3.5 and more detailed return messages
# v1.2, 2007-07-31, added support for edir 8.8
# new command line option "-i" to invert return codes of running and
# stopped drivers. This is meant to help monitoring usually inactive
# backup servers associated to a driver set.
# all changes in v1.2 based on enhancements by Rainer Brunold, many thanks!
# v1.3, 2007-12-05, added TAO file size monitoring
# username must now be ldap typed (for TAO file size monitoring)
# take driver startup mode into consideration when driver not running:
# disabled -> STATE_OK,
# manual -> STATE_WARNING
# auto -> STATE_CRITICAL
# added long command line options
# v1.4, 2008-01-22, added heartbeat monitoring, requires a schema extension (aux class), driver
# heartbeat and a special policy on the driver
# new command line option --br to add html line breaks to text output
# text output now shows warning/critical values for TAO file size and
# heartbeat monitoring
# v1.5, 2008-08-26, added -Z parameter to ldapsearchs
# improved TAO filesize determination for various "ls -l" output styles
# v1.6, 2008-09-01, fixed wrong $TAODIR for Edir 8.8x
#
# v1.7, 2015-10-06 Carmel customization
# added status MAINTENANCE for better monitoring.
# triggers when the driver is stopped and startup type is set to disabled.
# v1.8, 2015-13-10 verbose parameter added, stdout redirect to stderr by default
# output changed to number state
# v 1.9 Change temp file and cleanup

VERSION="1.9, 2015-12-09"

PROGNAME=`/bin/basename $0`
PROGPATH=`echo $0 | /bin/sed -e 's,[\\/][^\\/][^\\/]*$,,'`
INVERT=false
INVERTMSG=''

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4
STATE_MAINTENANCE=6

BR=
TREE="data"
HBATTR_LDAPNAME="nagiosLastHeartBeat"

# Redirect stdout, stderr
# exec 2>/dev/null # keep quiet
# exec 3>&1
# exec 1>&2 # redirect stdout to stderr

# setup an exit function.
print_exit() {
echo $1 >&3
exit $1
}


print_help() {
echo """\
Usage: $PROGNAME [-s hostname|ip-address] -u username, -p password -d driver-dn [-i] [-tw warnsize -tc criticalsize [-tree treename]]
Usage: $PROGNAME [-h | --help]
Novell DirXML 1.1 and Identity Manager 2.x/3.x driver state detector plugin for Zabbix
Version $VERSION

-s, --server dirxml/idm server ip or hostname, e.g. 127.0.0.1 or myserver.mydomain.org.
Leave out this option to check drivers running on the same machine as nrpe
-u, --username account used to check driver state, ldap typed syntax, e.g. cn=admin,o=novell
-p, --password passwort in cleartext (good reason to use a restriced account :-)
-d, --driver driver to check, ldap typed syntax, cn=drv_test,cn=my_driverset,o=system
-i, --invert invert return codes to monitor inactive backup servers in a driverset.
A running driver will return STATE_CRITICAL (2), a stopped one STATE_OK (0)
--tw max TAO file size before STATE_WARNING (1) will be reported
--tc max TAO file size before STATE_CRITICAL (2) will be reported.
If neither --tw and --tc are set, TAO file size checking will be disabled
--hbw max time in seconds since last heartbeat before STATE_WARNING (1) will be reported
--hbc max time in seconds since last heartbeat before STATE_CRITICAL (2) will be reported.
If neither --hbw and --hbc are set, heartbeat checking will be disabled
Please note that a schema extension and a special event transform policy on the
driver are necessary to support heartbeat checking
--hbattr ldap name of the attr that stores the last heartbeat timestamp if a non-default
schema extension is used
--tree treename of the driver to be checked. only needed with TAO file size monitoring
on edir 8.8 running multiple instances (developement feature - not tested!)
--br add <br> tags to output for better readability in HTML display
-h, --help this help screen
-v, --verbose show the status message instead of just a number

Please report bugs to <j.vanzanten@carmel.nl>."""
}


if [ $# -lt 1 ]; then
print_help
exit $STATE_UNKNOWN
fi

while test -n "$1"; do
case "$1" in
--help)
print_help
exit $STATE_OK
;;
-h)
print_help
exit $STATE_OK
;;
-u)
usernameldap=$2
shift
;;
--username)
usernameldap=$2
shift
;;
-p)
password=$2
shift
;;
--password)
password=$2
shift
;;
-s)
server=$2
shift
;;
--server)
server=$2
shift
;;
-d)
driverdnldap=$2
shift
;;
--driver)
driverdnldap=$2
shift
;;
-i)
INVERT=true
INVERTMSG='(on backup server)'
;;
--invert)
INVERT=true
INVERTMSG='(on backup server)'
;;
--br)
BR="<br>"
;;
--tw)
TAOWARNING=$2
shift
;;
--tc)
TAOCRITICAL=$2
shift
;;
--hbw)
HBWARNING=$2
shift
;;
--hbc)
HBCRITICAL=$2
shift
;;
--hbattr)
HBATTR_LDAPNAME=$2
shift
;;
--tree)
TREE=$2
shift
;;
-v)
VERBOSE=on
shift
;;
--verbose)
VERBOSE=on
shift
;;
*)
echo "Unknown argument: $1"
print_help
print_exit $STATE_UNKOWN
exit $STATE_UNKNOWN
;;
esac
shift
done

# if verbose parameter is given, stdout stays. Otherwise it is redirected to stdout
if [ X$VERBOSE = Xon ] ; then
exec 3>&1
else
exec 2>/dev/null # keep quiet
exec 3>&1
exec 1>&2 # redirect stdout to stderr
fi

# setup an exit function.
print_exit() {
echo $1 >&3
exit $1
}

username=${usernameldap//,/.}
username=${username//cn=/}
username=${username//ou=/}
username=${username//o=/}
username=${username//l=/}
username=${username//s=/}

driverdn=${driverdnldap//,/.}
driverdn=${driverdn//cn=/}
driverdn=${driverdn//ou=/}
driverdn=${driverdn//o=/}
driverdn=${driverdn//l=/}
driverdn=${driverdn//s=/}

#echo $usernameldap
#echo $username
#echo $paddword
#echo $driverdnldap
#echo $driverdn
#echo $server
#echo $driverdn

# location of dxcmd varies by edir version:
# edir 8.7.3.x has dxcmd located in /usr/bin/dxcmd,
# with edir 8.8.x it's in /opt/novell/eDirectory/bin/dxcmd
if [ -x "/usr/bin/dxcmd" ]; then
DXCMD="/usr/bin/dxcmd"
TAODIR="/var/nds/dib"
elif [ -x "/opt/novell/eDirectory/bin/dxcmd" ]; then
DXCMD="/opt/novell/eDirectory/bin/dxcmd"
TAODIR="/var/opt/novell/eDirectory/$TREE/dib"
else
DXCMD="dxcmd"
TAOWARNING=
TAOCRITICAL=
fi
# $$ is the current processid
DXERR="/tmp/$PROGNAME.$$.err"


if [ "$server" == "" ]; then
dxcmd_output=`$DXCMD -user "$username" -password "$password" -getstate "$driverdn" 2>$DXERR`
else
dxcmd_output=`$DXCMD -host "$server" -user "$username" -password "$password" -getstate "$driverdn" 2>$DXERR`
fi

dxml_drvstate=$?
#echo "$dxcmd_output"
#echo "Driver State: $dxml_drvstate"

case $dxml_drvstate in
0) #stopped
if [ "$server" == "" ]; then
dxcmd_output=`$DXCMD -user "$username" -password "$password" -getstartoption "$driverdn" 2>$DXERR`
else
dxcmd_output=`$DXCMD -host "$server" -user "$username" -password "$password" -getstartoption "$driverdn" 2>$DXERR`
fi
dxml_drvstartoption=$?
#echo $dxml_drvstartoption
if [ $dxml_drvstartoption = 0 ]; then
echo "Driver $driverdn is DISABLED. ${INVERTMSG}"
print_exit $STATE_MAINTENANCE
else
echo -n "Driver $driverdn is STOPPED. ${INVERTMSG}"
if [ "${INVERT}" = "true" ]; then
dxml_drvstate=$STATE_OK
elif [ $dxml_drvstartoption = 1 ]; then
dxml_drvstate=$STATE_WARNING
else
dxml_drvstate=$STATE_CRITICAL
fi
fi
;;
1) #starting
echo -n "Driver $driverdn is STARTING... ${INVERTMSG}"
dxml_drvstate=$STATE_OK
;;
2) #running
echo -n "Driver $driverdn is RUNNING. ${INVERTMSG}"
if [ "${INVERT}" = "true" ]; then
dxml_drvstate=$STATE_CRITICAL
else
dxml_drvstate=$STATE_OK
fi
;;
3) #stopping
echo -n "Driver $driverdn is STOPPING... ${INVERTMSG}"
dxml_drvstate=$STATE_WARNING
;;
11) #getting schema
echo -n "Driver $driverdn is GETTING the application SCHEMA... ${INVERTMSG}"
dxml_drvstate=$STATE_OK
;;
96) #access forbidden
echo -n "Driver $driverdn could not be checked because $username is NOT AUTHORIZED to do so. "
dxml_drvstate=$STATE_CRITICAL
;;
167) #does not exist
echo -n "Driver $driverdn DOES NOT EXIST."
dxml_drvstate=$STATE_CRITICAL
;;
255) #generic error
echo -n "Driver $driverdn could not be checked due to an UNKNOWN ERROR (`egrep 'xception' $DXERR`)."
dxml_drvstate=$STATE_CRITICAL
;;
*) #other dxcmd error
echo -n "Driver $driverdn could not be checked due to an UNKNOWN ERROR (Error code $dxml_drvstate)"
dxml_drvstate=$STATE_CRITICAL
;;
esac

if [ "$username" != "$usernameldap" ] && [ "$driverdn" != "$driverdnldap" ]; then

if [ "$TAOWARNING" != "" ] || [ "$TAOCRITICAL" != "" ]; then

#drivercn=`echo $driverdn | cut -d "." -f 1`
ENTRYID=`ldapsearch -x -Z -D "$usernameldap" -w "$password" -b "$driverdnldap" -s base localentryid | grep localEntryID: | cut -d " " -f 2`
TAOSIZE=`ls -l "$TAODIR/$ENTRYID.TAO" | sed -r "s/ / /g" | cut -d " " -f 5`
echo -n "$BR Cache file $TAODIR/$ENTRYID.TAO is $TAOSIZE bytes ($TAOWARNING/$TAOCRITICAL)."

dxml_taostate=$STATE_OK

if [ "$TAOWARNING" != "" ] && [ $TAOSIZE -gt $TAOWARNING ]; then
dxml_taostate=$STATE_WARNING
fi
if [ "$TAOCRITICAL" != "" ] && [ $TAOSIZE -gt $TAOCRITICAL ]; then
dxml_taostate=$STATE_CRITICAL
fi

if [ $dxml_taostate -gt $dxml_drvstate ]; then
dxml_drvstate=$dxml_taostate
fi
fi

if [ "$HBWARNING" != "" ] || [ "$HBCRITICAL" != "" ]; then

LHB=`ldapsearch -x -Z -D "$usernameldap" -w "$password" -b "$driverdnldap" -s base $HBATTR_LDAPNAME | grep $HBATTR_LDAPNAME: | cut -d " " -f 2`
LHB="${LHB:0:4}-${LHB:4:2}-${LHB:6:2} ${LHB:8:2}:${LHB:10:2}:${LHB:12:2} UTC"
#echo
#echo "Last Heartbeat: `date "%Y-%m-%d %X %Z" -d "$LHB"`"
#echo "Now: `date "%Y-%m-%d %X %Z"`"
#echo "Last Heartbeat: `date %s -d "$LHB"`"
#echo "Now: `date %s`"

let DIFF=`date %s`-`date %s -d "$LHB"`
echo -n "$BR Last Heartbeat occured $DIFF seconds ago at `date "%Y-%m-%d %X %Z" -d "$LHB"` ($HBWARNING/$HBCRITICAL)."

dxml_hbstate=$STATE_OK

if [ "$HBWARNING" != "" ] && [ $DIFF -gt $HBWARNING ]; then
dxml_hbstate=$STATE_WARNING
fi
if [ "$HBCRITICAL" != "" ] && [ $DIFF -gt $HBCRITICAL ]; then
dxml_hbstate=$STATE_CRITICAL
fi

if [ $dxml_hbstate -gt $dxml_drvstate ]; then
dxml_drvstate=$dxml_hbstate
fi
fi
fi
echo
#cleanup
rm $DXERR
print_exit $dxml_drvstate
#EOF



 

  • Assign rights for this script to Zabbix daemon.


 
chmod  x /usr/local/bin/check_dxml_drvstate.sh



  • Extend schema on IDM to enable Zabbix heartbeats.


 

Save this text as a file. Use your favorite way to extend the schema in IDM.
# Record 1
# Syntax TIME
# generated ASN2
dn: cn=schema
changetype: modify
add: attributetypes
attributetypes: (
nagioslastheartbeat-oid
NAME 'nagiosLastHeartbeat'
SYNTAX '1.3.6.1.4.1.1466.115.121.1.24'
SINGLE-VALUE
X-NDS_NEVER_SYNC '1'
X-NDS_NOT_SCHED_SYNC_IMMEDIATE '1'
)

# Record 2
dn: cn=schema
changetype: modify
add: objectClasses
objectClasses: (
nagioshelper-oid
NAME 'nagiosHelper'
AUXILIARY
MAY ( 'nagiosLastHeartbeat' )
X-NDS_NOT_CONTAINER '1'
)
#EOF



  • Install Zabbix agent daemon on every IDM server.


Each flavor of Linux has it's own package, described here:
https://www.zabbix.com/documentation/3.0/manual/installation/install_from_packages

For SLE 11, I did this:

For SLE 11 SP4 run the following as root:
zypper addrepo http://download.opensuse.org/repositories/server:monitoring/SLE_11_SP4/server:monitoring.repo
zypper refresh
zypper install zabbix-agent

For SLE 11 SP3 run the following as root:
zypper addrepo http://download.opensuse.org/repositories/server:monitoring/SLE_11_SP3/server:monitoring.repo
zypper refresh
zypper install zabbix-agent



 

  • Set rights for Zabbix user on several folders and files


 
md /var/run/zabbix
chown zabbix:zabbix /var/run/zabbix



  • Edit Zabbix config (/etc/zabbix/zabbix-agentd.conf)


 

These entries are mandatory:
Pidfile=/var/run/zabbix/zabbix-agentd.pid
Logfile=/var/log/zabbix/zabbix-agentd.log
EnableRemoteCommands=1
Server=<your Zabbix server IP or DNS name>
ServerActive=<your Zabbix server IP or DNS name>
Hostname=<This IDM Server Hostname>
Timeout=10

-EnableRemotecommands is necessary to enable the execution on the bash file.
-Timeout is necessary to be able to complete the check without failing on a timeout. 5 might work, 10 is safe to work with.



  • Start Daemon.


 

Auto start SLES: chkconfig zabbix-agentd on
Auto start Ubuntu: update-rc.d zabbix-agentd enable

Start Agent SLES: rczabbix-agentd restart
Start Agent Ubuntu: service zabbix-agentd restart



  • Create the IDM templates on Zabbix


Open Zabbix Administrator
Select Configuration->Templates->Create Template.
Name it for example: Novell Identy Manager




      1. Set the variables for username and password in Zabbix global.Open Zabbix administrator.
        Select Administration -> General.
        Select top right: Macros
        Create 2 macro’s:
        {$USER} = <IDM_User> -> Example: ZabbixAgent.Users.Administration.IDM.SCC
        {$PASS} = <IDM_User_Password>

        This user must have rights to read driver states, see below.
        The minimum rights to get driver state are: "read and compare" for the attribute "DirXML-AccessRun" on the driverset.

        img-1








      1. Create value Mappings for IDM driver output.Click Administration->General->Select Value Mapping top right.
        Name: Novell IDM value mapping.
        Add values:
        0 ⇒ ok
        1 ⇒ warning
        2 ⇒ critical
        3 ⇒ unknown
        6 ⇒ maintenance









      1. Create the item for a specific driverOpen Zabbix Administrator
        Select Configuration->Templates->Select the “Novell Identity Manager” Template.
        Select Items
        Click “Create Item”
        Name: <Driver Name>
        Type: Zabbix Agent
        Key: system.run[/usr/local/bin/check_dxml_drvstate.sh -u {$USER} -p {$PASS} -d “<Literal Drivername>.Driverset.<FQDN>”]
        Example: system.run[/usr/local/bin/check_dxml_drvstate.sh -u {$USER} -p {$PASS} -d “Entitlement Management.Driverset.IDM.SCC”]
        Type of information: Numeric (unsigned)
        Data type: Decimal
        Update Interval: <seconds> example : 120
        Store Value: As is.
        Show Value: Select “Novell IDM value mapping”
        1st time: New application : IDM
        Then Applications: Select IDM
        Click: Add

        Rince and repeat for every driver.

        img-2








      1. Create the trigger for that eventOpen Zabbix administrator
        Select Configuration->Templates-> Select the “Novell Identity Manager” Template
        Select triggers, Click “Add Trigger”
        Name: Something readable (will be displayed in trigger mails).
        Severity: W/E suits your level.
        Expression:
        {Template App IDM:system.run[/usr/local/bin/check_dxml_drvstate.sh -u {$USER} -p {$PASS} -d "<Item name>.Driverset.<FQDN>"].last()}<>0 and {Template App IDM:system.run[/usr/local/bin/check_dxml_drvstate.sh -u {$USER} -p {$PASS} -d "<Item name>.Driverset. <FQDN>"].last()}<>6

        This means that if a driver is not disabled and not running, the trigger will fire. A disabled driver will never fire the trigger, which is very handy in IDM environments with multiple servers.

        Alternative: In expression builder:

        img-3

         

        img-4






  • Add hosts to monitor


Follow your own procedures to create hosts, use agent interface and port 10050
Add each IDM server to the group “IDM Machines” (used in actions->Mails).
Add the template “Novell Identity Manager” to this server (see 3.) to automatically add all items and triggers. Don’t forget the OS Linux template as well!



  • Create action for sending mails on trigger


Open Zabbix Administrator
Select Configuration->Actions->Create Action
Name: Sending mails on IDM driver Triggers

Leave “Maintenance status not in maintenance” as is.
Add: Host group = IDM Machines
Add if you like Trigger severity >= Information so only the more severe triggers are being mailed.

Warning: Zabbix 2.4 and earlier: before enabling recovery messages or escalations, make sure to add “Trigger value = PROBLEM” condition to the action, otherwise remedy events can become escalated as well.

Click Operations.

The minimal information in the message screen should be:
Trigger: {TRIGGER.NAME}
Trigger status: {TRIGGER.STATUS}
Trigger severity: {TRIGGER.SEVERITY}
Trigger URL: {TRIGGER.URL}

Item values:

1. {ITEM.NAME1} ({HOST.NAME1})

Original event ID: {EVENT.ID}
Check “Pause operations while in maintenance”

Under Operations, click new, then create the action:
Type: Send Message
Add users/groups who should receive the mails.

I’m not going deeper in this action, as Zabbix admin you should know enough to work from here.

This concludes this document.

Special thanks:

Guus Snijders: for helping me tweaking the bash script to add extra functionality to the original code.

Bas Penris: for being my Jedi for basically anything related to IDM.

Lothar Haeger: for the original Bash script (also available on Cool Solutions!) which got me going.

Labels:

How To-Best Practice
Comment List
Parents
  • When testing the heartbeat, the script fails on the very first time it is run, before the driver did set the value. Replacing line 352 the following will capture this (and ignore the heartbeat).
    ##Heartbeat never set. Silent ignore by setting it to now
    if ["$LHB" = ""]; then
    LHB=`date +"%Y-%m-%d %X %Z"`
    else
    LHB="${LHB:0:4}-${LHB:4:2}-${LHB:6:2} ${LHB:8:2}:${LHB:10:2}:${LHB:12:2} UTC"
    fi
Comment
  • When testing the heartbeat, the script fails on the very first time it is run, before the driver did set the value. Replacing line 352 the following will capture this (and ignore the heartbeat).
    ##Heartbeat never set. Silent ignore by setting it to now
    if ["$LHB" = ""]; then
    LHB=`date +"%Y-%m-%d %X %Z"`
    else
    LHB="${LHB:0:4}-${LHB:4:2}-${LHB:6:2} ${LHB:8:2}:${LHB:10:2}:${LHB:12:2} UTC"
    fi
Children
No Data
Related
Recommended