Monitoring Script
From PDP/Grid Wiki
Revision as of 15:48, 18 November 2009 by Fbernabe@nikhef.nl (talk | contribs)
The following script checks all the WNs, in order to look for jobs that are not running anymore. So far the jobs are not deleted, but an email is sent as an alert. After knowing the job id, it gives extra information about it via 'tracejob':
FILE="results/Monitoring_Results_`date +%k%M%d%m%y`" y="" job_id_list_of_list="" i=0
echo "===============================================================================" >> "$FILE" echo "=========================== AFFECTED WN DIAGNOSIS =============================" >> "$FILE" echo "===============================================================================" >> "$FILE"
for wn in ${wn_list[@]} do y="`momctl -d0 -h $wn | grep "sidlist" | grep -v "RUNNING" | sed -e 's/job\[//g' | cut -f1 -d'.'`" len=${#y} if [ $len -ne 0 ] then momctl -d2 -h $wn >> "$FILE" echo "_______________________________________________________________________________" >> "$FILE" job_id_list_of_list[i]="${y}" i=$i+1 fi done
j=0
echo "===============================================================================" >> "$FILE" echo "=========================== JOBS ELIGIBLE TO BE DELETED =======================" >> "$FILE" echo "===============================================================================" >> "$FILE"
for job_id_list in ${job_id_list_of_list[@]} do for job_id in ${job_id_list_of_list[$j][@]} do echo $job_id tracejob -n 8 -q $job_id >> "$FILE" echo "_______________________________________________________________________________" >> "$FILE" done j=$j+1 done
mail -s "Monitoring Results" fbernabe@nikhef.nl < "$FILE"