Difference between revisions of "Monitoring Script"

From PDP/Grid Wiki
Jump to navigationJump to search
 
Line 5: Line 5:
 
   job_id_list_of_list=""
 
   job_id_list_of_list=""
 
   i=0
 
   i=0
 
 
   echo "===============================================================================" >> "$FILE"
 
   echo "===============================================================================" >> "$FILE"
 
   echo "=========================== AFFECTED WN DIAGNOSIS =============================" >> "$FILE"
 
   echo "=========================== AFFECTED WN DIAGNOSIS =============================" >> "$FILE"
 
   echo "===============================================================================" >> "$FILE"
 
   echo "===============================================================================" >> "$FILE"
 
 
   for wn in ${wn_list[@]}
 
   for wn in ${wn_list[@]}
 
   do
 
   do
Line 22: Line 20:
 
         fi
 
         fi
 
   done
 
   done
 
 
 
   j=0
 
   j=0
 
 
 
 
   echo "===============================================================================" >> "$FILE"
 
   echo "===============================================================================" >> "$FILE"
 
   echo "=========================== JOBS ELIGIBLE TO BE DELETED =======================" >> "$FILE"
 
   echo "=========================== JOBS ELIGIBLE TO BE DELETED =======================" >> "$FILE"
 
   echo "===============================================================================" >> "$FILE"
 
   echo "===============================================================================" >> "$FILE"
 
 
   for job_id_list in ${job_id_list_of_list[@]}
 
   for job_id_list in ${job_id_list_of_list[@]}
 
   do
 
   do
Line 41: Line 34:
 
         j=$j+1
 
         j=$j+1
 
   done
 
   done
 
 
   mail -s "Monitoring Results" fbernabe@nikhef.nl < "$FILE"
 
   mail -s "Monitoring Results" fbernabe@nikhef.nl < "$FILE"

Revision as of 15:48, 18 November 2009

The following script checks all the WNs, in order to look for jobs that are not running anymore. So far the jobs are not deleted, but an email is sent as an alert. After knowing the job id, it gives extra information about it via 'tracejob':

 FILE="results/Monitoring_Results_`date +%k%M%d%m%y`"
 y=""
 job_id_list_of_list=""
 i=0
 echo "===============================================================================" >> "$FILE"
 echo "=========================== AFFECTED WN DIAGNOSIS =============================" >> "$FILE"
 echo "===============================================================================" >> "$FILE"
 for wn in ${wn_list[@]}
 do
       y="`momctl -d0 -h $wn | grep "sidlist" | grep -v "RUNNING" | sed -e 's/job\[//g' | cut -f1 -d'.'`"
       len=${#y}
       if [ $len -ne 0 ]
       then
               momctl -d2 -h $wn >> "$FILE"
               echo "_______________________________________________________________________________" >> "$FILE"
               job_id_list_of_list[i]="${y}"
               i=$i+1
       fi
 done
 j=0
 echo "===============================================================================" >> "$FILE"
 echo "=========================== JOBS ELIGIBLE TO BE DELETED =======================" >> "$FILE"
 echo "===============================================================================" >> "$FILE"
 for job_id_list in ${job_id_list_of_list[@]}
 do
       for job_id in ${job_id_list_of_list[$j][@]}
       do
               echo $job_id
               tracejob -n 8 -q $job_id >> "$FILE"
               echo "_______________________________________________________________________________" >> "$FILE"
       done
       j=$j+1
 done
 mail -s "Monitoring Results" fbernabe@nikhef.nl < "$FILE"