Difference between revisions of "Cvmfs errors/warnings"

From PDP/Grid Wiki
Jump to navigationJump to search
Line 10: Line 10:
 
The event handler for Nagios will try to clear the error counters a few times using this command. Therefore, this warning will usually disappear after some time. If the warning persists for more than 2 hours, a manual reset of the counters may be needed.
 
The event handler for Nagios will try to clear the error counters a few times using this command. Therefore, this warning will usually disappear after some time. If the warning persists for more than 2 hours, a manual reset of the counters may be needed.
  
== SERVICE STATUS: offline (<repository> via <squid server>): repository revision <rev> ==
+
== SERVICE STATUS: offline (''repository'' via ''squid server''): repository revision ''rev'' ==
  
 
The full warning may comprise multiple repositories and/or Squid servers:
 
The full warning may comprise multiple repositories and/or Squid servers:

Revision as of 13:50, 11 September 2012

Below is an overview of cvmfs errors and warnings as reported by the Nagios check.

Note: this list is not yet complete!


SERVICE STATUS: 33 I/O errors detected: repository revision 1195

This warning means that there have been I/O errors. It is not needed to fix anything. The warning can be cleared by logging in on the node and executing the following command:

cvmfs-talk -i <repository> reset error counters

The event handler for Nagios will try to clear the error counters a few times using this command. Therefore, this warning will usually disappear after some time. If the warning persists for more than 2 hours, a manual reset of the counters may be needed.

SERVICE STATUS: offline (repository via squid server): repository revision rev

The full warning may comprise multiple repositories and/or Squid servers:

SERVICE STATUS: offline (http://cernvmfs.gridpp.rl.ac.uk/opt/atlas via http://pachter.nikhef.nl:3128): offline (http://cernvmfs.gridpp.rl.ac.uk/opt/atlas via http://zonnewijzer.nikhef.nl:3128): offline (http://cernvmfs.gridpp.rl.ac.uk/opt/atlas via http://karnton.nikhef.nl:3128): repository revision 1197

This warning means that the client could not connect to the repository via the listed Squid servers. If the remote repository is the same for all local Squid servers, the problem is likely to be a connection problem to the remote site. That is particularly true if the warning occurs for many clients. The message above refers to the remote repository at RAL for 3 local Squid, implying a problem at or to the remote end.

However, if connections to more than one remote repository are failing (RAL, CERN and/or BNL), the problem is likely to be in the local Squid servers. In this situation, a restart of the Squid servers may be required.