Estimated response time

From PDP/Grid Wiki
Jump to navigationJump to search

The estimated response time gives the user an idea how long she will have to wait before her jobs run, if she were to submit them now. This is actually the definition : the response time is the time between when the job is submitted to the cluster, until it starts to actually execute on a worker node. Note that it does not account for any delays incurred while the job is waiting for matchmaking at a resource broker.

The current estimated response time algorithm is based on observed wait times in the cluster. These are usually fairly good estimators, during periods of stable workload they have excellent results. The results can be substantially wrong during periods when the workload is changing rapidly. For example if your VO had no waiting jobs on the cluster, and there are free CPUs, the estimated response time would be zero. Now this VO submits 600 jobs of 17 hour duration to the cluster, which has 125 cores. Clearly the last few of these jobs will have to wait for order days to run, however the system will report a low number since none of the jobs have been waiting very long (yet).