Fair share

From PDP/Grid Wiki
Jump to navigationJump to search

A share refers to a certain fraction of the available time on the cluster. Suppose there are 125 "job slots" on the cluster (meaning up to 125 jobs can run concurrently); if your virtual organization has a 6% share on the cluster, then your VO has been allocated 7.5 job slots. It is important to remember that the main factor in the fair share is time used -- on a given day suppose your VO runs 270 jobs, each lasting for 17 minutes. This is a total of 17 * 60 * 270 = 275,400 processor-seconds of computing. Per day the cluster is able to 125 * 86,400 = 10.8 million processor-seconds of computing, hence for this given day your VO has used a share of 2.55% of the cluster. In this case, for that given day, your VO could have used more cycles since 2.55% is less than the 6% share allocated, even though 270 jobs is way above the 7.5 "allocated" job slots and even larger than the 125 "available" job slots -- the total time is what is important.

The share used is computed over a period of 24 days, with a slight (factor 0.99 per day) exponential damping of the influence. This means that at any given time, the system can report the "share used" by a given group, this number is then looking at the total processor time used by the group over the last 24 days, taking into account the exponential damping, and dividing this by the total time used by ALL groups on the cluster, over the same period, with the same damping envelope.

The number "24" comes from our scheduler, which computes the total share over 24 system-specified units of time. We chose the "day" as the specified unit of time as we have to report usage per month; we don't care about large fluctuations in usage per day or per week, but we'd like the monthly usage to come out close to the amounts assigned.