- Compute per project
- Storage per project
- Backup and Restore
- Availability, Maintenance, and Unplanned Disruptions
- Monitoring and Arbitration
- Access to up to 200 job slots/CPU cores at a time, subject to availability. This can be increased by arrangement.
- Ability to run jobs with RAM allocation up to 200GB
- 50GB of resilient, backed-up personal home space.
- In addition, users are allocated a quota of 1TB scratch space for working storage.
- Additional Vault storage may be available by negotiation. Vault facilitates longer term storage on Maxwell. It is useful where projects need to repeatedly use the same data over time.
3. Backup and Restore
|Storage||HPC Backup and Restore Policy|
|Home Space||Data is backed up as follows:
|Shared scratch||Not backed up|
|Vault||Not backed up|
The cluster runs Slurm workload manager to automatically allocate jobs submitted by users onto available compute nodes. Projects that supply grant or other funds to support Maxwell usage are given priority on the scheduler.
- The scheduler balances the availability of slots among all users to permit fair access to the system.
- It considers the specific requirements of each job (e.g., number of CPUs, amount of RAM, job duration, and node affinity requirements) and prioritisation.
- The scheduler starts queued jobs as space becomes available and can be set to advise users of job status by email.
- Larger jobs requiring more time and resource are more difficult to schedule, so it is to the advantage of all users to make sure the requested resource is as accurate as possible.
- Smaller jobs will be scheduled to run/backfill into available space and may therefore started/completed earlier than larger jobs.
An awareness of the following will help users provide accurate information when scheduling a job:
- When insufficient memory is requested for a job, the job will not run, and will need to be rescheduled with more memory requested.
- Where more memory is requested than is used, the user will have the full amount allocated to them as this resource is blocked and is not usable elsewhere.
- The default runtime of any job is 24 hours
- If more time is needed this must be explicitly stated.
- Less time can also be requested.
- When insufficient time is allocated to a job, the job will be stopped when the allocated time has elapsed and will need to be rescheduled with more time requested.
- Where the actual time used is less than the time requested, only the actual time will be attributed to a user’s account.
- Interactive jobs can run only when the requested resources (e.g. CPUs and memory) are immediately available on Maxwell.
- Once scheduled, data on the job are available from Maxwell using the ‘squeue’ command. This advises users of the status of a running job or the priority of a queued job.
5. Availability, Maintenance, and Unplanned Disruptions
- Maxwell is designed to ensure maximum availability and continued run time even if some cores or nodes stop functioning correctly. When this happens, these issues will be resolved, as far as possible, without any additional disruption to the cluster.
- Planned maintenance will be communicated in advance to all users and will be scheduled to cause minimum disruption.
- Every effort will be made to ensure there are no unplanned disruptions to the service. Where events, either internal or external to the HPC, do cause disruption, every effort will be made by Digital Research to restore service as quickly as possible. This may involve work with our suppliers.
6. Monitoring and Arbitration
- The Digital Research Services Team is responsible for monitoring use of the system and should be contacted via firstname.lastname@example.org to resolve any perceived scheduling or prioritisation issues.
- The HPC uses a queuing algorithm which prioritises funded projects
Costs must be included in Worktribe grant applications:
- £100 minimum per funded project (HPC account, set up support, and up to 1000 core hours CPU)
- 10p per core hour for compute
- plus 10p per core hour per GPU (where GPU required)
- £400 per day for additional support (eg installation and troubleshooting of bespoke applications)
- Additional storage by negotiation.
- For a quote, please contact email@example.com
Limited free use (up to 1000 core hours) for:
- Small pilot projects
- Unfunded PGR projects
- Unfunded UG/MSc projects
Test the HPC
Free of Charge (up to 500 core hours).
8. Support and Documentation
- See our HPC web site for further information
- Find out more on University's award winning Toolkit.
- See our helpful short training videos on using HPC
- See HPC documentation here.
For additional support, please contact firstname.lastname@example.org