I’ve recently started looking at Flight in our organisation, and have run into an issue were we are unable to fully utilize all available cores on our nodes.
Our test cluster is running with four nodes, each with eight cores. I can successfully submit multiple jobs and these will be run simultaneously on the nodes, unless doing so would require the use of cores 6-8. If I submit a new job that wants 1-5 cores, but running it would require using cores 6-8 on the node due to other jobs already running, then the new job sits in the queue until existing jobs complete, even if there is a sufficient number of idle cores to run the new job immediately.
If I submit a job that wants to use between 6-8 cores, the job sits in the queue indefinitely with the “no suitable queue” message.
We’re using release 2016.2r4, and I’m setting the number of cores to use via a “#$ -pe smp-verbose X” line in a bash script. Autoscaling is enabled (although I also get the same results with it disabled and all nodes running). qstat -f shows that there are a total of eight cores available for each queue.
Any ideas how we can fully utilize all cores on our nodes?