Sge queue problem on new cluster


#1

Hi,

while submitting jobs to the SGE using a script that I previously had no problems with, on a recently (today) started cluster (c4.8xlarge on AWS) I get the “no suitable queues” error, while there should be enough resources. I then tried to submit the simple example script as a test and get the same error. qstat -j gives a strange, really long output that I am posting below. Any help would be much appreciated.

Thank you very much,
JJ

output:

alces template copy 4 ./template4.sh
alces template copy: template 'simple' copied to './template4.sh'
[alces@login1(services1) src]$ qsub ./template4.sh 
Unable to run job: warning: alces your job is not allowed to run in any queue
warning: no suitable queues
Your job 5 ("template4.sh") has been submitted.
Exiting.

qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
      5 5.90234 template4. alces        qw    04/12/2017 16:07:09                                    1    

qstat -j 4
==============================================================
job_number:                 5  
exec_file:                  job_scripts/5
submission_time:            Wed Apr 12 16:07:09 2017
owner:                      alces
uid:                        1000
group:                      alces
gid:                        1000
sge_o_home:                 /home/alces
sge_o_log_name:             alces
sge_o_path:                 /opt/clusterware/opt/genders/bin:/opt/clusterware/opt/pdsh/bin:/opt/clusterware/opt/s3cmd:/opt/clusterware/opt/gridscheduler/bin/linux-x64:/opt/clusterware/opt/aws/bin
:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/alces/.local/bin:/home/alces/bin
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/alces/RRBS/002-20160205-027/src
sge_o_host:                 login1
account:                    sge
cwd:                        /home/alces/RRBS/002-20160205-027/src
merge:                      y  
hard resource_list:         h_rt=259200,h_vmem=1G
mail_list:                  alces@login1.services1.prv.alces.network
notify:                     FALSE
job_name:                   template4.sh
stdout_path_list:           NONE:NONE:$HOME/$JOB_NAME.$JOB_ID.output
priority:                   -100
jobshare:                   0  
env_list:                   LC_PAPER=de_BE.UTF-8,MANPATH=/opt/clusterware/opt/genders/share/man:/opt/clusterware/opt/pdsh/man:/opt/clusterware/opt/gridscheduler/man:/usr/share/man,LC_ADDRESS=de_B
E.UTF-8,XDG_SESSION_ID=3,LC_MONETARY=de_BE.UTF-8,HOSTNAME=login1,SHELL=/bin/bash,TERM=screen,HISTSIZE=1000,SSH_CLIENT=212.166.47.73 56906 22,SGE_CELL=etc,LC_NUMERIC=de_BE.UTF-8,OLDPWD=/home/alces
/RRBS,SSH_TTY=/dev/pts/0,USER=alces,LD_LIBRARY_PATH=/opt/clusterware/opt/gridscheduler/lib/linux-x64:/opt/clusterware/opt/ruby/lib:,LC_TELEPHONE=de_BE.UTF-8,LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00
:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz
=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.
xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=
01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;
35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;
35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.
axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:,AWSDIR=/opt/clusterware/opt/aws,PDSHBIN=/opt/clusterware/opt/pdsh/bin,TERMCAP=SC|screen|VT 100/ANSI X3.64 virtual terminal:\
        :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\
        :cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\
        :do=^J:nd=\E[C:pt:rc=\E8:rs=\Ec:sc=\E7:st=\EH:up=\EM:\
        :le=^H:bl=^G:cr=^M:it#8:ho=\E[H:nw=\EE:ta=^I:is=\E)0:\
        :li#52:co#195:am:xn:xv:LP:sr=\EM:al=\E[L:AL=\E[%dL:\
        :cs=\E[%i%d;%dr:dl=\E[M:DL=\E[%dM:dc=\E[P:DC=\E[%dP:\
        :im=\E[4h:ei=\E[4l:mi:IC=\E[%d@:ks=\E[?1h\E=:\
        :ke=\E[?1l\E>:vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l:\
        :ti=\E[?1049h:te=\E[?1049l:us=\E[4m:ue=\E[24m:so=\E[3m:\
        :se=\E[23m:mb=\E[5m:md=\E[1m:mr=\E[7m:me=\E[m:ms:\
        :Co#8:pa#64:AF=\E[3%dm:AB=\E[4%dm:op=\E[39;49m:AX:\
        :vb=\Eg:G0:as=\E(0:ae=\E(B:\
        :ac=\140\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\
        :po=\E[5i:pf=\E[4i:Km=\E[M:k0=\E[10~:k1=\EOP:k2=\EOQ:\
        :k3=\EOR:k4=\EOS:k5=\E[15~:k6=\E[17~:k7=\E[18~:\
        :k8=\E[19~:k9=\E[20~:k;=\E[21~:F1=\E[23~:F2=\E[24~:\
        :F3=\E[1;2P:F4=\E[1;2Q:F5=\E[1;2R:F6=\E[1;2S:\
        :F7=\E[15;2~:F8=\E[17;2~:F9=\E[18;2~:FA=\E[19;2~:kb=:\
        :K2=\EOE:kB=\E[Z:kF=\E[1;2B:kR=\E[1;2A:*4=\E[3;2~:\
        :*7=\E[1;2F:#2=\E[1;2H:#3=\E[2;2~:#4=\E[1;2D:%c=\E[6;2~:\
        :%e=\E[5;2~:%i=\E[1;2C:kh=\E[1~:@1=\E[1~:kH=\E[4~:\
        :@7=\E[4~:kN=\E[6~:kP=\E[5~:kI=\E[2~:kD=\E[3~:ku=\EOA:\
        :kd=\EOB:kr=\EOC:kl=\EOD:km:,PDSHDIR=/opt/clusterware/opt/pdsh,PATH=/opt/clusterware/opt/genders/bin:/opt/clusterware/opt/pdsh/bin:/opt/clusterware/opt/s3cmd:/opt/clusterware/opt/gridscheduler/bin/linux-x64:/opt/clusterware/opt/aws/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/alces/.local/bin:/home/alces/bin,MAIL=/var/spool/mail/alces,STY=6556.pts-0.login1,LC_IDENTIFICATION=de_BE.UTF-8,PWD=/home/alces/RRBS/002-20160205-027/src,GRIDSCHEDULERBIN=/opt/clusterware/opt/gridscheduler/bin/linux-x64,_LMFILES_=/opt/clusterware/etc/modules/services/aws:/opt/clusterware/etc/modules/services/gridscheduler:/opt/clusterware/etc/modules/services/s3cmd:/opt/gridware/local/el7/etc/modules/null:/opt/clusterware/etc/modules/services/pdsh,SGE_EXECD_PORT=6445,LANG=en_US.UTF-8,CW_DOCPATH=/opt/clusterware/var/lib/docs/gridscheduler,GRIDSCHEDULERDIR=/opt/clusterware/opt/gridscheduler,MODULEPATH=/opt/gridware/local/el7/etc/modules:/opt/clusterware/etc/modules,SGE_QMASTER_PORT=6444,LC_MEASUREMENT=de_BE.UTF-8,LOADEDMODULES=services/aws:services/gridscheduler:services/s3cmd:null:services/pdsh,SGE_ROOT=/opt/clusterware/opt/gridscheduler,S3CMDBIN=/opt/clusterware/opt/s3cmd,HISTCONTROL=ignoredups,PDSH_GENDERS_FILE=/opt/clusterware/etc/genders,HOME=/home/alces,SHLVL=2,cw_MODULES_VERBOSE=0,cw_SHELL=bash,AWSBIN=/opt/clusterware/opt/aws/bin,LOGNAME=alces,WINDOW=2,SSH_CONNECTION=212.166.47.73 56906 10.75.128.168 22,cw_DIST=el7,S3CMDDIR=/opt/clusterware/opt/s3cmd,LESSOPEN=||/usr/bin/lesspipe.sh %s,XDG_RUNTIME_DIR=/run/user/1000,SGE_CLUSTER_NAME=cluster,GENDERS_FILE=/opt/clusterware/etc/genders,LC_NAME=de_BE.UTF-8,BASH_FUNC__cw_root()=() {  _cw_ROOT=${_cw_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")"/../.. && pwd)};
 echo "${_cw_ROOT}"
},BASH_FUNC_module()=() {  alces module "$@"
},BASH_FUNC_alces()=() {  local errlvl _cw_ROOT;
 if [[ -t 1 && "$TERM" != linux ]]; then
 export cw_COLOUR=1;
 else
 export cw_COLOUR=0;
 fi;
 _cw_ROOT="$(_cw_root)";
 [[ -s "${_cw_ROOT}"/bin/alces ]] && case $1 in
 mo*)
 if [[ ! $(ps -o 'command=' -p "$$" 2>/dev/null) =~ ^- ]]; then
 if [[ ! ":$cw_FLAGS:" =~ :verbose-modules: ]]; then
 export cw_MODULES_VERBOSE=0;
 fi;
 fi;
 case $2 in
 al* | h* | -h | --help)
 if [[ ":$cw_FLAGS:" =~ :nopager: ]]; then
 "${_cw_ROOT}"/bin/alces "$@" 0>&1 2>&1;

else
 if [ -n "$POSIXLY_CORRECT" ]; then
 eval $("${_cw_ROOT}"/bin/alces "$@") 2>&1;
 else
 if [ "$2" == "load" -o "$2" == "add" ]; then
 eval $("${_cw_ROOT}"/bin/alces "$@") 2>&1;
 else
 local p;
 p="${_cw_ROOT}";
 eval $(${p}/bin/alces "$@" 2> >(less -FRX >&2)) 2>&1;
 fi;
 fi;
 fi
 ;;
 esac
 ;;
 gr*)
 case $2 in
 dep*)
 case $3 in
 en* | d* | p* | ini* | ins*)
 eval $("${_cw_ROOT}"/bin/alces "$@") 2>&1
 ;;
 *)
 "${_cw_ROOT}"/bin/alces "$@"
 ;;
 esac
 ;;
 *)
 "${_cw_ROOT}"/bin/alces "$@"
 ;;
 esac
 ;;
 *)
 "${_cw_ROOT}"/bin/alces "$@"
 ;;
 esac;
 errlvl=$?;
 unset cw_COLOUR;
 return $errlvl
},_=/opt/clusterware/opt/gridscheduler/bin/linux-x64/qsub
script_file:                ./template4.sh
verify_suitable_queues:     1
project:                    default.prj
scheduling info:            All queues dropped because of overload or full

#2

Hi JJ,

Can you verify that you have some nodes running? This kind of error can occur when the master (login) node has started but no compute resources are available yet. The output from qhost and qstat -f would probably provide some further clues.


BTW: you can use <code> blocks to help with formatting output from commands. e.g.:

<code>
[pasted output in here]
</code>