To run programs on Argo, submit your executable program (user-written or vendor-supplied software) to the torque system and it will run the program on one of the compute nodes. Torque is a networked subsystem for submitting, monitoring, and controlling a workload of jobs on the cluster. It is not a batch system - interactive jobs such as ANSYS and GaussView may be run in it.
Below is a sample text file, called my_script, that contains some basic directives (commands) to torque:
#!/bin/csh #PBS -m be #PBS -e /home/homes51/jsmith/a.ou1t.error #PBS -o /home/homes51/jsmith/a.out.output #PBS -N a.out /home/homes51/jsmith/a.out
In the above example, replace /home/homes51/jsmith with the full path to your home directory. To determine your home directory path use the following command:
To submit the text file to torque use the qsub command:
Upon submitting a job for execution the job is assigned a number, a jobid, which identifies the job and is used with other commands to monitor the job status. You are not limited to just what you see in the sample script. Other commands are available, including shell commands like cd, ls, grep, and rm. Each line that has a torque directive must start with #PBS. Shell commands, on the other hand, do not.
Let's analyze the script line by line:
This line tells torque to run the script under the C-shell. If you want to run your script under the bash shell, replace the line with #!/bin/bash. This line is not a torque directive, so it does not start with #PBS.
#PBS -m be
The second line is a torque directive which tells torque to send an email when your job starts and when it ends. This line is not required; it may be altered or removed entirely. Here are two other permutations:
The execution of your job might not start immediately. Your job may wait until requested resources become available. Or, it may never run because you requested too many resources or resources that will never be available to you. That's why the email, letting you know when your program begins executing, is useful. If you submit a job that will run for an extended period of time (days or weeks), it's convenient to be informed of its completion. The email will be sent to your NetID@uic.edu email address.
#PBS -e /home/homes51/jsmith/a.out.error
This optional line tells torque to use the a.out.error file as the standard error. If you remove this line, then a default naming scheme is used. The name of the default is constructed from the following information: jobname.e.jobid. For example, if you submit the a.out job and the job is assigned the job number 3603, then the error file is a.out.e3603.
The advantage of the default is that each new job submission creates a new error file and does not replace an existing one. The downside - each new submission creates a new error file, cluttering your home directory with hundreds of them. As you accumulate too many, your response time when you use the shell commands like ls or rm is slow at best. You are strongly encouraged to erase error files after reviewing the content.
#PBS -o /home/homes51/jsmith/a.out.output
This line is for standard out. The same logic that applied to standard error applies to standard out. If you use the default, then a new file is created with each submission. The standard output file has the following naming format: jobname.o.jobid
#PBS -N a.out
This line assigns the name a.out to the job. Assigning a name is not required and may be removed.
This line directs torque execute your program. It is required. Without the line, your script does nothing. You are strongly encouraged to use the full path to the executable. The following is a bad way to do identify the location of your executable and will, most likely, cause you problems:
If you write your own programs, always include the full path to your files in open statements. Or, include the cd command in your script. Example:
#!/bin/csh cd /home/homes51/jsmith/tmp /home/homes51/jsmith/tmp/a.out
Files opened by the a.out program will be written to /home/homes51/jsmith/tmp.
The command to submit your script file to torque is qsub. The operand to the qsub command must always be a text file and not a binary, executable program. If you use the executable a.out as the operand to qsub, you'll get something like the following message in your standard error file:
-bash: /var/spool/PBS/mom_priv/jobs/3629.argo.c.SC: cannot execute binary file
The script file may be complex with hundreds of directives or it may be nothing more than the path and name of your executable:
To get a list of environmental variables, use the env commandt:
You may use environmental variables in your script. For example, the path to your home directory is contained in the variable $HOME. Instead of hard coding the path to your home, you may substitute the variable $HOME:
If you use environment variables, you must inform torque that you are doing so. The way to do that is use the -V option:
qsub -V script_name
Best practice: always include the -V option.
September 21, 2016