The default torque server name is "torqueserver". This name is defined in the file /var/lib/torque/server_name. You can either change this, or, if you have access to the nameserver, add a torqueserver CNAME record pointing to the actual machine.

I will describe a simple setup, where we have a head node, that does not itself act as a computing server. Submitted jobs will be executed on the compute nodes.

In our setup, users have their root on NFS mounted shares, that are accesible from all workstations and also the compute nodes (via autofs).

For the system to work smoothly, it is important that you have set up passwordless ssh, so compute nodes can copy the batch queue log files back to the users' disk area.

Head node

On the head node, you need to run the torque-server and torque-scheduler. The scheduler is a simple one, you can substitute it with a more sophisticated one like maui, also distributed from clusterressources.org (unfortunately, maui has a license too strict to distribute as packages.)

Compute node

On each of the compute nodes, you need to install at least torque-mom.

Submit hosts

You also need a number of submit hosts, that may or may not include the head and compute nodes. On these, you need to install the software in torque-client. In our case, users do their work on a desktop workstation in their offices or in the computing room. You can submit jobs from each of these, these jobs are forwarded to the head node (torque-server) and there scheduled for execution.

Add to /etc/services

If you want to have names associated with the ports that are used by torque, you may want to patch /etc/services:

#(torque
# Standard PBS services
pbs           15001/tcp           # pbs server (pbs_server)
pbs           15001/udp           # pbs server (pbs_server)
pbs_mom       15002/tcp           # mom to/from server
pbs_mom       15002/udp           # mom to/from server
pbs_resmom    15003/tcp           # mom resource management requests
pbs_resmom    15003/udp           # mom resource management requests
pbs_sched     15004/tcp           # scheduler
pbs_sched     15004/udp           # scheduler
#)

If you ever need to remove this entry again, run this script:

tmp=`mktemp -p/tmp`
target=/etc/services
grep -q '#(torque' $target || exit
echo -n "unpatching $target... "
sed -r '/#\(torque/,/#\)/d' < $target > $tmp
cp $tmp $target
rm -f $tmp
echo done

Create a batch server on localhost

You can run a batch server on a single host. This can be useful if you want to try out the software, or if you regularly run many jobs and you need some way of keeping track of them.

You need to install several of the packages:

sudo apt-get install torque-mom torque-server torque-client torque-scheduler

and then run the following script (as root)

#! /bin/sh
update-rc.d pbs_mom defaults
update-rc.d pbs_server defaults
update-rc.d pbs_sched defaults
hostname --long > /var/lib/torque/server_priv/nodes
hostname --long > /var/lib/torque/server_name
hostname --long > /var/lib/torque/mom_priv/config
pbs_server -t create
qmgr -c "s s scheduling=true"
qmgr -c "c q batch queue_type=execution"
qmgr -c "s q batch started=true"
qmgr -c "s q batch enabled=true"
qmgr -c "s q batch resources_default.nodes=1"
qmgr -c "s q batch resources_default.walltime=3600"
qmgr -c "s s default_queue=batch"
invoke-rc.d restart
invoke-rc.d restart
invoke-rc.d restart
qmgr -c "s n `/bin/hostname --long` state=free" -e

and enjoy your very own batch server!

-- Morten Kjeldgaard <mok@bioxray.au.dk>, Sat,  9 Feb 2008 23:01:19 +0100