Running SEQUEST® at UWPR

Helpful linux commands:

mkdir directory      make a directory
rmdir directory      removes an empty directory
rm -rf directory     removes a directory without prompting, even if not empty
rm file              removes a file
rm -f file           removes a file without prompting
less file            show contents of a (text) file to screen
cp file1 file2       copy file1 to file2
mv file1 file2       move/rename file1 to file2
ln -s file1 file2    create symbolic link from file1 to file2
cd directory         change to directory
cd ..                move up a level/directory
ls                   directory listing
ls -l                long directory listing incl. file size, date, etc.
pwd                  list present working directory
nano file            simple text editor
cp from to           copy command
history              list of previous commands
!num                 execute command num from history
up-arrow             will scroll through previous commands


To run Sequest:

1) You will need a Genome Sciences user account.  See
      http://gs-its.gs.washington.edu/accounts/
   to set one up if you don't have such an account.  Once your
   GS account is in good standing, it will need access to a couple
   of our servers to allow you to run the analysis.  Your account
   will also need to be added to the 'pr' and 'pr-apache' groups.

   If you're in the Genome Sciences department, you can run
   these searches on the "sage" cluster.  Simply log into "sage"
   and follow directions below.  If you are a UWPR user, you
   could log into "tephra" to access the UWPR cluster.

2) Have your account setup correctly.  This includes passwordless
   authentication on sage/tephra nodes.  This is
   accomplished by adding your public key to the respective server's
   .ssh/authorized_keys file.  You can do this by typing the following
   two commands from within your home directory:

      ssh-keygen  <hit enter to all prompts>
      cp .ssh/id_rsa.pub .ssh/authorized_keys

   This also includes adding the
   following two directories to your PATH environment variable

      /net/pr/vol1/ProteomicsResource/bin/
      /net/pr/vol1/ProteomicsResource/bin/TPP/bin/tpp/bin/

   Most GS users will have the 'bash' shell by default.  For this shell,
   edit the .bashrc file in your home directory.  Do this using

      nano .bashrc

   Add the following lines to the end of that file

   export PATH=$PATH:/net/pr/vol1/ProteomicsResource/bin/:/net/pr/vol1/ProteomicsResource/bin/TPP/bin/tpp/bin/
   export TMOUT=14400
   umask 002

   Also run "winesetup.sh" once.  Ignore all error messages about
   permissions.

2b) As of version 2012010, you will need to set a environment
   variable to point to the license file.  To do this, add this to
   your .bashrc file:

      export UW_SEQUEST_LICENSE=/net/pr/vol1/ProteomicsResource/bin/uw.sequest.license

   Or if you are a c-shell user, add this to your .cshrc file:

      setenv UW_SEQUEST_LICENSE /net/pr/vol1/ProteomicsResource/bin/uw.sequest.license

3) If needed, convert all RAW files to mzXML

   For dept. users, 'ssh' into sage and issue a 'qlogin'.  Or 'ssh'
   into 'tephra'.

   Unless you're in Foege, you will have to connect through the nexus2 firewall
   machine first.  And then from nexus2, connect to tephra.  For Windows users,
   use the PuTTy program.  So ssh to nexus2.gs.washington.edu first
   (ssh is port 22) and then to tephra.

   And then issue the conversion command using one of the two below (depending
   on the case of your file extension):

      convert.sh *.RAW
      convert.sh *.raw

   This will run ReAdW under linux/WINE.  Ignore warning/error
   messages mentioning "Xserver" and "$DISPLAY".

4) Place mzXMLs and a sequest.params in a directory.

   As of release 2010.01.0,  you can generate a sequest.params
   file using the command "sequest -p".

   Be sure to set "output_format = 2" in your sequest.params file
   to generate .out files.  Note that this used to be "output_format = 1"
   to generate .out files but this changed with the 2012.01 release.
   Then on 'tephra', issue the command

      runSequestQ *.mzXML

   This issues multiple 'qsub' commands for each mzXML, with each
   job telling Sequest to search a subset scan range.  These are
   all queued up and run across all nodes in the cluster (letting
   the queuing software juggle/share cluster resources
   if multiple users submit simultaneous jobs).

   qsub logs are placed in a 'qsublogs' subdirectory since there
   are going to be many log files created.  Feel free to delete
   this directory after the searches are done; it does get deleted
   as part of running step 5.

   To check the status of the queue:
      qstat 

   To delete all jobs that you submitted:
      qdel -u <username>

5) When searches are done, you can use runSequestQ to run the
   Prophets.  You can:

   - just generate pep.xml files and tar up search results:

        runSequestQ --wosequest *.mzXML

   - generate pep.xml files and run PeptideProphet and
     ProteinProphet, combining all runs into a single analysis:

        runSequestQ --wosequest --all *.mzXML

   - generate pep.xml files and run the Prophets individually on
     each input file

        runSequestQ --wosequest --single *.mzXML

   - you can do a combination of --single and --all if you wish.

   Type 'runSequestQ' without any arguments to see all command
   line options.  Of most use would be the '--accurate' flag to
   turn on PeptideProphet's accurate mass model, useful for high
   mass accuracy instruments, and '--decoy <string>' flag
   to specify decoy entries (for a combined target-decoy search).

   Here's one common option:

      runSequestQ --wosequest --single --all --decoy rev_ *.mzXML

   Another example with SEQUEST internal decoys and accurate mass:

      runSequestQ --wosequest --single --all --decoy DECOY_ --accurate *.mzXML
   
6) Or you can run the TPP tools with your custom command line
   options using the standard 'xinteract' program.  The input
   would be the individual .pep.xml files generated above.