Running Comet at UWPR

Cliff notes version to run a Comet search

Generate a comet.params file:

   comet -p
   mv comet.params.new comet.params

Run a search:

   runCometQ *.RAW    (to do mzXML conversion and submit search in 1 step)
   runCometQ *.mzXML  (if you already did mzXML conversion)

To check if searches are running:

   qstat                    (or qstat -q pr-short.q)
   qstat | grep userid      (this will only list out your jobs, replace "userid" with your id)

Another way to look at search status:

   cat qsublogs/* | less     (list out all output in the log files including Comet reporting)

Lastly, the best way to know if searches are done is if the last line of each
.pep.xml file contains "</msms_pipeline_analysis>".  You can check with the
command below.  If you don't see "</msms_pipeline_analysis>" for every entry,
it means your searches are not complete.

   tail -n 1 *.pep.xml 

Note by default, runCometQ will request job resources for processes that run 48 hours.  If
your search will likely run longer than 48 hours, use the "--hours" command line option
to request up to 167 hours (7 days) of run time.  This option is currently
implemented for runCometQ.2015022 and newer.  The only drawback to always requesting 167
hours of runtime is that your job could be blocked by a scheduled maintenance that's
1 week out that would otherwise have finished searching before that time.

   runCometQ --hours 167 *.mzXML

Example commands after search:

   runCometQ --wocomet --all *.RAW
   runCometQ --wocomet --deleteraw --single *.mzXML
   runCometQ --wocomet --all --decoy DECOY_ *.mzXML


Open the .pep.xml or .prot.xml file in your browser.  That's it.
Type "runCometQ" without argument to see all options.

------------------------------------------------------------------------------

Helpful linux commands:

mkdir directory      make a directory
rmdir directory      removes an empty directory
rm -rf directory     removes a directory that is not empty
rm file              removes a file
rm -f file           removes a file without prompting
less file            show contents of a (text) file to screen
cp file1 file2       copy file1 to file2
mv file1 file2       move/rename file1 to file2
ln -s file1 file2    create symbolic link from file1 to file2
cd directory         change to directory
cd ..                move up a level/directory
ls                   directory listing
ls -l                long directory listing incl. file size, date, etc.
pwd                  print present working directory
nano file            simple text editor
cp from to           copy command
history              list of previous commands
!num                 execute command num from history
up-arrow             will scroll through previous commands

------------------------------------------------------------------------------

1) You will need a Genome Sciences user account.  See
      http://gs-its.gs.washington.edu/accounts/
   to set one up if you don't have such an account.  Once your GS account is in
   good standing, it will need access to a couple of our servers to allow you to
   run the analysis.  Your account will also need to be added to the 'pr' and
   'pr-apache_g' groups; I have to authorize this with GS-IT.

   If you're in the Genome Sciences department, you can run these searches on the
   "sage" cluster.  Simply log into "sage" and follow directions below.  If you
   are a UWPR user, you could log into "tephra" to access the UWPR cluster.

2) You will need to be added to the "pr" and "pr-apache_g" groups in order to login
   to tephra or any other UWPR machines.  You will need to be added to the
   appropriate cluster project group.  I (Jimmy) need to make the GSIT request to
   get these setup.

   Have your account setup correctly.  This includes passwordless authentication
   on sage/tephra nodes.  This is accomplished by adding your public key to the
   respective server's .ssh/authorized_keys file.  You can do this by typing the
   following two commands from within your home directory:

      ssh-keygen  <hit enter to all prompts>
      cp .ssh/id_rsa.pub .ssh/authorized_keys

   This also includes adding the following two directories to your PATH environment
   variable:

      /net/pr/vol1/ProteomicsResource/bin/
      /net/pr/vol1/ProteomicsResource/bin/TPP/bin/tpp/bin/

   Most GS users will have the 'bash' shell by default.  For this shell, edit the
   .bashrc file in your home directory.  Do this using

      nano ~/.bashrc

   Add the following lines to the end of that file

   export PATH=$PATH:/net/pr/vol1/ProteomicsResource/bin/:/net/pr/vol1/ProteomicsResource/bin/TPP/bin/tpp/bin/
   export TMOUT=14400
   export PS1='\n\[\e[1;31m\]\h \w  \n> \[\e[0m\]'
   alias cd='cd -P'
   umask 002

   As of 10/2015, you will also need to set/edit the file .sge_request.bak in
   your home directory.  Add one of "-P pr_bruce", "-P pr_maccoss",
   "-P pr_yadlin" as a line in this text file.

      nano ~/.sge_request.bak

   Also run "winesetup.sh" once.  Ignore all error messages about permissions.
   (Actual file should be .sge_request but that globally sets values for all
   cluster commands which is bad for users who want to submit jobs to other
   clusters.  So I edited our runCometQ scripts to read from .sge_request.bak.)
   Wine is not installed on nexus2 so this command needs to be run on tephra.

   Sign up for an account on the UWPR web application using the "Not registered?"
   link on the top left of this page.  For our purposes, this is simply used to
   set your web password that our analysis tools use to access/view search results.
   Use the same username as your UW NetID.  Choose any password you want.  Again,
   this is for access to visualize data through our web server which requires its
   own authentication.

   As of 2017/02/21 with the TPP 5.0 update, your account needs to be setup to
   run an updated Perl that's available through kernel modules.  So add the
   following 2 lines to your .baskrc and .bash_profile files in your home
   directory (assuming you're using the bash shell):

      . /etc/profile.d/modules.sh 
      module load modules modules-init modules-gs perl/5.24.0

2a) If you want to view data through the webserver (e.g. open pep.xml and
   prot.mxl files), you must register a UWPR account.  You can do this by clicking
   on the "Not registered?" link on the top left corner of this page.  We use
   the username and password that you set here to authenticate our webserver
   access.  You're probably going to have to remind me that you did this step as
   I have to do a manual step with this also in order to allow your access.

3) Convert data to mzXML and run search:

      runCometQ *.RAW

   Or if you want to do the mzXML conversion and search separately:

      convert.sh *.RAW
      runCometQ *.mzXML

   Ignore warning/error messages mentioning "Xserver", "$DISPLAY", "fixme:",
   etc.  which come up doing the mzXML conversion step (using ReAdW running
   under a Windows emulator on linux).

   To use the Genome Sciences cluster, 'ssh' into sage and issue a 'qlogin'.
   Or 'ssh' into 'tephra'.

   Unless you're in Foege, you will have to connect through the nexus2 firewall
   machine first.  And then from nexus2, connect to tephra.  For Windows users,
   use the PuTTy program.  Actually my favorite Windows terminal program is
   now MobaXterm.  So in either program, ssh to nexus2.gs.washington.edu first
   and then to tephra.

   This will submit (aka 'qsub') each mzXML as a separate 'job' to the cluster
   i.e. one mzXML will be searched on one node.  In addition to any other
   output formats you want, make sure to set "output_pepxmlfile = 1" in the
   params file.

   qsub logs are placed in a 'qsublogs' subdirectory since there are going to
   be many log files created.  Feel free to delete this directory after the
   searches are done; it does get deleted as part of running step 5.

   To check the status of the queue:

      qstat

   To delete all jobs that you submitted:

      qdel -u <username>

   Note that as of 10/2015, we switched over to using a shared job submission
   queue so 'qstat' may show a long list of jobs that are submitted to different
   clusters.  To just see the UWPR queue, use:

      qstat -q pr-short.q

4) When searches are done, you can use runCometQ to run the Prophets.  You can:

   - run PeptideProphet and ProteinProphet, combining all runs into a single
     analysis:

        runCometQ --wocomet --all *.mzXML

   - run the Prophets individually on each input file

        runCometQ --wocomet --single *.mzXML

   - you can do a combination of --single and --all if you wish.

   Type 'runCometQ' without any arguments to see all command line options.  Of
   most use would be the '--decoy <string>' flag to specify decoy entries (for
   a combined target-decoy search).

   Here's one common option:

      runCometQ --wocomet --single --all --decoy rev_ *.mzXML

   Another example with Comet internal decoys:

      runCometQ --wocomet --single --all --decoy DECOY_ *.mzXML
   
5) Or you can run the TPP tools with your custom command line options using
   the standard 'xinteract' program.  The input would be the individual
   .pep.xml files generated above.


6) Consider signing up for the UWPR_computation mailing list if you want to be
   kept abreast of UWPR computational announcements such as Comet software
   updates, cluster issues, etc.  You can subscribe/unsubscribe at the link
   above.