For those of you not using the E-value scores or performing internal decoy searches, you would not be impacted by the issues that this minor release addresses.
Here are runtimes comparing 2011.01.1 vs. 2012.01.1. The numbers are pretty close to same benchmarks using 2012.01.0 with a few small improvements.
dbase enzyme variable mods 2011.01.1 2012.01.1 foldΔ
------------------------------------------------------------------------
HH:MM:SS HH:MM:SS
yeast full tryptic 16M 2:27 0:10 14.7x
yeast full tryptic 16M, 80STY 6:05 0:55 6.6x
yeast semi tryptic 16M 9:27 1:34 6.0x
yeast semi tryptic 16M, 80STY 6:26:47 18:35 20.8x
yeast no enzyme 16M 24:07 15:15 1.6x
human full tryptic 16M 26:26 1:07 23.7x (26 min to 1 min)
human full tryptic 16M, 80STY 57:37 7:40 7.5x
human semi tryptic 16M 1:49:14 17:23 6.3x
human semi tryptic 16M, 80STY 57:54:17 2:25:22 23.9x (2+ days to 2.5 hrs)
human no enzyme 16M 4:39:59 2:53:39 1.6x
Based on the number of PSMs at 1% FDR using E-value scores, the 2012.01.1
release does a little better than the 2012.01.0 release but still lags the
2011.01.1 performance for the human semi- and no-enzyme searches. Otherwise
it is pretty equivalent or better for the other searches listed below. Chart below
replicates the TPP analysis chart further down on this page but for just the
E-value analysis showing update numbers. Numbers compare UW SEQUEST versions 2012.01.1
versus 2011.01.1.
NOTE: you must use a new sequest.params file associated with this version due to retirement of parameters and the change in behavior for the "output_format" parameter. A new parameter file can be generated using the "-p" command line option.
On the UWPR system, this executable is named sequest.2012010 which is invoked using runSequestQ.2012010. The plain runSequestQ is always symlinked to the most current version of the tool.
The previous version loaded the database in memory and searched each spectrum individually in each thread. (Actually all spectra were pre-loaded into memory also in previous version.) This version loads all spectra into memory and sequences are loaded from the database on demand. So the memory requirements are different. You need enough memory to store all MS/MS spectra and N sequences in memory (where N=# threads). Memory requirements are also a function of the bin size defined by the "fragment_ion_tolerance" parameter. Searches may have to be split up to address memory limitations. Here are examples of the memory requirements:
bin_size RES #spectra RES = code + data (actual physical memory being consumed) ------------------------- 0.36 9.7g 72,773 0.36 3.4g 23,759 0.36 1.7g 11,811 0.36 724m 4,515 0.10 * 72,773 * memory limited on 16GB machine; 0.10 5.5g 23,759 searches must be divided up to run 0.10 2.9g 11,811 0.10 1.4g 4,515 0.01 * 72,773 0.01 * 23,759 0.01 * 11,811 0.01 9.6g 4,515
Here are examples of runtimes (4,515 spectra, 0.36 bin size, +- 3.0 Da tol, 8 threads, Intel Xeon E5345 2.33GHz cpu, top 50 peptides stored):
dbase enzyme variable mods 2011.01.1 2012.01.0 foldΔ
------------------------------------------------------------------------
HH:MM:SS HH:MM:SS
yeast full tryptic 16M 2:27 0:10 14.7x
yeast full tryptic 16M, 80STY 6:05 0:58 6.3x
yeast semi tryptic 16M 9:27 1:32 6.2x
yeast semi tryptic 16M, 80STY 6:26:47 19:42 19.6x (6 hrs to 20 min)
yeast no enzyme 16M 24:07 14:43 1.6x
human full tryptic 16M 26:26 1:07 23.7x (26 min to 1 min)
human full tryptic 16M, 80STY 57:37 7:50 7.4x
human semi tryptic 16M 1:49:14 16:38 6.6x
human semi tryptic 16M, 80STY 57:54:17 2:32:15 22.8x (2+ days to 2.5 hrs)
human no enzyme 16M 4:39:59 2:51:19 1.6x
Since the underlying algorithm has changed fairly significantly (replacing Sp with xcorr as the primary score function which was optional previously but mandatory now), here's a comparison of results for this UWPR2012010 version versus the previous UWPR2011010 version on datasets analyzed in various ways. For the first few datasets, the files were processed through the Percolator algorithm using separate target/decoy database searches. The remaining datasets were processed through the TPP using concatenated target/decoy database searches.
Percolator: number of unique peptide IDs at 1% q-value FDR cutoff.
All samples are C. elegans searched through the hermie pipeline
in the MacCoss lab where the FT and Orbi runs were also processed through
Bullseye. Percolator version 2.01.
TPP: reported ID counts at 1% error rate either by PeptideProphet or ProteinProphet
(TPP 4.5.2, accurate mass option) and by FDR analysis based on SEQUEST E-values:
Here are some E-value distribution plots to show you how well the LMA algorithm (red line) fits the empirical xcorr data (blue dots). Looking at the analysis above, this new release does not perform as well as the previous release when comparing E-value based FDR analysis. Whether this is inherent in the change in scoring or due to lack of optimizations for this calculation, the answer is unknown and this will have to be revisited.
|
# comment lines begin with a '#' in the first position [SEQUEST] database_name = /some/path/yeast.fasta decoy_search = 0 ; 0=no (default), 1=concatenated search, 2=separate search num_threads = 0 ; 0=poll CPU to set num threads; else specify num threads directly (max 32) # # masses # peptide_mass_tolerance = 3.00 peptide_mass_units = 0 ; 0=amu, 1=mmu, 2=ppm mass_type_parent = 1 ; 0=average masses, 1=monoisotopic masses mass_type_fragment = 1 ; 0=average masses, 1=monoisotopic masses precursor_tolerance_type = 0 ; 0=MH+ (default), 1=precursor m/z isotope_error = 0 ; 0=off, 1= on -1/0/1/2/3 (standard C13 error), 2= -8/-4/0/4/8 (for +4/+8 labeling) # # enzyme # enzyme_number = 1 ; choose from list at end of this params file num_enzyme_termini = 2 ; valid values are 1 (semi-digested), 2 (fully digested, default), 8 N-term, 9 C-term max_num_internal_cleavage_sites = 2 ; maximum value is 5; for enzyme search # # Up to 6 differential modifications are supported # diff_search_options = 15.9949 M 0.0 X 0.0 X 0.0 X 0.0 X 0.0 X diff_search_type = 0 0 0 0 0 0 ; 0=variable mod, 1=binary mod diff_search_count = 4 4 4 4 4 4 ; max num of modified AA per each variable mod in a peptide max_num_differential_per_peptide = 10 ; max num of total variable mods in a peptide (not including terminal mods) # # fragment ions # # ion trap ms/ms: 0.36 tolerance, 0.11 offset (mono masses) # high res ms/ms: 0.01 tolerance, 0.00 offset (mono masses) # historical: 1.0005079 tolerance, 0.00 offset (mono masses) # # ion_series line: 0 0 nl A B C D V W X Y Z (nl=use neutral loss); 1st two digits are unused # ion_series = 0 0 1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 theoretical_fragment_ions = 0 ; 0=default peak shape, 1=M peak only fragment_ion_tolerance = 0.36 ; binning to use on fragment ions fragment_bin_startoffset = 0.11 ; offset position to start the binning # # output # output_format = 2 ; 0=sqt stdout (default), 1=sqt file, 2=out files print_expect_score = 1 ; 0=no, 1=yes, replace Sp with expect num_output_lines = 10 ; num peptide results to show show_fragment_ions = 0 ; 0=no, 1=yes # # mzXML parameters # scan_range = 0 0 ; start and scan scan range to search; 0 as 1st entry ignores parameter precursor_charge = 0 0 ; precursor charge range to analyze; does not override mzXML charge; 0 as 1st entry ignores parameter ms_level = 2 ; MS level to analyze, valid are levels 2 (default) or 3 activation_method = ALL ; activation method; used if activation method set; allowed ALL, CID, ECD, ETD, PQD, HCD, IRMPD # # misc parameters # digest_mass_range = 600.0 5000.0 ; MH+ peptide mass range to analyze num_results = 50 ; num prelim score results to store for xcorr analysis skip_researching = 1 ; for '.out' file output only, 0=search everything again (default), 1=don't search if .out exists max_fragment_charge = 0 ; 0=use default else set maximum fragment charge state to input value max_precursor_charge = 6 ; 0=use default (5) else set maximum precursor charge state to analyze nucleotide_reading_frame = 0 ; 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six clip_nterm_methionine = 0 ; 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine # # spectral processing # minimum_peaks = 5 ; minimum num. of peaks in spectrum to search (default 5) minimum_intensity = 0 ; minimum intensity value to read in remove_precursor_peak = 0 ; 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD) remove_precursor_tolerance = 1.5 ; +- Da tolerance for precursor removal # # additional modifications # variable_C_terminus = 0.0 variable_N_terminus = 0.0 variable_C_terminus_distance = -1 ; -1=all peptides, 0=protein terminus, 1-N = maximum offset from C-terminus variable_N_terminus_distance = -1 ; -1=all peptides, 0=protein terminus, 1-N = maximum offset from N-terminus add_Cterm_peptide = 0.0 add_Nterm_peptide = 0.0 add_Cterm_protein = 0.0 add_Nterm_protein = 0.0 add_G_Glycine = 0.0000 ; added to G - avg. 57.0513, mono. 57.02146 add_A_Alanine = 0.0000 ; added to A - avg. 71.0779, mono. 71.03711 add_S_Serine = 0.0000 ; added to S - avg. 87.0773, mono. 87.02303 add_P_Proline = 0.0000 ; added to P - avg. 97.1152, mono. 97.05276 add_V_Valine = 0.0000 ; added to V - avg. 99.1311, mono. 99.06841 add_T_Threonine = 0.0000 ; added to T - avg. 101.1038, mono. 101.04768 add_C_Cysteine = 57.021464 ; added to C - avg. 103.1429, mono. 103.00918 add_L_Leucine = 0.0000 ; added to L - avg. 113.1576, mono. 113.08406 add_I_Isoleucine = 0.0000 ; added to I - avg. 113.1576, mono. 113.08406 add_X_LorI = 9000.0 ; added to X - avg. 113.1576, mono. 113.08406 add_N_Asparagine = 0.0000 ; added to N - avg. 114.1026, mono. 114.04293 add_B_avg_NandD = 0.0000 ; added to B - avg. 114.5950, mono. 114.53494 add_D_Aspartic_Acid = 0.0000 ; added to D - avg. 115.0874, mono. 115.02694 add_Q_Glutamine = 0.0000 ; added to Q - avg. 128.1292, mono. 128.05858 add_K_Lysine = 0.0000 ; added to K - avg. 128.1723, mono. 128.09496 add_Z_avg_QandE = 0.0000 ; added to Z - avg. 128.6216, mono. 128.55059 add_E_Glutamic_Acid = 0.0000 ; added to E - avg. 129.1140, mono. 129.04259 add_M_Methionine = 0.0000 ; added to M - avg. 131.1961, mono. 131.04048 add_O_Ornithine = 0.0000 ; added to O - avg. 132.1610, mono 132.08988 add_H_Histidine = 0.0000 ; added to H - avg. 137.1393, mono. 137.05891 add_F_Phenyalanine = 0.0000 ; added to F - avg. 147.1739, mono. 147.06841 add_R_Arginine = 0.0000 ; added to R - avg. 156.1857, mono. 156.10111 add_Y_Tyrosine = 0.0000 ; added to Y - avg. 163.0633, mono. 163.06333 add_W_Tryptophan = 0.0000 ; added to W - avg. 186.0793, mono. 186.07931 add_U_user_amino_acid = 0.0000 ; added to U - avg. 0.0000, mono. 0.00000 add_J_user_amino_acid = 0.0000 ; added to J - avg. 0.0000, mono. 0.00000 # # SEQUEST_ENZYME_INFO _must_ be at the end of this parameters file # [SEQUEST_ENZYME_INFO] 0. No_Enzyme 0 - - 1. Trypsin 1 KR P 2. Chymotrypsin 1 FWY P 3. Clostripain 1 R - 4. Cyanogen_Bromide 1 M - 5. IodosoBenzoate 1 W - 6. Proline_Endopept 1 P - 7. Staph_Protease 1 E - 8. Trypsin_K 1 K P 9. Trypsin_R 1 R P 10. AspN 0 D - 11. Cymotryp/Modified 1 FWYL P 12. Elastase 1 ALIV P 13. Elastase/Tryp/Chymo 1 ALIVKRWFY P 14. Trypsin 1 KRD -