2002 Omics Protein Mixture Dataset

NOTE: unfortunately these spectrum files are in the old SEQUEST .dta format.

A. Keller, S. Purvine, A.I. Nesvizhskii, S. Stolyar, D.R. Goodlett, and E. Kolker, "Experimental Protein Mixture for Validating Tandem Mass Spectra Analysis", OMICS 6(2), 207-212 (2002). https://pubmed.ncbi.nlm.nih.gov/12143966/ (PDF).

If you're interested in this data, send me (engj@uw.edu) or anyone else at the UWPR an email and we will supply you a download link. These files are not available/linked here direct download to avoid constant downloads by data crawlers.

The css used for rendering these pages is horrible; this will be fixed soon.

The content:

  1. 22 .zip files - 22 LC/MS/MS runs, 14 runs on control mixture A and 8 on control mixture B. For example, the file 'sergey_digest_A_full_01.zip' contains all MS/MS (.dta) files generated in the first LC/MS/MS run on mixture A.
  2. annotation file, 'list_of_positives.txt'. This file lists all MS/MS spectra that were correctly identified by SEQUEST. The format:

    spectrum charge_state protein peptide

    The ending '...2/3' in the spectrum name indicates that the same MS/MS spectrum was searched using SEQUEST twice, once assuming it was a 2+ precursor ion (spectrum name ending with .2) , and once assuming it was a 3+ ion (spectrum name ending with .3). The number following the spectrum name indicates what the correct charge state.

    For example,

    ./sergei_digest_A_full_01.0469.0471.2/3 2 sp|P02666|CASB_BOVIN K.VKEAMAPK.H

    means that the spectrum 'sergei_digest_A_full_01.0469.0471.2' was assigned a peptide K.VKEAMAPK.H corresponding with CASB_BOVIN protein (one of the 18 control proteins), and that that identification was a correct one.

  3. database file, 'control_mixture.db' with the sequences of the 18 control proteins and known contaminants.
  4. Folder called SEQUEST with
    • 22 SEQUEST output files (html files)
    • sequest.xls, that contains all assignments from all 22 SEQUEST output files in Excel format. The most important columns in that file are : B (spectrum name), C (precursor ion mass), D (difference between measured and theoretical peptide mass), E (Xcorr), F (delta Cn), H (Sp rank), K (protein name), and M (peptide sequence)
  5. omics_6_2002.pdf - the manuscript describing the data set.