Sequencing from the C-terminus: One approach to manually determining a peptide sequence from CID data is to begin at the C-terminus. To begin sequencing a tryptic peptide, I make an initial assumption that the C-terminus of the peptide is either lysine or arginine. This assumption is usually true except for those tryptic peptides derived from non-tryptic cleavage (due to contaminating chymotryptic activity), or tryptic peptides encompassing the C-terminus of the original protein where the C-terminal residue of the protein is not lysine or arginine. The y1 ion is calculated by adding 19.018 u (three hydrogens and one oxygen) to the residue masses of lysine and arginine (147.113 u for lysine and 175.119 u for arginine). If either mass is present, then make a note of this. If you are dealing with ion trap data, the y1 ions will be below the mass cutoff; however, you often find the corresponding high m/z b-type ion containing all of the residues except the arginine or lysine at the C-terminus (see above). These b-type ions are calculated by subtracting 17.002 u (one oxygen and one hydrogen) from the peptide molecular weight, and then subtract from this value the residue masses of arginine or lysine. If such a b-type ion is found corresponding to the loss of arginine or lysine, then make a note of this. Sometimes I don't find such ions in ion trap data, in which case I make a note of what the calculated values would be for these ions (to be used later). If a y1 ion for lysine or arginine is found, I then subtract higher product ion masses from the y1 mass, and then check in the residue mass table to see if any mass differences correspond to an amino acid. If I can make any amino acid residue mass jumps up from the y1 ion, I make a note of what these putative y2 ions might be, and then subtract these residue masses from the aforementioned high mass b-type ions corresponding to loss of arginine or lysine (for ion trap data; for Qtof data I don't bother since I rarely see high m/z b-type ions). I keep stepping up to the y3 ion and higher, while at the same time checking to see if the corresponding high m/z b-type ion is present. Eventually as amino acid residue masses are added to the y-type ion series, it passes by the b-type ion series from which the same amino acid residue masses are subtracted. Of course any partial sequence that seems to have both high m/z b-type as well as the corresponding y-type ions are the favored partial sequence. Eventually, you will get a complete sequence for the peptide, where your hypothesized sequence has a calculated mass that equals the observed peptide mass (within the error tolerance of the peptide mass measurement). Often you cannot get a complete sequence all the way to the N-terminus, since its not uncommon for a CID spectrum to lack fragmentations between the first and second amino acids at the N-terminus. Hence, the N-terminus of your proposed sequence is not a sequence, but is instead a combined residue mass of the two amino acids. You should, however, make sure that this unsequenced mass at the N-terminus corresponds to the sum of of two amino acid residue masses. For example, an unsequenced N-terminal mass of 150 u is not possible in the absence of the additional mass of a post-translational modification.
Sequencing from the middle somewhere, identifying the N-terminus, and proceeding to the C-terminus: A diffierent approach is to get some partial sequence from the middle of the peptide and try to connect this partial sequence to the peptide N-terminus. For Qtof or triple quadrupole data one often finds a short stretch of fairly intense ions at a m/z greater than the precursor m/z, where the mass differences between the ions in the series correspond to amino acid residue masses (these are the so-called sequence tags introduced by Matthias Mann). In principal, one does not know if these are b-type or y-type ions (and hence, whether the partial sequence goes forward or backwards), but for Qtof and triple quad data for tryptic peptides it is usually safe to guess that this is a partial y-type ion series. If I find such a short series of ions located at higher m/z than the precursor, I subtract the highest mass ion in this series from the mass corresponding to the peptide mass plus 2.016 u (this is the mass of two hydrogens; for now don't worry about why I do this). This mass difference would correspond to a hypothetical lower mass b-type ion. Often times a high m/z y-type ion series will encompass all but the two N-terminal amino acids, so that the aforementioned mass difference corresponds to a b2 ion. If such an ion is present, I'll also check to see if there is another ion 27.995 u lower, which would possibly be the matching a2 ion. If I find all of these ions (the putative high m/z y-type ions plus the alleged low m/z b-type (plus the a-type ion), I will then start thinking I'm on the right track. If you are lucky, the high m/z y-type ion series extends all the way to the N-terminus, in which case this mass difference corresponds to b1 ion (an amino acid residue mass plus a hydrogen). Don't bother looking for a b1 ion; they don't exist. If I really found a partial y-type ion series, and I then try to identify the low mass b-type ion series that corresponds to the high m/z y-type ion series. For Qtof and triple quad data of tryptic peptides, the b-type ion series usually starts to piddle away to nothing, while at the same time the y-type ion series (as I work towards the C-terminus) starts to head to the low m/z end of the spectrum. This is where you start to run into most of the ambiguities in Qtof and triple quad data, since you've usually lost the b-type ion series at the high m/z end and your sequence determination is now relying solely on ions at the low m/z end of the spectrum where one usually observes many more fragment ions of different type (i.e., they could be internal ions, b-type, a-type, y-type, etc.). Anyway, you keep trying to connect the high m/z y-type ion series until you reach a y1 ion for the C-terminal lysine or arginine (147.113 or 175.119 u, respectively).
Checking a proposed sequence: If you think you've deduced the sequence, check off the ions that would correspond to the y-type, b-type, and a-type ions, as well as the losses of ammonia or water from these ion types. Then check to see if any of the remaining ions not accounted for are possibly due to internal fragmentations. Internal fragments are usually fairly short (less than five or so residues), and are calculated by summing the amino acid residue masses together and adding the mass of hydrogen. These ions can also lose water, ammonia, and/or CO, so check for the presence of these, too (they're usually less intense than the original internal fragment). In particular, check for internal fragments that have proline at the N-terminus of the fragment (e.g., the sequence FSTPEDLMNK would very likely have the internal fragments PE, PED, and PEDL). At this point you should have accounted for most of the more abundant ions; in particular, you should be able to account for the more abundant ions at a m/z greater than the precursor m/z. There will always be a few ions left over.
Further verification using an ion trap: For ion trap data using a nanospray source, its fairly easy to obtain MS3 spectra where you select a fragment ion in the MS/MS spectrum for further CID and analysis. There has been some gossip that CID performed on a b-type ion can result in rearrangements in the ion trap, and for this reason I prefer to select y-type ions (y-type ions presumably have the same ion structure as protonated peptides, and should behave themselves). Since I usually don't know which ions are which, and since nanospray affords significant spray time, I generally obtain MS3 spectra for several fragment ions.
I have three ways of selecting a fragment ion from an MS/MS spectrum for MS3 analysis.
Once I've interpreted the MS/MS spectrum, I can then determine if the fragment ion precursors for my MS3 spectra are y-type ions. If according to my interpretation of the MS/MS spectrum I have a MS3 spectra of a y-type ion, I then have a hypothetical structure for this y-type ion and I can check to see if the fragmentation pattern in the MS3 spectrum matches with what would be predicted from the supposed structure. If the deduced peptide sequence accounts for the major ions in both the MS/MS and MS3 spectra, then there is a good chance I've stumbled onto the correct sequence.
Further verification using a Qtof: MS3 spectra cannot be obtained on a Qtof; however, the high mass accuracy can be used to eliminate some of the ambiguity. Ion traps have a mass accuracy of +/- 0.4 u, whereas in my hands I get +/- 0.02 u with the Qtof. When interpreting MS/MS data it is a very good idea to keep these mass accuracies in mind. For example, if the difference between two ions in a spectrum is 115.2 u, on an ion trap this would fit with the residue mass of aspartic acid. However, for Qtof data this mass difference is too high for aspartic acid, since the observed mass difference is outside of the calculated residue mass of 115.03 u plus a tolerance of 0.02 u. Hence, fewer possible sequences can be generated to account for the data. Digestion of proteins in the presence of mixtures of 16O and 18O water has been used to tag the C-terminus of proteolytic peptides. For peptide sequencing by mass spectrometry, tryptic digestion is typically done in buffer made with water containing a 1:1 mixture of these isotopes, and the MS/MS spectrum is acquired with a precursor resolution reduced so that precursor ions containing both isotopes are subjected to CID. By tagging the C-terminus with a stable isotope doublet, fragment ions containing the C-terminus (e.g., y-type ions) exhibit a two u doublet, and all other fragments lacking the C-terminus do not. Sorting out which fragment ions contain the C-terminus can greatly reduce the ambiguity of a sequence determination. Of course, this stable isotope trick can be done with any instrumentation, but for some reason it has become popular with the Qtof people.