How to sequence tryptic peptides using low energy CID data

By Rich Johnson


See the ASMS link (*the page this link went to no longer exists - JIV 4/01) for official mass spec nomenclature. Collision induced dissociation (CID) technically speaking does not imply any sort of percursor mass selection or fragment ion detection; rather it just refers to the process of an ion bumping into gas atoms so hard that it breaks apart. Likewise, tandem mass spectrometry (MS/MS) does not necessarily mean that CID was employed to induce dissociations -- one could also use lasers or surfaces, for example.  In any case, I will erroneously use "CID spectra" and "MS/MS spectra" interchangeably, and by both of these terms I mean a precursor (intact) peptide ion is mass selected and allowed to undergo low energy (a few eV) collisions with neutral gas such that product ions are produced and subsequently measured. By "MS3 spectrum" I mean an intact precursor ion was mass selected, undergoes CID, one of the resulting fragment ion is mass selected for further CID, and then the resulting product ions are measured.

Why trypsin?

The following section gives a brief reminder on how to sequence tryptic peptides using mass spectral data. The reason we‚re focussing on peptides derived from tryptic proteolysis is two-fold. First, useable CID data usually can only be obtained from peptides less than 2-3 kDa, and trypsin generally produces peptides of this size. Using high resolution FTMS instrumentation, partial sequences have been deduced from CID spectra of large peptides or proteins, but in most cases the data one obtains for higher mass molecules is a mess. It seems that the easiest spectra to interpret are those obtained from doubly-charged precursors, where the resulting fragment ions are mostly singly-charged with only a few doubly-charged fragments. Doubly-charged precursors also fragment such that most of the peptide bonds break with comparable frequency, such that one is more likely to derive a complete sequence. Spectra obtained from triply-charged precursors are less likely to provide sufficient information to derive a complete sequence; however, in an ion trap (see below) it is possible to extend the sequence information of a triply-charged ion by acquiring MS3 spectra of the doubly-charged y-type ions (perhaps this will become more clear as you read on). The second reason for using trypsin proteolysis has to do with the desirability of placing basic residues, notably arginine, at the C-terminus of a peptide. It is a general observation in low energy CID that the presence of arginine in the middle of a peptide will often result in the absence of fragmentations at several contiguous peptide bonds adjacent to the arginine. Trypsin cleaves on the C-terminal side of arginine and lysine. By putting the basic residues at the C-terminus, peptides fragment in a more predictable manner throughout the length of the peptide.


Sequence-specific fragment ions

Low energy CID of peptides results in a limited number of fragment ions. The key sequence-specific fragment ions are the y-type and b-type ions, and both of these can lose water (18.011 u) or ammonia (17.027 u). Usually, the y-type and b-type ions are more abundant than their corresponding losses of water or ammonia; however, this is not always the case. For example, peptides with an N-terminal glutamine will sometimes yield spectra where the b-ions are absent or of very low abundance where the corresponding ions due to loss of ammonia are fairly intense. Similar observations can sometimes be made for peptides with an N-terminal glutamic acid or peptides rich in serine or threonine. In addition to losing water or ammonia, the b-type ions can also lose CO (27.995 u) to give the so-called a-type ion, although these ions seem to occur most commonly for the lower mass fragments containing two, three, or four of the N-terminal residues. The bond N-terminal to a proline residue seems to be particularly labile, whereas the bond on the C-terminal side is not. For peptides containing proline, this has the effect of producing a pattern where a y-type series of ions may have a particularly abundant y-type ion due to cleavage on the N-terminal side of proline, but the y-type ion resulting from cleavage on the C-terminal side of proline has a much reduced abundance or is sometimes absent. A similar phenomenon may be observed for peptides containing glycine, where the cleavage on the C-terminal side results in ions of reduced abundance.


Non-sequence-specific fragment ions -- internal and immonium ions

To complicate things further, there are a few non-sequence-specific ion types to consider. It is not uncommon to find fragment ions that result from y- and b-type cleavages at two peptide bonds, which yields the so-called internal fragment ions that contain neither the C- nor N-terminus of the peptide. These internal fragment ions can also lose water, ammonia, or CO. Often the more abundant internal fragment ions contain proline at the N-terminus of the fragment. Immonium ions cannot be used for sequence determination, but they are indicative of the amino acids present in the sequence. One should note, however, that the absence of a particular immonium ion cannot be taken as proof for the absence of the corresponding amino acid in the sequence. The converse of this is generally true; i.e., the presence of an immonium ion of sufficient abundance typically means that the corresponding amino acid is present in the sequence.


Peculiarities of ion trap versus Qtof (or triple quad) CID data of tryptic peptides

The prototypical CID spectrum of a tryptic peptide acquired using a triple quadrupole or Qtof contains a continuous series of y-type ions. The b-type ions are usually seen only at lower masses below the precursor m/z value and contain only a few of the N-terminal amino acids. Usually one finds fairly intense b2 ions that are often paired with a less intense a2 ion 27.995 u lower; as you go to b3 and b4 the ion intensity will fall off and eventually disappear.

Ion trap CID data of tryptic peptides is a bit different in that one often finds a continuous series of both b-type and y-type ions throughout the spectrum. This difference in intensity of b-type ions compared to Qtof 's and triple quads is presumably due to precursor ions falling out of resonance with the excitation frequency in the trap -- once the molecule breaks apart in the trap it ceases to undergo further fragmentations. This is in contrast to quadrupole collision cells where fragment ions continue to bump into neutral gas atoms with higher collision energy. Presumbably, b-type ions are less stable than y-type ions, and when they form in a triple quad or Qtof further collisions involving these fragments cause them to break down into smaller bits. I initially had reservations about being able to perform de novo sequencing of peptides using ion trap CID data, since the low mass end of an ion trap CID spectrum is missing (ions below about one third of the precursor m/z are lost in a trap). Hence, the low mass y-type ions that are critical for sequence determinations (as when using triple quadrupole and Qtof data) are missing in ion trap data. However, this lack of low mass y-type ions in ion trap data is compensated by the fact that one usually observes the corresponding higher mass b-type ions, which can be used to delineate the sequence at the C-terminus of the peptide. The downside of having fragment ions fall out of resonance with the activation frequency is that ion trap CID data will sometimes contain only a few ions that are very intense. For example, a peptide with proline in the third position has a particularly labile bond between the second and third residue, and in the ion trap this bond breaks very readily whereas the remaining peptide bonds break very infrequently. The end result is a spectrum containing a very intense y-type ion due to cleavage between the second and third residue, and little else. This can make it difficult to make a sequence determination.


Annoying things to remember when sequencing tryptic peptides using low energy CID


Sequencing tryptic peptides using low energy CID data

When manually sequencing a peptide from CID data, I generally take one of two approaches (I'm sure there are others, but this is what I do):

Once a sequence is hypothesized, it should be tested to see if it can be disproved somehow: