Created: 1st September 2000, last updated: 30th October 2000, © 2000 ABRF
Curtis G. Croker, John O. Pearcy, Douglas C. Stahl, Roger E. Moore, Denise A. Keen, and Terry D. Lee
Beckman Research Institute of the City of Hope, Duarte, CA
Mass spectrometry has become an indispensable analytical tool for studies related to the structure and function of peptides and proteins. The variety of analytical methods, the range of instrument capabilities, and the complexity of the data obtained make it difficult for most laboratories to acquire the necessary expertise to make optimal use of their instrumentation. We describe an expert system approach to automating specific types of analyses in a way that makes it easier to transfer the capability to do specific experiments to other laboratories. Central to the approach is the creation of a computer program (ie, a virtual instrument) that controls the operation of physical components, analyzes incoming data, automatically adjusts instrument parameters to achieve the goal of the analysis, and reports the results. By interacting with the mass spectrometer through the computer operating system, it is possible to add useful functions to the system without altering any of the manufacturer-controlled data system software. The usefulness of this approach is illustrated by the automation of experiments to confirm the sequences of synthetic peptides and perform LC/MS/MS peak parking experiments and real-time database searches. (J Biomol Tech 2000;11:135-141)
Key Words: automation, data-controlled analysis, expert systems, liquid chromatography, tandem mass spectrometry, virtual instruments.
Address correspondence and reprint requests to: Terry D. Lee, Division of Immunology, Beckman Research Institute of the City of Hope, 1450 East Duarte Road, Duarte, CA 91010 (email: tdlee@coh.org).
In recent years, tandem mass spectrometry (also known as mass spectrometry/mass spectrometry, MS/MS, or MS2) has become an indispensable tool for peptide and protein structural analysis. The ability to select one mass-to-charge (m/z) value, selectively fragment those ions, and record the resulting spectrum makes it possible to obtain structural data on peptides even when they are components of complex mixtures. Mass spectrometer data systems have sophisticated programs for controlling the instrument operation and analyzing the collected data. Most provide some degree of programming capability that allows users to configure the system for particular types of analyses. A good example is the ability of some systems to collect both full-range mass spectra and fragment ion spectra (MS/MS) for mixture components eluting from a liquid chromatography (LC) column. To accomplish this, full-range mass spectra are analyzed in real time, and ions are selected for collision-induced dissociation. Such data-controlled1,2 or data-dependent3 operations mark the beginning of more "intelligent" methods of data collection. The rate of further development of these data-controlled analyses has been limited by the rate at which mass spectrometer manufacturers can incorporate new features into their data system software.
More rapid development of data-controlled analyses could occur if mass spectrometer companies provided an interface in their data system software that made it possible for users to interact with the mass spectrometer using a computer program. This would allow the development of problem-specific applications and enable the user to more easily incorporate the mass spectrometer into a larger analytical system. The instrument manufacturer would retain full control of its own data system software, particularly that which interacts directly with the hardware components of the instrument. The mass spectrometry community as whole could then contribute to the development of new functions that could be readily distributed to other researchers.
We demonstrated the feasibility of this approach by incorporating a ThermoQuest LCQ mass spectrometer (San Jose, CA) as part of an automated system for peptide analysis. National Instrument's LabVIEW programming environment (Houston, TX)4 was used to create a program module that interacted with the LCQ software directly through the computer operating system. In LabVIEW, such program modules are called virtual instruments (VIs). VIs can be linked together using a graphic interface to form complex analytical systems. The VI approach has also proved useful for developing rule-based decision-making programs or expert systems5 that have been used to analyze incoming MS data and automatically adjust instrument parameters and mode of operation. The term "expert virtual instrument" (EVI) refers to a computer program that combines the elements of instrument control with real-time data analysis.
All software was created using National Instrument's LabVIEW version 5.1 operating under Windows NT version 4.0, service pack 5 (Microsoft, Redmond, WA).
All mass spectra were acquired on a ThermoQuest LCQ mass spectrometer equipped with a custom microscale electrospray (ES) interface for either nanospray or capillary liquid chromatography-mass spectrometry (LC/MS) operation. The data system was operated using either the ThermoQuest LCQ software version 1.2 or Xcalibur version 1.0. Nanospray needles were pulled from 1 mm outer diameter X 0.7 mm inner diameter borosilicate glass (World Precision Instruments, Sarasota, FL) using a Sutter Instruments Model 2000 laser puller (Novato, CA). Sample solutions (1-5 µl, approximately 5 pmol/µl in 49:49:2 water:acetonitrile:acetic acid) were loaded into the needles using a gel-loading micropipette tip. The needles were mounted on a Pt wire attached to an x-y-z positioner. The Pt wire served as the support as well as the electrode for the ES potential (500-700 V). The ES needles for liquid chromatography-tandem mass spectrometry (LC/MS/MS) analyses were pulled from 350 mm outer diameter X 150 mm inner diameter fused silica capillary tubing and packed with Vydac 218TP C18 reverse-phase chromatography support (The Separations Group, Hesperia, CA) as previously described.6
All separations were done using a custom-built gradient loop LC system. System components are essentially the same as previously described,7 except that a LabVIEW program was used as a controller. A detailed parts list and assembly instructions for the LC system and the Windows NT compatible control software can be downloaded from the Internet (http://www.cityofhope.org/microseq/download.html).
Bruce Kaplan (City of Hope Peptide and DNA Synthesis Facility, Duarte, CA) synthesized the synthetic peptide (SHLVEALYLVCG). The phosphopeptide (LFT GHPET*LEK, where T* denotes phosphothreonine) was obtained from the Protein/DNA Technology Center at The Rockefeller University as part of the 1997 Association of Biomolecular Resource Facilities (ABRF) Mass Spectrometry Committee Collaborative Study (http://www.abrf.org/ABRF/ResearchCommittees/msrcreports/abrf97ms.html). The peptide (0.5 pmol) was combined with the mixture of peptides (2 pmol) obtained by digesting equine cytochrome c with trypsin. Database searches were done using Sequest8 supplied as part of the ThermoQuest LCQ mass spectrometer data system. For the specific example shown, the database consisted of the sequence of equine cytochrome c, to which was added the sequence of the phosphopeptide.
As we have designed it, the EVI consists of a library of experiments that have various resources in common (Fig. 1). In this instance, the resources consist of an ion trap mass spectrometer, a gradient loop LC system, and the Sequest database search program. Each experiment can function as a stand-alone EVI. However, conflicts could arise if more than one experiment tried to access the same resource at the same time. By placing the experiments under the control of an experiment selector, conflicts are avoided and order is imposed on system operations. For example, an experiment that that uses only MS can be performed at the same time as an experiment that uses only LC. However, any experiment requiring both resources could not be selected unless both were available. To the user, the experiment selector is simply a menu from which available experiments can be selected. Experiments can range in complexity from simple routine housekeeping operations, such as refilling the LC syringe pump, to a fully automated LC/MS/MS analysis of a complex mixture of peptides. Each experiment has a built-in user interface where parameters for the analysis and information about the sample can be entered, instrument parameters can be displayed, and progress of the experiment can be monitored. Reports of the results from experiments are displayed on the computer screen and can also be printed.
FIGURE 1. Diagram of an expert virtual instrument (EVI)-controlled LC/MS system.
The interface between the EVI and system resources varies depending on the nature of the software programs that control that resource. Sequest is only a program, with no hardware components other than the computer. The EVI initiates a Sequest search on a spectrum by creating a .dta file and sending a command for Sequest to do the search on that file and write the output file where the EVI can access it. The gradient loop LC is a collection of pumps and valves controlled by a LabVIEW program. A communication interface was built into the LC software, and commands are received from the EVI using the TCP/IP protocol. The LCQ software, as provided by ThermoQuest, does not contain an interface that can be used to control the instrument using an outside computer program. This kind of control was achieved by working through the WIN32 API functions of the Windows NT operating system. The Tune Plus program in the LCQ software provides the user with access to instrument controls through the keyboard. This program utilizes WIN32 API to create the windows, buttons, and data-entry fields that are used to enter instrument parameters and control data acquisitions. LabVIEW VIs were created to access these elements and make the necessary entries as if it were being done through the computer keyboard. The net result is that a human operator is replaced with a computer program. Real-time access to the incoming mass spectral data was achieved by reading it directly out of the computer random access memory.
Programming the EVI was most efficient if the mass spectrometer was considered to be a collection of functions rather than a single entity. Sub-VIs were created with everything needed to perform a particular function and return the result. For example, the full MS sub-VI sets the LCQ to full MS mode, reads the collected spectrum out of random access memory and returns it to the EVI. The full MS2 sub-VI sets the LCQ to MS/MS mode, sets the precursor ion mass and relative collision energy, and returns the collected fragment ion spectrum to the EVI. LabVIEW uses a graphic interface to perform the actual programming operation. Different VIs and other program elements are linked together using a wiring diagram. The MS function sub-VIs can be placed in any diagram in which the result of that function is needed. The wiring diagram provides a visual representation of the program structure and is also an interface that can be opened whenever it is necessary to troubleshoot problems or alter the program function.
Each EVI experiment has an interface for the user to input problem-specific information and set any user-defined parameters. The design of an EVI experiment should include sufficient computer-coded expertise to adequately analyze the data and report the results of the analysis. Results of an analysis are returned in the form of a report displayed on the screen, sent to a printer, or written to a file. After the final structure for an EVI experiment has been set, the program can be compiled along with the LabVIEW runtime engine for easy distribution and use by computers that do not have LabVIEW installed.
Tandem mass spectrometry is ideally suited to confirm the structure of chemically synthesized peptides. A correct observed mass value confirms the amino acid composition and removal of all protecting groups. A fragment ion spectrum contains information that can confirm that the amino acids have been assembled in the correct order. In most instances, MS is a much more efficient method of checking the structure of a synthetic peptide than Edman microsequencing. The analysis takes minutes rather than hours and can be used for peptides that contain nonstandard amino acid residues. The analysis of synthetic peptides by ion trap MS was selected as the first example of an analysis that could be fully automated from start to finish.
In addition to preparing the sample and mounting it in front of the mass spectrometer, the user inputs the expected sequence of the peptide and sets parameters related to ion intensity thresholds used in analyzing and reporting the data. The EVI calculates the molecular weight of the expected sequence and the m/z values for charge states 1 through 3. It also calculates the expected fragment ions for the A, B, and Y" fragment ion series. A full-range mass scan is acquired, as are MS/MS spectra for ions corresponding to each of the expected charge state m/z values. Fragment ions in the MS/MS spectrum are correlated to those calculated for the expected sequence. If gaps in the sequence information are found, additional MS/MS experiments are performed in an attempt to fill those gaps. When the analysis is complete, a report is created.
The analysis of a peptide with the sequence SHLVEALYLVCG is used as an example (Fig. 2). In the report, the full-range mass scan (Fig. 2A) was annotated to indicate the different charge states for the expected sequence. Sodium adducts are frequently observed in the nanospray spectra of synthetic peptides, and these ions were also marked. Ions that were not marked are normal low-mass background ions or ions from contaminates in the sample. Only the 1+ charge state was observed for this peptide. The zoom scan (Fig. 2B) provided the observed monoisotopic mass value, and the 1-mass unit spacing of the isotope cluster confirmed the charge state assignment.
FIGURE 2. Mass spectral data acquired from the expert virtual instrument analysis of a synthetic spectrum. (A) Full-range mass spectrum. Ions marked with an "F" are consistent with fragments of the peptide. (B) Zoom scan of the 1+ charge state ion cluster. (C) MS2 spectrum of the 1+ charge state. (D) Full-range mass spectrum with in-source collision-induced dissociation. (E) MS2 of 637 fragment ion.
For this peptide, the most likely sites for protonation are the amino terminus and the histidine residue that is one amino acid away from the amino terminus. As a consequence of having the charge located at the amino terminus, the MS/MS (MS2) spectrum of this peptide (Fig. 2C) contained only B ion series fragments and associated water losses. The ion series was complete from B6 through B12. Those ions confirmed the order of the last six residues. Confirmation of the rest of the sequence would require further fragmentation of one or more of the fragment ions. On the LCQ ion trap mass spectrometer, such confirmation can be obtained by selecting an ion from MS2 spectrum and fragmenting that ion to yield an MS3 spectrum. Alternatively, the ion source can be tuned such that collision-induced dissociation produces fragments that appear in the full-range mass spectrum. When this was done for the peptide in question, the full-range mass spectrum (Fig. 2D) yielded a B6 ion of good intensity that was selected for MS/MS analysis. The resulting spectrum (Fig. 2E) contained enough additional sequence information to confirm the order of all remaining residues except for the first two. The approach of performing MS2 analyses on in-source collision-induced dissociation-produced fragments can be done on instruments that lack the capacity to collect MS3 or higher spectra.
In its present form, the EVI program has only limited capabilities for analyzing peptides that have incorrect structures. If no sequence is given or no ion is found corresponding to one of the predicted charge states in the full MS spectrum, zoom and MS/MS spectra are collected on the base peak, and the analysis is terminated. Most errors in peptide synthesis arise from failure to remove a protecting group, deletions of one or more residues, or the substitution of one amino acid for another. Future versions of the program could incorporate additional expertise to determine where in the sequence these errors have occurred.
All of the spectra produced by the EVI-directed analysis are also collected in a normal LCQ data file. The report includes the file name and scan number of each annotated spectrum. If there is a need to examine any of the original data, this can be done using the standard data system software. The compiled version of the EVI can be installed and used on any LCQ data system computer. No changes are needed in either the mass spectrometer hardware or software to perform this automated analysis.
The second example of an EVI-directed analysis was chosen to illustrate the ability to coordinate the activities of individual components of a complex analytical system. The gradient loop LC system and the microscale ES interface described previously are capable of rapid flow rate changes while maintaining stable ES emission. At reduced flow rates, peaks take longer to elute from the column, extending the time available for the mass spectrometer to collect spectra. Automated "peak parking" requires real-time analysis of the incoming spectral data and adjustment of the instrument parameters for both the LC and the mass spectrometer. The EVI was programmed (Fig. 3) to initiate a parking event when an ion not on the dynamic exclusion list in the full-range mass spectrum had an intensity higher than the user-defined threshold. In this instance, an MS2 spectrum is acquired for that ion, and its m/z value is written to the exclusion list. The MS2 spectrum is analyzed for the intense -98 and -49 (2+ charge state) fragment ions that are characteristic of phosphopeptides. If found, the MS2 spectrum is sent to Sequest, which matches it to the sequence of the protein. If the spectrum matches a phosphopeptide with a good score, the next full-range mass spectrum is collected. If no match is obtained, an MS3 spectrum is collected on the most intense fragment ion to obtain more sequence information. If the characteristic -98 or -49 losses are not observed, the next full-range mass spectrum is collected. If no nonexcluded ions are above threshold, the parking event is terminated.
FIGURE 3. Flow diagram for the expert virtual instrument experiment designed to peak park, collect MS2 spectra, and perform Sequest searches on phosphopeptides.
Operation of the EVI experiment was tested on the mixture of peptides obtained from the trypsin digestion of cytochrome c spiked with a small amount of a synthetic phosphopeptide. MS2 spectra were collected for all of the cytochrome c peptides commonly observed (data not shown) as well as for the singly and doubly protonated forms of the phosphopeptide (Fig. 4). Both phosphopeptide spectra scored well enough in the Sequest search to confirm the identity of the peptide and assign the correct threonine residue identified as the phosphorylated amino acid. With sufficiently good Sequest scores for these spectra, no MS3 spectra were collected for these ions. However, one other MS2 spectrum (derived from an impurity) contained an intense -98 fragment. Sequest could not match the spectrum, and an MS3 spectrum was collected for that ion (data not shown). Neither the MS2 or MS3 spectrum could be interpreted to provide the identity of the contaminant.
FIGURE 4. Full-range mass and MS2 spectra for each charge state of the phosphopeptide LFTGHPET*LEK collected during an expert virtual instrument-controlled analysis of a tryptic digest mixture. Results of the real-time Sequest database search are shown above each MS2 spectrum.
The principal limitations of the EVI approach are those imposed by the components of the analytical system. The pumps for the LC system come with an interface to communicate with an external computer and well-documented program language for sending commands and receiving sensory data. Incorporating these into the system was relatively easy. Conversely, the mass spectrometer has no interface to interact with a computer program. Although tools exist to create an interface without the assistance of the manufacturer, this requires considerable computer programming expertise. More importantly, much of the work has to be redone each time there is a new release of the mass spectrometer data system software. This limitation to the EVI approach would no longer exist if mass spectrometer companies made their instruments as accessible to computer programs as they are to human operators.
The EVI approach described herein demonstrates the usefulness of combining real-time instrument control and data analysis to fully automate different types of MS analysis of peptides and proteins. By using expert system approaches to analyze the data and decide the course of the analysis, it is possible to make the analysis more efficient and avoid collecting redundant or superfluous data. Expertise coded in a computer program is more easily transferred to another laboratory. With the EVI approach, functions can be added to a mass spectrometer without altering any of the manufacturer's hardware or software components. The principal limitation is the need to reengineer the interface to the mass spectrometer with each new release of the data system software.
This work was supported in part by grants RR06217 and CA3572 from the Public Health Service, National Institutes of Health, Bethesda, Maryland.
1. Stahl DC, Swiderek KM, Davis MT, Lee TD. Data-controlled automation of liquid chromatography tandem mass spectrometry analysis of peptide mixtures. J Am Soc Mass Spectrom 1996;7:532-540.
2. Stahl DC, Martino PA, Swiderek KM, Davis MT, Lee TD. Automated LC/MS/MS analysis of peptide mixtures using capillary HPLC and electrospray ionization on a triple sector quadrupole mass spectrometer. In The 40th Conference on Mass Spectrometry and Allied Topics. Washington, DC: American Society of Mass Spectrometry, 1992:1801-1802.
3. Lim HK, Stellingweif S, Sisenwine S, Chan KW. Rapid drug metabolite profiling using fast liquid chromatography, automated multiple-stage mass spectrometry and receptor-binding. J Chromatogr A 1999;831:227-241.
4. Johnson GW. LabVIEW graphical programming: practical applications in instrumentation and control. In Machover C, ed. McGraw-Hill Series on Visual Technology. New York: McGraw-Hill, 1994:522.
5. Waterman DA. A guide to expert systems. In Hayes-Roth F, ed. The Teknowledge Series in Knowledge Engineering. Reading, MA: Addison-Wesley, 1986:419.
6. Davis MT, Lee TD. Rapid protein identification using a microscale electrospray LC/MS system on an ion trap mass spectrometer. J Am Soc Mass Spectrom 1998;9:194-201.
7. Davis MT, Stahl DC, Lee TD. Low flow high-performance liquid chromatography solvent delivery system designed for tandem capillary liquid chromatography mass spectrometry. J Am Soc Mass Spectrom 1995;6:571-577.
8. Eng JK, Mccormack AL, Yates III JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 1994;5:976-989.