Members and sponsors are encouraged to share ideas, applications research and experiences, and technical problems by contributing to this column. In addition, short reviews covering aspects of biotechnology that are of interest to members are also encouraged. All contributions will be subject to peer review and the deadline for submitting articles for the September issue is August 1, 1994. Periodically, articles will also be solicited from non-members by the Editors. Contributions may be mailed or faxed to the Editor, Ken Williams, W.M. Keck Foundation Biotechnology Resource Laboratory, P.O. Box 9812, 295 Congress Avenue, New Haven, CT 06536-0812, (203) 737-2206, Fax (203) 737-2638. Contributions will not necessarily reflect the views of the Editorial Board or the Association.
Kathryn M. Ivanetich and Donald French
Biomolecular Resource Center and Department of Pharmaceutical
Chemistry,
University of California at San Francisco
Box 0541
San Francisco, CA 94143
Phone: (415) 476-1839; FAX (415) 476-7974.
E. Mail address: french@cgl.ucsf.edu
How heavily does the quality of your database or the lack thereof impact on your biotechnology service or facility? From the experience in our facility over the last six years, the utilization of a high quality customized database strikingly increases the efficiency and cost effectiveness of facility operation. Efficiency and cost effectiveness concern not only facility managers, but impact facility users since both factors affect facility charges. We feel strongly that increased efficiency through computerization and automation is essential for providing either a single biotechnology service or multiple services in a large facility.
The evolution of tracking service records in our facility over the last six years was as follows: The first stage involved manual tracking of orders through hard copy filing, with manual or hand calculator calculation of the charges for each order. The second stage was the introduction and use of Filemaker, a flat file DBMS (data base management system). With Filemaker we could easily customize such things as input-output layouts for each service. However, it was incapable of addressing our more complex database needs. It was not relational, was lacking complex programming capability, and did not have a robust user interface.(*)
The third phase was to design and develop, over several years, a comprehensive database program employing a powerful relational DBMS. This program currently handles over 10,000 records, is fast, efficient and essential to our daily facility operation.(**) We would like to share our ideas and expertise in biotechnology facility database development to aid those interested in developing their own in-house software or evaluating available software products for use in facilitating scientific and accounting/billing aspects of their facility's function. This report focuses on several factors important for the development and/or evaluation of a database for biotechnology facilities.
Concepts and features to be considered include:
There were a number of general features that we felt should be built into the customized database. The database had to be easy to use. In addition, it should combine both scientific and accounting aspects relevant to biotechnology services, linking these aspects in each service record. We felt that the database should automate data entry and calculations as much as possible and yet be flexible. We wanted data editing wherever possible to catch entry errors, especially for sequence entry. The database had to be versatile for the different biotechnology services.
We wanted all staff members to be able to perform their own data input/output. Because of high turnover of staff in certain positions, it was essential that the program be easy to use - whether it was new to the individual, used on a daily basis, or used only occasionally.
We wanted the database program to minimize data entry and to reduce or eliminate input error wherever possible. There are several ways in which a well-designed database program can help by automating certain processes. One common problem arises from slight variations in entering such information as investigator name into the service record, causing later summary reports to be inaccurate. Automatic data entry avoids this type of problem and saves user entry time through use of investigator codes/on screen call up of relevant accounts (see below for details). We also wanted any investigators with customized charging structures to have the correct charges entered and totals calculated automatically with no effort by the entry person. To avoid double sequence entry and associated errors, we developed a mechanism to send sequences from the database directly to our Applied Biosystems Inc. (ABI) oligonucleotide synthesizers.
In summary, the automation features allow us to save time, avoid errors, facilitate searches and selections, automatically number records and communicate with our oligo synthesizers.
The changing needs of our facility had to be reflected in our ability for user customization of the database. What if we some day want to include an HPLC profile with our oligo synthesis data sheet? Might we want to introduce a new scale of synthesis next week? We needed versatility accessible to the user for the occasions when pricing, scales of synthesis, pricing structures or discounts were changed, or when specialized reports needed to be generated. Our database had to be able to keep up with these needs - preferably by user modification, rather than resorting to a programmer.
We sought a computer and DBMS that were extremely friendly, intuitive and easy to use. We felt the Macintosh was the computer of choice. Macintosh's ease of use results from its ground-breaking graphical user interface (GUI). It has a simple "point-and-click" technology with pull-down and pop-up menus, and visual icons instead of a confusing text-oriented command line. We made our commitment to the Mac in 1989, and still favor the Mac for our computer database needs over a PC with Microsoft Windows. We currently run the data base on either a Macintosh Centris 650 (recently rechristened the Quadra 650) or Macintosh IIfx, with 8 meg of RAM and a 250 or 510 meg hard drive.
What type of DBMS should we choose? One way that database management systems are classified is as basic, flexible, or complex (1,2). The basic variety has little or no programming capability and is virtually always a "flat-file" DBMS. It is adequate for simple systems with uncomplicated data structures. The flexible variety is designed for more demanding environments where some programming capability is called for. These DBMS's are either flat-file or pseudo-relational, seldom fully relational. The complex variety has advanced programming capability, extensive processing and reporting capability and is virtually always a fully relational DBMS. We already had experience with the basic variety and knew we needed more capability than it had to offer. The flexible type offered many of the advantages we wanted but was not capable of providing all the flexibility and sophistication we needed. The complex DBMS offered all the capabilities we wanted, but designing the end product required considerable input time from facility members plus the services of a professional database designer/programmer for extended periods of time. If you can't live with the long development time and expense, you might consider compromising with a less complex DBMS or look into purchasing a "package database" designed especially for the biotechnology laboratory.
4th Dimension is the DBMS we settled on because it had all the extensive features which allowed us to design the ideal system with no compromises. It is fully relational, has a rich programming language, an interactive development environment, and many developers' tools such as a program editor, a compiler, a debugger, and a cross-referencing tool. The advantage to the relational feature is that data from a related file is automatically entered into reports and input screens by virtue of the relationship definition. With flat file and pseudo- relational systems, if data from related files can be included in a report at all, it is only with a great deal of effort and normally with a significant cost in processing time. We take advantage of these features for such things as automatic posting of investigator information and setting of correct charge structures. User input time and input errors are both decreased as a result. Processing time is an increasingly important issue as the size of your database grows. This is another reason to have a compiled versus an interpreted DBMS. (Compilers convert programs into very fast running "machine code"). With compiled 4th Dimension we got the speed we needed even with tens of thousands of records in our database. 4th Dimension also has robust and well-designed search, sort, and select features which help to make the user's job easier. Finally, there is a built-in hierarchical password system to help provide data access security. 4th dimension was and still is the most powerful relational DBMS available for the Mac. Several built-in features help to make it both a pleasure to use and a relatively error-free system. "Wild card" capability allows one to type a few letters of a field such as the investigator's last name and have the database program automatically present a list of all matching names to choose from. It also provides the capability to speak back the sequences for final verification, and the ability to send DNA sequences directly to our oligo synthesizers, which greatly reduced sequence entry errors.
As mentioned above, taking advantage of the many capabilities required the skills of a professional database programmer with a specialty in 4th Dimension. Thanks to careful planning at the outset, once we got past the usual debugging phase, we needed continuing programming support on only a limited basis.
We run the database on one Macintosh centrally located, in either the office area adjacent to the laboratory or in the laboratory itself. That Macintosh is connected to all of our ABI oligo synthesizers for sequence sending via a cable from the modem port on the Macintosh to a switcher box and then by cables to the serial ports of the ABI's. All reports are printed on an Apple Laser Writer II.
INVESTIGATOR MODULE
The investigator module collects all relevant information on each investigator or specific account for an investigator into one record for automatic posting to all relevant service records, for automatic billing, etc. Each investigator record includes fields for investigator name, address, billing information, account number and account type. Two mechanisms for modulating the charges for each account/investigator are provided (Figure 1). The first option is a global percentage modifier, either a discount (<100%) or surcharge (>100%) (such as for outside users) that would be automatically posted to records for all services. The second option is more flexible; it allows individual charge schedules to be customized for any or all charge options for any service and scale for a given investigator account. These individual charge schedules override the standard charges. This might be used, for example, for an investigator who supplies technical assistance for one service, to set charges for that service at cost of reagents only, and other services at standard charges.
Figure 1. Third page of investigator account record, showing two mechanisms for customizing charges for individual accounts.
For reasons indicated earlier, we wanted to automatically call up information from investigator records into service records. For this we used the 4th Dimension "wild card" capability, which allows one to type a few letters of an investigator's last name plus an "@" to automatically call up a list of all matching names/accounts. Click on the correct account, and the program automatically posts all relevant investigator and billing information to the top portion of the service record (Figure 2). In addition, any special charges for that account are automatically posted into the charges section of the record. Otherwise, the standard charge structure (from the constants file) is posted.
CONSTANTS FILE
Access to the constants file is controlled by an Administrator's password, which differs from the User's password. This file allows for setting of charges for service, scales of service, record number, and other parameters for automatic posting to all relevant records for a given service. For Investigator records, the Constants file contains the names and field lengths for six account number fields and whether to invoice or not for each of the various account types. For oligo synthesis, one can choose up to four scales of synthesis, and set the charges for each scale for set up, per base, mix rate, purification rate and inosine charge. For peptides, one can set the charge rates for two user chosen scales of synthesis, plus double couple rate and tryptophan rate, if desired (Figure 3).
Figure 2. Top portion of service records. Fields that are half-boxed are enterable, other fields are automatically posted.
Figure 3. Peptide synthesis constants screen.
SERVICE MODULES
Service-specific modules were needed for each biotechnology service. This included five major biotechnology services (3): DNA synthesis, peptide synthesis, DNA sequencing, protein sequencing, and amino acid analysis plus miscellaneous services. There were some common features in all service modules. These include investigator information (see above), enterable fields for order date, sample name, notebook reference, comments, notes, delivery date and charge date. In addition, there are six searchable, assignable fields for any special needs for each service. For each service, charges, constants and parameters set in the Constants file are automatically posted into service records. Each service module has an additional charges field, in case any specialized charges need to be added into a record. The identification number of each new service record and investigator record is automatically provided by the database. Numbers are in sequence from either 1 or the initial number set for that module in the Constants file.
Each service had specific scientific and charging requirements. Specific features were incorporated into the individual services as indicated below.
Three mechanisms for checking DNA sequence were developed. First, the program computes the correct sequence length from the entered sequence, including sequences with base mixtures. If a sequence is entered which does not match the user-entered sequence length, it is brought to the operator's attention. Second, the sequence field has base recognition capability; and only accepts the standard bases A, G, C or T, I for inosine, X or Z for unusual, namable bases or modifiers, and any mix of A,G,C and/or T. Third, the computer reads the entered sequence to you, for comparison to the original order. Sequence accuracy is particularly important since we transmit sequences directly from the database to our ABI 380B and 394 oligo synthesizers.
Once a DNA sequence has been entered and verified to be accurate, it is available to be transmitted to one of the ABI synthesizers. Our two ABI 380B's and one ABI 394 are connected by serial cables through a switch box to the centrally located Mac which runs the database. The connection is to the Appletalk port on the 394 and the serial communication RS232 port of the 380Bs. No other Macintosh software, such as ABI's Oligo Net is required.
To send a sequence, the desired synthesizer is selected on the switcher box and the internal Syncom program is activated on the ABI synthesizer. Within the database program, the operator selects the oligo record and clicks on the "Send to ABI" button. The sequence is sent in seconds. We usually send batches of three to ten oligos at a time. This method of sequence entry into the synthesizers is a great improvement in time saving and accuracy over manual entry using the ABI touch screen or keypad.
We have clients from within our University and clients from outside, each with different billing requirements. The in-house University clients need their charges accumulated and recharged to the appropriate account and fund numbers on a monthly basis. A special module was added to the database which could recognize the internal accounts, process their records and create a file to be transmitted to the University accounting office in their required format.
The outside account processing does not require creation of files to be transmitted, but does require printing bills, maintaining balances, both billed and due. It is effectively a simple accounts receivable system. In designing the database we needed to include features which would accommodate both types of accounts and various unusual reporting needs.
Many of the unusual reports are simply "quick reports" which, once designed, can be run with any desired selection of records. We built flexibility into this phase of the database because of the variety of clients and unpredictable special requests. We have included the ability to maintain credit accounts, include special one-time messages on the bills, select the clients processed in any run based on any combination of criteria possible, specify any billing period, and re- process any client's account for any billing period.
Figure 4 (options A & B). Page one of DNA synthesis input screen, showing pop up box for choosing scale of synthesis.
Figure 5. Page one of DNA sequencing entry screen.
Figure 6. A typical "quick Report".
Requests for custom reports are a common occurrence. The "Quick- report" capability of 4th Dimension allows facile creation of reports, including automatic sorting, level breaks, and totaling. Once a report is designed, its format may be saved for future use. Similarly, there are built-in graphing and label-writing capabilities. There are standard label templates which came with 4th Dimension, but new templates may be created with ease. Like designing reports, designing label output is a simple matter of clicking on field names and dragging and dropping them onto the form in the position desired. Although we do not often use the graphing capabilities, the ability exists to create, view, and print graphs on any selected part of the database. Bar graphs, line graphs, area graphs, scatter diagrams, and pie charts are all possible.
Import/Export options. It's important to be able to easily import data into a database. Most labs have data in an existing database, which needs to be imported to get started. It is also important to be able to export sets of data in formats compatible with other DBMS's or spreadsheets. Like most DBMS's, 4th Dimension has import and export capability.
Client Data Entry Module We also wrote a separate stand-alone data entry module to allow our clients to enter their orders at their own sites and to provide the orders on a floppy disk or transmit them to us over the campus network. Our main database program will import these orders with very little effort on the part of our data entry personnel. With this capability, clients may copy their sequences from other documents and paste them into the order form. They may also get spoken verification of the sequences. Batch entry is supported for those who have multiple items to order and a batch may be saved, edited, and printed. The client's module was written as an Excel macro rather than in 4th Dimension, so that each client would not have to purchase a run time version of 4th Dimension.
ACIUS's 4th Dimension for the Macintosh was selected as the DBMS platform upon which to develop an integrated database program to fulfill the scientific, management, and accounting information requirements of our biotechnology facility. We chose this complex DBMS since it offered the most power, speed, flexibility, and ease-of-use. The design and planning phase was equally as important as software/hardware selection. For flexibility, we designed the database around a set of configuration records called the constants files, which allowed users to change factors such as scales of synthesis and charging schedules for each service individually. By virtue of the relational nature of the DBMS, the service charging schedules and the client profiles are automatically posted to service and billing records. Our design goals held accuracy and ease-of-use as paramount. For example, we now consider speak-back verification as indispensable, whereas until we attempted to define the ideal method of maintaining accuracy we had never even considered it a possibility. The same is true for sending our sequences directly from the database to the synthesizers.
In conclusion, we feel that the reward for our labor-intensive effort was great. In our opinion, it is easy to underestimate how useful a fully functioning automated custom database is to the operation of your facility until you have one in place. We view our database as absolutely essential to the day to day operation of our biotechnology facility. We hope that our ideas and experience will assist you in the choice or development of an in-house database program for your facility.
Footnotes:
References:
We wish to acknowledge the assistance of Daniel V. Santi in the conception and development of this database program.
Return to the The ABRF Home Page