Read Microsoft Word - myoglobin.doc text version

Bioinformatics Project: Visualization of Conserved Regions of Proteins 30 Points

This exercise was created by Wendy Shuttleworth at Lewis and Clark State College, and Celeste Brown at the University of Idaho. It was adapted for the NGBW by Celeste Brown and Mark Miller Intended Audience: Upper division Biology/Biochemistry Courses. Flash walkthrough of the laboratory:

Goal: The goal of this project is to introduce biochemistry students to some easily

accessible protein data base tools and allow the students to explore these databases so they get a glimpse of what is available. In this exercise, the protein myoglobin is explored since it and hemoglobin, were discussed in detail in the previous class lectures. Other proteins with structures in PDB could be used as well. Multiple myoglobin sequences are lined up against human myoglobin to determine which residues are conserved across all species. These conserved residues are then located on the human myoglobin structure. This exercise uses the Next Generation Biology Workbench ( for MSA tools and for viewing structures. It therefore only requires internet access and a relatively current java installation (Java 1.4 or greater; this is standard equipment). The exercise was adapted for the NGBW by Dr. Mark Miller, UCSD.

Overview: One of the most powerful bioinformatics tools for the study of proteins is the ability to search databases for similar sequences and to make multiple alignments of those sequences. Increased computing power in the last few years has resulted in these techniques being readily available to anyone with a PC and internet access. Study of these alignments identifies conserved regions; these can reveal much about the structure:function relationships in a given protein. Moreover, it can also be used to explore the relationship between the organisms (to be addressed in another bioinformatics exercise).

In this exercise, the protein myoglobin will be explored as this, along with hemoglobin, has been discussed in detail in the CHEM 481 lectures. Multiple myoglobin sequences will be lined up against human myoglobin to determine which residues are conserved across all species. These conserved residues will then be located on the human myoglobin structure.


Part 1 Alignment 1. It is assumed here that you can access the NGBW site, create an account, and log in. Log onto San Diego Supercomputer Center Next Generation Biology Workbench by typing in the URL: 2. Set up a free account by clicking the register button. Simply follow the instructions, be sure to remember your user name and password. If you have any trouble, you can access flash presentations showing how to do this under the help section ( 3. The NGBW is based on the use of folders for data and tasks. Once you login, the folders will appear on the left hand side of the screen. Click on Create a New Folder for this project, and then give your folder a name, and a description if you like. The NGBW also has flash help files to assist you in undertaking the analyses described below. 4. In this exercise, we will explore conserved residues in myoglobin, so first we will retrieve some myoglobin sequences from one of the public databases stored in the NGBW. To do this, click on the Data icon attached to the folder you just created, and when the Data Management pane appears, click the "Search for Data" button. When the Data Search pane appears, type the exact string human myoglobin into the query window. Use the two drop-down menus to specify that you wish to search for a Protein (the NGBW calls this the "Entity Type") Sequence (and this is the "Data Type"). You will be presented with a third drop down menu with a list of "Data Sets" that contain Protein Sequences. Select Swissprot and click "Submit Search". A list of results will be displayed. Select: P02144 MYG_HUMAN Myoglobin Homo sapiens 82 SWISSPROT by checking in the box to the left of the sequence, and then click the "Save Results" button. You will receive a green success message at the top of the page when the sequence is transferred to your data area 5. Now you want to search for related myoglobin sequences. In this step you will do this by comparing the sequence of Human Myoglobin to all Protein sequences (currently 5,324,740) in the Swissprot database, and select the ones that are similar, in order of relative similarity. Swissprot is a highly curated sequence database that doesn't have much of the "junk" sequences that are found in less heavily vetted DBs. Comparisons between sequences are accomplished using algorithms that measure similarity. Today you will use BLAST (Basic Local Alignment Search Tool), but there are other tools to do this as well, such as FASTA. Each has its own algorithm for comparing sequences and measuring similarity. BLAST happens to be one of the fastest ones. 6. To run a BLAST search to compare a protein sequence to a set of protein sequences, one uses the BlastP tool. To do this in the NGBW, click on the Tasks folder for this project. When the Task Management pane opens, click the "Create a New Task" button. When the Task Creation pane opens, enter some descriptive text in the "Description" box, and click the "Set Description" button.

7. Now click on the "Select Input Data" button. Find the Myoglobin data file, and select it by checking the box on the left of the sequence, and then clicking the "Select Data" button at the bottom of the page. This will return you again to the task creation pane. 8. Now click the "Select Tool" button. Under the toolkit pane, find and click on the "BlastP" tool. It is under the "Protein Tools" tab. This will return you to the Task Creation pane. The most important part of creating a BLAST job is to specify the Database you will be searching. To do this, click on the "Set Parameters" button, and when the Parameters pane opens, find the "protein db" dropdown, and satisfy yourself that it is set to search "SWISSPROT". 9. While you are on the parameters pane, set the "Expect value" to "0.1" (this reduces the number of poor matches). It is just below the protein database dropdown. Now expand the Advanced Parameters section by clicking the link. Check under Scoring Options to see that the default setting for the Matrix (-M) is "Blosum62". The other settings are left at the default values. Click "Save Parameters" (at the bottom of the page). 10. When this happens, click on the "Save and Run Task" button at the bottom of the page. This will deploy your job, and return you to the Task Management pane. On this pane you can watch your job progress (it will take a few minutes to complete this job). While you are waiting, you can begin creating the next job, or just click the "Refresh Tasks" button until you see the text on the right-most column change from "View Status" to "View Output". 11. To view your results, click the "View Output" button, and this will expose all the results produced by your search. Click on the link to "blast2.txt", and this will expose the list of sequences with strong similarity to the Human Myoglobin sequences. Glance briefly at the header of this file, which tells you what analysis you ran. Find how many sequences you compared yours to, and confirm that the sequence you searched with is the one you intended. These checks are part of good practice. Now scroll down slightly and you will see a list of the top matches, with the measure of similarity (e-value) on the right hand column. Satisfy yourself that there is a steady decrease in similarity as you down the list. 12. To select individual sequences for additional exploration, please click on the "View" link at the top of the page. It is inside a black and white box. The program will display a list of sequences identified by the BLAST search; the first sequence is the protein whose sequence was used. 13. Select a number of the sequences from the search. You can do this as follows: change the number of sequences displayed from 20 (default) to 200, using the drop-down at the top of the results pane. Choose sequences at random from the list, or go down and pick your favorite critters. Stick to myoglobin (at the lower part of the list you will see hemoglobins, etc), and bear in mind that you want to align a good cross section of myoglobin molecules, so choose some of those with the least similarity to human myoglobin. Avoid selecting entries that say "partial sequences" in their description. Now click on the "Save Results" button to transfer the data to your personal data area. NOTE: If you select the Human Myoglobin sequence at the top of this list, it will be listed in your data area twice. You must be careful not to select it twice in the next

step, or the CLUSTALW program will fail. 14. Now return to the Task Management pane by clicking on the "Tasks" folder. Create a task, just as above, except when you choose your data, select all of the Myoglobin sequences you saved (but be careful not to save Human Myoglobin twice, CLUSTALW will not accept sequences with the same name, so be careful not to load the same sequence twice.), then click the "Select Data" button, and select "CLUSTALW_P" from the "Protein Sequence" tool list. Once the task is constructed, "Save and Run Task". CLUSTALW is a multiple sequence alignment tool; each sequence is aligned with every other sequence to give a best fit, then it aligns the most similar sequences together first until all sequences are aligned to each other. 15. It may take a short while for the alignment to run especially if the server is busy. When the job is complete, go to the Results pane, and eyeball your alignment by clicking on the "outfile.aln" link. You can see changes, large areas that are identical, and sometimes blank spaces are inserted in some sequences to optimize the alignment. Note: if any of the sequences you chose are fragments you will need to remove these from the alignment as the program will not show consensus in the regions where there are no amino acids; I ran into trouble with a giant panda sequence that was not complete so it messed up the alignment in the missing areas. 16. You can color code the conserved residues to make them easier to view, using the tool Boxshade. Start by saving the alignment to your data area by clicking the "Save to Current Folder" button. Give the data item a name, tell the application it is a "Protein" "Sequence Alignment" in "Clustal" format. 17. Create the Boxshade task just as you created the others. Click the "Return to Task List" button, then "Create New Task", and follow the procedure for task creation, selecting your alignment as Input Data, and Boxshade, which is under "Phylogeny/Alignment Tools", as the Tool. Open the Parameter pane and you can set the coloring yourself. Choose HTML output. Be sure to check the box for Special label for identical residues in all sequences, and it is a good idea to check the boxes for a consensus line, and a ruler. For colors, typically blue is reserved for completely conserved residues, green is for identical residues, yellow is for residues that are similar, i.e. the chemical properties are similar, black on white is for residues that show no consensus. However, play with these until you feel you can glance at the data, and understand basically what patterns are found there. There are asterisks at the bottom of the completely conserved residues. 18. Once the job is done, use "View Results" to see the output. You should now Save file as "Protein" "Sequence Alignment" "HTML" (or "Unknown"), since the NGBW cannot use this data for anything except display. Now you are ready to move on to the second part of this exercise where you will examine the human myoglobin structure and look to see where the completely conserved residues are found on this structure. You can start by using the ruler on the Boxshade output to construct a list of all the residues that are conserved across all myoglobins you chose. Part 2 Protein Structure Visualization 1. The Protein Data Bank (PDB) is an international resource for protein structure models that have been determined experimentally using X-ray crystallography and/or NMR.

Many people get protein structures by going to the PDB site at You might want to take a couple of minutes to explore the site; there is usually a "molecule of the month" on the front page. 2. In the NGBW, we access PDB data directly (via web services) with our structure viewing tool called Sirius. As a result, you need not leave the NGBW to analyze a protein structure. For this exercise, we want to look at the conserved regions of a myoglobin molecule in the context of the three dimensional structure of the molecule. To do this, go to the Structure Tools pane of the Toolkit, and click on the Sirius link. This will open the Sirius manual page. Clicking the Sirius link on the manual page will start Sirius. When the Sirius tool loads, click "File", then "Load from PDB". Enter the PDB ID of the molecule of interest (2MM1 for human myoglobin) into the search box at the top of the page. If you feel like playing around, go back to the PDB site and search on myoglobin. Collect some IDs of other animal myoglobins you are interested in, and use as well. Obviously, since we are looking at invariant residues, it doesn't really matter which one you pick, but it is convenient to use human, since the numbering of the residues will be the same as that in the Boxshade figure. 3. Once you enter the ID number, the molecule will appear with all of its non-hydrogen atoms shown. There are many options for how to display the molecule, and some of these are illustrated in the Flash file at . Look for the heme group: this is a non-protein component of myoglobin that gives it the red color you know so well. It is composed of a large planar aromatic ring system, and with an iron atom in the center, held in place by 4 pyrrole nitrogen atoms. Under physiological conditions, the iron is Fe(II), and in this state it carries molecular oxygen reversibly. In this oxygenated state, it is bright red, but once outside the body, the iron goes to Fe(III), which turns the pigment brown. 4. Once the molecule is displayed, you can also display the protein sequence in a second window. To do this, click "Tools" in the top menu bar, then "Sequence Viewer". This opens a second window containing the sequence. Use this sequence viewer to locate the conserved residues. To highlight a residue of interest, just click on its corresponding letter in the sequence viewer, and the entire residue will turn yellow in the structure viewer. 5. Start by paying attention to Histidine residues. For some of these, you can tell immediately why they are important, and with others it will be harder to tell. The heme for oxygen binding proteins is nearly always bound to the protein by a single His residue. This residue is known as the "proximal" His. The oxygen molecule binds to the iron on the opposite face of the heme from the proximal ligand. Typically, but not always, a His residue is positioned near the open iron coordination site, and it interacts with the bound oxygen molecule to control its reactivity. The His is referred to as the "distal" His. 6. Play with your display of conserved residues to create a figure displaying as many conserved residues as possible. The figure might get too busy if you add them all but put in a number to see where they lie. Try to produce a nice final image with a number of them shown; hopefully you can find the proximal and distal histidines and place these on your molecule. Manipulate the image to best show off the conserved residues. If you prefer, prepare a couple of final images, one with e.g. the histidines and a second one with other conserved residues shown.

Report Please turn in a report containing the following items in an easy to read format: 1. Your sequence alignment as a texshade or boxshade. 2. A list of the conserved residues from your alignment. 3. One or more images showing the positions of the conserved residues on the human myoglobin molecule. 4. Are the conserved residues clustered in any specific areas of the protein?


Microsoft Word - myoglobin.doc

6 pages

Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate


You might also be interested in

Microsoft Word - myoglobin.doc