Coursework 1 – data analysis report
The coursework has to be submitted online as a single PDF or Word file, using the Turnitin submission box on the Moodle site for the unit.
Background
Genetic variation is sometimes (but not always) found associated with phenotypic variation. In this coursework, you will apply your skills to identify a gene of interest, determine the link between phenotype and genotype, and discuss the biological relevance of your findings. Answer each question in order and illustrate what you did with figures. Take care of interpreting your results!
We will start with yet unidentified sequences of a mRNA product, obtained from individuals classed into four different phenotypes (Group 1, Group 2, Group 3 and Group 4, indicated in the sequences’ names).
Tasks
- Align all sequences found in the file Workshop_sequences_1.fas. Retrieve the protein sequences. Explain (briefly) how you obtained it. You can use Expasy, BioEdit (go in Sequence > Translate or Reverse-translate) or MEGA. What do you observe when comparing sequences between groups (check both DNA and amino-acid alignments)? Whatare the variants that mayexplain differences in phenotypes? (25/100) Notes: You can use a complete sequence as a reference to obtain coordinates for variants (e.g. Phenotypic_Group_4(2)) Some sequences contain ambiguity codes (R, G, etc.). Their meaning can be found here:
https://www.bioinformatics.org/sms/iupac.html
- Retrieve the sequences associated with Genbank accession numbers AF134416.1 and AF134413.1 in NCBI (individuals belong to Group 4 and Group 2 respectively). What are these sequences? From which species do they originate? Which gene do they cover? What does the gene do? Run a NCBI BLAST on representatives of the four phenotypic groups to identify the actual phenotypes. (20/100) Hint: Pay attention to the “keywords” field on the Genbank accession page to figure out the phenotype.
- Align the sequences you just retrieved with the others. Do a Maximum Likelihood phylogenetic analysis based on all sequences except heterozygous ones. To root your tree, include in your alignment the sequence with the accession number NM_001194925.2. Describe what you did, feel free to add a few figures. You may use MEGA. Describe your phylogenetic tree. Is your phylogeny well supported? How do alleles group together? Are phenotypic groups monophyletic clades? Why did we choose sequence NM_001194925.2 as a root? (25/100)
- Briefly summarize your findings (150 words should be more than enough). Find the name of the submitting author in the Genbank accessions starting by “AF”. He published a review in 2021 in a journal called “Annals of Blood”. Find the reference and read it (you can also ask the lecturer a copy of the article if you do not have access). Based on the evidence collected in this coursework and external reading, discuss in 500-1000 words the evolutionary dynamic of the alleles, their diversity, and possible reasons for the maintenance of polymorphisms. (30/100) Advice: Do not just read the reference from Annals of Blood!
Get expert help for Bioinformatics & Omics and many more. 24X7 help, plag-free solution. Order online now!