BS2040: Bioinformatics Assignment

BS2040: Bioinformatics

Practical Class 1: Using BLAST to identify genes

  1. Dates/Times/Locations:

08/02/21, 10.00 – 2.00pm SETUP Sessions 1-4 (Synchronous in Blackboard Collaborate) 08/02/21 – 26/02/21 SUPPORT online via Blackboard Discussion Board> BLAST Practical Support

01/03/21, 0800 DEADLINE (NB – updated from start of module!) for submission of BLAST Practical Report.

  1. Resources required

This practical requires access only to the internet, and installation of the NoMachine Remote Desktop (Windows / iOS / Linux) or SSH Chrome plugin (Chrome OS) software. This will be used to access the SPECTRE2 (Special Computational Teaching and Research Environment) Linux Cluster (using your University IT username and password). You will need to use the command line to run BLAST searches on SPECTRE2, but full instructions are provided below.

  1. Entry skills

To take part in this practical you will need to be able to carry out the following tasks:

  1. Logging onto a UoL client PC
  2. Using Chrome to access websites (including the Blackboard VLE) with multiple windows or tabs
  3. Retrieval and recording of analysis results (by copying and pasting)
  4. Installation of NoMachine Remote Desktop client on your personal computer
  5. Accessing the SPECTRE2 Linux Cluster using NoMachine Enterprise Client
  6. Manipulating files and running BLAST on the command line
  7. Use of Microsoft Word to produce a practical report
  8. Electronic submission of your report as an assignment, using the Blackboard VLE

NOTE: If there are any tasks that you are not sure how to do, or about which you needclarification,don’tforgettoaskduring thepracticalclass!

  1. Introduction

During this practical class, you will explore the use of the BLAST (Basic Local Alignment Search Tool) (Altschul et al., 1990) program to identify a mystery gene sequence. This is one of the simplest and most often used applications in sequence database searching and seeks to answer the question “what gene does my novel sequence come from?” Each student will have an individual sequence to analyse. Once you have answered this basic question you will need to do some research to answer a series of short questions about “your” gene. You will also need to make a blog post about the human disease(s) caused by sequence variants (mutations) in your gene. A description of your analysis and the answers to the questions should be included in your practical report, which should be submitted using Blackboard.

To give you insight into how bioinformaticians really use BLAST, we will be using BLAST at the command line interface (CLI) on the University of Leicester Teaching Linux Cluster, a computer called SPECTRE2. You already have an account on SPECTRE and the username and

password are the same as your University IT services account. Using the CLI can seem strange to start with but it is how most bioinformaticians (and the programs they write) interact with the computers used for their analyses. Below is a brief guide to logging onto and manipulating files in the Linux environment, written by Professor Raymond Dalgleish (and adapted for this practical).

  1. Using the command line interface (CLI) on Linux (Unix) systems (briefly!) Some CONVENTIONS…

In common with most computer systems, the commands in the Linux operating system use a

rigorous set of rules. So, if you don’t type in a command exactly as it appears in the instructions you will not get the computer to do what you want it to! You must be sure to type in spaces and full stops etc. exactly as written in this protocol. A very important aspect of the Linux operating system is that it is “case sensitive”. This means that the computer will understand the command ls(which lists the files in your directory) but not LS. Be sure that you use the correct case!

In the descriptions used here, the commands that you type are shown in bold face Courierwith any extra items (such as a filename which you must choose) shown in bold faceCourieritalics.

Commands are entered at the command prompt which looks like [abc123@spectre12 ~]$where “abc123” is your username. You do not type the prompt! Linux puts it on the screen for you. NOTE – you may have a different number after the word “spectre” in the prompt – the cluster has a number of login nodes and you may be assigned to any of them, indicated by the number – it is not a problem if your prompt has a different number.

If you have to press a special key such as the Enter key (also known as the Return key), this is shown as <ENTER>. Occasionally you may have to terminate a program before it finishes (for example, if you have typed a command incorrectly). This can usually be achieved by typing

<CONTROL>c; press the Control (or Ctrl) key and whilst holding it down type a c (lower case), release the c key and finally release the control key. Do not try to hit the two keys simultaneously as you will usually not manage it.

You will be prompted at various times to provide filenames for your data. A filename can be (almost) anything you like — don’t make it too long as you will get fed up typing it in after the first couple of times and do not include spaces in the name. Unlike in some other operating systems, there is no real convention used for the naming of files. However, it is convenient to use filenames of the form filename.ext where ext is a three letter “extension” which tells you what sort of file it is; an example might be sequence.dat — the use of an extension is not compulsory, it can be any length, and you can even have more than one e.g.filename.ext1.ext2The computer will sometimes add an extension automatically, but it is good practice to always specify one yourself. You will find it extremely useful to make a written note of filenames and their contents. If you have a problem, or the computer does not behave as you expect, then ask for help!

IMPORTANT

You should also be careful not to confuse the letter l (el) with the number 1 (one), as thetwo are not interchangeable. Similarly, the upper-case (capital) letter O and the number 0arealsonotinterchangeable.

  1. PRACTICAL INSTRUCTIONS

INSTALLING NoMachine Remote Desktop Client / Chrome SSH

Once you have connected to SPECTRE2 the instructions are the same for all students, but how you access SPECTRE2 depends upon the operating system that you are using. We canvassed students at the start of the module: most were using MS Windows, and smaller but equal proportion were using Apple macOS and Google Chrome OS. Please follow the appropriate instructions below to get connected.

MS Windows and Apple macOS (not iPhones / iPads!)

  1. Go to the NoMachine homepage (https://www.nomachine.com/), and click the large red “Download now” button. NoMachine should guess your operating system and indicate this within the button – if this is not correct, click on the “Other operating systems” link, to find the correct version for your operating system.
  2. Follow the download and installation instructions (including entering your user password is required). On Apple macOS, if you cannot open the NoMachine app once installed, you can adjust your security preferences as instructed here: https://support.apple.com/en-gb/guide/mac-help/mh40620/mac
  3. Go to the Research Computing page which details accessing SPECTRE2 with NoMachine

– download the “SPECTRE2 Connection file” (linked about three quarters of the way down the page). Save the file somewhere easy to find (on your desktop).

Double-clicking the SPECTRE2 connection file should launch NoMachine. Follow the instructions below from “LOGGING IN TO SPECTRE2 with NoMachine”.

Google Chrome OS

  1. Unfortunately, there is not a NoMachine client for Chrome OS (yet…!) so Chrome users will have to download the SSH app from the Chrome Web Store: (https://chrome.google.com/webstore/category/extensions) and search for “Secure Shell App”. The SECOND search result (with the icon and description below is the one that you want) – click on and install the app.
  1. Once you have installed and started up SSH (see screen shot below) you will need to fill in the “username” field – your UoL username (WITHOUT @student.le.ac.uk), and the “hostname” field: spectre2.le.ac.uk
  2. Click “[ENTER] Connect” button, and input your UoL password at the prompt – there will be no asterisks to indicate you have typed a character, so do this carefully.
  3. If the password is correct, say “Y” to any questions and window will fill with information about SPECTRE2. You can now follow the instructions from “BASIC FILE HANDING COMMANDS”.
  1. To quit SSH (once you have finished doing BLAST searches) type “exit” at the command line prompt, and close the Chrome window.

LOGGING IN TO SPECTRE2 with NoMachine

After startup, and typing in your username (WITHOUT @student.le.ac.uk), and UoL password, the following screens provide guidance on how to use the NoMachine. Read through the guidance on the first occasion that you log in, but note that there is a check box (bottom left of the window) to stop the guidance prompts popping up every time you log in. Click the OK button to progress to the next step. You can resize the NoMachine window in the usual fashion to suit your needs.

If you are asked whether you want to “save the password in the connection file” – DO NOT select this option, as it is always best practice to enter your password manually for each session.

The NoMachine Client program provides a connection to the SPECTRE2 computer system in combination with NoMachine Server, which runs on SPECTRE2 itself. Together, they allow the user to have a graphical interface (known as the “desktop”) on the SPECTRE2 system. To help keep things organised, it might be best to display is better to use the NoMachine Server application as a floating windows, rather than running it in full-screen mode. This will allow you to see your Windows 10 desktop.

When you log in to SPECTRE2, there may be messages (in a BIG RED BOX) on the SPECTRE2 desktop containing any current system announcements. These may be ignored. The message box may be dismissed by clicking the Close button, at the bottom right corner.

.

GETTING STARTED WITH THE TERMINAL (NoMachine)

A terminal is opened by double clicking on the MATE Terminal icon (a black screen with a command prompt “>”) on the SPECTRE2 menu bar or by selecting Applications > System Tools

> MATE Terminal from the menu at the top left corner of the desktop. Each time you do so, you will launch another instance of the terminal program and you can have several running

simultaneously, if necessary. For convenience, we will simply refer to a running instance of the MATE Terminal program as a “terminal” or sometimes as the “command line” as commands are issued to Linux computers from the terminal.

You will see that the terminal has a set of menus at the top and large black area with a white flashing block cursor adjacent to the prompt that looks like [abc123@spectre12 ~]$ where abc123 is replaced by your user name. When the prompt is visible you can type commands.

Only 24 lines of text (80 characters wide) are displayed in the terminal window. The terminal allows you to scroll the screen backwards to view items that may have scrolled off the top of the screen and out of sight. This can be done using the standard slider or arrows in the right-hand edge of the terminal window, or by using the mouse scroll wheel. By default, only 512 scroll- back lines are saved.

BASIC FILE HANDLING COMMANDS

Now that you are logged into SPECTRE2 and have a terminal open, you will need to know some basic commands. All commands must be completed by pressing the <ENTER> key otherwise the computer does not know that you have finished typing the command.

If you want to see a list of the names of your files, type ls<ENTER> at the prompt. This will give you a “basic” listing of your files showing only their names. A more complete listing can be obtained by typing ls -l<ENTER> which also shows the size, the date and time of creation and other data about the files. If you are logging in to SPECTRE2 for the first time, you will not have any files yet and the ls command will not be able to show any results.

If you want to delete a file, type rm filename.ext<ENTER> at the prompt. Be careful, as SPECTRE2 will delete the file immediately if the deletion is carried out at the command line in a terminal. The Wastebasket that you might have noticed on the desktop “saves” ONLY files that are deleted using programs, such as the Caja file browser, that have a graphicalinterface.

Otherwise, once it’s removed with rm, it’s gone.

To view the contents of a file, use more filename.ext<ENTER> at the prompt which allows large files to be viewed one screen-full at a time. To see the next screen press <SPACEBAR>; to move down one line press <ENTER>; to quit from the more program type q; to scroll back up the file type band to scroll forward again type f

TOP TIP! At no time will typing the name of a data-file alone have any effect other than to generate an error message. You must always type a command to tell SPECTRE2 what to do with the file, such as typing moreto view the contents of the file.

Summary of some useful Linux/SPECTRE2 commands

lslist the contents of your directory
more filename.extview a file one screen-full at a time
less filename.extview a file one line or one screen-full at time, backwards or forwards
rm filename.extdelete a file
cp oldfilename.ext newfilename.extcopy a file (keeps both files)
mv oldfilename.ext newfilename.extchange a file name or move a file (removes old file)
cd directorynamechange directory
mkdir directorynamemake a directory
rmdir directorynameremove a directory (but only if it is empty)

TOPTIP!To save typing a whole filename, if you are currently in the same directory as a file (i.e.it is listed when you type ls<ENTER>.) then typing the first few letters and pressing the

<TAB> key will prompt Linux to automatically complete the filename. If the first few letters are shared by two files, then the common part of the filename will be added – you will need to type more letters and press <TAB>until Linux knows EXACTLY which file you want.

TOP TIP! If something goes horribly wrong (in Linux this is usually accompanied by a cryptic error message – success is usually indicated by NO RESPONSE AT ALL) then you can save typing by using the Linux shell’s “history” feature: pressing the <up> and <down> arrow keys will scroll through a list of commands you have issued – you can find the command that caused the problem and edit it (perhaps you misspelled a filename and got a “No such file or directory” error?). You can use the <left> and <right> arrow keys to move the cursor within the line, and when done editing press <ENTER> to execute the command. There is no need to move the cursor to the end of the line before pressing the <ENTER>key.

  1. COLLECTING YOUR MYSTERY SEQUENCES
    1. Download the assignment document BS2040_19_BLAST_assign.pdf from the BLAST Practical folder which is in the Learning Materials > Practical Classes > BLAST Practical section of the BS2040 Blackboard site. The assignment document is under the heading “Assignments of students to sequences…” and lists the query sequence number that you have been assigned.
    2. On SPECTRE2, navigate to the directory where the mystery sequences are stored by typing cd/data/bioinf/Teaching/BS2040<ENTER>. Typing lsin this directory will reveal two subdirectories (“DNA” and “Protein”); you need to collect the sequences with your assigned number from EACH directory.
    3. Use the cd command to move into the DNA or Protein subdirectory (cd DNA OR cdProtein) and the lscommand to list the directory contents. There will be a file named queryXXp.fa.fasta or queryXXn.fa.fasta (depending on the directory), where XX corresponds to your assigned sequence number.
    4. To copy your sequence file into your home directory type:

cp queryXXp.fa.fasta ~

where XX is your assigned sequence number. (The ~ symbol is shorthand for your home directory, and saves you typing the full path). To confirm that the copy operation has worked (by viewing the copy of the file that SHOULD now be there), you need to navigate BACK to your home directory. Fortunately, the default behaviour of the cdcommand (if issued with no “arguments”) is to take you to your home directory. Just type cdand then the lscommand to list the directory contents. If you already have many files in your home directory it would be a good idea to create a new directory:

mkdir BLAST_practical

and then move your sequence files into it, e.g.:

mv queryXXp.fa.fasta BLAST_practical

  1. The sequences are in the common (and very simple) FASTA format, indicated by a description line starting with a “>”. There is no need to put the sequences in separate DNA and Protein directories. You can view the sequences by typing:

less queryXXn.fa.fasta OR less queryXXp.fa.fasta

The less command allows you to move forward and backwards through a file line by line using the <up> and <down> arrow keys, and page by page by using the f (forward) and b(back) keys. When you have finished looking at the file contents type q<ENTER>to quit the lessprogram.

  1. Once you are sure that you have collected the DNA and Protein sequence with your assigned sequence number, you are ready to do a BLAST search (check with a demonstrator if you are not sure).
  1. Perform a BLAST search to identify your mystery sequence
    1. Navigate to the directory that contains your query sequences and load the module to run the blast algorithm by typing:

module load blast+

  1. Next type in the appropriate command to compare your unknown nucleotide sequence to the installed nucleotide (nt) database, as shown below (it might be helpful to drag the edge of the terminal to the right to give a usefully long space in to which to type commands):

blastn -query queryXXn.fa.fasta -db nt -evalue 0.01 -out first_DNA_blast.txt

As you might expect, blastnrefers to the appropriate algorithm (program) for nucleotide sequences and the -query “flag” or “argument” (an additional command that modifies the behaviour of the main command) tells blastn which sequence file you would like to compare to the database. Also, the –db flag specifies the built-in nucleotide (nt) database. -evaluesets the expect value threshold, so that the blast search only reports matches that have an expect value less than the evalue parameter. Finally the name of the output file is specified by the -out flag; you should change this for eachsearchotherwise blast will silently overwrite the last set of results with the new set.

NOTE: One search parameter that is NOT included in the default blast output is the setting of the evalue threshold. If you do searches at different values of –evalue then you should note this by including the value in the name of the output file (e.g.–outDNA_blast_E0.01.txt). N.B – If the command has been executed successfully there will be NO feedback from blast – this is the default behaviour of the UNIX / Linux command line. If you get an error (which may or may not be intelligible) contact a demonstrator for help.

  1. You will ONLY know that the blast search is complete when the command prompt appears again – then you can view the output file (it should be called the same as the name following the –outflag in your blast command) using the lesscommand to see what blast has found.

Remember

Nucleotide searches are best for finding out what a particular sequence is, while protein searches can identify what a protein is AND if there are related sequences in the database. Nucleotide searches can also find related sequences but, due to stabilising selection on protein structure and redundancy in the genetic code, protein searches can identify more distantly related sequences.

  1. Things to note about the blast output are:
    1. The top (header) section of the report shows which version of the BLAST algorithm was used (with a citation), the name and size of the database searched, and the name and length of the query.
    2. The next section (hit table) is a list of sequences in the database with significant alignments to the query – their identifiers, score and E value of the match are listed. Only hits with E values below the threshold are reported.
    3. The third section (alignments) shows the pairwise alignment of each database hit (Sbjct) with your query. Above the alignment, are statistics for the alignment, which are particularly useful.
    4. The fourth section reports the Karlin / Altschul statistics for the search (Lambda, K and H) for ungapped and gapped alignments, as well as the details of the database and the scoring matrix and gap penalties used.

Whattypeof gappenaltiesareusedbydefault:constant,proportionaloraffine?What is the scoring matrix used by default by the blastn search? (HINT: refer toLecture7).

  1. Exploring your nucleotide BLAST results further
    1. Examine the nucleotide blast results file – is there an exact match (100%) to your query sequence?
    2. If so, use the hit identifier – the accession and version number of the sequence – at the far left of the hit table section to get more information about the database sequence which your query has matched. If you enter the accession and version number into the search box on the NCBI front page, selecting the “Nucleotide” option from the dropdown menu just to the left of the search box, you will be able to access more information on the sequence. The data may be presented in a variety of ways on different websites, but ultimately is derived from the “flat file” data format, where the information is split up into sections with standard fields (Review Lecture 6 on Archives, information retrieval and data mining). From what species doesyourmysterysequenceoriginate?
    3. If there isn’t an exact match (100%), whatistheorigin ofthemostsimilarsequence?
    4. Examine the Reference field in the flat file to find a paper associated with your sequence

— there may be a PubMed identifier (PMID) number that links directly to the abstract of the paper, or you can paste the PMID number into the PubMed search box at NCBI. IfthereisnotaPMIDyouwill havetolocatethepaperbydoingaliteraturesearch.

Isitapaperspecificallyaboutthegeneyouarestudying,orsomethingelse?

  1. Next steps
    1. Next perform a protein BLAST search with the protein (queryXXp.fa.fasta) version of your sequence.

blastp -query queryXXp.fa.fasta -db swissprot -evalue 0.01

-num_alignments 10000 -num_descriptions 10000 -outfirst_Protein_blast.txt

NOTE: the –num_alignments and –num_descriptions parameters (which control the number of alignments and descriptions that blast reports, respectively) are set at a high level to ensure that you are able to examine non-significant (i.e. E >0.01) database matches. This may mean that your blast report will be a very large file, but this will vary between queries.

blastp refers to the appropriate algorithm for amino acid sequences and the -queryflag identifies your sequence file. Again the -dbflag specifies the built-in database, in the case of protein searches, this is UniProt/SwissProt. -evalue sets the expect value threshold, so that the blast search only reports matches that have an expect value less

than the evalue parameter. Finally, the name of the output file is specified by the -outflag; you should change this for each search otherwise blast will silently overwrite the last set of results with the new set. NOTE: One search parameter that is NOT included in the default blast output is the setting of the evalue threshold. If you do searches at different values of -evalue then you could note this by including the value in the name of the output file (e.g.-outProtein_blast_E0.01.txt)

  1. You will know that the blast search is complete, (which should be quicker than for the nucleotide blast, Why?) when the command prompt returns – then you can view the output file using the lesscommand to see what blast has found.
  2. Things to note about the blast output are:
    1. The top (header) section of the report shows which version of the BLAST algorithm was used (with a citation), the name and size of the database searched, and the name and length of the query.

Is the UniProt/SwissProt database redundant or not?

  1. The next section (hit table) is a list of sequences in the database with significant alignments to the query — their identifiers, score and E value of the match are listed. Only hits with E values below the threshold are reported.
  2. The third section (alignments) shows the pairwise alignment of each database hit (Sbjct) with the query. Above the alignment are statistics for the alignment, which are particularly useful. Note that protein alignments report identities AND similarities.

How are these indicated in the pairwise alignment?

  1. The fourth section reports the Karlin / Altschul statistics for the search (Lambda, K and H) for ungapped and gapped alignments, as well as the details of the database and the scoring matrix and gap penalties used.

Whattypeof gappenaltiesareusedbydefault:constant,proportionaloraffine?What is the scoring matrix use by default by the blastp search? (HINT: refer toLecture7).

  1. Exploring your protein BLAST results further
    1. Examine the protein blast results file — is there an exact match (100%) to your query sequence?
    2. If so, use the hit identifier – the accession and version number – at the far left of the hit table section to get more information about the database sequence which your query has matched. If you do a text search on the UniProt website (https://www.uniprot.org/) with the accession and version number you will be able to access more information on the sequence. The data are presented in a richly interlinked form, but is fairly easy to navigate – you should review Lecture 6: “Scientific Publications and Archives” to guide you through UniProt. From whatspeciesdoesyourmysterysequenceoriginate?
    3. If there isn’t an exact match (100%), whatistheoriginofthemostsimilarsequence?
    4. Examine the Publications section in UniProt to find a paper associated with your sequence — the PubMed identification (PMID) number will often link directly to the abstract of the paper.

Isitapaperspecificallyaboutthegeneyouarestudying,orsomethingelse?

  1. Retrieving your blast search results from SPECTRE2 (NoMachine)
    1. You can extract much of the required information from the blast results file using the less command and copying and pasting but there is a problem: using the standard Windows copy shortcut (Ctrl-C) will terminate the less program. It is sufficient to just highlight text with the mouse cursor within the command window on SPECTRE2 to copy it to the Windows clipboard, from where you can paste it (into a text document) with Ctrl-

V. However, it may be simpler to have the entire text results file on your Windows desktop for including (parts of it) in your write up.

  1. To download a particular file from SPECTRE2 (for example your nucleotide BLAST results, which can be quite large) you can click on the “iM” icon at the top right of the NoMachine window. This will reveal a drop down menu with “Download file from the server” as an option. Click this and a file browser (with the title “NoMachine –Select file to send”) will appear. Navogate to the file you want in your SPECTRE2 home directory, select the file and click OK. You will then be asked to “Select the file destination” on your computer. Once downloaded you can open the text file with any text editor or MS Word.
  2. Retrieving your blast search results from SPECTRE2 (Chrome SSH app)

To access your BLAST results using the Chrome SSH app you will need to use the SFTP function of the app. This involves exiting from your current login (typing “exit” at the command prompt, and closing the Chrome tab / window. Start the SSH app again and complete the login details for SPECTRE2 (they should have been retained, but refer to the screen shot in section 3.0 is not).

  1. THIS TIME click the “SFTP” button, next to the “[ENTER] Connect” button. This will start an SFTP (Secure File Transfer Protocol) session on SPECTRE2 – you will need to give your UoL password, and type “Y” in response to any questions.
  2. Once the login is complete you will have the usual command prompt and should navigate to the file within you directory as before (using cdetc)
  3. In the directory containing the results file type: getfilename(where “filename is the name of the results file).
  4. Pressing the enter key will cause the file to be downloaded and placed in your “Downloads” directory in Chrome.
  5. You can close the SFTP session by typing “exit” at the command prompt, AFTER you have checked that the file has downloaded and you can open it in a text editor.
  1. Getting started with your research about “your” gene…

To help you start your research on the gene that you have (hopefully) identified in the course of the practical, we have given each student a blog where you can record the results of your research. Find your blog by clicking the “BLAST / Galaxy Practical Blogs” button on the left sidebar of the BS2040 Blackboard page, and then the “BLAST Practical blogs” link. Your task is to identify the humangene that is orthologous to your mystery DNA and protein sequences, and then use online resources to find out about the diseases that are caused by mutations in this gene. You should post the results of your research on your blog, but first make sure that you check with a demonstrator that you have identified the correct gene / disease pair. For your blog post it is ACCEPTABLE to copy information from websites or journal articles BUT ONLY IF you surround the copied material with double quotes (“…”) AND include the URL of the source immediately after the quote. Used in this way, the blog can be a tool to record your research and will help you to answer the questions posed below. However, in your PRACTICAL REPORT you MUST WRITEINYOUROWNWORDSANDCITEYOURSOURCES– to NOTdo so IS

PLAGIARISM, and the usual penalties will apply.

NOTE: the BLAST Practical blog is NOT assessed – it is simply a useful place to storenotes on the practical session, the results of your research (and perhaps a summary ofyourBLASTresults). You are stronglyencouragedtouse the blog in this way – in the past,

students have found it very useful to have an online, web accessible record of their practical and research notes.

  1. Results – questions to answer in the results section of your report

Section 4.1 [35marksintotal]

  1. Which organism does your query DNA sequence come from? In other words, from what species does the sequence that most closely matches your query come? You should find the common English name (if one exists) and the binomial Linnean (Latin) name of the species. If there are two (or more) 100% identical sequences you should make clear which (first, second, third, etc.) you are referring to. [2marks]

2)

  1. What is the next most closely related nucleotide sequence, from a different organism, which shows significant similarity to your query DNA sequence? If there are two (or more) sequences showing the same level of identity you should make clear which (first, second, third, etc.) you are referring to. [2marks]
  2. What percentage of your query sequence does this other sequence cover (calculated as the length of the alignment, divided by the length of the query)

[2 marks]

  1. What is the E-value associated with this match? [2 marks]
  2. How similar (in terms of % identity) are these sequences? [2 marks][Totalof8marks]
  1. Using your PROTEIN query sequence, find out which HUMAN protein sequence is MOST similar to your mystery gene?
    1. You should not only name the protein that satisfies this criterion… [2marks]
    2. but also describe how you found it… [4marks]
    3. the percentage identity to your query… [2marks]
    4. and the E-value of the alignment [2 marks]. [Totalof10marks]

NOTE: Protein alignments report both identity (where the same amino acids are aligned) and similarity (where chemically similar amino acids align) — you should report identity.

  1. What is the human reference messenger RNA sequence (NCBI RefSeq) corresponding to the MOST similar HUMAN protein sequence identified in question 3?

HINT: the simplest way to locate the reference mRNA sequence is to perform an NCBI Gene search using the protein name from question 3: paste it into the search box at use the dropdown box to select the “Gene” database. For some genes, there are a number of search results – you should select the result (usually the top one) that corresponds to the human gene name that you searched for. Clicking on the link for this result takes you to the Gene report, where data on the gene are organised into a number of sections. You should scroll down to the “Genomic Regions Transcripts and Products” section. If there are multiple NM_ mRNA sequences (likely due to alternative transcripts) simply select the longest isoform (the one with the largest “Aligned Length” in the dropdown box that appears when you hover over the mRNA) –: this is an arbitrary choice – it may correspond exactly to your protein query sequence, but to be sure you would need to align the protein sequence to the selected mRNA using the software tool exonerate, but this is outside the scope of this practical. For your selected mRNA you should:

  1. give the accession number of the reference mRNA (starts NM_…) [1mark]
  2. give the version number [1mark]
  3. and describe how you found it [3 marks][Totalof5marks]
  4. What is the protein sequence, from another organism with the lowest alignment score, which nevertheless shows significant similarity to your query protein sequence? Refer to the note below for a definition of what “significant similarity” means. In your answer you should not only name the protein that satisfies this criterion [(a), 2 marks], but also describe how you found it [(b), 4 marks], the percentage identity to your query [(c), 2marks], and the E-value of the alignment [(d,)2marks]. [Totalof10 marks]

NOTE: Protein alignments report both identity (where the same amino acids are aligned) and similarity (where chemically similar amino acids align) — you should report identity.

REMEMBER: “E-values” can be used to assess the likelihood that a given BLAST hit is biologically meaningful (i.e. it involves similarity between two sequences which diverged from a common ancestor). High expect values (>1) indicate the likelihood of observing as good a match (or higher) between your query and a completely biologically unrelated sequence. For example, an E-value of 3.0 indicates that you would EXPECT to see as good a match (or better) to three totally unrelated sequences in the database, each time you did the search, simply due to chance. As a “rule of thumb” we can say that BLAST hits with E-values of less than 0.01 are significant and likely indicate a biological relationship. In other words an E-value of 0.01 indicates that if the search were done 100 times, only once (1%) would we expect to see as good a match between the query sequence and a biologically unrelated sequence.

Section 4.2 [30marksintotal]

NOTE: You should answer the questions below by referring to the reference mRNA sequence (RefSeq) which you located in Section 4.1, question 4. If you are not sure that you have the correct sequence, check with a demonstrator.

  1. What is the RefSeqGene entry (starts NG_…) for the human gene that corresponds to the RefSeq mRNA sequence you have identified [(a), 1 mark]. You should give the HGNC (Hugo Gene Nomenclature Committee) approved symbol for the gene (https://www.genenames.org/) [(b), 1 mark] and the HGNC approved name [(c) 1 mark ][Totalof3 marks]HINT: the “Summary” section of the NCBI Gene entry contains a link to this information, under “Primary Source”.
  2. Is this gene the human orthologue of the gene from which your query sequences are derived? In your answer, you should define what an orthologue is [(a), 3 marks] and explain how your BLAST results support this conclusion [(b), 5 marks]. [Total of 8marks]
  3. What human disease(s) is / are associated with mutations in the gene you have identified? You should name the disease (or one of the diseases, if there are more than one) [(a), 2 marks] and cite an original primary research paper (NOT a review paper) that supports this association HINT: OMIM might be useful here [(b), 3 marks]. [Total of5marks]

NOTE:If you are not sure how to cite a primary research paper correctly ask for help.

  1. What is meant by the term a “conserved protein domain”? Does your gene encode any such domains? You can check by copying and pasting your protein sequence into the search box. In your answer, you should define what a conserved protein domain is [(a), 3 marks], describe any conserved domains present in your protein OR how you know there are NO conserved domains [(b),3marks]. [Totalof6marks]
  2. What is the “function” of the protein product of your gene, if known? Your answer should describe the protein’s function [(a), 5 marks] and how this is related to the disease caused when it is mutated. [(b),3marks]

OR if the function of the protein product of your gene is not known you should summarise any information on the predicted function that can be inferred from any conserved domains of the protein [(a), 5 marks] and what is known about how the protein might cause the disease phenotype, based on your research [(b),3marks]. [Totalof8marks]

5.0 Practical Report Details

You should produce a practical report summarising the results of your research and answering the questions posed in sections 4.1 and 4.2 above.

You should organise your report with a brief Introduction – describing what BLAST is, how it is used and what you used it for in the practical [Total of 5 marks]. This should be followed by a Methods section where you describe what YOU did (if you followed the practical instructions EXACTLY then there is no need to re-write them, simply reference these Practical instructions, but you should note any variations from the instructions that you carried out, so that markers can reproduce your analysis) [Total of 2 marks]. The Methods section should be followed by a Results section, in which you should answer the questions posed in Sections 4.1 [Totalof35marks]and 4.2 [Totalof30marks].

Your answers to the questions should be EXPLICITLY identified in the text, for example “Section 4.1, Question 1a. Answer: My DNA sequence (Query28N) comes from the organism…”. You should finish your report with a brief summary of your results [Total of3 marks]. Your report should be at least 2 pages long, in 12pt Arial font with margins (top/bottom/left/right) of 1.5 cm, and be fully referenced (Using EndNote or RefWorks) [Totalof5Marks].

NOTE: The Practical report will be marked out of a total of 80 marks, against the Grademark Rubric on the Submission Page. You should consult the rubric to make sure that you have fulfilled the requirements for each section in your report.

The report should be producedinMicrosoftWordand saved as a PDF (.pdf) format file. The name of the file should include the words “BS2040_BLAST” and your UniversityStudent ID number (e.g. BS2040_BLAST_123456789.pdf). Please feel free to use screen shots (with appropriate citations), and figures (both screenshots and figures should have descriptive legends) to illustrate your report. The practical report should be submitted via Blackboard using the appropriate assignment page on the BS2040 Blackboard site, by 8 am on Monday 1st March 2021. It will be marked electronically and your mark and feedback returned using the GradeMark system.

NOTE:Onsubmission,yourworkwillbeelectronicallyscannedforplagiarismusing the “TurnitinUK” system, which compares your work to that of all other pastandpresentstudentsaswellasallavailablesourcesofinformationontheinternet. Please take care to explain your research in your own words since“copying and pasting” from websites IS plagiarism and can result in a mark of zeroforthispieceofwork.

References

[This reference uses the Harvard reference style as implemented in RefWorks – please use this style for your report]

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. JMolBiol.215,403-10.

Why Do You Get Services With The Best NET Service Help From Universal Assignment?

We have been in the writing industry for many years helping us to understand the quality and format that universities expect from a student when it comes to working in Bioinformatics. We have a team of experts who help the student with their various questions as they have just graduated from prestigious universities.

A student who is looking for Bioinformatics assistance is often confused by the many writing resources but we in Universal Assignment ensure that you get the best quality before the deadline. Some of the services that help us maintain a high position include.

Quality Assurance: We in the Universal assignment have Bioinformatics Assignment Expert experts to solve your query and provide you with quality work so you can get HD marks.
Real content: Professionals have excellent research skills that help them use real data while compiling work that will be done to improve quality.
Plagiarism: Our Delivery follows a Zero plagiarism policy, and experts understand its importance as a major issue for universities.
Review Policy: Last-minute change is a problem for the student due to overwork, and stress. We are here to rescue you and improve your work with last-minute changes.
Student portfolio: When a conversation is between a service provider and a client minimize the opportunity. So we have a student portfolio that helps you connect with a professional directly.

With so much to gain with us, you do not have to go anywhere. The desired distance is left with a single click. Share data and place your order now.

Universal Assignment (March 9, 2026) BS2040: Bioinformatics Assignment. Retrieved from https://universalassignment.com/bs2040-bioinformatics-assignment/.
"BS2040: Bioinformatics Assignment." Universal Assignment - March 9, 2026, https://universalassignment.com/bs2040-bioinformatics-assignment/
Universal Assignment April 14, 2022 BS2040: Bioinformatics Assignment., viewed March 9, 2026,<https://universalassignment.com/bs2040-bioinformatics-assignment/>
Universal Assignment - BS2040: Bioinformatics Assignment. [Internet]. [Accessed March 9, 2026]. Available from: https://universalassignment.com/bs2040-bioinformatics-assignment/
"BS2040: Bioinformatics Assignment." Universal Assignment - Accessed March 9, 2026. https://universalassignment.com/bs2040-bioinformatics-assignment/
"BS2040: Bioinformatics Assignment." Universal Assignment [Online]. Available: https://universalassignment.com/bs2040-bioinformatics-assignment/. [Accessed: March 9, 2026]

Please note along with our service, we will provide you with the following deliverables:

Please do not hesitate to put forward any queries regarding the service provision.

We look forward to having you on board with us.

Most Frequent Questions & Answers

Universal Assignment Services is the best place to get help in your all kind of assignment help. We have 172+ experts available, who can help you to get HD+ grades. We also provide Free Plag report, Free Revisions,Best Price in the industry guaranteed.

We provide all kinds of assignmednt help, Report writing, Essay Writing, Dissertations, Thesis writing, Research Proposal, Research Report, Home work help, Question Answers help, Case studies, mathematical and Statistical tasks, Website development, Android application, Resume/CV writing, SOP(Statement of Purpose) Writing, Blog/Article, Poster making and so on.

We are available round the clock, 24X7, 365 days. You can appach us to our Whatsapp number +1 (613)778 8542 or email to info@universalassignment.com . We provide Free revision policy, if you need and revisions to be done on the task, we will do the same for you as soon as possible.

We provide services mainly to all major institutes and Universities in Australia, Canada, China, Malaysia, India, South Africa, New Zealand, Singapore, the United Arab Emirates, the United Kingdom, and the United States.

We provide lucrative discounts from 28% to 70% as per the wordcount, Technicality, Deadline and the number of your previous assignments done with us.

After your assignment request our team will check and update you the best suitable service for you alongwith the charges for the task. After confirmation and payment team will start the work and provide the task as per the deadline.

Yes, we will provide Plagirism free task and a free turnitin report along with the task without any extra cost.

No, if the main requirement is same, you don’t have to pay any additional amount. But it there is a additional requirement, then you have to pay the balance amount in order to get the revised solution.

The Fees are as minimum as $10 per page(1 page=250 words) and in case of a big task, we provide huge discounts.

We accept all the major Credit and Debit Cards for the payment. We do accept Paypal also.

Popular Assignments

Assignment Quantitative CASP RCT Checklist

CASP Randomised Controlled Trial Standard Checklist:11 questions to help you make sense of a randomised controlled trial (RCT)Main issues for consideration: Several aspects need to be considered when appraising arandomised controlled trial:Is the basic study design valid for a randomisedcontrolled trial? (Section A)Was the study methodologically sound? (Section B)What are

Read More »

Assignment Qualitative CASP Qualitative Checklist

CASP Checklist: 10 questions to help you make sense of a Qualitative researchHow to use this appraisal tool: Three broad issues need to be considered when appraising a qualitative study:Are the results of the study valid? (Section A)What are the results? (Section B)Will the results help locally? (Section C) The

Read More »

Assignment Topics

PS3002 Assignment TopicsDear studentsPlease choose one of the topics below. Please note that if you are repeating this subject, you cannot choose the same topic that you did previously in this subject.patellar tendinopathyinstability of the lumbar spinehamstring strainperoneal tendinopathyhip – labral tear.hip osteoarthritispatellofemoral instabilityankylosing spondylitisanterior cruciate ligament rupture (conservative management)quadriceps

Read More »

Assessment 2 – Report

Assessment 2 – Report (1200 words, 30%)PurposeTo demonstrate an understanding of the purpose and application of evidence-based dietary advice and guidelinesLearning objectives1.Review and analyse the role and function of macronutrients, micronutrients and functional components of food in maintaining health2.Understand digestion, absorption and metabolism of food in the human body and

Read More »

Assessment 2 – Individual Case Study Analysis Report

Southern Cross Institute,Level 2, 1-3 Fitzwilliam Street, PARRAMATTA NSW 2150 & Level 1, 37 George Street PARRAMATTA NSW 2150Tel: +61 2 9066 6902 Website: www.sci.edu.auTEQSA Provider No: PRV14353 CRICOS Provider No: 04078ªPage 1 of 16HRM201 Human Resources ManagementSemester 1, 2026Assessment 2 – Individual Case Study Analysis ReportSubmission Deadline: This Week,

Read More »

ASSESSMENT 2 BRIEF HPSYSD101 The Evolution of Psychology

HPSYSD101_Assessment 2_20240603 Page 1 of 7ASSESSMENT 2 BRIEFSubject Code and TitleHPSYSD101 The Evolution of PsychologyAssessment TaskAnnotated BibliographyIndividual/GroupIndividualLength2,000 words (+/- 10%)Learning OutcomesThe Subject Learning Outcomes demonstrated by successful completion of the task below include:b) Examine the significant figures, events and ideas present in the history of psychology.c) Identify and relate the

Read More »

Assessment 1 – Individual Case Study Analysis Report

HOS203 Contemporary Accommodation ManagementSemester 1, 2026Assessment 1 – Individual Case Study Analysis Report (10%)Submission Deadline: This Week, at 11:59 pm (Week 4)Overview of this AssignmentFor this assessment, students are required to analyse an assigned case study about hospitality industry relevant regulations and/or operational and accreditation failures of a hospitality organisation.

Read More »

Assessment Brief PBHL1003FOUNDATIONS OF HEALTH AND HEALTH CARE SYSTEMS

Assessment BriefPBHL1003FOUNDATIONS OF HEALTH AND HEALTH CARE SYSTEMSTitleAssessment 2 TypeEssay Due DateWeek 6 Monday 14 April 2025, 11:59pm AEST Length1000 words Weighting60% Academic IntegrityNO AI SubmissionUse Word Document – submit to Blackboard / Assessments Tasks & Submission / Assessment 2 Unit Learning OutcomesThis assessment task maps to the following Unit

Read More »

Assignment 4 – Intersection Upgrades and Interchange Station Design

CIVL5550: Civil Infrastructure DesignAssignment 4 – Intersection Upgrades and Interchange Station DesignDue: This WeekSubmission Instructions:1.Submit a report of approximately 10 pages, covering the following:Part 1: Intersection Upgrade Design•Propose upgrade schemes for two sign-controlled intersections and one signalized intersection•Use SIDRA to evaluate the performance of both the original and upgraded intersections•Use

Read More »

Assessment Brief 1

1 of 14Assessment Brief 1Assessment DetailsUnit Code Title NURS2018 Building Healthy Communities through Impactful PartnershipsAssessment Title A1: Foundations of Community Health Promotions ProjectAssessment Type ProjectDue Date Week 4, Monday, 22nd of September 2025, 11:59pm AESTWeight 40%Length / Duration 1200 wordsIndividual / Group IndividualUnit Learning Outcomes(ULOS)This assessment evaluates your achievement of

Read More »

Assignment 1 – Digital Stopwatch

Assignment 1 – Digital StopwatchThis assessment is an individual assignment. For this assignment, you are going to implement the functionality for a simple stopwatch interface as shown above. The interface itself is already provided as a Logisim file named main.circ . Your assignment must be built using this file as

Read More »

Assessment Background Country Profile

BackgroundCountry ProfileKiribati is an island nation situated in the central Pacific Ocean, consisting of 33 atolls2 and reef islands spread out over an area roughly the size of India (see Figure 1).i Yet, Kiribati is also one of the world’s smallest and most isolated country. A summary of Kiribati’s key

Read More »

Assessment 3: PHAR2001 INTRODUCTORY PHARMACOLOGY

PHAR2001 INTRODUCTORY PHARMACOLOGYAssessment 3: Case StudyASSESSMENT 1 BRIEFAssessment Summary Assessment titleAssessment 3: Case study Due DateThursday Week 6, 17 April at 11:59 Length•The suggested number of words (not a word limit) for the individual questions within the case study is as indicated at the end of each individual question. Weighting50%

Read More »

Assessment Module 1 Healthcare Systems Handout

Module 1Healthcare Systems HandoutGroup AgendasHealth Professionals: You got into health to help people. However, as an owner and operator of a multidisciplinary practice, you need to see many patients to cover the cost of equipment, technology, office and consumables, and pay your staff. The Medicare benefit doesn’t cover the rising

Read More »

Assessment 2 – Case study analysis 

Assessment 2 – Case study analysis  Description  Case study analysis  Value  40%  Length  1000 words  Learning Outcomes  1, 2, 3, 4, 5, 6, 7  Due Date  Sunday Week 9 by 23:59 (ACST)  Task Overview  In this assessment, you will choose ONE case study presenting a patient’s medical history, symptoms, and relevant test

Read More »

Assessment NURS2018: BUILDING HEALTH COMMUNITIES

NURS2018: BUILDING HEALTHCOMMUNITIES THROUGH IMPACTFULPARTNERSHIPSAssessment 1 Template: Foundation of Community Health Promotion projectOverall word count excluding the template wording (63 words) and reference list:Introduction to health issue:The case study, increase breast screening in Muslim women living in Broadmeadows,Melbourne, focuses on addressing the low participation rates in breast cancer screening amongMuslim

Read More »

Assessment EGB272: Traffic and Transport Engineering (2025-s1)

EGB272: Traffic and Transport Engineering (2025-s1)ashish.bhaskar@qut.edu.auPage 1 of 8Assessment 1A (15%) Cover PageIndividual component: 5%Group component: 10%You are expected to submit two separate submissions:Individual Submission (5%): Each student must submit their own individual report. Details of the individual report are provided in Section 3.1, and the marking rubric is in

Read More »

Assessment 3 – Essay: Assessment 3 Essay rubric

Unit: NUR5327 – Management and leadership in healthcare practice – S1 2025 | 27 May 2025Assessment 3 – Essay: Assessment 3 Essay rubricLearning Objective 5:Differentiate drivers forchange and proactively leadhealth professionalresponses to changing anddynamic environmentsFails toidentify aclear plannedchange ordoes not linkit to thestrategic plan.0 to 7 pointsIdentifies aplannedchange, butthe link

Read More »

Assessment 2 – Case study analysis 

Assessment 2 – Case study analysis  Description  Case study analysis  Value  40%  Length  1000 words  Learning Outcomes  1, 2, 3, 4, 5, 6, 7  Due Date  Sunday Week 9 by 23:59 (ACST)  Task Overview  In this assessment, you will choose ONE case study presenting a patient’s medical history, symptoms, and relevant test

Read More »

Assessment 1 PPMP20009 (Leading Lean Projects)

Term 1, 2025PPMP20009 (Leading Lean Projects)1Assessment 1 – DescriptionAssessment title Case study reportAssessment weight 40% of the unit marksReport length 3000 wordsMaximum 8 pages excluding references and appendicesReport format MS Word or PDFSubmission type IndividualSubmission due by Friday, Week 6Assessment objectiveThe purpose of this assessment item is to help you

Read More »

Assignment Maternity – Paramedic Management

Title-Maternity – Paramedic ManagementCase Study – Home Birth Learning outcomes1. Understand the pathophysiology and prehospital management of a specific obstetric condition.2. Develop a management plan for a maternity patient.3. Examine models of care available for maternity patients.4. interpret evidence that supports paramedic care of the maternity patient and neonate.5. Demonstrate

Read More »

Assignment Guidelines for Cabinet Submissions

Guidelines for Cabinet SubmissionsGENERALThe purpose of a Cabinet submission is to obtain Cabinet’s approval for a course of action. Ministers may not have extensive technical knowledge of the subject matter -and may have competing calls on their time. It is, therefore, important that Cabinet submissions are presented in a consistent

Read More »

Assignment Secondary research structure

Dissertation – Secondary Research – Possible Structure and Content GuideA front cover stating: student name, module title, module code, Title of project moduleleader, supervising tutor and word count.Abstract (optional and does not contribute to your word count)This should be an overview of the aim of the critical review, the methodology

Read More »

Assignment E-Business and E-Marketing

Module HandbookFaculty of Business, Computing and DigitalIndustriesSchool of Business(On-campus)E-Business and E-MarketingModule.2025-26􀀀Contents Module Handbook 1Contents 2Module Introduction 3Module Leader Welcome 3Module Guide 5Module Code and Title 5Module Leader Contact Details and Availability 5Module Team Tutors Contact Details and Availability 5Module Teaching 5Module Intended Learning Outcomes 5Summary of Content 6Assessment and Deadlines

Read More »

Assignment II: Computational Fluid Dynamics (CFD) Analysis of

CRICOS Provider 00025B • TEQSA PRV12080 1MECH3780: Computational MechanicsAssignment II: Computational Fluid Dynamics (CFD) Analysis ofGeneralised Cardiovascular Medical DevicesIntroduction:In this assignment, you will develop your CFD capability by analysing a benchmark casefrom a validation study sponsored by the U.S. Food & Drug Administration (FDA) and fundedby the FDA’s Critical Path

Read More »

LCRM301 Researching criminology

LCRM301 Researching criminology Worksheet 1 This worksheet will be disseminated to students in Week 3 and will assist them in the planning and development of the second assessment task: literature review. PART 1: Refining your topic The topic I am interested in is: I am interested in this topic because:

Read More »

ASSESSMENT TASK 2 – COURT APPLICATION

APPENDIX B: ASSESSMENT TASK 2 – COURT APPLICATION (30% OF FINAL MARK)General informationThis Assessment task is worth 30 marks of your final mark.The task is either making (Applicant) or opposing (Respondent) an application before the Supreme Court in your respective state based on a fact scenario, which will be uploaded

Read More »

Can't Find Your Assignment?