生物信息代写 | CSDS 458 – Homework 2

这个作业是通过编程来分析计算基因相关的问题
CSDS 458 – Homework 2

1. (10pts) Show the recursive function that is used to calculate the Fibonacci
numbers runs in exponential time.
2. (10pts) Write pseudo code of the dynamic programming algorithm to solve the
matrix chain multiplication problem. Explain your variables and steps. Define the
order in filling out the table. Analyze its time and space complexity. You do not
need to provide code for this one.
3. (20pts) In solving the sequence alignment problem using dynamic programming,
a traceback path is the backtracking path that follows the pointers in a table to
reconstruct an alignment. Since the traceback paths in a dynamic programming
table from the bottom right corner to the top left corner correspond one-to-one
with the optimal alignments, the number of distinct optimal alignments can be
obtained by computing the number of distinct traceback paths. Give an algorithm
to compute this number in O(|x||y|) time for two strings x and y.
For students with CS backgrounds, please finish question 4. For students
with backgrounds other than CS, you have the freedom to choose either Q4
or Q5.
4. (60pts) Implement the dynamic programming algorithms for global alignments
and local alignments using what ever language you like. You program read a file
with parameters and inputs, you need to output
a. The optimal score (5pts for global and 5pts for local)
b. The number of solutions that achieve the optimal score (5pts for global and
5pts for local)
c. The actual alignments with the optimal score. (20pts for global and 20pts for
local)
The format of your input file is:
The first line is either a “g” for global alignments or a “l” for local
alignments.
The second line has three numbers, indicate the score for match, mismatch
and indel (insertion and deletion)
The third and the fourth line are the two sequences you need to align.
You can assume the alphabet of the input sequence is {A, T, G, C}. You should
upload your source code and executable code in Canvas. Your program will be
evaluated based on some example inputs that will not be available to you during
your development.
5. (60pts) Use Matlab (or any other high level languages) to do global and local
alignments.
a. The gene we will use is called ‘PTEN/Pten’, first try to obtain information
about this gene for human, mouse and fly within Matlab (see Matlab
Demos/Bioinformatics Toolboxes/Sequence Analysis/Alignment and see
example output below).
b. If you cannot obtain the gene info using the above approach, please directly
download the fasta format of the gene for these three species and read them in
Matlab. Output the length of this gene for all the three species.
c. Try to do global alignment between human and mouse, and local alignment
between human and fly. If you have memory issues and the program cannot
finish the alignment, what can you do? Please describe some realistic
solutions you may come up with.
d. Convert these DNA sequences into protein sequences and perform protein
sequence alignments, again, global alignment between human and mouse and
local alignment between human and fly. What is the problem with this
conversion? If you have memory issues and the program cannot finish the
alignment, what can you do?
e. If you don’t know the gene structure of a gene, you cannot directly convert
your DNA sequence to protein sequence. However, as a first step, you can
find their open reading frames (ORFs) from DNA sequences. Find and show
the ORFs of each of the sequences (see example below).
f. Find the longest ORF from each species and convert it into protein sequence.
And then perform global alignment between human and mouse and local
alignment between human and fly. Visualize your results using dot plot and
the actual aligbments (see example below).
For item a, something like the one blow. Of course, they are different genes.
The human gene is loaded into the MATLAB Workspace as a structure.