Learn to Program: The Fundamentals - Assignment 2
A2 Problem Domain: Deoxyribonucleic Acid (DNA)
The problem domain for A2 is Deoxyribonucleic Acid (DNA), the double-stranded molecule that encodes genetic information for living organisms. DNA is made up of four kinds of nucleotides, which are molecules that bond together to form DNA sequences.
The four nucleotides are adenine (A), guanine (G), cytosine
(C), and thymine (T). Each strand of DNA is a sequence of nucleotides,
for example AGCTAC. In a program, we will use a string representation of
this,
"AGCTAC"
.
DNA has 2 strands in a double helix. The nucleotides in one
strand are bonded to the nucleotides in the other strand. A and T can be
bonded together, and thus complement each other; similarly, C and G are complements of each other.
You can see a picture of this on the Wikipedia page for DNA. The two strands in DNA are complementary
because each nucleotide in one strand is bonded with its complement in
the other strand. Thus, given the DNA sequence ACGTACG, its
complementary strand is TGCATGC.
Terminology in this handout
A DNA sequence is a sequence of nucleotides, such as TCATGT.
What to do
Step 2: Write functions get_length
, is_longer
, count_nucleotides
, and contains_sequence
. (Be sure to test your code with empty strings where appropriate.)
For these functions, you may use built-in functions,
str
operations (for example: in
, +
, indexing), and str
methods.
Function name: (Parameter types) -> Return type |
Description |
---|---|
get_length: (str) -> int
|
The parameter is a DNA sequence. Return the length of that sequence. |
is_longer: (str, str) -> bool
|
The two parameters are DNA sequences. Return True if and only if the first DNA sequence is longer than the second DNA sequence. (If they are the same length, return False .)
|
count_nucleotides: (str, str) -> int
|
The first parameter is a DNA sequence and the second parameter is a nucleotide ('A' , 'T' , 'C' or 'G' ). Return the number of times the nucleotide occurs in the DNA sequence.
|
contains_sequence: (str, str) -> bool
|
The two parameters are DNA sequences. Return True if and only if the first DNA sequence contains the second DNA sequence.
|
def get_length(dna):
""" (str) -> int
Return the length of the DNA sequence dna.
>>> get_length('ATCGAT')
6
>>> get_length('ATCG')
4
"""
return len(dna)
def is_longer(dna1, dna2):
""" (str, str) -> bool
Return True if and only if DNA sequence dna1 is longer than DNA sequence
dna2.
>>> is_longer('ATCG', 'AT')
True
>>> is_longer('ATCG', 'ATCGGA')
False
"""
return len(dna1) > len(dna2)
def count_nucleotides(dna, nucleotide):
""" (str, str) -> int
Return the number of occurrences of nucleotide in the DNA sequence dna.
>>> count_nucleotides('ATCGGC', 'G')
2
>>> count_nucleotides('ATCTA', 'G')
0
"""
number = 0
for num in dna:
if num == nucleotide:
number += 1
return number
def contains_sequence(dna1, dna2):
""" (str, str) -> bool
Return True if and only if DNA sequence dna2 occurs in the DNA sequence
dna1.
>>> contains_sequence('ATCGGC', 'GG')
True
>>> contains_sequence('ATCGGC', 'GT')
False
"""
return dna2 in dna1
""" (str) -> int
Return the length of the DNA sequence dna.
>>> get_length('ATCGAT')
6
>>> get_length('ATCG')
4
"""
return len(dna)
def is_longer(dna1, dna2):
""" (str, str) -> bool
Return True if and only if DNA sequence dna1 is longer than DNA sequence
dna2.
>>> is_longer('ATCG', 'AT')
True
>>> is_longer('ATCG', 'ATCGGA')
False
"""
return len(dna1) > len(dna2)
def count_nucleotides(dna, nucleotide):
""" (str, str) -> int
Return the number of occurrences of nucleotide in the DNA sequence dna.
>>> count_nucleotides('ATCGGC', 'G')
2
>>> count_nucleotides('ATCTA', 'G')
0
"""
number = 0
for num in dna:
if num == nucleotide:
number += 1
return number
def contains_sequence(dna1, dna2):
""" (str, str) -> bool
Return True if and only if DNA sequence dna2 occurs in the DNA sequence
dna1.
>>> contains_sequence('ATCGGC', 'GG')
True
>>> contains_sequence('ATCGGC', 'GT')
False
"""
return dna2 in dna1
Step 3: Write functions is_valid_sequence
and insert_sequence
.
There is no starter code for these functions. Use the design
recipe to complete them. We have given you some suggestions for
examples to try, but you should come up with more on your own based on
the descriptions.
For these functions, you may use built-in functions and
str
operations (for example: in
, +
, indexing).
Do not use str
methods.
Function name: (Parameter types) -> Return type |
Description |
---|---|
is_valid_sequence: (str) -> bool
|
The parameter is a potential DNA sequence. Return True if and only if the DNA sequence is valid (that is, it contains no characters other than 'A' , 'T' , 'C' and 'G' ).There are at least 2 ways to approach this. One way is to count the number of characters that are not nucleotides and then at the end check whether there were more than zero. Another way is to use a Boolean variable that represents whether you have found a non-nucleotide character; it would start off as True and would be set to False if you found something that wasn't an 'A' , 'T' , 'C' or 'G' .You should construct examples that contain only 'A' s, 'T' s, 'C' s and 'G' s,
and you should also create examples that contain other characters. A
string is not a valid DNA sequence if it contains lowercase letters.
|
insert_sequence: (str, str, int) -> str
|
The first two parameters are DNA sequences and the
third parameter is an index. Return the DNA sequence obtained by
inserting the second DNA sequence into the first DNA sequence at the
given index. (You can assume that the index is valid.) For example, If you call this function with arguments 'CCGG' , 'AT' , and 2 , then it should return 'CCATGG' .When coming up with more examples, think about where the second DNA sequence might be inserted: what are the extremes? |
Once you have finished writing these functions, in IDLE,
choose Run -> Run Module. In the shell, test your function by running
some example function calls.
def is_valid_sequence(dna):
''' (str) -> bool
Return True if only DNA sequence is valid.
>>> is_valid_sequence('ATCG')
True
>>> is_valid_sequence('LTPROCKS')
False
'''
for n in dna:
if n not in('ATCG'):
return False
else:
return True
def insert_sequence(dna1,dna2,index):
''' (str,str,int) -> str
Return the DNA sequence obtained by inserting the second DNA sequence
into the first DNA sequence at the given index.
>>> insert_sequence('CCGG','AT',2)
CCATGG
'''
return dna1[:index] + dna2 + dna1[index:]
''' (str) -> bool
Return True if only DNA sequence is valid.
>>> is_valid_sequence('ATCG')
True
>>> is_valid_sequence('LTPROCKS')
False
'''
for n in dna:
if n not in('ATCG'):
return False
else:
return True
def insert_sequence(dna1,dna2,index):
''' (str,str,int) -> str
Return the DNA sequence obtained by inserting the second DNA sequence
into the first DNA sequence at the given index.
>>> insert_sequence('CCGG','AT',2)
CCATGG
'''
return dna1[:index] + dna2 + dna1[index:]
Step 4: Write functions get_complement
and get_complementary_sequence
.
There is no starter code for these functions. Use the design
recipe to complete them. We have given you some suggestions for
examples to try, but you should come up with more on your own based on
the descriptions.
For these functions, you may use built-in functions and
str
operations (for example: in
, +
, indexing).
Do not use str
methods.
Function name: (Parameter types) -> Return type |
Description |
---|---|
get_complement: (str) -> str
|
The first parameter is a nucleotide ('A' , 'T' , 'C' or 'G' ). Return the nucleotide's complement.We have intentionally not given you any examples for this function. The Problem Domain section explains what a nucleotide is and what a complement is. |
get_complementary_sequence: (str) -> str
|
The parameter is a DNA sequence. Return the DNA sequence that is complementary to the given DNA sequence. For exmaple, if you call this function with 'AT' as the argument, it should return 'TA' .
|
Once you have finished writing these functions, in IDLE,
choose Run -> Run Module. In the shell, test your function by running
some example function calls.
def get_complement(n):
'''(str) -> str
Return the complement of a nucleotide.
>>> get_complement('A')
T
>>> get_complement('C')
G
'''
if n == 'A':
return 'T'
elif n == 'T':
return 'A'
if n == 'C':
return 'G'
elif n == 'G':
return 'C'
def get_complementary_sequence(dna):
'''(str) -> str
Return the complement of a given DNA sequence.
>>> get_complementary_sequence('ATCGGACT')
TAGCCTGA
>>> get_complementary_sequence('GCACTCC')
CGTGAGG
'''
complementary_seq = ''
for n in dna:
complementary_seq += get_complement(n)
return complementary_seq