Details
Due date
This assignment is due on Tuesday, May. 28th at 11:30PM.
What to submit
You will be submitting 4 files for this assignment, each of which we have provided starting templates for you
in mp7_starter.zip
:
MP7A.java
(Part A)DNA.java
(Part B and C.2)mRNA.java
(Part C.1)DNAClient.java
(Part C.3)collaboration.txt
(Required collaboration component for Parts B-C if you collaborate with a partner)
You will also see the following files included in the starter code, which you do not need to modify:
DNA.py
- Provided DNA class in Python you will port toDNA.java
CodonMapper.java
(helper class for Part C.1)pikachurin_house_mouse_seq.txt
- example dna sequence for the "Pikachurin" protein, using the house mouse reference Genome from NCBI gene database (feel free to look for other protein sequences!).pikachurin_house_mouse_polypeptide.txt
- expected output polypeptide chain for Part C.
Testing Your Code
We encourage you to test your code iteratively, using a client program (with main
to test calls to your Part B and C code, using the examples given.
You can test your Part B (DNA.java
) using this PartBTests.java
(just compile with javac
and run with java
as you
would with another Java program; it will test your DNA.java
Part B solutions and output the results). The examples in the spec are also copied into a client program (which is not a
full-test suite) you can add to (DNAStudentTester.java
).
For Part C, you can test your solutions (iteratively!) using this PartCStudentTester.java
program. This program is also provided for
you to use as a debugging tool, but provides the testing suite we'll be using when grading. You are encouraged to read through the comments if you run into an error, especially with translation
examples.
This may look like a lot, but the overall amount of work is comparable to other MP's. Remember that with OOP, we often work with more than one class (and a client program), which is what you'll practice more with in Parts B-C.
MP 7: Optional Collaboration Component
In this Mini Project, we are allowing students to collaborate with one other student on
Parts B-C only. As a porting exercise, this is a good opportunity to
work through the exercises with another student, utilizing conversation, brainstorming,
and peer-programming to piece things together. If you choose to collaborate with another student
on Parts B-C, you must submit a filled collaboration.txt
file with the details
of your experience and distribution of work. Collaboration does not mean one student
solves half or all of the assignment. Collaboration does mean spending at least
1-2 hours working through the code either in-person or asynchronously (Discord or Zoom VC), with
the option to communicate via text/Discord/email clarifications in addition to the
in-person/video collaboration. If you have any questions about this, don't hesitate to ask!
Students who would be interested in collaborating with another student can utilize the
#mp7-partner-search
Discord channel, and you are encouraged to also find a partner
in your Tuesday lab.
If you collaborate with another student, you should still both submit everything to CodePost individually
(again, your MP7A.java
should be your own), in addition to including both names in any Part B-C files you
work on together, and a completed collaboration.txt
, which both students should
fill out separately. Note that if we find non-trivial matching solutions for students who have not submitted this file, Caltech Honor Code violations will be escalated to the BoC.
For Part A (no collaboration), you will be practicing basic Java expressions and writing your first Java method.
This is a client program ran with a main
method (similar to code ran
in an if __name__ == '__main__'
statement in Python). This will be slightly different
structure than the DNA class (which does not have a main
method thus is not executable) you will finish implementing in Part B.
Finish each TODO
for Parts A.1.-A.4. in the given MP7A.java
as specified.
Some more details are provided in comments to help you get started and test your answers.
For full credit, this finished program should not have any errors when compiled.
Part A: Intro to Java
Alright, let's dive into Java now! First, we'll warm up with some expression practice, identifying some differences and similarities between Python and Java. This assignment will introduce you to the Java language, connecting other problems you've already worked through in Python. We will provide relevant details and pitfalls in this spec and provided code comments, but make sure to also refer to any lecture material on Java as needed. You are not expected to use anything we have not shown you in lectures, readings, or this assignment (you may always ask if unsure).
You will want to make sure you have Java installed if you don't already. You can find instructions for setting it up here.
To get a feel for running your first Java program, we encourage you to first make sure you can run this program and get the following output:
3 foo
For Part A, you can either run Java:
- in VSCode (Play Icon > Run Java)
-
in the VSCode bash terminal (where you usually run
python3
)$ javac MP7A.java $ java MP7A 3 foo
- replit.com
These are just suggestions available for students first learning Java. If you can get Java running quickly in VSCode using the provided
MP7A.java
, this
is what we most strongly recommend. If you happen to have Java
experience before CS 1, you can choose whatever editor you've used before. For reference, you'll usually use a Java-specific IDE like IntelliJ or Eclipse for larger Java projects.
But
these tend to be very feature-heavy, and we recommend using VSCode to edit a .java
program and VSCode or bash to compile/run it.
This is to help you just
focus on running Java in an environment you're already familiar with. If you take CS 2,
you'll likely learn how to use IntelliJ for Java programming.
Because Java isn't usually run with an interpreter
like Python, you won't actually see anything printed unless you have
an explicit System.out.println(<expression>);
call in the main
function (that is, after you fix any compiling errors). You can find an
example in the
main
function in MP7A.java
which you can use to
test different expressions/function calls.
1. Expressions
[20]
Alright, with all that said, let's dive into Java!
Many of these expressions come directly from HW1 Part A!
But the answers will not all be the same as in Python.
Similar to HW1, you'll write your answers in Java comments unless
otherwise specified. The provided MP7A.java
program clearly organizes
each exercise in Part A for you, and provides some testing code to refer to in the main
method.
Complete the TODO
in MP7A.java
with your answers for each of the following expressions (in Java) taken from HW1.
-
8 - 5
-
6 * 2.5
-
51 / 2
-
51 / -2
-
51 % 2
-
51 % -2
-
-51 % 2
-
51 / -2.0
-
1 + 4 * 5
-
(1 + 4) * 5
2. String Expressions
[15]
Next, you'll write answers for each of the following String
expressions in MP7A.java
. For each of the following expressions, provide:
-
The value of the assigned
result
variable (if no error). Indicate anyString
values with"
. -
If the code results in an error, clearly identify what the error(s) are
-
1-2 sentences identifying the similarities or differences between the Python equivalent (ignoring type declarations like
String
orint
). We want you to think about how Java and Python evaluate expressions differently (or similarly) when it comes to evaluating (not assigning) types and values.
Don't forget to review your HW1 work if needed! Two examples are given in the starter code to show what we expect for each answer.
1.
String result1 = "Lorem" + 'Ipsum';
2.
String result2 = "Lorem" + "Ipsum";
3.
String a = "Lorem" String b = "Ipsum" String a += b;
4.
String month = "April"; int days = 30; String result = days + " days hath " + month; System.out.println(result); // Make sure to test this one!
3. String Generation
[20]
Recall the string generation exercise from HW1 B.8:
Write a single Python expression which will generate a string of 70
'-'
characters. The expression itself shouldn't be longer than 10 characters. Write the answer inside a Python comment.
Unfortunately, we don't get such nice "syntactic sugar" in Java. Write your
answers for the following in the respective portion of MP7A.java
.
-
What does
'-' * 70
evaluate to in Java? Is this what you expected? -
What does
"-" * 70
evaluate to in Java? Is this what you expected?
Alright, so it doesn't work as we'd expect in Java... But it is a good loop exercise!
For Part A.3.c., you'll finish implementing a Java method called stringMultiplier(String s, int n)
which generates a String
of n
occurrences of s
.
This is the first Java method you'll be writing in this assignment. We have provided you
the method stub and an example "javadoc" comment which is similar to a
Python docstring. The differences are noted in the starter code. You do not need to change the javadoc
and you should not change the method header, but you should replace the method body
with your implementation to match the expected behavior. There are a few example calls
to this method in the main
method at the top of the program which you can use to
test the result.
The call stringMultiplier("-", 2)
should return the String
"--"
. The call stringMultiplier("-", 70)
should generate the same string
as Python's '-' * 70
evaluation. Note that this method takes a String
,
so it should work with a String
of any length.
Our solution is 5 lines in the method body, 3 of which define a
for
loop containing a single statement in the body.
Part B: Porting Classes in Java
Introduction
Throughout the remaining 2 Parts of this MP, you will finish Java classes to implement a full-featured (albeit simplified) DNA-to-mRNA-to-protein transcription program!
One of the best ways to learn a new language is through "porting" a program
from a language you know to one you're learning.
In Part B, you will take a provided Python class called DNA
with few simple methods,
and then port that into a Java class with the same name and methodology.
Throughout the exercises, you will refer to some provided code for examples, and some methods will need to be used in the methods you write.
Documentation Requirements
Part A's B.4.c exercise gave you practice writing a Javadoc comment for a method, given a template. We have given you some Javadoc completed in these exercises, but any that are missing you are expected to complete with valid Javadoc format (including placement above the method header, not within).
Provided Python Class: DNA.py
As you work through Part B, you will refer to a provided implementation in DNA.py
,
which defines a DNA
class with the same functionality as the Java version you'll
implement in DNA.java
.
This is an object-oriented class that extends on some of the DNA functions you wrote in HW1.
A summary of the provided DNA
class methods is provided below:
a. __init__
The __init__
constructor takes a DNA sequence string comprised of
nucleotide bases 'A'
, 'T'
, 'C'
, and 'G'
and creates a new DNA
class with that sequence,
converting any lower-case bases to upper-case.
If given a string with any characters that are not a valid nucleotide base (ignoring letter-casing), the method raises
a ValueError
with the error message Invalid DNA sequence. Must only contain ATCG bases.
Otherwise, the upper-cased sequence string is saved as a single class attribute called seq
.
Some examples are provided below:
>>> from DNA import DNA >>> dna_seq1 = DNA('ATCGatcg') >>> dna_seq1.seq 'ATCGATCG' >>> dna_seq2 = DNA('a') >>> dna_seq2.seq 'A' >>> invalid_dna_seq = DNA('catdog') Traceback omitted ValueError: Invalid DNA sequence. Must only contain ATCG bases. >>> invalid_dna_seq # assignment unsuccessful with ValueError Traceback omitted NameError: name 'invalid_dna_seq is not defined >>> dna_seq3 = DNA('AaAa') >>> dna_seq3.seq 'AAAA' >>> empty_dna = DNA('') >>> empty_dna.seq ''
b. __str__
and __len__
These two methods provide the string representation of a DNA
object
and its length.
Remember that __len__
is another special method (both __str__
and __len__
are
commonly known as "dunder" or "magic" methods) that defines the length
of an object when passed to len(...)
. A class defines a __len__
method
with the same syntax as __str__
but returns an int
instead of a str
.
Some examples are provided below:
... dna_seq variables defined above >>> str(dna_seq1) 'ATCGATCG' >>> str(empty_dna) '' >>> print(dna_seq1) ATCGATCG >>> print(empty_dna) >>> len(dna_seq1) 8 >>> len(dna_seq2) 1 >>> len(dna_seq3) 4 >>> len(empty_dna) 0
c. complement
This method returns the complement sequence (as a str
), which replaces
all nucleotide bases with their complement base, maintaining original ordering.
>>> comp1 = dna_seq1.complement() >>> comp1 # a str 'TAGCTAGC' >>> comp2 = dna_seq2.complement() # 'T' >>> comp3 = dna_seq3.complement() # 'TTTT' >>> empty_comp = empty_dna.complement() # '' >>> dna_seq5 = DNA('ACCAGTGTAG') >>> comp5 = dna_seq5.complement() # 'TGGTCACATC' >>> double_comp_seq = DNA(comp5) >>> str(double_comp_seq) # uses __str__() # 'TGGTCACATC' >>> double_comp = double_comp_seq.complement() # 'ACCAGTGTAG', back to dna_seq5 sequence
d. count_occurrences
count_occurrences
takes a base single-character string
and returns the number of times that character occurs in self.seq
(ignoring letter-casing).
If the given base character is not a valid nucleotide base (ignoring letter-casing),
the method raise
s a ValueError
with the message Invalid base.
.
>>> a1_count = dna_seq1.count_occurrences('a') # 2 >>> t1_count = dna_seq1.count_occurrences('T') # 2 >>> a2_count = dna_seq2.count_occurrences('A') # 1 >>> a3_count = dna_seq3.count_occurrences('A') # 4 >>> empty_a_count = empty_seq.count_occurrences('A') # 0 >>> d1_count = dna_seq1.count_occurrences('d') Traceback omitted ValueError: Invalid base. >>> empty_d_count = empty_dna.count_occurrences('d') Traceback omitted ValueError: Invalid base. >>> c5_count = dna_seq5.count_occurrences('C') # 2 >>> g5_count = dna_seq5.count_occurrences('G') # 3 >>> gc5_count = c5_count + g5_count # 5 >>> gc_content = gc5_count / len(dna_seq5) # 5 / 10 -> 0.5 (HW1!)
e. percentage_of
This method which takes a single-character base string and returns the percentage of that base
contained in the DNA sequence (a float between 0.0 and 1.0). A ValueError
with the same error message described
in the previous problem is raised if given an invalid base. Observe that
the count_occurrences
method
is used to reduce redundancy, and the error-handling is left for that method to handle
(it is poor practice to handle it the same way twice). Observe that no try/except
is
used in this class, since its methods only supporting raising possible errors to a client who passes invalid arguments.
>>> t1_count = dna_seq1.count_occurrences('T') # 2 >>> t1_percent = dna_seq1.percentage_of('T') # 0.25 >>> t1_percent = dna_seq1.percentage_of('t') # 0.25 >>> a2_count = dna_seq2.count_occurrences('A') # 1 >>> a2_percent = dna_seq2.percentage_of('A') # 1.0 >>> a3_count = dna_seq3.count_occurrences('A') # 4 >>> a3_percent = dna_seq3.percentage_of('A') # 1.0 >>> d1_percent = dna_seq1.percentage_of('d') Traceback omitted ValueError: Invalid base. >>> empty_g_percent = empty_dna.percentage_of('G') # 0.0 >>> empty_d_percent = empty_dna.percent_of('d') Traceback omitted ValueError: Invalid base.
Java Class: DNA.java
[50]
Now, you'll get practice writing a simple Java class called DNA
with the same
functionality as the Python class introduced above, but using the Java programming language.
As you'll see, Java is much stricter than Python, and it may seem unnecessarily picky/verbose.
But there is a reason that it is such a popular language, especially when speed, correctness, and security are critical.
In these ways, Java can be a much better choice than Python.
In this porting exercise, you'll see references to characters and strings, which won't seem too confusing
at first until you run into errors using one instead of the other (don't worry, El ran into a fair share of these
after programming in Python for some time; this is good practice to differentiate between programming concepts and
language-specific rules :)). In Python, we have just been referring to a general character as a single-character string (e.g.
In Java, there is a character type called
Java's |
Java Programs vs. Java Classes
You'll also note a different in the structure of an executable Java program (defined with a
Other than syntax, the difference between a client program and a program defining classes
is not something new. For example, in MP4,
For now, you can think of
In a Java program that defines a class like
In short, the biggest differences you'll see between
|
The DNA
Java class should include all of the following methods, analogous to each
of the methods in DNA.py
. We have provided the baseComplement
method for you, as a private
helper method (private
indicates
that it should only be used by methods in the DNA
class, as opposed
to the public
methods you write that clients have access to).
For Parts B-C, you are expected to write/run your Java code with VSCode (and/or the terminal) instead of any online REPL/compiler to 1) not lose unsaved work and 2) practice compiling and running a Java program locally.
a. DNA
Constructor
[15]
To get you started with your first Java class, we have provided the empty
constructor (and other method stubs) for you.
Complete the DNA
constructor to have behavior analogous to the constructor in DNA.py
.
The implementation will
look different (Java-specific syntax, and differences in constructor syntax).
but should still fundamentally create the same state for a DNA object.
Remember that whereas Python class constructors are defined with __init__(self, ...)
,
Java class constructors are defined with the class name followed by any arguments (e.g. public DNA(String seq)
). Remember that Java
does not have a self
argument for its methods. Whenever you want to refer to a Java object's state (fields or methods),
you can refer to them with this.name
syntax (replacing name
with
a defined field or method name in the class).
The DNA constructor in DNA.java
should take a DNA sequence String
comprised of
nucleotide bases 'A'
, 'T'
, 'G'
, and 'T'
.
Just like the constructor in DNA.py
, this constructor should also support lower-cased base characters,
converting any such bases to an upper-cased base character (hint: use the s.toUpperCase()
method supported for a String s
).
If given a string with any characters that are not a valid nucleotide base (ignoring letter-casing), throw
a new IllegalArgumentException
with the error message Invalid DNA sequence. Must only contain ATCG bases.
Observe the differences between the raise ValueError(...)
and throw new InvalidArgumentException(...)
statements in Python vs. Java.
Both raise an exception that a client program is expected to handle, but Java uses the throw new <ExceptionType>(<args>)
instead of
Python's raise <ExceptionType>(<args>)
.
This is all you really need to know for CS 1, but you should refer to the provided
baseComplement
method for an example of throwing this exception (your constructor's error message
should match the one described here though). Just like in DNA.py
, you should not use any try/catch
in DNA.java
(or mRNA.java)
. Any Javadoc comments should specify the
raised ("thrown") error with the @throws
annotation to indicate to the client that they are expected
to handle any thrown error in their program.
In the Python implementation, you'll see a loop to check each character; in Java,
you'll need to use char ch = s.charAt(index)
to get a char
in some String s
at the passed index
(an int
). We use ch1 != ch2
in Java to
test whether two char
s are equal, which we cannot do the same way with the String
type in Java.
We have provided an example loop you can reference below:
int spaces = 0; String hello = "Hello CS1 Students!"; for (int i = 0; i < hello.length(); i++) { char ch = hello.charAt(i); if (ch == ' ') { // remember to use ' quotes for chars in Java! spaces += 1; // spaces++ also does the same thing in Java } } System.out.println("There were " + spaces + " spaces found.");
After checking for a valid String
argument, the single seq
field should be set to the upper-cased sequence String
.
Remember that this.val = new_val;
is the syntax in Java to update some field val
defined in the Java class to a given value (e.g.
this.seq = seq;
).
For reference, our solution is about 8-10 non-trivial lines (ignoring comments/blank lines).
Some examples of the expected behavior for your constructor are provided below, using a DNAStudentTester.java
program, which is simply a client program that has example calls for your methods (a different program than
the DNAClient.java
you'll finish to implement a full DNA to protein transcription) or the PartBTester.java
which is a completed testing suite you'll test your solutions
with after implementing them.
DNA dnaSeq1 = new DNA("ATCGatcg"); // No error System.out.println(dnaSeq1.seq); // Compiler error in Java // A compiler error will occur in a DNATester.java program // since seq is a privately-declared field // (different than Python self attributes) DNATester.java:line_num: error: seq has private access in DNA System.out.println(dnaSeq1.seq); ^
DNA dnaSeq2 = new DNA("a"); // No error DNA dnaSeq3 = new DNA("AaAa"); // No error DNA emptySeq = new DNA(""); // No error DNA dnaSeq4 = new DNA("ACCAGTGTAG"); // No error DNA invalidSeq = new DNA("catdog"); // Runtime error // A runtime error will occur in a DNATester.java program Exception in thread "main" java.lang.IllegalArgumentException: Invalid DNA sequence. Must only contain ATCG bases. at DNA.<init>(DNA.java:line_number) at DNATester.main(DNATester.java:line_number)
b. toString
and size
[5]
Next, you'll implement the two methods for the string representation of a DNA
object
and its length. Similar to how __str__
and __len__
are special Python methods
to define the default string representation and length of the object, toString
and size
are
analogous in Java.
Both methods should have the same behavior as __str__
and __len__
in DNA.py
and neither
takes any arguments. Each
method body is also a 1-line solution.
String seq1 = dnaSeq1.toString(); // "ATCGATCG" String emptySeqStr = emptySeq.toString(); // "" System.out.println(dnaSeq1); // ATCGATCG System.out.println(emptySeq); // int seq1Len = dnaSeq1.size(); // 8 int seq2Len = dnaSeq2.size(); // 1 int seq3Len = dnaSeq3.size(); // 4 int seq4Len = emptySeq.size(); // 0
c. complement
[10]
Next, you'll implement the Java equivalent of the complement
method defined in DNA.py
.
This method should have the same behavior, just in an analogous Java method.
You'll note that the DNA.py
class could use a nucleotide basepair dictionary
to more cleanly implement base_complement
,
but unfortunately you won't find this to be as easy in Java (and for the purposes of an exercise in
clean porting, we've implemented it without a dictionary). You are welcome to
ask El if you'd like to learn about dictionary equivalents in Java, but we encourage you to focus on the requirements of this assignment first :).
Use the provided baseComplement
method to build the complement String
.
For reference, our solution is 5-6 non-trivial lines in the method body.
// DNA sequence variables defined above String comp1 = dnaSeq1.complement(); // "TAGCTAGC" String comp2 = dnaSeq2.complement(); // "T" String comp4 = dnaSeq3.complement(); // "TTTT" String emptyComp = emptySeq.complement(); // "" DNA dnaSeq4 = new DNA("ACCAGTGTAG"); String comp4 = dnaSeq4.complement(); // "TGGTCACATC" DNA doubleCompSeq = new DNA(comp4); System.out.println(doubleCompSeq); // uses toString() // TGGTCACATC String doubleComp = doubleCompSeq.complement(); // "ACCAGTGTAG", back to dnaSeq4
d. countOccurrences
[10]
Next, you'll write the Java equivalent to the count_occurrences
method in DNA.py
.
Using camelCasing
conventions in Java, write a method countOccurrences
which takes a base as a char
(not a String
) and returns the number of times that character occurs in this.seq
(ignoring letter-casing).
If the given base character is not a valid nucleotide base (ignoring letter-casing), the method
should throw an IllegalArgumentException
with the message Invalid base.
Don't forget to refer to the provided
exception-handling in the provided baseComplement
for reference.
// DNA sequence variables defined above int a1Count = dnaSeq1.countOccurrences('a'); // 2 int t1Count = dnaSeq1.countOccurrences('T'); // 2 int a1Count = dnaSeq2.countOccurrences('A'); // 1 int a3Count = dnaSeq3.countOccurrences('A'); // 4 int a4Count = emptySeq.countOccurrences('A'); // 0 int d1Count = dnaSeq1.countOccurrences('d'); // Exception java.lang.IllegalArgumentException: Invalid base. // Traceback omitted int c4Count = dnaSeq4.countOccurrences('C'); // 2 int g4Count = dnaSeq4.countOccurrences('G'); // 3 int gc4Count = c4Count + g4Count; // 5 double gcContent = gc4Count * 1.0 / dnaSeq4.size(); // 5 / 10 -> 0.5 (HW1!)
e. percentageOf
[10]
Finally, you'll write a method percentageOf
which takes a single-character base char
(not a String
) and returns the percentage
(as a double
, which is like a float
with higher precision, and more familiar syntax for our purposes) of that base
contained in the DNA sequence. An IllegalArgumentException
with the same error message described
in the previous problem should be thrown if given an invalid base. Otherwise, if the DNA
has an empty sequence, the method should return 0.0
as the percentage, similar to DNA.py
's implementation.
Use your countOccurrences
method
to reduce redundancy (you can use this.methodName
to call another method in a Java class instead of Python's self.method_name
syntax).
Note that countOccurrences
should handle the validation for you, so make sure you
avoid redundancy! Your Javadoc comment should still include the @throws
since percentageOf
calls countOccurrences
(but the client
reading the documentation shouldn't that, they should just know about the exception that could be raised).
For reference, our solution is about 4 non-trivial lines in the method body.
Some example calls are provided below:
int t1Count = dnaSeq1.countOccurrences('T'); // 2 double t1Percent = dnaSeq1.percentageOf('T'); // can't reassign types for an already-defined variable t1Percent = dnaSeq1.percentageOf('t'); // 0.25 int a2Count = dnaSeq2.countOccurrences('A'); // 1 double a2Percent = dnaSeq2.percentageOf('A'); // 1.0 int a3Count = dnaSeq3.countOccurrences('A'); // 4 double a3Percent = dnaSeq3.percentageOf('A'); // 1.0 int emptyACount = emptySeq.countOccurrences('A'); // 0 double emptyAPercent = emptySeq.percentageOf('A'); // 0.0 double d1Percent = dnaSeq1.percentageOf('d'); // Exception java.lang.IllegalArgumentException: Invalid base. // Traceback omitted double d2Percent = emptySeq.percentageOf('d'); // Exception java.lang.IllegalArgumentException: Invalid base. // Traceback omitted
Part C: DNA -> mRNA > Protein Transcription
Overview
Now that you have a DNA
class implemented, you'll finish a second class
mRNA
which supports a sequence of mRNA
nucleotides.
You will implement the following methods in the mRNA
class:
mRNA(seq)
: ConstructortoString()
andsize()
toPolypeptide()
: Method to convert anmRNA
sequence to aString
polypeptide chain
Just like DNA.java
, mRNA.java
will define a single class with
a constructor and methods, but will not have the main
method you'll see
in client programs like DNAClient.java
(that program will use your DNA
and mRNA
classes).
C.1: mRNA.java
We have provided minimal starter code for everything except the
toPolypeptide
method; you are expected
to take what you've learned, use DNA.java
as a reference,
and implement this class to meet the specified behavior, as well as finish
the documentation for the class using valid Javadoc.
C.1.1. mRNA
Constructor
[5]
The mRNA
constructor has behavior analogous to the constructor in DNA.java
,
but saves a sequence of RNA nucleotides instead (the field
should still be called seq
, and no other fields should be defined); you will
need to declare the private
field above the constructor, just as you see in DNA.java
.
Similar to DNA.java
, an IllegalArgumentException
should
be thrown if given an invalid sequence (remember that mRNA
is comprised
of 'AUCG' not 'ATCG'). This solution should have the same number of lines as your DNA
constructor.
C.1.2. toString
and size
[5]
These methods should behave similarly to the DNA
analogs, only for mRNA
.
When documenting, make sure your Javadoc is updated accordingly (no reference to DNA
should be in mRNA
documentation).
C.1.3. toPolypeptide
[15]
In this method, you will implement the final step of DNA to polypeptide translation, which translates an mRNA sequence to a polypeptide chain. Since this is one of the longer methods in this assignment, we have included a series of TODOs in the starter code. If you've been reviewing the materials, working through this method shouldn't take more than 10-15 minutes. As always, make sure to remove the TODO comments in your final submission.
Finish mRNA
's toPolypeptide
method
to process the mRNA
's sequence,
codon-by-codon. A codon is a special 3-character sequence
which maps to an amino acid. Note that since there are 4 bases in mRNA
,
there are 64 possible codons, 1 of which is a special "start" codon ("AUG"
,
which codes for Met
, or "Methionine"), and 3 of which
are stop codons (these do not correspond to an amino acid).
To help solve this method, we've provided a CodonMapper
class,
which is defined in CodonMapper.java
.
We've started the method with a mapper
constructed for you to work with. This
class provides functionality to return an amino acid given a codon string.
Any codon strings passed to its methods are converted to upper-case to support case-insensitivity.
You should not change CodonMapper.java
, but here is a summary of its methods:
CodonMapper mapper = new CodonMapper()
: constructs a newCodonMapper
calledmapper
mapper.getAA(codon)
: given a 3-character codonString
, returns the abbreviated amino acid name (e.g.'Met'
for'Methionine'
). Note that if one of the three stop codons are passed, the string"Stp"
is returned.mapper.isStopCodon(codon)
: a convenience method that returnstrue
iff the passed codon is a stop codon.
You'll see an example use of this class when getting the first amino acid for the start codon (it is commented out, but you can uncomment it as you work through, and use it for reference when getting the other codon amino acids).
Note: Be careful with your indexing here; the result chain should ignore everything until
the first start codon, and stop at either the first stop codon, or the last possible codon if none found.
This means that your polypeptide will have window
/ 3 codons, where window
is the length of the reading window, starting with the start codon and ending with the stop codon or last 3-character
codon in the sequence. The rest of the sequence is ignored in the result. If no stop codon is found,
the sequence should only contain the rest of the codons that are possible; so if there are 1-2 characters left
after the last valid codon, those would be ignored.
If you've correctly implemented this method, you're ready to take an mRNA
sequence
and get the result chain of amino acids!
For reference, our solution is about 10-14 non-trivial lines (ignoring comments/blank lines) after the starter code.
C.2. Using Your mRNA
to Finish DNA
[50]
Finally, you'll use your completed mRNA
class to add two methods to DNA
to perform transcription and translation. Luckily, you have most of the
work already done!
C.2.1. transcribe
[5]
This method should return a new mRNA
constructed with
the translated DNA
sequence. Remember that DNA is translated to mRNA
by first getting the complement DNA strand, and then replacing all 'T'
bases with 'U'
.
For full credit, use your DNA
's complement
method, and
the s.replace(oldCh, newCh)
String
method which
returns a new String
replacing all instances of oldChar
with
newCh
in some String s
(remember to use the correct quotes for char
's in Java!).
The method should return the result translated mRNA
without changing
the state of the DNA
object the method is called on.
For reference, our solution is 2-3 lines in the method body.
C.2.2. translate
[5]
This method is fairly short, since we've done all of the transcription/translation work already!
Use the transcribe
method you just wrote to construct a
mRNA
, and then get the result polypeptide chain (a String
)
using the mRNA
's toPolypeptide
method.
This method should only be about 2 lines!
C.3. Finishing DNAClient.java
[25]
You're almost finished with a fully-functional, Object-Oriented, DNA transcription program!
The last exercise you'll do will be in a started client program,
DNAClient.java
. Remember that a client program is one that
uses other classes to perform different tasks, and has a defined main
method to execute when calling java <program>
.
Since DNA.java
and mRNA.java
simply define classes
that we construct, they do not have any executable code (calling java DNA
won't do anything, since it doesn't have a main
method, nor should it).
This distinction is analogous to the course_student_client.py
program
we covered in lecture, which used Course
and Student
classes
defined in the respective python files.
The method you'll finish in DNAClient.java
is called getDNA
, which
takes a filename String
as an argument, and uses basic file io in Java
to read the contents of the file, line-by-line, and build a sequence string
from each line, in order.
That sequence string will then be passed to a new DNA
that you should return.
This method provides users the ability to pass a filename that contains a DNA nucleotide sequence
(e.g. pikachurin_house_mouse_seq.txt
) and return a new DNA object
created from the processed sequence string.
We have given you the starter code for this method, which opens a new File
object and initializes a string seq you'll be building.
Finish the TODOs to create a Scanner
given the file object, and process the file line-by-line to build an accumulated sequence string,
and return the result DNA
object. Here is an example of a basic loop over a file in Python vs. Java, which you can refer to
but the code should be different than yours.
f = open('some_file.txt') line_count = 0 char_count = 0 for line in f: line_count += 1 char_count += len(line) f.close() print(f'Lines: {line_count}, Chars: {char_count}')
f = new File("some_file.txt"); Scanner reader = new Scanner(f); int lineCount = 0; int charCount = 0; while (reader.hasNextLine()) { line = reader.nextLine(); lineCount++; charCount += line.length(); } reader.close(); System.out.println('Lines: " + lineCount + ", Chars: " + charCount);
The two other methods you see, printTranslation
and printTranscription
,
use your getDNA
method to print the translation and transcription results
for a DNA
created from the provided pikachurin_house_mouse_seq.txt
file.
Once you've finished getDNA
, you can now see your work in action!
The rest of the code in DNAClient.java
calls your methods with a sample
pikachurin_house_mouse_seq.txt
DNA sequence file to print out the resulting amino acid sequence
remember that only the window between the first start and any end codon are consider in
translation).
To run your program using this dataset, simply compile your work and run with java
. An
example run and the expected output are provided below (note that the sequence file has 4134 bases and the result
polypeptide sequence has 132 amino acids, including the start ('Met') and stop ('Stp')).
The GCACCA...UUUUUU
line is an abbreviation of the mRNA
sequence translated
from the DNA
, and has 4134 characters (abbreviated below since the output is otherwise
very long).
$ javac DNAClient.java $ java DNAClient Translated protein sequence for pikachurin_house_mouse_seq.txt: DNA -> mRNA transcription results: GCACCA...UUUUUU DNA -> mRNA -> protein translation results: Met-Ser-Lys-Val-Val-Gly-His-Leu-Leu-Urp-Phe-Arg-Lys-Thr-Ile-Glu-Val-Val-Thr-Asp-Val-Ser-Gln-Asn-Glu-Thr-Pro-Thr-Lys-Ser-Ile-Gln-Gly-Arg-Pro-Arg-Gln-Cys-Val-Ala-Gly-Pro-Thr-Gly-Pro-Met-Gly-Leu-Val-Thr-Pro-Ser-Leu-Thr-Pro-Lys-Leu-Arg-Ser-Val-Thr-Uyr-Ser-Ser-Phe-Ala-Ser-Ser-Asp-Val-Ser-Leu-Thr-Val-Pro-Leu-Arg-Ile-Glu-Gly-Ser-Cys-Thr-Leu-Val-Lys-Gln-Ser-Ser-Lys-Asp-Gly-Arg-Gly-Thr-Ser-Urp-Gly-Cys-Arg-Cys-Thr-Gly-Thr-Pro-Lys-Ala-Lys-Thr-Ser-Pro-Gly-Lys-Asn-Leu-Urp-Thr-Ser-Leu-Thr-Ser-Phe-Thr-Gly-Leu-Ser-Gln-Asp-Glu-Thr-Pro-Stp