Details

Due date

This assignment is due on Tuesday, May. 28th at 11:30PM.

What to submit

You will be submitting 4 files for this assignment, each of which we have provided starting templates for you in mp7_starter.zip:

  • MP7A.java (Part A)
  • DNA.java (Part B and C.2)
  • mRNA.java (Part C.1)
  • DNAClient.java(Part C.3)
  • collaboration.txt (Required collaboration component for Parts B-C if you collaborate with a partner)

You will also see the following files included in the starter code, which you do not need to modify:

  • DNA.py - Provided DNA class in Python you will port to DNA.java
  • CodonMapper.java (helper class for Part C.1)
  • pikachurin_house_mouse_seq.txt- example dna sequence for the "Pikachurin" protein, using the house mouse reference Genome from NCBI gene database (feel free to look for other protein sequences!).
  • pikachurin_house_mouse_polypeptide.txt - expected output polypeptide chain for Part C.

Testing Your Code

We encourage you to test your code iteratively, using a client program (with main to test calls to your Part B and C code, using the examples given.

You can test your Part B (DNA.java) using this PartBTests.java (just compile with javac and run with java as you would with another Java program; it will test your DNA.java Part B solutions and output the results). The examples in the spec are also copied into a client program (which is not a full-test suite) you can add to (DNAStudentTester.java).

For Part C, you can test your solutions (iteratively!) using this PartCStudentTester.java program. This program is also provided for you to use as a debugging tool, but provides the testing suite we'll be using when grading. You are encouraged to read through the comments if you run into an error, especially with translation examples.

This may look like a lot, but the overall amount of work is comparable to other MP's. Remember that with OOP, we often work with more than one class (and a client program), which is what you'll practice more with in Parts B-C.

MP 7: Optional Collaboration Component

In this Mini Project, we are allowing students to collaborate with one other student on Parts B-C only. As a porting exercise, this is a good opportunity to work through the exercises with another student, utilizing conversation, brainstorming, and peer-programming to piece things together. If you choose to collaborate with another student on Parts B-C, you must submit a filled collaboration.txt file with the details of your experience and distribution of work. Collaboration does not mean one student solves half or all of the assignment. Collaboration does mean spending at least 1-2 hours working through the code either in-person or asynchronously (Discord or Zoom VC), with the option to communicate via text/Discord/email clarifications in addition to the in-person/video collaboration. If you have any questions about this, don't hesitate to ask! Students who would be interested in collaborating with another student can utilize the #mp7-partner-search Discord channel, and you are encouraged to also find a partner in your Tuesday lab.

If you collaborate with another student, you should still both submit everything to CodePost individually (again, your MP7A.java should be your own), in addition to including both names in any Part B-C files you work on together, and a completed collaboration.txt, which both students should fill out separately. Note that if we find non-trivial matching solutions for students who have not submitted this file, Caltech Honor Code violations will be escalated to the BoC.

For Part A (no collaboration), you will be practicing basic Java expressions and writing your first Java method. This is a client program ran with a main method (similar to code ran in an if __name__ == '__main__' statement in Python). This will be slightly different structure than the DNA class (which does not have a main method thus is not executable) you will finish implementing in Part B.

Finish each TODO for Parts A.1.-A.4. in the given MP7A.java as specified. Some more details are provided in comments to help you get started and test your answers. For full credit, this finished program should not have any errors when compiled.

Part A: Intro to Java

Alright, let's dive into Java now! First, we'll warm up with some expression practice, identifying some differences and similarities between Python and Java. This assignment will introduce you to the Java language, connecting other problems you've already worked through in Python. We will provide relevant details and pitfalls in this spec and provided code comments, but make sure to also refer to any lecture material on Java as needed. You are not expected to use anything we have not shown you in lectures, readings, or this assignment (you may always ask if unsure).

You will want to make sure you have Java installed if you don't already. You can find instructions for setting it up here.

To get a feel for running your first Java program, we encourage you to first make sure you can run this program and get the following output:

3
foo

For Part A, you can either run Java:

  1. in VSCode (Play Icon > Run Java)
  2. in the VSCode bash terminal (where you usually run python3)

    $ javac MP7A.java
    $ java MP7A
    3
    foo
  3. replit.com

These are just suggestions available for students first learning Java. If you can get Java running quickly in VSCode using the provided MP7A.java, this is what we most strongly recommend. If you happen to have Java experience before CS 1, you can choose whatever editor you've used before. For reference, you'll usually use a Java-specific IDE like IntelliJ or Eclipse for larger Java projects. But these tend to be very feature-heavy, and we recommend using VSCode to edit a .java program and VSCode or bash to compile/run it. This is to help you just focus on running Java in an environment you're already familiar with. If you take CS 2, you'll likely learn how to use IntelliJ for Java programming.

Because Java isn't usually run with an interpreter like Python, you won't actually see anything printed unless you have an explicit System.out.println(<expression>); call in the main function (that is, after you fix any compiling errors). You can find an example in the main function in MP7A.java which you can use to test different expressions/function calls.

1. Expressions

[20]

Alright, with all that said, let's dive into Java!

Many of these expressions come directly from HW1 Part A! But the answers will not all be the same as in Python. Similar to HW1, you'll write your answers in Java comments unless otherwise specified. The provided MP7A.java program clearly organizes each exercise in Part A for you, and provides some testing code to refer to in the main method.

Complete the TODO in MP7A.java with your answers for each of the following expressions (in Java) taken from HW1.

  1. 8 - 5

  2. 6 * 2.5

  3. 51 / 2

  4. 51 / -2

  5. 51 % 2

  6. 51 % -2

  7. -51 % 2

  8. 51 / -2.0

  9. 1 + 4 * 5

  10. (1 + 4) * 5

2. String Expressions

[15]

Next, you'll write answers for each of the following String expressions in MP7A.java. For each of the following expressions, provide:

  1. The value of the assigned result variable (if no error). Indicate any String values with ".

  2. If the code results in an error, clearly identify what the error(s) are

  3. 1-2 sentences identifying the similarities or differences between the Python equivalent (ignoring type declarations like String or int). We want you to think about how Java and Python evaluate expressions differently (or similarly) when it comes to evaluating (not assigning) types and values.

Don't forget to review your HW1 work if needed! Two examples are given in the starter code to show what we expect for each answer.

1.

String result1 = "Lorem" + 'Ipsum';

2.

String result2 = "Lorem" + "Ipsum";

3.

String a = "Lorem"
String b = "Ipsum"
String a += b;

4.

String month = "April";
int days = 30;
String result = days + " days hath " + month;
System.out.println(result); // Make sure to test this one!

3. String Generation

[20]

Recall the string generation exercise from HW1 B.8:

Write a single Python expression which will generate a string of 70 '-' characters. The expression itself shouldn't be longer than 10 characters. Write the answer inside a Python comment.

Unfortunately, we don't get such nice "syntactic sugar" in Java. Write your answers for the following in the respective portion of MP7A.java.

  1. What does '-' * 70 evaluate to in Java? Is this what you expected?

  2. What does "-" * 70 evaluate to in Java? Is this what you expected?

Alright, so it doesn't work as we'd expect in Java... But it is a good loop exercise!

For Part A.3.c., you'll finish implementing a Java method called stringMultiplier(String s, int n) which generates a String of n occurrences of s.

This is the first Java method you'll be writing in this assignment. We have provided you the method stub and an example "javadoc" comment which is similar to a Python docstring. The differences are noted in the starter code. You do not need to change the javadoc and you should not change the method header, but you should replace the method body with your implementation to match the expected behavior. There are a few example calls to this method in the main method at the top of the program which you can use to test the result.

The call stringMultiplier("-", 2) should return the String "--". The call stringMultiplier("-", 70) should generate the same string as Python's '-' * 70 evaluation. Note that this method takes a String, so it should work with a String of any length.

Our solution is 5 lines in the method body, 3 of which define a for loop containing a single statement in the body.

Part B: Porting Classes in Java

Introduction

Throughout the remaining 2 Parts of this MP, you will finish Java classes to implement a full-featured (albeit simplified) DNA-to-mRNA-to-protein transcription program!

One of the best ways to learn a new language is through "porting" a program from a language you know to one you're learning. In Part B, you will take a provided Python class called DNA with few simple methods, and then port that into a Java class with the same name and methodology.

Throughout the exercises, you will refer to some provided code for examples, and some methods will need to be used in the methods you write.

Documentation Requirements

Part A's B.4.c exercise gave you practice writing a Javadoc comment for a method, given a template. We have given you some Javadoc completed in these exercises, but any that are missing you are expected to complete with valid Javadoc format (including placement above the method header, not within).

Provided Python Class: DNA.py

As you work through Part B, you will refer to a provided implementation in DNA.py, which defines a DNA class with the same functionality as the Java version you'll implement in DNA.java.

This is an object-oriented class that extends on some of the DNA functions you wrote in HW1.

A summary of the provided DNA class methods is provided below:

a. __init__

The __init__ constructor takes a DNA sequence string comprised of nucleotide bases 'A', 'T', 'C', and 'G' and creates a new DNA class with that sequence, converting any lower-case bases to upper-case. If given a string with any characters that are not a valid nucleotide base (ignoring letter-casing), the method raises a ValueError with the error message Invalid DNA sequence. Must only contain ATCG bases. Otherwise, the upper-cased sequence string is saved as a single class attribute called seq.

Some examples are provided below:

>>> from DNA import DNA
>>> dna_seq1 = DNA('ATCGatcg')
>>> dna_seq1.seq
'ATCGATCG'
>>> dna_seq2 = DNA('a')
>>> dna_seq2.seq
'A'
>>> invalid_dna_seq = DNA('catdog')
Traceback omitted
ValueError: Invalid DNA sequence. Must only contain ATCG bases.
>>> invalid_dna_seq # assignment unsuccessful with ValueError
Traceback omitted
NameError: name 'invalid_dna_seq is not defined
>>> dna_seq3 = DNA('AaAa')
>>> dna_seq3.seq
'AAAA'
>>> empty_dna = DNA('')
>>> empty_dna.seq
''

b. __str__ and __len__

These two methods provide the string representation of a DNA object and its length. Remember that __len__ is another special method (both __str__ and __len__ are commonly known as "dunder" or "magic" methods) that defines the length of an object when passed to len(...). A class defines a __len__ method with the same syntax as __str__ but returns an int instead of a str.

Some examples are provided below:

... dna_seq variables defined above
>>> str(dna_seq1)
'ATCGATCG'
>>> str(empty_dna)
''
>>> print(dna_seq1)
ATCGATCG
>>> print(empty_dna)

>>> len(dna_seq1)
8
>>> len(dna_seq2)
1
>>> len(dna_seq3)
4
>>> len(empty_dna)
0

c. complement

This method returns the complement sequence (as a str), which replaces all nucleotide bases with their complement base, maintaining original ordering.

>>> comp1 = dna_seq1.complement()
>>> comp1 # a str
'TAGCTAGC'
>>> comp2 = dna_seq2.complement()
# 'T'
>>> comp3 = dna_seq3.complement()
# 'TTTT'
>>> empty_comp = empty_dna.complement()
# ''
>>> dna_seq5 = DNA('ACCAGTGTAG')
>>> comp5 = dna_seq5.complement()
# 'TGGTCACATC'
>>> double_comp_seq = DNA(comp5)
>>> str(double_comp_seq)  # uses __str__()
# 'TGGTCACATC'
>>> double_comp = double_comp_seq.complement()
# 'ACCAGTGTAG', back to dna_seq5 sequence

d. count_occurrences

count_occurrences takes a base single-character string and returns the number of times that character occurs in self.seq (ignoring letter-casing). If the given base character is not a valid nucleotide base (ignoring letter-casing), the method raises a ValueError with the message Invalid base..

>>> a1_count = dna_seq1.count_occurrences('a')
# 2
>>> t1_count = dna_seq1.count_occurrences('T')
# 2
>>> a2_count = dna_seq2.count_occurrences('A')
# 1
>>> a3_count = dna_seq3.count_occurrences('A')
# 4
>>> empty_a_count = empty_seq.count_occurrences('A')
# 0
>>> d1_count = dna_seq1.count_occurrences('d')
Traceback omitted
ValueError: Invalid base.
>>> empty_d_count = empty_dna.count_occurrences('d')
Traceback omitted
ValueError: Invalid base.
>>> c5_count = dna_seq5.count_occurrences('C')
# 2
>>> g5_count = dna_seq5.count_occurrences('G')
# 3
>>> gc5_count = c5_count + g5_count
# 5
>>> gc_content = gc5_count / len(dna_seq5)
# 5 / 10 -> 0.5 (HW1!)

e. percentage_of

This method which takes a single-character base string and returns the percentage of that base contained in the DNA sequence (a float between 0.0 and 1.0). A ValueError with the same error message described in the previous problem is raised if given an invalid base. Observe that the count_occurrences method is used to reduce redundancy, and the error-handling is left for that method to handle (it is poor practice to handle it the same way twice). Observe that no try/except is used in this class, since its methods only supporting raising possible errors to a client who passes invalid arguments.

>>> t1_count = dna_seq1.count_occurrences('T')
# 2
>>> t1_percent = dna_seq1.percentage_of('T')
# 0.25
>>> t1_percent = dna_seq1.percentage_of('t')
# 0.25
>>> a2_count = dna_seq2.count_occurrences('A')
# 1
>>> a2_percent = dna_seq2.percentage_of('A')
# 1.0
>>> a3_count = dna_seq3.count_occurrences('A')
# 4
>>> a3_percent = dna_seq3.percentage_of('A')
# 1.0
>>> d1_percent = dna_seq1.percentage_of('d')
Traceback omitted
ValueError: Invalid base.
>>> empty_g_percent = empty_dna.percentage_of('G')
# 0.0
>>> empty_d_percent = empty_dna.percent_of('d')
Traceback omitted
ValueError: Invalid base.

Java Class: DNA.java

[50]

Now, you'll get practice writing a simple Java class called DNA with the same functionality as the Python class introduced above, but using the Java programming language. As you'll see, Java is much stricter than Python, and it may seem unnecessarily picky/verbose. But there is a reason that it is such a popular language, especially when speed, correctness, and security are critical. In these ways, Java can be a much better choice than Python.

String vs. char

In this porting exercise, you'll see references to characters and strings, which won't seem too confusing at first until you run into errors using one instead of the other (don't worry, El ran into a fair share of these after programming in Python for some time; this is good practice to differentiate between programming concepts and language-specific rules :)). In Python, we have just been referring to a general character as a single-character string (e.g. 'A' and 'AGT' are both type str). There is no "character" type in Python.

In Java, there is a character type called char that is used whenever using a single-character in (required) ' quotes.

Java's String type represents a sequence of characters (strictly in " quotes, not '). We'll do our best to refer to general characters to as we have been throughout our journey in Python, and will clearly state when a type must be char instead of String. You are welcome to ask on Discord if you have any questions interpreting between the way types are described in both languages.

Java Programs vs. Java Classes

You'll also note a different in the structure of an executable Java program (defined with a main method) and a non-executable Java class (not to be confused with .class files, which are a separate topic). In Part A, you finished implementing a Java program that could be ran with java MP7A (or the run button in VSCode). You won't actually be able to run DNA.java (or Part C's mRNA.java) the same way, because it defines a class that only other executable client programs can use when their main method is ran.

Other than syntax, the difference between a client program and a program defining classes is not something new. For example, in MP4, mp4_main.py was a client program you ran which started a UI prompt interface from the if __name__ == '__main__' statement. The program imported mp4_pokemon.py which defined functions (and could define classes) used in mp4_main.py (mp4_pokemon.py did not have an if __name__ == '__main__' block since it was just used to provide function definitions, not have any runtime behavior).

For now, you can think of MP7A.java as an analogous client program (though it doesn't import any classes you've written) and DNA.java as program that defines a single class that can be imported in a separate (executable) client program.

In a Java program that defines a class like DNA, we refer to class methods instead of functions, just like we refer to methods in DNA.py and functions in Python programs like lab6c.py. Inside of a Java class, we can refer to state using this. notation instead of self. notation, and Java methods do not take an argument referring to this/self; they just take any arguments needed for the method (possibly none).

In short, the biggest differences you'll see between MP7A.java and DNA.java are:

  • public static void main only in MP7A.java, invoked when the program is ran.
  • static before any function names in MP7A.java, omitted completely in DNA.java. This essentially is referring to a function that does not refer to state (which we often do refer to within class methods).
  • A single class field (analogous to a Python class attribute accessed with self) declared private in DNA.java that can be referred to in any method with this.seq. private just means that the field cannot be accessed outside of any DNA method definition, which is an advantage in Java compared to class attributes in Python.

The DNA Java class should include all of the following methods, analogous to each of the methods in DNA.py. We have provided the baseComplement method for you, as a private helper method (private indicates that it should only be used by methods in the DNA class, as opposed to the public methods you write that clients have access to).

For Parts B-C, you are expected to write/run your Java code with VSCode (and/or the terminal) instead of any online REPL/compiler to 1) not lose unsaved work and 2) practice compiling and running a Java program locally.

a. DNA Constructor

[15]

To get you started with your first Java class, we have provided the empty constructor (and other method stubs) for you. Complete the DNA constructor to have behavior analogous to the constructor in DNA.py. The implementation will look different (Java-specific syntax, and differences in constructor syntax). but should still fundamentally create the same state for a DNA object.

Remember that whereas Python class constructors are defined with __init__(self, ...), Java class constructors are defined with the class name followed by any arguments (e.g. public DNA(String seq)). Remember that Java does not have a self argument for its methods. Whenever you want to refer to a Java object's state (fields or methods), you can refer to them with this.name syntax (replacing name with a defined field or method name in the class).

The DNA constructor in DNA.java should take a DNA sequence String comprised of nucleotide bases 'A', 'T', 'G', and 'T'. Just like the constructor in DNA.py, this constructor should also support lower-cased base characters, converting any such bases to an upper-cased base character (hint: use the s.toUpperCase() method supported for a String s).

If given a string with any characters that are not a valid nucleotide base (ignoring letter-casing), throw a new IllegalArgumentException with the error message Invalid DNA sequence. Must only contain ATCG bases. Observe the differences between the raise ValueError(...) and throw new InvalidArgumentException(...) statements in Python vs. Java. Both raise an exception that a client program is expected to handle, but Java uses the throw new <ExceptionType>(<args>) instead of Python's raise <ExceptionType>(<args>). This is all you really need to know for CS 1, but you should refer to the provided baseComplement method for an example of throwing this exception (your constructor's error message should match the one described here though). Just like in DNA.py, you should not use any try/catch in DNA.java (or mRNA.java). Any Javadoc comments should specify the raised ("thrown") error with the @throws annotation to indicate to the client that they are expected to handle any thrown error in their program.

In the Python implementation, you'll see a loop to check each character; in Java, you'll need to use char ch = s.charAt(index) to get a char in some String s at the passed index (an int). We use ch1 != ch2 in Java to test whether two chars are equal, which we cannot do the same way with the String type in Java. We have provided an example loop you can reference below:

int spaces = 0;
String hello = "Hello CS1 Students!";
for (int i = 0; i < hello.length(); i++) {
    char ch = hello.charAt(i);
    if (ch == ' ') { // remember to use ' quotes for chars in Java!
        spaces += 1; // spaces++ also does the same thing in Java
    }
}
System.out.println("There were " + spaces + " spaces found.");

After checking for a valid String argument, the single seq field should be set to the upper-cased sequence String. Remember that this.val = new_val; is the syntax in Java to update some field val defined in the Java class to a given value (e.g. this.seq = seq;).

For reference, our solution is about 8-10 non-trivial lines (ignoring comments/blank lines).

Some examples of the expected behavior for your constructor are provided below, using a DNAStudentTester.java program, which is simply a client program that has example calls for your methods (a different program than the DNAClient.java you'll finish to implement a full DNA to protein transcription) or the PartBTester.java which is a completed testing suite you'll test your solutions with after implementing them.

DNA dnaSeq1 = new DNA("ATCGatcg");  // No error
System.out.println(dnaSeq1.seq);    // Compiler error in Java
// A compiler error will occur in a DNATester.java program
// since seq is a privately-declared field
// (different than Python self attributes)
DNATester.java:line_num: error: seq has private access in DNA
        System.out.println(dnaSeq1.seq);
                               ^
DNA dnaSeq2 = new DNA("a");           // No error
DNA dnaSeq3 = new DNA("AaAa");        // No error
DNA emptySeq = new DNA("");           // No error
DNA dnaSeq4 = new DNA("ACCAGTGTAG");  // No error
DNA invalidSeq = new DNA("catdog");   // Runtime error

// A runtime error will occur in a DNATester.java program
Exception in thread "main" java.lang.IllegalArgumentException: Invalid DNA sequence. Must only contain ATCG bases.
        at DNA.<init>(DNA.java:line_number)
        at DNATester.main(DNATester.java:line_number)

b. toString and size

[5]

Next, you'll implement the two methods for the string representation of a DNA object and its length. Similar to how __str__ and __len__ are special Python methods to define the default string representation and length of the object, toString and size are analogous in Java.

Both methods should have the same behavior as __str__ and __len__ in DNA.py and neither takes any arguments. Each method body is also a 1-line solution.

String seq1 = dnaSeq1.toString();
// "ATCGATCG"
String emptySeqStr = emptySeq.toString();
// ""
System.out.println(dnaSeq1);
// ATCGATCG
System.out.println(emptySeq);
//
int seq1Len = dnaSeq1.size(); 
// 8
int seq2Len = dnaSeq2.size(); 
// 1
int seq3Len = dnaSeq3.size();
// 4
int seq4Len = emptySeq.size();
// 0

c. complement

[10]

Next, you'll implement the Java equivalent of the complement method defined in DNA.py. This method should have the same behavior, just in an analogous Java method.

You'll note that the DNA.py class could use a nucleotide basepair dictionary to more cleanly implement base_complement, but unfortunately you won't find this to be as easy in Java (and for the purposes of an exercise in clean porting, we've implemented it without a dictionary). You are welcome to ask El if you'd like to learn about dictionary equivalents in Java, but we encourage you to focus on the requirements of this assignment first :). Use the provided baseComplement method to build the complement String.

For reference, our solution is 5-6 non-trivial lines in the method body.

// DNA sequence variables defined above
String comp1 = dnaSeq1.complement();
// "TAGCTAGC"
String comp2 = dnaSeq2.complement();
// "T"
String comp4 = dnaSeq3.complement();
// "TTTT"
String emptyComp = emptySeq.complement();
// ""
DNA dnaSeq4 = new DNA("ACCAGTGTAG");
String comp4 = dnaSeq4.complement();
// "TGGTCACATC"
DNA doubleCompSeq = new DNA(comp4);
System.out.println(doubleCompSeq);  // uses toString()
// TGGTCACATC
String doubleComp = doubleCompSeq.complement();
// "ACCAGTGTAG", back to dnaSeq4

d. countOccurrences

[10]

Next, you'll write the Java equivalent to the count_occurrences method in DNA.py. Using camelCasing conventions in Java, write a method countOccurrences which takes a base as a char (not a String) and returns the number of times that character occurs in this.seq (ignoring letter-casing).

If the given base character is not a valid nucleotide base (ignoring letter-casing), the method should throw an IllegalArgumentException with the message Invalid base. Don't forget to refer to the provided exception-handling in the provided baseComplement for reference.

// DNA sequence variables defined above
int a1Count = dnaSeq1.countOccurrences('a');
// 2
int t1Count = dnaSeq1.countOccurrences('T');
// 2
int a1Count = dnaSeq2.countOccurrences('A');
// 1
int a3Count = dnaSeq3.countOccurrences('A');
// 4
int a4Count = emptySeq.countOccurrences('A');
// 0
int d1Count = dnaSeq1.countOccurrences('d');
// Exception java.lang.IllegalArgumentException: Invalid base.
// Traceback omitted
int c4Count = dnaSeq4.countOccurrences('C');
// 2
int g4Count = dnaSeq4.countOccurrences('G');
// 3
int gc4Count = c4Count + g4Count;
// 5
double gcContent = gc4Count * 1.0 / dnaSeq4.size();
// 5 / 10 -> 0.5 (HW1!)

e. percentageOf

[10]

Finally, you'll write a method percentageOf which takes a single-character base char (not a String) and returns the percentage (as a double, which is like a float with higher precision, and more familiar syntax for our purposes) of that base contained in the DNA sequence. An IllegalArgumentException with the same error message described in the previous problem should be thrown if given an invalid base. Otherwise, if the DNA has an empty sequence, the method should return 0.0 as the percentage, similar to DNA.py's implementation.

Use your countOccurrences method to reduce redundancy (you can use this.methodName to call another method in a Java class instead of Python's self.method_name syntax). Note that countOccurrences should handle the validation for you, so make sure you avoid redundancy! Your Javadoc comment should still include the @throws since percentageOf calls countOccurrences (but the client reading the documentation shouldn't that, they should just know about the exception that could be raised).

For reference, our solution is about 4 non-trivial lines in the method body.

Some example calls are provided below:

int t1Count = dnaSeq1.countOccurrences('T');
// 2
double t1Percent = dnaSeq1.percentageOf('T');
// can't reassign types for an already-defined variable
t1Percent = dnaSeq1.percentageOf('t');
// 0.25
int a2Count = dnaSeq2.countOccurrences('A');
// 1
double a2Percent = dnaSeq2.percentageOf('A');
// 1.0
int a3Count = dnaSeq3.countOccurrences('A');
// 4
double a3Percent = dnaSeq3.percentageOf('A');
// 1.0
int emptyACount = emptySeq.countOccurrences('A');
// 0
double emptyAPercent = emptySeq.percentageOf('A');
// 0.0
double d1Percent = dnaSeq1.percentageOf('d');
// Exception java.lang.IllegalArgumentException: Invalid base.
// Traceback omitted
double d2Percent = emptySeq.percentageOf('d');
// Exception java.lang.IllegalArgumentException: Invalid base.
// Traceback omitted

Part C: DNA -> mRNA > Protein Transcription

Overview

Now that you have a DNA class implemented, you'll finish a second class mRNA which supports a sequence of mRNA nucleotides.

You will implement the following methods in the mRNA class:

  • mRNA(seq): Constructor
  • toString() and size()
  • toPolypeptide(): Method to convert an mRNA sequence to a String polypeptide chain

Just like DNA.java, mRNA.java will define a single class with a constructor and methods, but will not have the main method you'll see in client programs like DNAClient.java (that program will use your DNA and mRNA classes).

C.1: mRNA.java

We have provided minimal starter code for everything except the toPolypeptide method; you are expected to take what you've learned, use DNA.java as a reference, and implement this class to meet the specified behavior, as well as finish the documentation for the class using valid Javadoc.

C.1.1. mRNA Constructor

[5]

The mRNA constructor has behavior analogous to the constructor in DNA.java, but saves a sequence of RNA nucleotides instead (the field should still be called seq, and no other fields should be defined); you will need to declare the private field above the constructor, just as you see in DNA.java. Similar to DNA.java, an IllegalArgumentException should be thrown if given an invalid sequence (remember that mRNA is comprised of 'AUCG' not 'ATCG'). This solution should have the same number of lines as your DNA constructor.

C.1.2. toString and size

[5]

These methods should behave similarly to the DNA analogs, only for mRNA. When documenting, make sure your Javadoc is updated accordingly (no reference to DNA should be in mRNA documentation).

C.1.3. toPolypeptide

[15]

In this method, you will implement the final step of DNA to polypeptide translation, which translates an mRNA sequence to a polypeptide chain. Since this is one of the longer methods in this assignment, we have included a series of TODOs in the starter code. If you've been reviewing the materials, working through this method shouldn't take more than 10-15 minutes. As always, make sure to remove the TODO comments in your final submission.

Finish mRNA's toPolypeptide method to process the mRNA's sequence, codon-by-codon. A codon is a special 3-character sequence which maps to an amino acid. Note that since there are 4 bases in mRNA, there are 64 possible codons, 1 of which is a special "start" codon ("AUG", which codes for Met, or "Methionine"), and 3 of which are stop codons (these do not correspond to an amino acid).

To help solve this method, we've provided a CodonMapper class, which is defined in CodonMapper.java. We've started the method with a mapper constructed for you to work with. This class provides functionality to return an amino acid given a codon string. Any codon strings passed to its methods are converted to upper-case to support case-insensitivity. You should not change CodonMapper.java, but here is a summary of its methods:

  • CodonMapper mapper = new CodonMapper(): constructs a new CodonMapper called mapper
  • mapper.getAA(codon): given a 3-character codon String, returns the abbreviated amino acid name (e.g. 'Met' for 'Methionine'). Note that if one of the three stop codons are passed, the string "Stp" is returned.
  • mapper.isStopCodon(codon): a convenience method that returns true iff the passed codon is a stop codon.

You'll see an example use of this class when getting the first amino acid for the start codon (it is commented out, but you can uncomment it as you work through, and use it for reference when getting the other codon amino acids).

Note: Be careful with your indexing here; the result chain should ignore everything until the first start codon, and stop at either the first stop codon, or the last possible codon if none found. This means that your polypeptide will have window / 3 codons, where window is the length of the reading window, starting with the start codon and ending with the stop codon or last 3-character codon in the sequence. The rest of the sequence is ignored in the result. If no stop codon is found, the sequence should only contain the rest of the codons that are possible; so if there are 1-2 characters left after the last valid codon, those would be ignored.

If you've correctly implemented this method, you're ready to take an mRNA sequence and get the result chain of amino acids!

For reference, our solution is about 10-14 non-trivial lines (ignoring comments/blank lines) after the starter code.

C.2. Using Your mRNA to Finish DNA

[50]

Finally, you'll use your completed mRNA class to add two methods to DNA to perform transcription and translation. Luckily, you have most of the work already done!

C.2.1. transcribe

[5]

This method should return a new mRNA constructed with the translated DNA sequence. Remember that DNA is translated to mRNA by first getting the complement DNA strand, and then replacing all 'T' bases with 'U'.

For full credit, use your DNA's complement method, and the s.replace(oldCh, newCh) String method which returns a new String replacing all instances of oldChar with newCh in some String s (remember to use the correct quotes for char's in Java!).

The method should return the result translated mRNA without changing the state of the DNA object the method is called on. For reference, our solution is 2-3 lines in the method body.

C.2.2. translate

[5]

This method is fairly short, since we've done all of the transcription/translation work already!

Use the transcribe method you just wrote to construct a mRNA, and then get the result polypeptide chain (a String) using the mRNA's toPolypeptide method. This method should only be about 2 lines!

C.3. Finishing DNAClient.java

[25]

You're almost finished with a fully-functional, Object-Oriented, DNA transcription program!

The last exercise you'll do will be in a started client program, DNAClient.java. Remember that a client program is one that uses other classes to perform different tasks, and has a defined main method to execute when calling java <program>. Since DNA.java and mRNA.java simply define classes that we construct, they do not have any executable code (calling java DNA won't do anything, since it doesn't have a main method, nor should it).

This distinction is analogous to the course_student_client.py program we covered in lecture, which used Course and Student classes defined in the respective python files.

The method you'll finish in DNAClient.java is called getDNA, which takes a filename String as an argument, and uses basic file io in Java to read the contents of the file, line-by-line, and build a sequence string from each line, in order. That sequence string will then be passed to a new DNA that you should return.

This method provides users the ability to pass a filename that contains a DNA nucleotide sequence (e.g. pikachurin_house_mouse_seq.txt) and return a new DNA object created from the processed sequence string.

We have given you the starter code for this method, which opens a new File object and initializes a string seq you'll be building. Finish the TODOs to create a Scanner given the file object, and process the file line-by-line to build an accumulated sequence string, and return the result DNA object. Here is an example of a basic loop over a file in Python vs. Java, which you can refer to but the code should be different than yours.

f = open('some_file.txt')
line_count = 0
char_count = 0
for line in f:
    line_count += 1
    char_count += len(line)
f.close()
print(f'Lines: {line_count}, Chars: {char_count}')
f = new File("some_file.txt");
Scanner reader = new Scanner(f);
int lineCount = 0;
int charCount = 0;
while (reader.hasNextLine()) {
    line = reader.nextLine();
    lineCount++;
    charCount += line.length();
}
reader.close();
System.out.println('Lines: " + lineCount + ", Chars: " + charCount);

The two other methods you see, printTranslation and printTranscription, use your getDNA method to print the translation and transcription results for a DNA created from the provided pikachurin_house_mouse_seq.txt file.

Once you've finished getDNA, you can now see your work in action! The rest of the code in DNAClient.java calls your methods with a sample pikachurin_house_mouse_seq.txt DNA sequence file to print out the resulting amino acid sequence remember that only the window between the first start and any end codon are consider in translation).

To run your program using this dataset, simply compile your work and run with java. An example run and the expected output are provided below (note that the sequence file has 4134 bases and the result polypeptide sequence has 132 amino acids, including the start ('Met') and stop ('Stp')). The GCACCA...UUUUUU line is an abbreviation of the mRNA sequence translated from the DNA, and has 4134 characters (abbreviated below since the output is otherwise very long).

$ javac DNAClient.java
$ java DNAClient
Translated protein sequence for pikachurin_house_mouse_seq.txt:
DNA -> mRNA transcription results:
GCACCA...UUUUUU
DNA -> mRNA -> protein translation results:
Met-Ser-Lys-Val-Val-Gly-His-Leu-Leu-Urp-Phe-Arg-Lys-Thr-Ile-Glu-Val-Val-Thr-Asp-Val-Ser-Gln-Asn-Glu-Thr-Pro-Thr-Lys-Ser-Ile-Gln-Gly-Arg-Pro-Arg-Gln-Cys-Val-Ala-Gly-Pro-Thr-Gly-Pro-Met-Gly-Leu-Val-Thr-Pro-Ser-Leu-Thr-Pro-Lys-Leu-Arg-Ser-Val-Thr-Uyr-Ser-Ser-Phe-Ala-Ser-Ser-Asp-Val-Ser-Leu-Thr-Val-Pro-Leu-Arg-Ile-Glu-Gly-Ser-Cys-Thr-Leu-Val-Lys-Gln-Ser-Ser-Lys-Asp-Gly-Arg-Gly-Thr-Ser-Urp-Gly-Cys-Arg-Cys-Thr-Gly-Thr-Pro-Lys-Ala-Lys-Thr-Ser-Pro-Gly-Lys-Asn-Leu-Urp-Thr-Ser-Leu-Thr-Ser-Phe-Thr-Gly-Leu-Ser-Gln-Asp-Glu-Thr-Pro-Stp