Details
Due date
This assignment is due on Friday, Apr. 26th, at 11:30 PM.
Coverage
This assignment covers the material up to Lecture 9, focusing on file io, dictionaries and tuples, and program decomposition.
What to hand in
You will be handing in the following completed files, all of which are started for you in mp3_starter.zip
:
analyzer.py
fixer.py
generator.py
Reminder: Documentation Requirements
Similar to MP 2, make sure you write a docstring for every function we ask you to write. It should describe what the function does, what the arguments represent, and what the return value represents, as we’ve discussed in class and in the readings (refer to Lecture 4 Slides and recent lecture code for some examples). Do not include implementation details. For very short or very simple functions you can just write a one-line description of what the function does; this is in line with the guidelines in PEP 257.
Note: We have provided you both docstrings in Part 1's analyzer.py
which you can leave as-is and reference for
examples of documentation for file-processing functions.
Supporting files
There are a number of supporting files you need to download for this assignment, which you can find in mp3_starter.zip. They include:
-
3 starter files for you to complete:
analyzer.py
fixer.py
generator.py
-
Example files to test with can be found in
sample_files.zip
(we have zipped them for you so you can unzip again if you accidentally change them) -
Test scripts are provided with instructions and a demo video walking you through getting them set up and running here.
NOTE: Just because a test script works does not guarantee that your code is perfect; the test script may not be comprehensive. You should always test your code by hand (interactively in the Python interpreter) to make sure it does what you expect. Test scripts are very helpful, but they are not a crutch.
Template files
We are supplying you with 3 template files to get started. This includes data you will need, implementations of some
functions you don’t have to write, as well as stubs for all functions that you
do have to write. You should fill in the functions by editing the docstrings
(replacing any """<TODO: docstring>"""
with an actual, good docstring) and removing
the TODO
/pass
lines and replacing them with the body of your function. Remove any TODOs and pass
statements in your final submissions.
The purpose of the starter code is to save you time by writing some of the less interesting code for you, especially UI logic (some of which is used in the tests). Part of this assignment is also to practice reading and understanding a full-featured program as you work through implementing each step (we recommend implementing functions in the order they are introduced in this spec).
Program Overview
In this Mini Project, you will finish implementing a program which provides several features to analyze, fix, and generate Python programs. This project is a great example of using Python to automate tasks, especially in file-processing. We encourage you to utilize the features in your future Python programs, and consider adding your own functions as you learn more programming skills through the course!
The provided mp3_pythonify.py
is the main interface for clients to provide options to analyze or fix existing programs, or to
generate program templates given function names and arguments/returns.
You do not need to change this file at all, and will not submit it (the three required programs you submit will be tested with the same
program and our own grading tests).
The three programs you will implement are introduced roughly in order of difficulty, and are summarized below.
analyzer.py
- A library of 2 functions to analyze Python programs (printing function information and file statistics).fixer.py
- A library of 1 function to "fix" a specified file such that any tab characters are replaced with spaces. This is inspired by automated tools likeautopep8
which allows programmers to auto-fix Python programs with specified PEP8 guidelines (for this project, you only need to implement the tabs-to-spaces feature, though there are other functions, likeenforce_snake_case
which you may find helpful to implement on your own).generator.py
- A library of a collection of functions (3 of which you will implement from scratch, 2 of which you will finish, and 3 of which are provided. This is the longest part of the assignment, but also the most rewarding! As you work through the exercises, you will be building a feature to generate a complete Python program template, with auto-populated functions and docstrings conforming to Python standards.
Part 1: Analyzer
Summary
This program provides functionality to analyze files and report statistics to users. The learning objectives covered in Part 1 include File IO (reading), loops, and string formatting. All output must match exactly as described.
1. print_fns
Finish the print_fns
function which
takes a filename string as an argument and prints out all of the function names and
arguments in order, (without the 'def '
, ':'
, or any trailing whitespace/new lines).
The first line output by the function (provided for you) should be:
Functions in <filename>:
Replacing <filename>
with the given filename string (without <
or >
).
Whenever working with files, it's best practice to handle cases when files are not found. We will learn more about how to do this with try
/except
later,
but for this assignment we have provided code for handling unknown file names for you using the os.path.exists
function.
Do not change any provided code for these cases, and do not use any other features of os
. These requirements apply to this entire assignment.
When processing the file's functions, you must only consider lines that are legal Python functions,
starting with 'def '
(you don't need to do any more syntax validation than this, and don't need to
enforce any file extensions). Hint: Don't forget about the string's startswith
method covered in Reading 5!
Each function header printed should be indented by four spaces (do not use '\t'
)
and should be printed in order they are defined.
For example, if a program math_fns.py
has the following content:
""" Provides a simple interface for a user to calculate math operations for inputs. Currently only supports average, but more function definitions will be supported in upcoming versions. Note to self (also a test case to ignore in print_fns!) The keyword def is used to define a function in Python! """ def average(x, y): """ docstring ... """ return x + y / 2 def print_intro(): """ docstring ... """ print('Welcome to the math function program!') def start(): """ docstring ... """ print_intro() # ... rest of function print(f'Average of {x} and {y}: {average(x, y)}') if __name__ == '__main__': start()
The resulting output when calling print_fns('math_fns.py')
should be
displayed as following (we have provided example code to test your output in the interpreter as well, but you should of course not include any of the >>>
lines):
Note: For all examples in this spec, anything bold in code examples represents user input to help guide your testing in the interpreter. Any code not in bold represents expected output.
>>> from analyzer import print_fns >>> print_fns('math_fns.py') Functions in math_fns.py: average(x, y) print_intro() start() >>>
Your solution should not have more than one loop, and should not use readlines()
or read()
.
For reference, our solution replaces the TODO with 8-12 lines ignoring comments (depending on whether you use a while
loop or a for
loop).
2. file_info
The second function you will write will provide users to retrieve statistics about any file (not just Python programs).
Finish the file_info
function
to take a single filename string argument and return statistics of the file
in a dictionary.
The returned dictionary should contain the line count, word
count, and character count using the keys 'lines'
, 'words'
, and
'characters'
, respectively (these key names must match for full credit).
Lines are defined to be sequences of characters ending in the '\n'
newline character, as we saw in
lecture. You can use the split
method on strings (which splits a str
by spaces by default and returns
the resulting list
of str
parts) to identify the number of words in a line. Don’t worry about
punctuation; "words" here just means a sequence of characters separated by
spaces. An example of the string's split
method is given below:
>>> ' Lorem (Ipsum) is a very good boi indeed.' >>> parts = sentence.split() >>> parts ['Lorem', '(Ipsum)', 'is', 'a', 'very', 'good', 'boi', 'indeed.'] >>>
The character count should include the newline characters at the end
of each line (in other words, do not strip
any whitespace in this function).
Don’t forget to close the file before your code returns (unless you use with
as discussed in Lectures 8/9).
Similar to Exercise 1.1, we have provided you the code to handle files that do not exist, and you solution should be written below.
Hint: Use counter variables to store the current values of the three items you’re computing.
This function actually does the same thing as the Linux wc
program, which
stands for "word count". You can test your solutions
in your terminal as well, using the first test as an example:
# Assume that the files exist (e.g. 'hamlet.txt' exists and has 7996 lines, 32006 words # and 197341 characters). $ wc 'hamlet.txt' 7996 32006 197341 hamlet.txt $ python3 >>> from analyzer import file_info >>> file_info('hamlet.txt') {'lines': 7996, 'words': 32006, 'characters': 197341} >>> file_info('babbage_tabbed.txt') {'lines': 154, 'words': 1159, 'characters': 7435} >>> file_info('math_fns.py') # note that this provided program has \t characters {'lines': 48, 'words': 173, 'characters': 1090} >>> file_info('two_tab_test.txt') # 2 \t characters on one line + 1 ending \n {'lines': 1, 'words': 0, 'characters': 3} >>> file_info('four_tab_test.txt') {'lines': 2, 'words': 7, 'characters': 45}
Your solution should not have more than one loop, and you may use the readline
function in this exercise (though you aren’t
required to; see lecture slides on different ways to process files line-by-line, such as with a for
loop over a file object).
Again, do not use the readlines
or read
functions in this assignment, because this may cause
excessive memory use if the file is very large.
For reference, our solution replaces the TODO with 9-12 lines ignoring comments (depending on whether you use a while
loop or a for
loop).
Part 2: Fixer (Mini PEP8 Enforcement)
1. tabs_to_spaces
As you'll learn, tabs are a very annoying culprit in codebases leading to
some serious (╯°□°)╯︵ ┻━┻
. Let's write a function to quickly swap out
those subtle nuances. ┬─┬ノ(^_ ^ノ)
Write a function tabs_to_spaces
which takes a
filename string and a number of tab-spaces and converts each tab
('\t'
) character in that file to the given number of spaces per tab.
The file with the given filename should not change, but the results should be
written to a new file in the format spaced_<filename>
(e.g. spaced_test.py
for test.py
)
which should be identical to the original file only replacing the tabs with spaces
as specified (hint: you can use the string's replace(old, new)
method to help,
but make sure you aren't making any assumptions about the number/location of tab characters in a string).
Your function should not return anything, but should print out the results as number of lines
found with tabs and the number of tab characters replaced in the new file. An example is given
below, testing the program on the provided math_fns.py
program which has 31
tab characters (if you opened that file in VSCode you may need to replace it with the one
you downloaded again if your editor automatically replaces tabs for any opened file).
Hint: You should expect this function to take the most time in the assignment,
so if you would find it helpful to jump to Part 3, you can. The Case Study El went
over is very similar to this problem, and a video is posted for this
review session which goes over strategies, pitfalls, and common questions for this similar problem.
If you find yourself getting stuck on this problem, we encourage you to review
the video/substitution.py
code!
>>> tabs_to_spaces('math_fns.py', 4) Lines with tabs: 29 Tabs replaced: 34 >>> tabs_to_spaces('two_tab_test.txt', 4) Lines with tabs: 1 Tabs replaced: 2 >>> tabs_to_spaces('four_tab_test.txt', 4) Lines with tabs: 2 Tabs replaced: 4
From the above code, you can see that the provided math_fns.py
file has 34 '\t'
characters
spanning 29 lines, and the function call should result in a copy called spaced_math_fns.py
with any '\t'
characters replaced with spaces as described above.
You can also test your results with the following bash commands as a sanity check (you of course
are not expected to know the magical world of grep
, but it essentially counts all
occurrences of a character in a file and then pipes the result to wc
to count the number
of lines that were changed :))
$ grep -o '\t' math_fns.py | wc -l 34 $ grep -o '\t' spaced_math_fns.py | wc -l 0 $ grep -o ' ' math_fns.py | wc -l 137 $ grep -o ' ' spaced_math_fns.py | wc -l 273 $ grep -o '\t' two_tab_test.txt | wc -l 2 $ grep -o ' ' two_tab_test.txt | wc -l 0 $ grep -o '\t' spaced_two_tab_test.txt | wc -l 0 $ grep -o ' ' spaced_two_tab_test.txt wc -l 8 $ grep -o '\t' four_tab_test.txt | wc -l 4 $ grep -o ' ' four_tab_test.txt | wc -l 3 $ grep -o '\t' spaced_four_tab_test.txt | wc -l 0 $ grep -o ' ' spaced_four_tab_test.txt wc -l 19
You can test changing space argument as well:
$ python3 >>> from fixer import tabs_to_spaces >>> tabs_to_spaces('math_fns.py', 2) Lines with tabs: 29 Tabs replaced: 34 >>> quit() $ grep -o ' ' spaced_math_fns.py | wc -l 205
Requirements: Your function should not loop through any line more than once, and
do not use read()
or readlines()
. Your function should have exactly one (un-nested) loop
to process the input file. You can use the writelines
method to write a list of strings
to a writable file object. For reference, our solution is 20 lines (not including comments or blank lines).
Hint: First, try to represent a space with '_'
(underscore) when using replace
since you won't be able to
necessarily see the differences between tabs and spaces in the output file. Remember that the tab-space number parameter
will determine how many space characters should replace a single '\t'
tab character.
Make sure to fix this to ' '
(space character) before your submission and try the above bash commands
if you'd like to do a final check.
Part 3: Program Generator
1. to_snake_case
Write a function named to_snake_case
that takes a string as an argument
and returns a new string converting the argument to Python snake_case
conventions (all lower-case, words separated with _
and numbers allowed, _
must be between characters not at end of strings). In particular, this function
should handle strings following camelCasing (multiple words capitalized after first word) and PascalCasing (all words capitalized)
such that any capital letter is replaced with _
followed by the lowercased version of the letter unless
the capital letter is the first or last letter in the string, in which case it should just be lowercased.
>>> s = to_snake_case('snek_case') >>> s 'snek_case' >>> to_snake_case('camelCase') 'camel_case' >>> to_snake_case('PascalCase') 'pascal_case' >>> to_snake_case('removeAll3') 'remove_all3' >>> to_snake_case('cAtTeRpIlLaRcAsE') 'c_at_te_rp_il_la_rc_ase' >>> to_snake_case('') ''
The string method isupper
may be useful to you here.
Type help(''.isupper)
at the Python prompt
to learn more about it, or go to the online Python documentation.
2. format_fn_header
Write a function named format_fn_header
which returns a Python function header string using a given
function name (which should be converted to snake_case) and a string list of
arg names. This is a list of strings, not a dictionary but when this
function is called, that list could be one
generated with args.keys()
(in that case, it's technically a dict_keys
object
but it works the same as a list in this case as long as you don't index it since dict_keys
don't support
index access (for a good reason)), where
args
is the dictionary passed to build_fn_str
(we'll use this function in build_fn_str
like so).
You can assume the list of argument name strings are in snake_case already.
Note: While you see
a use of keys()
here for the purpose of an example, the only use of keys()
should be in the build_fn_str
function that will use format_fn_header
when passing
the argument names from its args
dictionary as a list structure.
Do not use any dictionaries in the format_fn_header
function.
Don't forget about the string's join
method! s.join(lst)
will return a string joining all elements in lst
separated by s
(what is each string in the argument list separated by in the expected result string?).
This works for any list, including a dict_list
.
>>> from generator import format_fn_header >>> fn_name = 'dice' >>> args = {'n': ('int', 'Number of dice to roll (>= 1)'), ... 'm': ('int', 'Number of sides per dice (>= 1)')} >>> list(args.keys()) # This is included to show the list, but args.keys() will also work ['n', 'm'] >>> fn_header = format_fn_header(fn_name, args.keys()) >>> fn_header 'def dice(n, m):' >>> fn_header = format_fn_header(fn_name, ['n', 'm']) # this would also work >>> fn_header 'def dice(n, m):' >>> fn_name = 'factorial' >>> args = {'n': ('int', '')} >>> fn_header = format_fn_header(fn_name, args.keys()) >>> fn_header 'def factorial(n):' >>> fn_name = 'sayHello' >>> args = {} >>> fn_header = format_fn_header(fn_name, args.keys()) >>> fn_header 'def say_hello():'
3. generate_args_data
These last few functions will be related to user input and dictionaries to
complete a provided utility function (fn_generator
) that generates Python function stubs (with docstrings!)
for us (and yes, you can use that function on your assignments/this midterm!). This problem and generate_return_data
are started for you,
with a few TODOs to fill in to practice reading and understanding given code to collect data with tuples and dictionaries.
The more tedious input-handling is provided for you, but we expect you to understand the logic to complete the TODOs.
generate_args_data
takes no arguments
but prompts a user for as many arguments as they'd like to specify.
Your task in this exercise is to finish the 3 TODOs to finish the returning of a dict
with each argument as a key mapping to a 2-string tuple
containing (arg type, description).
Each argument will be assigned a desired name, type, and description (optional) using the prompts as follows:
Add an argument name (<return> for none): argname What is the expected type of `argname`? argtype Description of `argname`: description
The given argument name should be converted to snake_case as soon as it is entered (and this snake_cased
version should be the name entered for the argument in the result dictionary). Remember that if
a string is in snake_case already, your to_snake_case
shouldn't return a different
that differs from that passed string.
If no type is given in the second line, 'unspecified'
should be assigned as argument's type string.
If no description is given, ''
should be assigned for that argument's description.
A blank line should be printed after each "Description of ..." prompt.
A user can specify as many arguments as they want, stopping when they press the <return> ("Enter") key
on the prompt for an argument name.
If no arguments are added in the prompt, the function should return {}
.
>>> args = generate_args_data() Add an argument name (<return> for none): dnaString What is the expected type of `dna_string`? str Description of `dna_string`: DNA sequence Add an argument name (<return> for none): base What is the expected type of `base`? str Description of `base`: Single-character nucleotide base Add an argument name (<return> for none): >>> args {'dna_string': ('str', 'DNA sequence'), 'base': ('str', 'Single-character nucleotide base') >>> args = generate_args_data() Add an argument name (<return> for none): n What is the expected type of `n`? Description of `n`: Add an argument name (<return> for none): >>> args {'n': ('unspecified', '')} >>> args = generate_args_data() Add an argument name (<return> for none): >>> args >>> {}
Do not change any existing code, and remove the three # TODO
comments in your final solution. Each TODO task is exactly one line (the same line it is located in the function).
4. generate_return_data
Next, you'll finish a function generate_return_data
to get the type and description of a function's return.
You only need to replace the single TODO to return a tuple as described, but we have provided a summary of this function here for reference.
This function takes no arguments, but prompts
a user for a type and a description for a return value. If no type is given,
the type is assigned the string 'unspecified'
similar to generate_args_data
(the description is also optional, ''
if not provided).
A blank line should be printed after each "Description of ..."
prompt.
The function should return a tuple containing the inputted type string and description (in this order).
>>> from generator import generate_return_data >>> ret_data = generate_return_data() What is the expected type of the return? float Description of return: Percentage of `base` in `dna_string` (0.0 to 100.0). >>> ret_data ('float', 'Percentage of `base` in `dna_string` (0.0 to 100.0).') >>> ret_data = generate_return_data() What is the expected type of the return? float Description of return: >>> ret_data ('float', '') >>> ret_data = generate_return_data() What is the expected type of the return? Description of return: >>> ret_data ('unspecified', '')
You may note some redundancy between prompting for arguments and returns in Exercise 3.3 and 3.4. You may add a helper function to reduce redundancy (just make sure it is correct when called from both functions).
Given: build_fn_str
and generate_fn
We're almost there! The last function you will implement in this project is called save_program
, which takes a program template string and saves
to a new file. The three given functions in generator.py
are implemented for you, as they factor out a lot of the tedious UI management to prompt the user for program information using the functions you've implemented so far. We've provided a brief summary of the function-generation functions here that you can test in your Python interpreter to make everything is ready to go at this point before you move on to the final save_program
function.
These functions should not be changed, and are documented for you. They require that you have correctly implemented Exercises 3.1-3.4 at this point.
generate_fn
takes a function name string as an argument (which
is converted to snake casing using your to_snake_case
function), and then prompts the
user for arguments and any return for that function (using your generate_args_data
and generate_return_data
, respectively).
After collecting this data from the user, it generates the function stub using the generate_fn_stub
function we've provided. Note that we could have
merged both of these functions into one, but factoring out the function-stub-string-generation demonstrates good practice in program decomposition.
If all of your functions above are correctly implemented, the following example runthroughs should produce the output shown below:
>>> fn_name = 'dice' >>> args = {'n': ('int', 'Number of dice to roll (>= 1)'), ... 'm': ('int', 'Number of sides per dice (>= 1)')} >>> ret_data = ('int', 'Sum of `n` randomly-rolled `m`-sided dice.') >>> fn_desc = 'Simulates `n` randomly-rolled `m`-sided dice, returning the sum of all rolls.' >>> fn_str = build_fn_str(fn_name, args, ret_data, fn_desc) >>> print(fn_str) def dice(n, m): """ Simulates `n` randomly-rolled `m`-sided dice, returning the sum of all rolls. Arguments: `n` (int) - Number of dice to roll (>= 1) `m` (int) - Number of sides per dice (>= 1) Returns: (int) - Sum of `n` randomly-rolled `m`-sided dice. """ pass >>> fn_name = 'FooBAR' >>> args = {'c_at_er_pi_ll_ar': ('unspecified', '')} >>> ret_data = ('dict', '') >>> fn_desc = 'Who knows...' >>> fn_str = build_fn_str(fn_name, args, ret_data, fn_desc) >>> print(fn_str) def foo_b_ar(c_at_er_pi_ll_ar): """ Who knows... Arguments: `c_at_er_pi_ll_ar` (unspecified) Returns: (dict) """ pass >>> body = generate_fn('dice') # This function uses fn_str_builder, but manages user input to collect data Add an argument name (<return> for none): n What is the expected type of `n`? int Description of `n`: Number of dice to roll (>= 1) Add an argument name (<return> for none): m What is the expected type of `m`? int Description of `m`: Number of sides per dice (>= 1) Add an argument name (<return> none): Does your function have a return? (y for yes) Y What is the expected type of the return? int Description of return: Sum of `n` randomly rolled `m`-sided dice. Finally, provide a description of your function: Simulates `n` randomly-rolled `m`-sided dice, returning the sum of all rolls. Finished generating a function stub for dice! >>> print(body) def dice(n, m): """ Simulates `n` randomly-rolled `m`-sided dice, returning the sum of all rolls. Arguments: `n` (int) - Number of dice to roll (>= 1) `m` (int) - Number of sides per dice (>= 1) Returns: (int) - Sum of `n` randomly rolled `m`-sided dice. """ pass
5. save_program
Finally, you will implement the final function to save your generated programs! Note that the final block of code in
the provided generate_program
function calls save_program
with the generated
program body string (everything is ready to go!).
Finish the save_program
function in generator.py
, which takes a single string argument representing the body of
the program, with each line separated by '\n'
(assuming all functions are working as expected at this point). This function should prompt the user
for a file name to save to, reprompting until they give a non-empty file name.
Note: While we will not require you to handle this, it is not uncommon for new programmers to accidentally overwrite a file. To help prevent this without adding more tasks in this assignment, we have provided you a check_rewrite(filename)
function which you may use to confirm with a user that they want to overwrite an existing file. To use it, update your break
condition (which should at minimum be on a condition for a non-empty filename) to break
only when passing the non-empty filename string to check_rewrite
returns True
(when the file doesn't yet exist or when the user has confirmed they want to overwrite it).
Once a valid filename is found, save the body to a new file
(hint: use f.writelines
)
and print a success message in the same format as shown in the examples below. Note: We have added an example of confirmation if you choose to implement this feature.
For reference, our solution is 9 lines, most of which is the re-prompting logic for empty filenames.
>>> from generator import save_program >>> # This is a test "body" string >>> body = '"""TODO: File header"""\n\ndef foo():\n pass\n\n' >>> print(body) """TODO: File header""" def foo(): pass >>> save_program(body) What is the name of your file? foo_test.py Successfully wrote to foo_test.py! >>> save_program(body) What is the name of your file? Please provide a non-empty file name, e.g. test.py What is the name of your file? test.py Successfully wrote to test.py! >>> # Optional handling of existing files >>> save_program(body) What is the name of your file? foo_test.py A file is already found with that name. Are you sure you want to overwrite? n What is the name of your file? foo_test.py A file is already found with that name. Are you sure you want to overwrite? Y Successfully wrote to foo_test.py!
Running it all with generate_program
(Given)
Finally, you'll be able to reap the rewards of your hard work! The provided generate_program
at the end of
the generator.py
program you've been working on initiates the full UI for generating a program
given user prompt, optionally saving the results to a new file using your save_program
function.
If all of your functions above are correctly implemented, the following example should create a new file as shown:
>>> from generator import * >>> generate_program() First, you'll input some function stub data, then have the option to save the results to a file! Please input a new function name (<return> to quit): dice Add an argument name (<return> for none): n What is the expected type of `n`? int Description of `n`: Number of dice to roll (>= 1) Add an argument name (<return> for none): m What is the expected type of `m`? int Description of `m`: Number of sides per dice (>= 1) Add an argument name (<return> for none): Does your function have a return (y for yes)? Y What is the expected type of the return? int Description of return: Sum of `n` randomly rolled `m`-sided dice. Finally, provide a description of your function: Simulates `n` randomly-rolled `m`-sided dice, returning the sum of all rolls. Finished generating a function stub for dice! Please input a new function name (<return> to quit): Finished generating your program template! Do you want to save your program to a file (y for yes): y What is the name of your file? dice_example.py Successfully wrote to dice_example.py!
If you look at the contents of the new dice_example.py
file created by your save_program
function, you should see the following contents:
""" TODO: Author, description """ def dice(n, m): """ Simulates `n` randomly-rolled `m`-sided dice, returning the sum of all rolls. Arguments: `n` (int) - Number of dice to roll (>= 1) `m` (int) - Number of sides per dice (>= 1) Returns: (int) - Sum of `n` randomly-rolled `m`-sided dice. """ pass