CMPT 120, Spring 2020, Project Page 1 of 7
Instructor: Diana Cukierman

ASSIGNMENT: – PROJECT: Data Processing and recommendation of movies.
This is an up to 3 people team work (you can also work in a team of 2 people or individually if you
prefer so). While you may discuss generalities with colleagues from other teams about this exercise,
you cannot develop the same code, nor share code among teams, nor obtain code from other sources.
Being a team exercise, it places a big responsibility on each individual. You want to respect and be
honest with your partners and with yourself: DO YOUR SHARE, and BE KNOWLEADGEABLE OF THE
WHOLE ASSIGNMENT. Working on this assignment is also a way for you to prepare for the final exam.

Deadline: Sunday April 19, 11:59 pm.
A solution will be posted after the deadline.
Associated to this assignment/project the following is posted:

1. A problem solving suggestion guide (separate from the present document)
2. Two text (.txt) data files (to be read by your program, use them to debug your program)
3. Python code that you can use directly or adapt (just acknowledge in comments)
4. Sample runs to illustrate how the program needs to work
5. A recording with a demo of how the system is meant to work
1. FIRST: Read the Problem Solving Suggestions document. IT IS BRIEF AND HIGHLY
You are asked to implement a Python program which provides movies recommendations to the user.
The user will be able to obtain several movie recommendations. The recommendations will be based on
data files and the user preferences.
CMPT 120, Spring 2020, Project Page 2 of 7
Instructor: Diana Cukierman
The system will recommend a movie based on exact feature requests from the user. The program will
also generate a csv (comma-separated values) file with all the recommendations. For bonus points, the
system will calculate the number of features in common between one pair of movies (and some other
Notes about the data files:
1) Two text data files are provided. They have to be placed in the same folder as your program.
2) To expand testing, you may also create your own data files with the same name and format.
3) Basic code is provided which reads and writes text data files to and from lists of strings respectively.
You may use (copy/paste) this code or modify it. You should leave the original comments in the code to
acknowledge the source, and you should include additional comments if you revise the code.
4) The exercise will be marked using data files with the same names but possibly different content
SEE MORE DETAILS BELOW. The sample runs are also considered part of the problem description.
Make sure that your program produces the same results as in the sample runs when the program
reads data from the data files provided. The dialog, messages and results have to be similar and in the
same order as in the sample runs.
Planning stage:
• What is the main level structure of the whole program?
• Which functions would be useful to develop (and call from the main level)? What parameters
would be useful that those functions have? What should those functions return?
• Which major data structures will you use? List of strings? List of numeric variables? List of lists?
Several lists?
• Create a flowchart or pseudocode (at a high level) to give you a general sense.
• Come up with descriptive names for variables and functions. Perhaps distinguish in the naming
what is a string, what is a list, what is a function. Be clear which functions return values.
As the program works, add different improvements to your program, gradually (save copies of your
previous working versions!). The stages are recommendations, do as works best for you.
Stage I: Implement the system to provide only one movie recommendation per request. Do not validate
user data yet. Do not yet write any results to an output file. Use the already provided input data file to
make sure that the program works correctly and produces similar output as the sample run 1.
Stage II: Add basic user input validation (the codes need to be integers, for ex, 0 for Canadian movie, for
now trust that the user provides valid codes). Offer to the user to provide several recommendations
with the same requested features. See sample run 2.
CMPT 120, Spring 2020, Project Page 3 of 7
Instructor: Diana Cukierman
Stage III: Validate codes are numeric AND within the valid values. Write the recommended movies to a
file. Check the samplerun3 and also the recorded demo (the recording does not illustrate the validation)
Bonus: Provide the user the possibility of calculating the number of common features that two movies
have. The maximum number of common features is 7: 1 each for type, rating and origin, plus up to 4 (1
for each genre). Check the samplerun4
Data files are provided for (1) genres and for (2) movies with features, with the format as
described next.
1) Input to the program: IN_genres.txt
The genres file has one genre per ‘line’. One ‘line’ is a string, where the last character is ‘\n’.
Note that viewing the file in a text editor will not show the ‘\n’ but rather a new line.
The provided IN_genres.txt file is (as viewed in a text editor)
As a string this is: “action\ncomedy\ndrama\n … romance\n” (… stands for more genres)
The genres are to be understood as coded 0, 1, 2, 3… etc based on the order in this file.

2) Input to the program: IN_all_data.txt
• There is one ‘line’ per movie in the file. One ‘line’ is a string, where the last character is ‘\n’.
(Again most text editors will not show the ‘\n’ but rather include a new line)
• Each line in this file contains the movie name, then at least one space, and then the movie
features, coded with digits in a pre-established order. Features are separated by spaces, but the
genres are all together. A the end of one movie ‘line’ there is the ‘\n’ character.
For example the line for the movie Harry_Potter is
Harry_Potter 1 3456 0 1
It has 4 features: 1, 3456, 0 and 1. 3456 are the codes of 4 genres
CMPT 120, Spring 2020, Project Page 4 of 7
Instructor: Diana Cukierman
• One all data file is provided, with 8 movies with 4 features each movie, and within the genres,
form 1 to 4 genres. You can invent other files, with different movies, different number of
movies and different values for the features.
• The genres coding in the IN_all_data.txt file should be consistent with the IN_genres.txt file.
• Further assumptions about the file format:
1. Movies’ names do not include spaces (multi word names are connected with “_”)
2. The four features of a movie are separated by at least one space, and ordered exactly:
Type Genres (1 up to 4 different genres) rating origin
3. Features are coded with numbers (or an ‘X’ if there is no known value for genres)
• type: 0 – ‘TV series’, 1- ‘movie’
• genres are coded with numbers 0,1,2 … based on the IN-genres.txt file data. An X
represents an unknown genre. Any movie will have at least 1 genre (not X)
• rating: 0- ‘G’ , 1- ‘PG’, 2- ‘PG 13’, 3- ‘NC17’, 4- ‘R’
• origin: 0- ‘Canadian’, 1- ‘Foreign’
Example with 3 movies and their type, genres (4 different genres), rating, origin
Viewed in an editor:
Harry_Potter 1 3456 0 1
The_Matrix 1 145X 4 1
Black_mirror 0 45XX 3 1
Viewed as a string:
“Harry_Potter 1 3456 0 1\nThe_Matrix 1 145X 4 1\nBlack_mirror 0 45XX 3 1\n”
So, Harry Potter is a ‘movie’, with genres: ‘fantasy’, ‘fiction’, ‘mystery’ , ‘romance’, rating ‘G’, ‘foreign’
3) Output from the program: OUT_recommended.csv
The output file will be also have one ‘line’ per recommended movie, each line including the name and
ending in ‘\n’.
Example of output file without the request number, viewed in an editor
Viewed as a string:
CMPT 120, Spring 2020, Project Page 5 of 7
Instructor: Diana Cukierman
You are encouraged to use the interactive textbook as reference to see more details about reading
and writing data files, sections in Chapter 11 are recommended as a no-points reading assignment.
Still, Python code is provided to you for both, reading and writing data from/to files.
Python code (the function read_string_list_from_file(…) ) is provided to read data from a text file
and save the lines read in a list or strings. When this function is called (or invoked), the function will
return a list of strings, where each string contains the data associated to one line, in the same order that
the data is in the data file, and without the ‘\n’. You may develop a different function to read the data.
You may incorporate this function verbatim in your code, with comments. If you revise the function,
clarify how you revise it.
Following the previous example, with three movies only, when calling this function it will return:
[“Harry_Potter 1 3456 0 1”, “The_Matrix 1 145X 4 1”, “Black_mirror 0 45XX 3 1”]
As with any function, you can call the function providing the argument directly as in:
read_string_list_from_file(“IN_all_data.txt”) ,
OR saving the value in a variable
file_name = “IN_all_data.txt”
list_with_all_data_strings = read_string_list_from_file(file_name)
Python code (the function write_perstudent_to_file (…) ) is provided to write data to a file. Check the
requirements and assumptions in the function code and comments.
The information that your program asks from and shows to the user should be analogous to and in
the same order and detail as in the sample runs. Ask if in doubt.
6) REQUIREMENTS IN DETAIL (anything ‘required’ gets points)
Execution requirements
a) The program should execute
b) The program should have a dialog and options analogous to the one presented in the sample runs
c) The results obtained by your program with the data files provided should be the same as the results
shown in the sample runs.
d) Your program should work well with data files with other values as long as the formatting
conventions in the data files are respected
CMPT 120, Spring 2020, Project Page 6 of 7
Instructor: Diana Cukierman
Coding details and style requirements
Note: The provided functions do not count. For functions to count they need to be called (i.e. invoked)
e) Your program should have at least 3 “fruitful” or “productive” functions.
f) Your program should have at least 2 “void functions”
h) Your program should have at least 4 functions receiving parameters (and so that the parameters
are correctly used inside the function) (these functions may be productive or void)
i) Your program should have a reasonable main level which shows the general structure of the
program. The main level can be the program top level or it can be inside a “main” function, (and in
the latter case at the top level you just call the main function).
j) You may use some variables as global (i.e. defined at the top level and not passed as parameters).
The reason to have these variables as global would be that they are relevant at the main level and
may be frequently used by many functions. Yet, given the requirements above you will also have to
have local variables in your functions (including parameters) and return values from functions also.
k) Name your variables and functions appropriately
l) At the top of the program file include as comment the authors names and dates of the versions
m) Include comments, with general descriptions of functions, special situations being true at a certain
place in the program, etc. On the other hand, do not include redundant comments. For example
the statement “i = i + 1” does not need the comment “i is increased by 1”. Keep in mind that good
naming of variables and functions reduce the need of comments.
n) Include “Trace printing” as you debug your code. When you submit your solution you may comment
out some tracing prints. However, you need to leave in your program tracing print analogous to
o) Bonus points will be given if you do not break loops (with break or return statements), and rather
use while statements with one or more conditions.
Requirement –admin file (group.txt or individual.txt)
p) You need to submit an admin file (as in previous assignments). Name your admin file “group.txt”
when you are a group of 2 or 3 members. In this case you need to include the group members
names inside this file and clarify how you distributed tasks among the team members.
q) If your group has members from different sections your name your file groupD1+D2.txt
r) If you are working individually, name the file “individual.txt”. In this latter case the file may be
empty or it may include any comments about how you worked with this exercise.
s) It would be useful for you if you keep track of the time you spend on this exercise, including the
total hours dedicated, considering the total time by all the team members. If you submit this
information, include it in the admin file.
Requirement – Flowchart for top level
s) Submit a flowchart describing only the main/top level possibly referring to some of the global
variables. You do NOT need to do a flowchart for all the details!! It should fit in one page. You may
use flowgorithm or similar or draw the flowchart by hand and take a picture/capture the screen and
submit a jpg or png file.
CMPT 120, Spring 2020, Project Page 7 of 7
Instructor: Diana Cukierman
a. The code (Python file) of your final submission (allowing to see the Trace printing as indicated in the
sample runs)
b. Flowchart of your main level program
c. A sample run of your most polished version (copy/pasted from the execution window in a txt file)
d. The admin text file (group.txt or individual.text)
If you submit as a group, as before, all members should join a preexisting CANVAS group associated
to this assignment. One of you submits for all members. If members are from different sections (D1,
D2), you need to submit the admin file named groupD1+D2.txt Points will be subtracted if you do
not follow these indications.
Check email and Canvas announcements in case of additional clarifications
If you have questions join remote office hours, use the course Canvas Discussion forum for questions of
interest to the whole class. For individual consultations, join remote office hours, email the help course
list (reaching the TAs and instructor)
End of description of the final assignment/ project.