Python辅导 | FIT3155 S2/2019: Assignment 3

FIT3155 S2/2019: Assignment 3  
Due midnight 11:59pm on Sunday 27 October 2019)  
(
[
Weight: 20 = 15 + 5 marks.]  
Your assignment will be marked on the performance/efficiency of your program. You must  
write all the code yourself, and should not use any external library routines, except those  
that are considered standard. The usual input/output and other unavoidable routines are  
exempted.  
Follow these procedures while submitting this assignment:  
The assignment should be submitted online via moodle strictly as follows:  
All your scripts MUST contain your name and student ID.  
Use gzip or Winzip to bundle your work into an archive which uses your student ID  
as the file name. (STRICTLY AVOID UPLOADING .rar ARCHIVES!)  
Your archive should extract to a directory which is your student ID.  
This directory should contain a subdirectory for each of the two questions, named  
as q1/ and q2/.  
Your corresponding scripts and work should be tucked within those subdirectories.  
Submit your zipped file electronically via Moodle.  
Academic integrity, plagiarism and collusion  
Monash University is committed to upholding high standards of honesty and academic  
integrity. As a Monash student your responsibilities include developing the knowl-  
edge and skills to avoid plagiarism and collusion. Read carefully the material available  
at https://www.monash.edu/students/academic/policies/academic-integrity to understand  
your responsibilities. As per FIT policy, all submissions will be scanned via MOSS.  
Assignment Questions  
1
. Write an encoder and decoder that implements Lempel-Ziv-Storer-Szymanski (LZSS)  
variation of LZ77 algorithm with the following specifications.  
Strictly follow the specification below to address this question:  
1

ENCODER SPEC:  
[10 marks]  
Program name: lzss encoder.py  
Arguments to your program: (a) An input ASCII text file.  
(
b) Search window size (integer) W  
(
c) Lookahead buffer size (integer) L  
Command line usage of your script:  
lzss encoder.py <input text file> <W> <L>  
Output file name: output lzss encoder.bin  
Output format: The output is a binary stream of bits that losslessly en-  
codes the input text file over two parts: (i) the header part, and (ii) the  
data part. The information encoded in each of these two parts is given  
below:  
Information encoded in the header part:  
Encode using variable-length Elias integer code (see slides 24-30 in lecture  
), the number of unique ASCII characters in the input text.  
For each unique character in the text:  
9
Encode using fixed-length 8-bit ASCII code the unique character.  
Encode using variable-length Elias code the length of the Huffman  
code assigned to that unique character.  
Concatenate the variable-length Huffman codeword assigned to that  
unique character.  
Information encoded in the data part:  
Encode using variable-length Elias code the total number of Format-0/1  
fields (see slide 38 in lecture 9 slides) required to encode the input text.  
Successively encode information in each Format-0/1 field as:  
For Format-0: h0-bit, offset, lengthi, where offset and length are  
each encoded using the variable-length Elias code.  
For Format-1: h1-bit, characteri, where character is encoded using  
its assigned variable-length Huffman code defined in the header.  
Encoding example: This example is a truncation of the example on slide 39 of  
your week 9 lecture. Assume W = 6, L = 4.  
Assume that the input file contained the following text:  
aacaacabcaba  
Note, there are 3 unique characters in the text, {a, b, c}. A feasible set of Huff-  
man codewords for {a, b, c} are {1, 00, 01} respectively. Using LZSS approach  
we get the following Format-0/1 fields:  
h1, ai, h1, ai, h1, ci, h0, 3, 4i, h1, bi, h0, 3, 3i, and h1, ai.  
The header part will contain:  
2

the number of unique characters, 3 in this example, encoded using Elias  
code as 011  
ASCII code of each unique character followed by the Elias code of the length  
of its assigned Huffman codeword, followed by the statement of the Huffman  
codeword:  
Statement of a with Huffman codeword1 of length 1: 01100001, followed  
by 1, followed by 1  
Statement of b with Huffman codeword ‘00’ of length 2: 01100010, fol-  
lowed by 010, followed by 00  
Statement of c with Huffman codeword of ‘01’ of length 2 : 01100011,  
followed by 010, followed by 01  
Thus, concatenating all of the above codes, the header part is encoded as:  
0
11011000011101100010010000110001101001  
The data part will contain:  
The encoding of the total number of Format-0/1 fields. In this example,  
The encoded information of successive Format-0/1 fields:  
h1, ai encoded as 11,  
h1, ai encoded as 11,  
h1, ci encoded as 101,  
h0, 3, 4i encoded as 0011000100,  
h1, bi encoded as 100,  
h0, 3, 3i encoded as 0011011, and finally  
h1, ai encoded as 11.  
Thus concatenating all codes in the data part, we get the encoding:  
0
0011111111010011000100100001101111  
Finally, concatenating the header and data parts gives the lossless encoding of  
the input text to be written out in binary:  
0
1101100001110110001001000011000110100100011111111010011000100100001101111  
DECODER SPEC:  
[5 marks]  
Program name: lzss decoder.py  
Arguments to your program: (a) Output file from your encoder program imple-  
mented in the question above.  
Command line usage of your script:  
lzss decoder.py <output lzss encoder.bin>  
Output file name: output lzss decoder.txt  
Output format: The output is the decoded ASCII text.  
3

Example: If the input binary encoded file contained this bit stream:  
0
1101100001110110001001000011000110100100011111111010011000100100001101111  
the output file will decode the above as:  
aacaacabcaba  
2
. Let P100, P99, · · · , Pk, · · · , P1 be a descending sequence of 100 largest prime numbers  
less than some given N. (Assume N 547.) Corresponding to these prime numbers let  
C , C , · · · , C , · · · , C denote 100 composite numbers, where any C is of the form  
1
00  
99  
k
1
k
C = P + 1. Your task is to factorize each of these 100 composite numbers to their  
k
k
respective prime factors.  
Strictly follow the specification below to address this question:  
Program name: factors.py  
Argument to your program: N (assume N 547.)  
Command line usage of your script:  
factors.py <N>  
Output file name: output factors.txt  
Output format of each line of the output:  
<
<
.
<
C_1>  
C_2>  
… (and so on)  
<prime factors of C_1>  
<prime factors of C_2>  
C_100> <prime factors of C_100>  
Example output for N = 550:  
4
2^2  
6
2^1 x 3^1  
2^3  
2^2 x 3^1  
8
1
.
2
..  
5
5
5
5
22  
24  
42  
48  
2^1 x 3^2 x 29^1  
2^2 x 131^1  
2^1 x 271^1  
2^2 x 137^1  
———-=o0o=———–  
THE END  
&
BEST OF LUCK FOR YOUR EXAM  
———–=o0o=———–  
4