算法代写 | COMPSCI 340 Assignment 1
We no longer live in a world where computing speeds increase along with Moore’s Law.
Instead we maintain increased throughput in our computers by running more cores and
increasing the amounts of parallelism. Parallelism on a multi-core (multi-processor) machine
is looked after by the operating system.
In this assignment you have to parallelise a simple program in a number of different ways and
answer questions summarising your findings.
This assignment is to be done on a Unix based operating system. The distributed code runs on
Linux, but you can modify it to run on any Unix based operating system such as MacOS. It
should run on the Windows subsystem for Linux. To run your code the markers will use the
Ubuntu image provided on flexit.auckland.ac.nz. So please ensure that your submitted
programs run in that environment. To get the results for this assignment you will need a
machine (real or VM) with 4 or more cores.
When you come to time your programs run them all on the same machine.
The incredibly slow insertion sort
The algorithm you have to parallelise is an insertion sort. As you will recall from COMPSCI
130 the insertion sort is an order O(n2) sort, but it is one of the better such sorts.
The original version for the assignment is a1.0.c.
Always compile using:
gcc -O2 filename.c -o filename -lm (you will need to use -lpthread for some
programs, it is slightly different in MacOS).
All the programs (except the first version of a1.1.c) should be written to correctly sort the
data. If the markers find that a program does not sort the data, you will not get the marks for
Things to do
Read through and understand the code in the file a1.0.c.
Compile and run the program. An important part of the program is the is_sorted function
before the end. You will always need to call this to ensure that any changes you make to the
implementation do in fact return the sorted data.
If you run the program without any command line parameters e.g.
it will use a default array of only 16 values.
Run it with a larger amount of random data by including the power of two size parameter on
the command line e.g.
runs the program with an array of 1024 random values. This is the largest size which will
print all of the values.
Find a parameter size which makes the program take more than 30 seconds to complete. On
one of my VMs this was a value of 20.
Question 1 [1 mark]:
Describe the environment you used, operating system, number of cores, amount of real
memory. What parameter gave a time of > 30 seconds? How long did it take to complete in
clock ticks and elapsed time?
You don’t need to submit a1.0.c.
The a1.0.c program has several stages. The first stage splits the array of data into 4 roughly
equal size bins. (This of course is totally pointless before we start to parallelise the program.)
This step is based on trying to split the data faster. The split_data function runs through
the whole array and deposits values into one of 4 bins, with values from 0 to 249, 259 to 499,
500 to 749, and 750 to 999.
The file a1.1.c contains an attempt at doing this.
Run the a1.1.c program with the same parameter as Step 0.
Question 2 [2 marks]:
What happens when the program runs? Explain why?
Modify the program so that the 4 threads successfully split the data into the 4 bins and the
data is sorted properly. Submit this program as a1.1.c. Put your login/UPI in the file
Question 3 [2 marks]:
What did you do to enable the parallel split_data function to work properly?
Question 4 [2 marks]:
Is it a good idea to split the data using multiple threads? Explain why or why not, using data
you collected. (I recommend you time only the splitting of the data not the sorting in order to
answer this question.)