OpenMP代写 | CS546 “Parallel and Distributed Processing” Programming Assignment 2
本次美国代写是一个openMP并行和分布式处理的assignment
Exercise 1:
Implement a dense matrix multiplication of two N x N matrices (C = N x N) and then parallelize it with
 OpenMP.
1) Compare the performance of the OpenMP version solution with the serial version solution and
 show how it scales when increasing number of threads.
2) Try different loop orders to see the impact of cache locality on performance (e.g., changing the
 order of the outer loop and inner loop when doing the matrix multiplication). What is the
 overall best performing version?
Exercise 2:
Parallel LU decomposition of a square matrix A (N x N) with OpenMP. Assume that the square matrix
 A is generated randomly. Show the performance of the LU decomposition when changing the number
 of threads.
Hint: LU decomposition is one of the methods solving square systems of linear equations. It
 decomposes the matrix A into a product of two matrices: a lower triangular matrix L and an upper
 triangular matrix U. The decomposition can be represented as follows:
 A = LU

Submission Information
Each program must work correctly and be well documented. You should hand in:
1. Report file (in PDF format): This is your change to explain your solution, possible
 optimization, any insights gained, problems encountered, etc. The report should include the
 performance results for your solution and your analysis of results.
 2. Readme: This file should include the instructions to build and run your program.
 3. Source Code: You must hand in all your source code.
 4. Output file with timings of your performance testing. It should be consistent with your
 report.
