# OpenMP代写 | CS546 “Parallel and Distributed Processing” Programming Assignment 2

本次美国代写是一个openMP并行和分布式处理的assignment

**Exercise 1:**

Implement a dense matrix multiplication of two N x N matrices (C = N x N) and then parallelize it with

OpenMP.

1) Compare the performance of the OpenMP version solution with the serial version solution and

show how it scales when increasing number of threads.

2) Try different loop orders to see the impact of cache locality on performance (e.g., changing the

order of the outer loop and inner loop when doing the matrix multiplication). What is the

overall best performing version?

**Exercise 2:**

Parallel LU decomposition of a square matrix A (N x N) with OpenMP. Assume that the square matrix

A is generated randomly. Show the performance of the LU decomposition when changing the number

of threads.

Hint: LU decomposition is one of the methods solving square systems of linear equations. It

decomposes the matrix A into a product of two matrices: a lower triangular matrix L and an upper

triangular matrix U. The decomposition can be represented as follows:

A = LU

**Submission Information**

Each program must work correctly and be well documented. You should hand in:

1. Report file (in PDF format): This is your change to explain your solution, possible

optimization, any insights gained, problems encountered, etc. The report should include the

performance results for your solution and your analysis of results.

2. Readme: This file should include the instructions to build and run your program.

3. Source Code: You must hand in all your source code.

4. Output file with timings of your performance testing. It should be consistent with your

report.