计算机代写|Assignment4 COMP3370 Computer Organization

这是一篇加拿大的关于计算机架构的计算机代写

1.Consider the following loop.

LOOP : LDUR X10 , [ X1 , #0]

LDUR X11 , [ X1 , #8]

ADD X12 , X10 , X11

SUBI X1 , X1 , #16

CBNZ X12 , LOOP

Assume (i) that perfect branch prediction is used (no stalls due to control hazards); (ii) that there are no delay slots; (iii) that the pipeline has full forwarding support; and (iv) that branches are resolved in the EX (as opposed to the ID) stage.

(a) Show a pipeline execution diagram for the first two iterations of this loop.

(b) Mark pipeline stages that do not perform useful work. How often while the pipeline is full do we have a cycle in which all five pipeline stages are doing useful work? (Begin with the cycle during which the SUBI is in the IF stage and end with the cycle during which the CBNZ is in the IF stage.)

2.Consider a program with the following cache behaviors.

(a) Suppose a CPU with a write-through, write-allocate cache achieves a CPI of 2. What are the read and write bandwidths (measured by bytes per cycle) between RAM and the cache? (Assume each miss generates a request for one block.)

(b) For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the read and write bandwidths needed for a CPI of 2?

(c) Do additional calculations to (separately) demonstrate the changes in the bandwidth if we

3.Consider the following instruction sequence, running on a 5-stage pipeline datapath:

ADD X5 , X2 , X1

LDUR X3 , [ X5 , #4]

LDUR X2 , [ X2 , #0]

ORR X3 , X5 , X3

STUR X3 , [ X5 , #0]

(a) If there is no forwarding or hazard detection, insert NOPs to ensure correct execution.

(b) Now, change and/or rearrange the code to minimize the number of NOPs needed. You can assume register X7 can be used to hold temporary values in your modified code.

(c) If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when the original code executes?

(d) If there is forwarding, for the first seven cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units in figure below:

(e) If there is no forwarding, what new input and output signals do we need for the hazard detection unit in the above figure? Using this instruction sequence as an example, explain why each signal is needed.

4.Although a cache is named, by convention, according to the amount of data it holds (e.g, a 4 KiB cache can hold 4 KiB of data), caches also require SRAM to store metadata such as tags and valid bits. In the following questions, you will examine how a cache’s configuration affects the total amount of SRAM needed to implement it as well as the performance of the cache. Assume that the caches are byte addressable, and that addresses and words are 64 bits.

(a) Calculate the total number of bits required to implement a 32 KiB cache with 2-word blocks.

(b) Calculate the total number of bits required to implement a 64 KiB cache with 16-word blocks. How much bigger is this cache than the 32 KiB cache described in the previous question? Why the amount of data can be increased by only increasing the block size?

(c) Explain why the above 64 KiB cache, despite its larger data size, might provide slower performance than the first cache.

(d) Generate a series of read requests that have a lower miss rate on a 32 KiB 2-way set associative cache than on the cache described above?.