2023年12月3日

数据库代写｜W4112 Database Systems Implementation Homework 2

1. In class we saw that hash partitioning of a table T larger than RAM could be useful for various
relational algebra operations. Suppose our RAM is M blocks big. Suppose also that (unlike what we
covered in class) we separately count the number s of sequential I/O requests and the number r of
random I/O requests. The overall cost would then be s∗Cs +r ∗Cr where Cs is the cost of a sequential
I/O and Cr is the cost of a random I/O. For some devices, Cr is much bigger than Cs.

In this question, we’ll try to convert some of the random I/Os into sequential I/Os. In particular,
instead of bringing in one block from the input, we’ll bring in k contiguous blocks at a time. Similarly,
for each hash partition, we’ll buffer k blocks and only write out data to the disk in units of k contiguous
blocks.

(a) Calculate r and s for a single partitioning pass through the input, in terms of k and jT j (the
number of blocks in T).

(b) Write an inequality that represents the conditions on k and jT j that ensure the partitions will
each fit in RAM after one partitioning pass. For this question you may assume that the hash
function perfectly partitions the data into equal-sized pieces.

(c) How would your answer to the previous question change if you also implemented double buffering
to overlap CPU and I/O latencies?

(d) Given your answers above, how would you choose k in practice?

2. Table R has 500,000,000 records, a clustering B-tree index on R:A, a secondary hash index on R:B, and
a secondary B-tree index on (R:C; R:D). A disk page holds 100 records from R on average, and tree
nodes hold 500 records on average in a disk page. According to the catalog, A is a key for the table,
there are 1,000,000 distinct B values, 15,000 distinct C values, and 100 distinct D values. For each
of the SQL or relational algebra queries below, give the best access plan for the query and estimate
its cost. You can use Figure 15.3 for cost formulas, and assume tS = 5 milliseconds and tT = 0:1
milliseconds.

(a) σA=5(R).
(b) σB=5(R).
(c) σC=5(R).
(d) σB=5 and C=5(R).
(e) σC=5 and D=5(R).
(f) σB=5 or C=5(R).
(g) σC=5 or D=5(R).
(h) Select C, sum(D) From R Group by C
(i) Select D, sum(C) From R Group by D

3. We saw in class that multidimensional histograms can help determine selectivities for correlated
columns. There are several ways to choose the bucket boundaries for such histograms. Take a look at

http://www.vldb.org/conf/1997/P486.PDF and answer the following:

(a) Based on Section 5 of that paper, explain how the MHIST-2 method chooses bucket boundaries
for 2-dimensional histograms. (Two paragraphs is about right.)

(b) Based on Section 7, outline how the authors determine that MHIST-2 performs well. (Two
paragraphs is plenty.)

4. Take a look at https://www.ibm.com/docs/en/db2/11.1?topic=methods-scan-sharing and answer
the following:

(a) Explain the concept of scan sharing. What performance advantages does it offer?

程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail:vipdue@outlook.com 微信:vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

数据库辅导

网页代写 | ICTWEB429 Assessment 1 – Develop 2 Mark-up Documents 大数据代写 | FIT5202 – Data processing for Big Data Assignment 2