数据库代写｜COMP0022 Database and Information Management Systems
There are FOUR questions in total.
Answer All FOUR questions.
Marks for each part of each question are indicated in square brackets [ n].
Standard calculators are permitted.
Consider a database that records details about school students, their exam entries and exams:
Students(studentID. firstname, familyname, address, borough)
Entries(studentID, examID. mark, result)
Exams(exam ID, subject, qualification, board, date)
(a)Write a SQL query on this database to find all students living in the borough of Haringey whose result was a pass where the subject was Maths. Your query should use two joins and no sub-query. [4 marks]
(b)Re-express the query from (i) in relational algebra where the joins are performed on the tables before the tables are reduced by the selection operators. [4 marks]
(c)Calculate the cost of processing the query in terms of disc accesses if you assume the following statistics ( nb. answers may not need all of these statistics):
10 rows are in the Students table.
100 rows are in the Entries table.
10 rows in the Entries table are for the Maths exam.
15 rows are in the Exams table.
1 exam has the subject ‘Maths’.
80 exam entries have a ‘pass’ result.
There were 8 passes of the Maths exam.
5 students live in the borough of ‘Haringey’, they all take the Maths exam and 4 pass.
Assume also that the results of all intermediate operations ( each join and each selection operation) are stored on disc, that rows are accessed one at a time and that main memory is large enough to process entire tables for each operation.
Assume that the Entries and Exams tables are joined before being joined to the Students table. Explain each stage in your calculation. [8 marks]
(d) What would be the total disc accesses if the query to find Haringey students who passed the Maths exam used only one join (i.e., on the Student and Entries tables) and a sub-query to find the examID for the Maths exam (i.e., for an equality search condition in the Where clause of the outer query)? [7 marks]
(e)’Selection cardinality of attribute A in relation R’ is an example of a statistic about a table that a Database Management System might maintain for the purpose of Query Optimisation. Give the equivalent descriptions of two other statistics used for this purpose. [2 marks]
(a)Statements (i) to (xv) below are about methods for managing transactions. In your answer books, say whether each statement is true or false. You will gain one mark for a correct answer but you will lose half a mark for an incorrect answer (the lowest mark you can receive for all of this part of the question is zero). You may state that you don’t know the answer, in which case no marks will be gained or lost. If you are unsure about an answer then you should provide a brief explanation for the answer you have given and this will be taken into account if your answer is incorrect.
(i)A schedule is ‘view-serialisable’ if concurrent transactions read and write the same data values as they would in a serial (non-overlapping) schedule of the same transactions.
(ii)View serialisable is more stringent than conflict serialisable because it requires that pairs of conflicting operations have the same chronological ordering as they would have with a serial schedule.
(iii)Any view serializable schedule that is not conflict serializable contains one or more blind writes.
(iv)In a Recoverable Schedule, for each pair of transactions Ti and Ti, if Ti reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Ti.
(v)To prevent a Cascading Rollback with 2 Phase Locking, the release of all locks should be done immediately they are no longer required.
(vi)Recovery from deadlock must avoid ‘starvation’ which occurs when the same transaction is aborted repeatedly, never getting to read a locked data item it needs and never completing.
(vii)The ‘Wait-Die’ strategy for Deadlock prevention allows only an older transaction to wait for a younger one. If the waiting transaction is younger, it is restarted with the same timestamp rather than a new timestamp.
(viii) The Wound-Wait strategy for Deadlock prevention allows an older transaction to wait for a younger one. If an older transaction requests a lock held by a younger one, the older one is aborted.
(ix)Multi-version timestamp ordering allows multiple transactions to read and write copies of the same data item; it ensures that each transaction sees a consistent set of versions for all data items a transaction accesses.
(x)Multi-version timestamp ordering allows concurrent transactions to operate on copies of the same data item. A transaction reading that item will select the version of the item corresponding with the timestamp of the transaction.
(xi)Optimistic techniques assume that conflicts are rare and allow some transactions to make early commits before they are complete. For read only transactions, the commit is made and validated if still correct at the completion of the transaction.
(xii)The choice of granularity of data items for protection by concurrency protocols involves a trade-off: a fine grain means a lower degree of concurrency and a coarse grain means more locking information needs to be stored and processed.
(xiii)The benefits of decomposing a transaction into nested sub-transactions include: (i) being able to run sub-transactions in parallel; (ii) the ability to control recovery of a failed part of a transaction rather than the complete transaction.
(xiv)In the nested transaction model, a transaction abort at one level does not have to affect transaction in progress at a lower level.
(xv) n the nested transaction model, a parent may perform its own recovery and may ignore the failure of non-vital sub-transactions. [15 marks]
(b)Considering the facilities provided by the Database Management System for database recovery following a system failure, explain: how the state of a transaction determines the recovery procedure used; the three main recovery techniques. [10 marks]