Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Data Bases 2

Full exam

Databases 2 - exam S. Comai, P. Fraternali, D. Martinenghi February 4th, 2022 A. Ranking (8 points) The following ranked lists report salaries of employees (in Ke) and their departmental budgets (in Me), respectively, in descending order. 1.Use TA to find the top-1 employee according to the salary/budget ratio. 2.Use now FA for the same purpose.salary (Ke)budget (Me)Dan: 60.0Pam: 40.0 Abe: 50.0Joe: 30.0 Pam: 30.0Dan: 30.0 Joe: 20.0Abe: 10.0 3.Discuss correctness of TA and FA for this problem: a)using the data shown above;b)in general. In Steps 1 and 2, provide enough details for us to check correctness of your execution with TA and FA. Solution. 1.With TA, we make one sorted access tosalaryretrieving Dan (60), and one random access tobudget to retrieve Dan’s budget (30); simmetrically, one s.a. tobudget(Pam, 40) and a corresponding r.a. to salary(30). With this, Dan’s score is 2, Pam’s is 0.75, while the threshold point (60,40) has a score of 1.5. So TA stops and recognizes Dan as the top-1 employee. 2.With FA, we make sorted accesses to both lists until a common ob ject is found, which happens at depth3 (both Dan and Pam are in common, so their score can be computed); when the sorted access phase stops, all employees have been seen. For Abe and Joe, FA determines their missing partial score through random access. The highest score is Abe’s (5), which is the top-1 employee by FA. 3.Clearly, TA fails to recognize the top employee (Abe), while FA succeeds. The problem is the useof a scoring function that is not monotonic in its arguments, so the determination of the threshold is meaningless, particularly because budgets decrease, so overall scores may increase. However, FA may also fail with this scoring function: if we swapped Pam’s and Dan’s salaries, then Pam would be found on both lists at depth 1 and FA would stop immediately, failing to recognize that Abe has a much higher score. B. Physical databases (8 points + bonus) TableEmployee(ID, Name, HireDate, Salary, ManagerID, Site) stores 30K tuples in an entry-sequenced primary sequential organization with 2K blocks. Evaluate the query plan and cost for the query below in each of the following two scenarios, assuming that val(Site)=60 and that 5% of the employees are top managers (i.e., theirManagerIDis NULL); there can only be either top managers or regular employees (non-managers). 1.No auxiliary structures; 2.Secondary indexes: B+(ManagerID) with 500 leaves, B+(Site) with 600 leaves, both with 3 levels, F=70. SELECT * FROM Employee WHERE ManagerID IN (SELECT ID FROM Employee WHERE Site=’Milan’ AND ManagerID IS NULL) Bonus(1 extra point): how would the results change if we replaced the first line of the query withUPDATE Employee SET Salary = 1.1 * Salary? Motivate your answer. Solution.Scenario 1Start from the nested query: scan the 2K blocks of the primary structure. During the scan find the tuples whereSite=’Milan’andManagerID IS NULL: they are 30K * 5% * 1/60 = 25 tuples. Cache these 25 tuples (managers from Milan), in particular theirIDs. For the external query: scan again the 2K blocks of the primary structure and find the employees where ManagerIDin the set of the 25 managersIDs. TOTAL COST = 2K + 2K = 4K I/O operations Scenario 2Start from B+(Site) and search for’Milan’: this requires to read 2 intermediate nodes + all the leaf nodes that contain the pointers to’Milan’ employees. There are 30K/60=500 employees from Milan. A leaf node can contain 30K/600=50 pointers. To read 500 pointers we need to read 10 leaf nodes (rounded to 11). Then, we follow the 500 pointers and check if theManagerID IS NULL. We will find 25 managers (500*5%). We cache them.For the external query one may proceed like in scenario 1, but we can optimize the plan by exploiting also the B+(ManagerID). For each top manager from Milan we can search the ID in the B+(ManagerID). This has 2 intermediate levels. For eachManagerIDwe expect to have 95% / 5%=19 employees. Each leaf node can contain 30K/500 pointers, therefore the 19 pointers in most of the cases will be in a single leaf 1 node; we need to read 3 nodes in the B+ tree. For each ManagerID, we follow the 19 pointers to find the full tuples of the employees that have that ID in theManagerIDfield to be returned in the select clause. TOTAL COST = 2 (int. levels) + 11 (leaf nodes) + 500 pointers (to reach the full tuples of the’Milan’ employees) + 25 * (3 ( levels in B+(ManagerID)) + 19 (pointers)) = 1063 I/O operations If the plan starts from B+(ManagerID) the cost for the nested query becomes: 2 intermediate nodes + (1500/60 +1 = 26) = 28 leaf nodes; then, follow 30K*5%=1500 pointers to reach the full tuples to check if the site is’Milan’. This cost (2+28+1500) is higher than (2+11+500). Bonus: Updating a tuple requires the containing block to be read end thenoverwritten; we can assume that both the costs of reading and overwriting can be measured as 1 I/O, for simplicity. If we replace the first part with an update we have to do exactly the same steps described before, we just need to add the “write part”: for all the blocks that contain an employee with a Manager from’Milan’ (they are 25*19 = 475), the salary is updated and the block must be persisted into secondary memory. In the worst case, the 475 tuples are contained in 475 different blocks: this is an upper-bound for the cost that must be added to the costs of the two previous scenarios that already consider the flead part” of the update. C. Trigger (8 points) A DBMS manages the enrolment of students in the current exam call, using the following two tables: ENROLLEDSTUDENT(StudentID, Name, AssignedRoom) VIRTUALROOM(RoomID, NumberOfStudents) TableVIRTUALROOM keeps a list of virtual rooms with the number of students assigned to them. A virtual room can hold up to 25 students. When a student unsubscribes, the corresponding row is deleted from ENROLLEDSTUDENT ; the system then checks whether it can merge two classes into one with at most 25 students. Virtual rooms with 0 students are deleted. Write a set of triggers enforcing the above policy when reacting to unsubscribe events. Solution. CREATE TRIGGER T1 AFTER DELETE ON ENROLLED_STUDENT FOR EACH ROW BEGIN UPDATE VIRTUAL_ROOM SET NumberOfStudents = NumberOfStudents - 1 WHERE RoomID = old.assignedRoom; END CREATE TRIGGER T2 AFTER UPDATE OF NumberOfStudents ON VIRTUAL_ROOM FOR EACH ROW WHEN new.NumberOfStudents = 0 BEGIN DELETE FROM virtual_room WHERE RoomID=new.RoomID; END CREATE TRIGGER T3 AFTER UPDATE OF NumberOfStudents ON VIRTUAL_ROOM FOR EACH ROW WHEN new.NumberOfStudents > 0 AND new.NumberOfStudents < old.NumberOfStudents BEGIN DECLARE R INT; SELECT IFNULL((SELECT RoomID FROM VIRTUAL_ROOM WHERE NumberOfStudents