#assignment #sdsc5003

题目链接 SDSC5003 - Question of Assignment 2

SDSC5003 - Assignment 2

Part 1: Relational Algebra (35 points)

a. List the names and ages of employees who work in both the Hardware department and the Software department. (7 points)

πename,age(Empeid(πeid(Worksdidσdname=’Hardware’(Dept))πeid(Worksdidσdname=’Software’(Dept))))\pi_{\text{ename}, \text{age}} \left( \text{Emp} \bowtie_{\text{eid}} \left( \pi_{\text{eid}} \left( \text{Works} \bowtie_{\text{did}} \sigma_{\text{dname} = \text{'Hardware'}}(\text{Dept}) \right) \cap \pi_{\text{eid}} \left( \text{Works} \bowtie_{\text{did}} \sigma_{\text{dname} = \text{'Software'}}(\text{Dept}) \right) \right) \right)

b. List the names of employees whose salary exceeds the budget of all departments they work in. (7 points)

S=EmpeidWorksdidDeptBad_eids=πeid(σsalarybudget(S))All_eids=πeid(Emp)Good_eids=All_eidsBad_eidsResult=πename(Good_eidseidEmp)\begin{aligned} S &= \text{Emp} \bowtie_{eid} \text{Works} \bowtie_{did} \text{Dept} \\ \text{Bad\_eids} &= \pi_{eid} \left( \sigma_{salary \leq budget}(S) \right) \\ \text{All\_eids} &= \pi_{eid}(\text{Emp}) \\ \text{Good\_eids} &= \text{All\_eids} - \text{Bad\_eids} \\ \text{Result} &= \pi_{ename} \left( \text{Good\_eids} \bowtie_{eid} \text{Emp} \right) \\ \end{aligned}

c. Find the names and ages of managers who manage only departments with a budget exceeding $1,000,000. (7 points)

All_managers=πmanagerid(Dept)Bad_managers=πmanagerid(σbudget1000000(Dept))Good_managers=All_managersBad_managersResult=πename,age(Good_managersmanagerid=eidEmp)\begin{aligned} \text{All\_managers} &= \pi_{managerid}(\text{Dept}) \\ \text{Bad\_managers} &= \pi_{managerid} \left( \sigma_{budget \leq 1000000}(\text{Dept}) \right) \\ \text{Good\_managers} &= \text{All\_managers} - \text{Bad\_managers} \\ \text{Result} &= \pi_{ename, age} \left( \text{Good\_managers} \bowtie_{managerid = eid} \text{Emp} \right) \\ \end{aligned}

d. Find the name of the manager who manages the department with the highest budget. (14 points)

D1=ρD1(Dept),D2=ρD2(Dept)NonMax_did=πD1.did(σD2.budget>D1.budget(D1×D2))MaxDept_did=πdid(Dept)NonMax_didManager_ids=πmanagerid(MaxDept_diddidDept)Result=πename(Manager_idsmanagerid=eidEmp)\begin{aligned} D1 &= \rho_{D1}(\text{Dept}), \quad D2 = \rho_{D2}(\text{Dept}) \\ \text{NonMax\_did} &= \pi_{D1.did} \left( \sigma_{D2.budget > D1.budget}(D1 \times D2) \right) \\ \text{MaxDept\_did} &= \pi_{did}(\text{Dept}) - \text{NonMax\_did} \\ \text{Manager\_ids} &= \pi_{managerid} \left( \text{MaxDept\_did} \bowtie_{did} \text{Dept} \right) \\ \text{Result} &= \pi_{ename} \left( \text{Manager\_ids} \bowtie_{managerid = eid} \text{Emp} \right) \\ \end{aligned}


Part 2: SQL Queries (40 points)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- part2a
SELECT DISTINCT E.ename, E.age
FROM Emp E
WHERE E.eid IN (
SELECT W.eid
FROM Works W
JOIN Dept D ON W.did = D.did
WHERE D.dname = 'Hardware'
) AND E.eid IN (
SELECT W.eid
FROM Works W
JOIN Dept D ON W.did = D.did
WHERE D.dname = 'Software'
);

-- part2b
SELECT E.ename
FROM Emp E
WHERE E.salary > (
SELECT MAX(D.budget)
FROM Works W
JOIN Dept D ON W.did = D.did
WHERE W.eid = E.eid
);

-- part2c
SELECT E.ename, E.age
FROM Emp E
WHERE E.eid IN (
SELECT managerid
FROM Dept
WHERE budget > 1000000
) AND E.eid NOT IN (
SELECT managerid
FROM Dept
WHERE budget <= 1000000
);

-- part2d
SELECT E.ename
FROM Emp E
JOIN Dept D ON E.eid = D.managerid
WHERE D.budget = (
SELECT MAX(budget)
FROM Dept
);

-- part2e
SELECT W.did, COUNT(DISTINCT W.eid) AS employee_count
FROM Works W
GROUP BY W.did
HAVING SUM(W.pct_time) >= 2000;

Part2.png


Part 3: Design Theory (25 points)

Relation schema R(A1,A2,A3,A4)R(A1,A2,A3,A4) satisfies the functional dependencies (FDs):

  1. A3A1A3 \rightarrow A1

  2. A4A2A4 \rightarrow A2

  3. A1,A2A3A1,A2 \rightarrow A3

  4. A1,A2A4A1,A2 \rightarrow A4

1. List all candidate keys of RR. (6 points)

Compute attribute closures:

  • {A1,A2}+={A1,A2,A3,A4}\{A1,A2\}^+ = \{A1,A2,A3,A4\} (by FD3 and FD4)

  • {A1,A4}+={A1,A4,A2}\{A1,A4\}^+ = \{A1,A4,A2\} (by FD2), then {A1,A2}+={A1,A2,A3,A4}\{A1,A2\}^+ = \{A1,A2,A3,A4\} (by FD3 and FD4)

  • {A2,A3}+={A2,A3,A1}\{A2,A3\}^+ = \{A2,A3,A1\} (by FD1), then {A1,A2}+={A1,A2,A3,A4}\{A1,A2\}^+ = \{A1,A2,A3,A4\} (by FD3 and FD4)

  • {A3,A4}+={A3,A4,A1,A2}\{A3,A4\}^+ = \{A3,A4,A1,A2\} (by FD1 and FD2)

2. The main requirement of BCNF is that the left-hand side of an FD must be a superkey. Do the above FDs violate this requirement? Explain each one. (6 points)

BCNF requires: for every non-trivial FD XYX \rightarrow Y, XX must be a superkey.

  • FD1 A3A1A3 \rightarrow A1: A3A3 is not a superkey (A3+={A1,A3}A3^+ = \{A1,A3\}, does not contain all attributes). Violates BCNF.

  • FD2 A4A2A4 \rightarrow A2: A4A4 is not a superkey (A4+={A2,A4}A4^+ = \{A2,A4\}, does not contain all attributes). Violates BCNF.

  • FD3 A1,A2A3A1,A2 \rightarrow A3: {A1,A2}\{A1,A2\} is a candidate key (superkey). Does not violate BCNF.

  • FD4 A1,A2A4A1,A2 \rightarrow A4: {A1,A2}\{A1,A2\} is a candidate key (superkey). Does not violate BCNF.

3. The secondary requirement of 3NF is that for an FD XYX \rightarrow Y, YY must be part of a candidate key. Do the above FDs violate this requirement? Explain each one. (4 points)

  • FD1 A3A1A3 \rightarrow A1: A3A3 is not a superkey, but A1A1 is a prime attribute (contained in candidate keys {A1,A2}\{A1,A2\} and {A1,A4}\{A1,A4\}). Does not violate 3NF.

  • FD2 A4A2A4 \rightarrow A2: A4A4 is not a superkey, but A2A2 is a prime attribute (contained in candidate keys {A1,A2}\{A1,A2\} and {A2,A3}\{A2,A3\}). Does not violate 3NF.

  • FD3 A1,A2A3A1,A2 \rightarrow A3: A1,A2A1,A2 is a superkey. Does not violate 3NF.

  • FD4 A1,A2A4A1,A2 \rightarrow A4: A1,A2A1,A2 is a superkey. Does not violate 3NF.

All FDs satisfy the 3NF requirement; no violations.

4. Provide a lossless BCNF decomposition of RR with the minimum number of tables. (9 points)

Since FD1 and FD2 violate BCNF, decomposition is needed. The minimum number of tables is 3 (because the dependencies cannot satisfy BCNF with only 2 tables). Here is a lossless join decomposition:

  • R1(A1,A3)R1(A1, A3), FD A3A1A3 \rightarrow A1, candidate key {A3}\{A3\} (BCNF).

  • R2(A2,A4)R2(A2, A4), FD A4A2A4 \rightarrow A2, candidate key {A4}\{A4\} (BCNF).

  • R3(A1,A2)R3(A1, A2), no non-trivial FDs (except key dependencies), candidate key {A1,A2}\{A1,A2\} (BCNF).