SDSC5003 - Assignment 2

Part 1: Relational Algebra (35 points)

a. List the names and ages of employees who work in both the Hardware department and the Software department. (7 points)

$\pi_{\text{ename}, \text{age}} \left( \text{Emp} \bowtie_{\text{eid}} \left( \pi_{\text{eid}} \left( \text{Works} \bowtie_{\text{did}} \sigma_{\text{dname} = \text{'Hardware'}}(\text{Dept}) \right) \cap \pi_{\text{eid}} \left( \text{Works} \bowtie_{\text{did}} \sigma_{\text{dname} = \text{'Software'}}(\text{Dept}) \right) \right) \right)$

b. List the names of employees whose salary exceeds the budget of all departments they work in. (7 points)

$\begin{aligned} S &= \text{Emp} \bowtie_{eid} \text{Works} \bowtie_{did} \text{Dept} \\ \text{Bad\_eids} &= \pi_{eid} \left( \sigma_{salary \leq budget}(S) \right) \\ \text{All\_eids} &= \pi_{eid}(\text{Emp}) \\ \text{Good\_eids} &= \text{All\_eids} - \text{Bad\_eids} \\ \text{Result} &= \pi_{ename} \left( \text{Good\_eids} \bowtie_{eid} \text{Emp} \right) \\ \end{aligned}$

c. Find the names and ages of managers who manage only departments with a budget exceeding $1,000,000. (7 points)

$\begin{aligned} \text{All\_managers} &= \pi_{managerid}(\text{Dept}) \\ \text{Bad\_managers} &= \pi_{managerid} \left( \sigma_{budget \leq 1000000}(\text{Dept}) \right) \\ \text{Good\_managers} &= \text{All\_managers} - \text{Bad\_managers} \\ \text{Result} &= \pi_{ename, age} \left( \text{Good\_managers} \bowtie_{managerid = eid} \text{Emp} \right) \\ \end{aligned}$

d. Find the name of the manager who manages the department with the highest budget. (14 points)

$\begin{aligned} D1 &= \rho_{D1}(\text{Dept}), \quad D2 = \rho_{D2}(\text{Dept}) \\ \text{NonMax\_did} &= \pi_{D1.did} \left( \sigma_{D2.budget > D1.budget}(D1 \times D2) \right) \\ \text{MaxDept\_did} &= \pi_{did}(\text{Dept}) - \text{NonMax\_did} \\ \text{Manager\_ids} &= \pi_{managerid} \left( \text{MaxDept\_did} \bowtie_{did} \text{Dept} \right) \\ \text{Result} &= \pi_{ename} \left( \text{Manager\_ids} \bowtie_{managerid = eid} \text{Emp} \right) \\ \end{aligned}$

Part 2: SQL Queries (40 points)

-- part2a
SELECT DISTINCT E.ename, E.age
FROM Emp E
WHERE E.eid IN (
SELECT W.eid
FROM Works W
JOIN Dept D ON W.did = D.did
WHERE D.dname = 'Hardware'
) AND E.eid IN (
SELECT W.eid
FROM Works W
JOIN Dept D ON W.did = D.did
WHERE D.dname = 'Software'
);

-- part2b
SELECT E.ename
FROM Emp E
WHERE E.salary > (
SELECT MAX(D.budget)
FROM Works W
JOIN Dept D ON W.did = D.did
WHERE W.eid = E.eid
);

-- part2c
SELECT E.ename, E.age
FROM Emp E
WHERE E.eid IN (
SELECT managerid
FROM Dept
WHERE budget > 1000000
) AND E.eid NOT IN (
SELECT managerid
FROM Dept
WHERE budget <= 1000000
);

-- part2d
SELECT E.ename
FROM Emp E
JOIN Dept D ON E.eid = D.managerid
WHERE D.budget = (
SELECT MAX(budget)
FROM Dept
);

-- part2e
SELECT W.did, COUNT(DISTINCT W.eid) AS employee_count
FROM Works W
GROUP BY W.did
HAVING SUM(W.pct_time) >= 2000;

Part 3: Design Theory (25 points)

Relation schema $R(A1,A2,A3,A4)$ satisfies the functional dependencies (FDs):

$A3 \rightarrow A1$
$A4 \rightarrow A2$
$A1,A2 \rightarrow A3$
$A1,A2 \rightarrow A4$

1. List all candidate keys of $R$ . (6 points)

Compute attribute closures:

$\{A1,A2\}^+ = \{A1,A2,A3,A4\}$ (by FD3 and FD4)
$\{A1,A4\}^+ = \{A1,A4,A2\}$ (by FD2), then $\{A1,A2\}^+ = \{A1,A2,A3,A4\}$ (by FD3 and FD4)
$\{A2,A3\}^+ = \{A2,A3,A1\}$ (by FD1), then $\{A1,A2\}^+ = \{A1,A2,A3,A4\}$ (by FD3 and FD4)
$\{A3,A4\}^+ = \{A3,A4,A1,A2\}$ (by FD1 and FD2)

2. The main requirement of BCNF is that the left-hand side of an FD must be a superkey. Do the above FDs violate this requirement? Explain each one. (6 points)

BCNF requires: for every non-trivial FD $X \rightarrow Y$ , $X$ must be a superkey.

FD1 $A3 \rightarrow A1$ : $A3$ is not a superkey ( $A3^+ = \{A1,A3\}$ , does not contain all attributes). Violates BCNF.
FD2 $A4 \rightarrow A2$ : $A4$ is not a superkey ( $A4^+ = \{A2,A4\}$ , does not contain all attributes). Violates BCNF.
FD3 $A1,A2 \rightarrow A3$ : $\{A1,A2\}$ is a candidate key (superkey). Does not violate BCNF.
FD4 $A1,A2 \rightarrow A4$ : $\{A1,A2\}$ is a candidate key (superkey). Does not violate BCNF.

3. The secondary requirement of 3NF is that for an FD $X \rightarrow Y$ , $Y$ must be part of a candidate key. Do the above FDs violate this requirement? Explain each one. (4 points)

FD1 $A3 \rightarrow A1$ : $A3$ is not a superkey, but $A1$ is a prime attribute (contained in candidate keys $\{A1,A2\}$ and $\{A1,A4\}$ ). Does not violate 3NF.
FD2 $A4 \rightarrow A2$ : $A4$ is not a superkey, but $A2$ is a prime attribute (contained in candidate keys $\{A1,A2\}$ and $\{A2,A3\}$ ). Does not violate 3NF.
FD3 $A1,A2 \rightarrow A3$ : $A1,A2$ is a superkey. Does not violate 3NF.
FD4 $A1,A2 \rightarrow A4$ : $A1,A2$ is a superkey. Does not violate 3NF.

All FDs satisfy the 3NF requirement; no violations.

4. Provide a lossless BCNF decomposition of $R$ with the minimum number of tables. (9 points)

Since FD1 and FD2 violate BCNF, decomposition is needed. The minimum number of tables is 3 (because the dependencies cannot satisfy BCNF with only 2 tables). Here is a lossless join decomposition:

$R1(A1, A3)$ , FD $A3 \rightarrow A1$ , candidate key $\{A3\}$ (BCNF).
$R2(A2, A4)$ , FD $A4 \rightarrow A2$ , candidate key $\{A4\}$ (BCNF).
$R3(A1, A2)$ , no non-trivial FDs (except key dependencies), candidate key $\{A1,A2\}$ (BCNF).