SQL 실습, Big Data (2018 spring), Postechx

1. Write a SQL query that is equivalent to the following relational albebra expression.

π_{term} (σ_{docid=2}(Documents)) U π_{term} (σ_{count=3}(Documents))
Try both union and union all to see the difference
Try "or" instead of "union" and compare results with using union

Answer

input

select Term from Documents where Docid=3
UNION
select Term from Documents where count=3;

output

base
ds
final
mid
network
sd
structure

input

select Term from Documents where Docid=3
UNION ALL
select Term from Documents where count=3;

output

ds
sd
final
mid
base
structure
sd
sd
network

input

select Term from Documents where Docid=3
EXCEPT
select Term
from (select * from Documents where Docid=3 EXCEPT select * from Documents where count=3) x
where count=3;

output

sd

2. Write a SQL query to count the number of documents containing the word "data".

Answer

input

select count(Docid)
from Documents
where Term="data";

output

3. Write a SQL query to find all documents that have more than 3 terms.

Answer

input

select Docid 
from Documents 
group by Docid 
having count(Term) > 3;

output

4. Write a SQL query to count the number of documents that contain both the word "data" and "base".

Answer

input

select count(a.Docid) 
from (select * from Documents where Term="data") a, 
     (select * from Documents where Term="base") b 
where a.Docid = b.Docid;

output

5. Write a SQL query to compute the similarity of every pair of documents in Documents.

The similarity here is computed by summing the same term counts of two documents. For example, the similarity of Doc1<'a':2, 'b':1, 'c':3> and Doc2<'b':2, 'c':1, 'd':4>is 1*2('b') + 3*1('c') = 5.

(Hint: to avoid computing the similarity of both (Doc1, Doc2) and (Doc2, Doc1), add a condition of the form a.DocID < b.DocID.)

Answer

input

select x.aID, x.bID, sum(x.value) as result 
from (select a.DocID as aID, b.DocID as bID, a.count*b.count as value 
      from Documents a JOIN Documents b on a.Term = b.Term 
      where a.DocID < b.DocID 
      order by a.DocID, b.DocID) x 
group by x.aID, x.bID;

output

1,2,10
1,4,13
1,5,30
1,6,20
2,4,2
2,5,22
2,6,10
3,4,17
4,6,4
5,6,8

GzuPark/Excercise_SQL.md

1. Write a SQL query that is equivalent to the following relational albebra expression.

Answer

2. Write a SQL query to count the number of documents containing the word "data".

Answer

3. Write a SQL query to find all documents that have more than 3 terms.

Answer

4. Write a SQL query to count the number of documents that contain both the word "data" and "base".

Answer

5. Write a SQL query to compute the similarity of every pair of documents in Documents.

Answer