- π_{term} (σ_{docid=2}(Documents)) U π_{term} (σ_{count=3}(Documents))
- Try both union and union all to see the difference
- Try "or" instead of "union" and compare results with using union
- input
select Term from Documents where Docid=3
UNION
select Term from Documents where count=3;
- output
base
ds
final
mid
network
sd
structure
- input
select Term from Documents where Docid=3
UNION ALL
select Term from Documents where count=3;
- output
ds
sd
final
mid
base
structure
sd
sd
network
- input
select Term from Documents where Docid=3
EXCEPT
select Term
from (select * from Documents where Docid=3 EXCEPT select * from Documents where count=3) x
where count=3;
- output
sd
- input
select count(Docid)
from Documents
where Term="data";
- output
4
- input
select Docid
from Documents
group by Docid
having count(Term) > 3;
- output
1
2
3
4
5
- input
select count(a.Docid)
from (select * from Documents where Term="data") a,
(select * from Documents where Term="base") b
where a.Docid = b.Docid;
- output
3
The similarity here is computed by summing the same term counts of two documents. For example, the similarity of Doc1<'a':2, 'b':1, 'c':3>
and Doc2<'b':2, 'c':1, 'd':4>
is 1*2('b') + 3*1('c') = 5
.
(Hint: to avoid computing the similarity of both (Doc1, Doc2) and (Doc2, Doc1), add a condition of the form a.DocID < b.DocID
.)
- input
select x.aID, x.bID, sum(x.value) as result
from (select a.DocID as aID, b.DocID as bID, a.count*b.count as value
from Documents a JOIN Documents b on a.Term = b.Term
where a.DocID < b.DocID
order by a.DocID, b.DocID) x
group by x.aID, x.bID;
- output
1,2,10
1,4,13
1,5,30
1,6,20
2,4,2
2,5,22
2,6,10
3,4,17
4,6,4
5,6,8