Skip to content

Instantly share code, notes, and snippets.

@tinylamb
Created March 12, 2014 06:48
Show Gist options
  • Select an option

  • Save tinylamb/9502067 to your computer and use it in GitHub Desktop.

Select an option

Save tinylamb/9502067 to your computer and use it in GitHub Desktop.
信息检索之布尔查询
问题:有文档集合D,给定词汇t1,t2,我想知道同时包含t1 t2的文档有哪些?
问题转化为,倒排索引表上的交,并,差 运算。结合Weien图
倒排索引的由来:
所谓倒排就是某个词在哪些文档中出现,正好与一个文档中包含哪些词相反。
词汇-文档矩阵包含上述两个概念。
倒排表的交运算
L1={ai},L2={bi} ;L1 L2均有序,现求L1 ∩ L2
Intersection(L1,L2):
inter=[]
p1=L1.head,p2=L2.head;
while p1 && p2://求交,这里是&&
if *p1 == *p2:
inter.append(*p1)
p1++
p2++
elseif *p1 < *p2:
p1++
else
p2++
倒排表的并运算
Merge(L1,L2):
merge=[]
p1=L1.head,p2=L2.head
//求并,这里与求交不同,不是&&
len = L1.length + L2.length
for i in range(len):
if (p2 ==NULL || (p1 && *p1 <= *p2)):
merge.append(*p1)
++p1
if (*p1 == *p2)
++p2
else
merge.append(*p2)
++p2
倒排表的差运算
Except(L1,L2)
except=[]
p1=L1.head,p2=L2.head
while p1
if (p2 = =NULL || *p1 < *p2)
except.append(*p1)
++p1
elseif (*p1 == *p2)
++p1
++p2
else
++p1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment