Skip to content

Instantly share code, notes, and snippets.

@kmaehashi
Last active June 20, 2016 08:50
Show Gist options
  • Save kmaehashi/c3cffc966dae96ce64ecb7e2dad5c0dc to your computer and use it in GitHub Desktop.
Save kmaehashi/c3cffc966dae96ce64ecb7e2dad5c0dc to your computer and use it in GitHub Desktop.
BM25 RPC benchmark
# -*- coding: utf-8 -*-
import jubatus
import time
# create labeled datum
sentence = 'あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほabcd' # 100 bytes
sentences = {}
for i in range(100):
sentences['data_{0}'.format(i)] = sentence
d = jubatus.classifier.types.LabeledDatum('pos', jubatus.common.Datum(sentences))
# create classifier
c = jubatus.Classifier("127.0.0.1", 9199, '', 0)
total = 0
count = 0
while True:
batch_size = 1000
begin = time.time()
for i in range(batch_size):
c.train([d])
total += time.time() - begin
count += batch_size
print('[{0} queries sent] ... average latency: {1} ms'.format(count, (total * 1000 / count) ))
{
"method": "perceptron",
"converter": {
"string_filter_types": {},
"string_filter_rules": [],
"num_filter_types": {},
"num_filter_rules": [],
"string_types": {},
"string_rules": [
{ "key": "*", "type": "space", "sample_weight": "bin", "global_weight": "bin"}
],
"num_types": {},
"num_rules": []
}
}
{
"method": "perceptron",
"converter": {
"string_filter_types": {},
"string_filter_rules": [],
"num_filter_types": {},
"num_filter_rules": [],
"string_types": {},
"string_rules": [
{ "key": "*", "type": "space", "sample_weight": "tf", "global_weight": "bm25"}
],
"num_types": {},
"num_rules": []
}
}
{
"method": "perceptron",
"converter": {
"string_filter_types": {},
"string_filter_rules": [],
"num_filter_types": {},
"num_filter_rules": [],
"string_types": {},
"string_rules": [
{ "key": "*", "type": "space", "sample_weight": "tf", "global_weight": "idf"}
],
"num_types": {},
"num_rules": []
}
}

Before BM25

bin-bin

[1000 queries sent] ... average latency: 0.951684951782 ms
[2000 queries sent] ... average latency: 0.944710016251 ms
[3000 queries sent] ... average latency: 0.974693377813 ms
[4000 queries sent] ... average latency: 0.97652053833 ms
[5000 queries sent] ... average latency: 0.973491048813 ms
[6000 queries sent] ... average latency: 0.975795030594 ms
[7000 queries sent] ... average latency: 0.9784215859 ms
[8000 queries sent] ... average latency: 0.979064255953 ms
[9000 queries sent] ... average latency: 0.973398447037 ms
[10000 queries sent] ... average latency: 0.973693299294 ms

tf-idf

[1000 queries sent] ... average latency: 0.92867398262 ms
[2000 queries sent] ... average latency: 0.92799448967 ms
[3000 queries sent] ... average latency: 0.959951321284 ms
[4000 queries sent] ... average latency: 0.957108020782 ms
[5000 queries sent] ... average latency: 0.958620786667 ms
[6000 queries sent] ... average latency: 0.966298977534 ms
[7000 queries sent] ... average latency: 0.959840672357 ms
[8000 queries sent] ... average latency: 0.967863976955 ms
[9000 queries sent] ... average latency: 0.957330438826 ms
[10000 queries sent] ... average latency: 0.932580208778 ms

After BM25

bin-bin

[1000 queries sent] ... average latency: 0.933595895767 ms
[2000 queries sent] ... average latency: 0.910020470619 ms
[3000 queries sent] ... average latency: 0.924420674642 ms
[4000 queries sent] ... average latency: 0.912827253342 ms
[5000 queries sent] ... average latency: 0.91952662468 ms
[6000 queries sent] ... average latency: 0.921235521634 ms
[7000 queries sent] ... average latency: 0.922405447279 ms
[8000 queries sent] ... average latency: 0.922378003597 ms
[9000 queries sent] ... average latency: 0.921071343952 ms
[10000 queries sent] ... average latency: 0.919125413895 ms

tf-idf

[1000 queries sent] ... average latency: 0.954237937927 ms
[2000 queries sent] ... average latency: 0.968870520592 ms
[3000 queries sent] ... average latency: 0.96574529012 ms
[4000 queries sent] ... average latency: 0.972022950649 ms
[5000 queries sent] ... average latency: 0.962139368057 ms
[6000 queries sent] ... average latency: 0.956581473351 ms
[7000 queries sent] ... average latency: 0.958241428648 ms
[8000 queries sent] ... average latency: 0.960849881172 ms
[9000 queries sent] ... average latency: 0.967716428969 ms
[10000 queries sent] ... average latency: 0.96981818676 ms

tf-bm25

[1000 queries sent] ... average latency: 1.24666714668 ms
[2000 queries sent] ... average latency: 1.14513397217 ms
[3000 queries sent] ... average latency: 1.16408427556 ms
[4000 queries sent] ... average latency: 1.17657291889 ms
[5000 queries sent] ... average latency: 1.20339832306 ms
[6000 queries sent] ... average latency: 1.22321108977 ms
[7000 queries sent] ... average latency: 1.23770809174 ms
[8000 queries sent] ... average latency: 1.24913406372 ms
[9000 queries sent] ... average latency: 1.26171326637 ms
[10000 queries sent] ... average latency: 1.26992533207 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment