nlpo3-cli vs newmm
- Computer: Scaleway's Mac mini M1
- Rustc: rustc 1.54.0 (a178d0322 2021-07-26)
- Python: Python 3.8.2
- OS: Darwin 506124d8-4acf-4595-9d46-8ca4b44b8110 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:27 PDT 2021; root:xnu-7195.141.2~5/RELEASE_ARM64_T8101 arm64
- Script:
#!/bin/bash
set -x
INPUT=thwik-head1m.txt
for i in {1..10}
do
{ time python3 newmm.py < $INPUT > newmm.out ; } 2>> bench_newmm.txt
{ time nlpo3 segment < $INPUT > cham.out ; } 2>> bench_o3.txt
done
- A command line interface for newmm:
from pythainlp import word_tokenize
import sys
for line in sys.stdin:
print("|".join(word_tokenize(line[:-1])))
- nlpo3 version: 1.1.2
- nlpo3-cli version: 0.0.1
- chamkho version: 0.5.0
- dataset: https://file.veer66.rocks/langbench/thwik-head1m.txt
[root@exper1 ~]# % grep real bench_o3.txt
real 2m10.923s
real 2m12.014s
real 2m10.931s
real 2m9.448s
real 2m9.055s
real 2m10.570s
real 2m10.672s
real 2m10.140s
real 2m11.220s
real 2m9.941s
% grep real bench_newmm.txt
real 7m52.180s
real 7m58.090s
real 7m57.071s
real 8m9.779s
real 7m54.576s
real 7m52.807s
real 7m59.109s
real 7m58.489s
real 7m59.604s
real 7m57.844s
- nlpo3
% grep real bench_o3.txt | ruby -lane 'BEGIN { all = 0.0; cnt = 0 }; cols = $F[1].split(/[ms]/).map {|x| x.to_f }; v = cols[0]*60 + cols[1]; all += v; cnt += 1; END { p all/cnt}'
130.49140000000003
- newmm
% grep real bench_newmm.txt | ruby -lane 'BEGIN { all = 0.0; cnt = 0 }; cols = $F[1].split(/[ms]/).map {|x| x.to_f }; v = cols[0]*60 + cols[1]; all += v; cnt += 1; END { p all/cnt}'
477.9549
3.66