Last active
December 26, 2015 20:00
-
-
Save 9468305/9042d6eb137e5ea7dcbd to your computer and use it in GitHub Desktop.
Nexus5奇葩,总是不按常理出牌;在各种情况下,md5p表现都优秀;并行数=4在各种情况下表现稳定;华为荣耀3X应该是8核手机,所以并行数=8时总是最优;目前主流手机是4核为主,因此选择并行数=4的md5算法最好; 后续测试:进行大文件的分段mmap+并行算法测试; 存在争议:究竟瓶颈是cpu还是file io?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Android手机测试数据 | |
测试文件:460MB | |
源码:见 https://gist.github.com/9468305/97dca7c470ee02a6867c | |
使用场景:需求源于这里 https://gist.github.com/9468305/fa8f1307ea4738225fca | |
测试思路:测试mmap,file io buffer read,OpenMP并行数,对各种数据摘要算法,在不同手机上的性能表现进行统计分析 | |
首先使用mmap映射整个文件到内存,直接使用460MB内存地址计算 | |
blake2sp 和 blake2bp 并行数=官网默认值8 | |
华为荣耀3X ndk-build thumb2 gcc -O3 执行2次 | |
blake2s time = 15.230977 seconds 15.149100 seconds | |
blake2b time = 23.496808 seconds 23.273819 seconds | |
blake2sp time = 3.458435 seconds 3.083542 seconds | |
blake2bp time = 6.946069 seconds 7.100667 seconds | |
华为荣耀3X ndk-build thumb2 gcc -O2 | |
run test on device | |
blake2s time = 15.519730 seconds | |
blake2b time = 22.998304 seconds | |
blake2sp time = 3.352492 seconds | |
blake2bp time = 6.810990 seconds | |
华为荣耀3X ndk-build thumb2 gcc -Os | |
blake2s time = 15.541925 seconds | |
blake2b time = 22.988108 seconds | |
blake2sp time = 3.389670 seconds | |
blake2bp time = 6.830233 seconds | |
华为荣耀3X ndk-build arm gcc -O3 | |
blake2s time = 15.135739 seconds | |
blake2b time = 23.559449 seconds | |
blake2sp time = 3.393177 seconds | |
blake2bp time = 6.834560 seconds | |
数据分析:thumb arm相同,-O2 -O3 -O3 相同; | |
继续测试:以下使用thumb -O3 | |
小米4 -O3 thumb 均衡模式(运行期间可能手机黑屏了,导致blake2bp数据不准) | |
blake2s time = 11.874335 seconds | |
blake2b time = 31.524835 seconds | |
blake2sp time = 3.485516 seconds | |
blake2bp time = 20.172217 seconds | |
小米4 省电模式(CPU被降频) | |
blake2s time = 21.074071 seconds | |
blake2b time = 48.200784 seconds | |
blake2sp time = 9.167828 seconds | |
blake2bp time = 23.293334 seconds | |
小米4 性能模式(CPU频率不锁,可动态至最高,但不是始终最高频运行) | |
blake2s time = 6.652548 seconds | |
blake2b time = 20.712190 seconds | |
blake2sp time = 2.550314 seconds | |
blake2bp time = 6.108460 seconds | |
Nexus5 thumb2 16指令集 执行2次 (Nexus5比较奇葩,32位CPU+64位OS,表现各种异常) | |
blake2s time = 6.078702 seconds 6.857189 seconds | |
blake2b time = 19.730120 seconds 19.689735 seconds | |
blake2sp time = 8.525794 seconds 10.762844 seconds | |
blake2bp time = 21.908520 seconds 23.456138 seconds | |
Nexus5 arm 32位指令集 | |
blake2s time = 6.817943 seconds | |
blake2b time = 20.754325 seconds | |
blake2sp time = 10.346421 seconds | |
blake2bp time = 23.718929 seconds | |
数据分析:并行数可能对执行效率有影响 | |
继续测试:修改blake2sp blake2bp的并行数=4 | |
Nexus5 执行2次 (可能中途手机进入省电模式) | |
blake2s time = 10.197930 seconds 10.092280 seconds | |
blake2b time = 15.541306 seconds 15.102294 seconds | |
blake2sp time = 3.458573 seconds 3.345040 seconds | |
blake2bp time = 4.747609 seconds 4.683213 seconds | |
小米4 执行2次 | |
blake2s time = 6.572716 seconds 6.555986 seconds | |
blake2b time = 20.772833 seconds 20.788557 seconds | |
blake2sp time = 2.553117 seconds 2.616197 seconds | |
blake2bp time = 6.034184 seconds 6.170549 seconds | |
Nexus5 执行2次 (手机亮屏) | |
blake2s time = 5.844344 seconds 5.960309 seconds | |
blake2b time = 19.377594 seconds 19.664046 seconds | |
blake2sp time = 7.031088 seconds 7.153667 seconds | |
blake2bp time = 21.665652 seconds 21.794273 seconds | |
数据分析:blake2b blake2bp 在32位OS上性能太差; 并行数对测试结果有影响; | |
继续测试:放弃blake2b,blake2bp,观察blake2s blake2sp在并行数=2的表现; | |
小米4 | |
blake2s time = 6.574686 seconds | |
blake2sp time = 3.746227 seconds | |
华为荣耀3X | |
blake2s time = 10.452700 seconds | |
blake2sp time = 6.534018 seconds | |
Nexus5 | |
blake2s time = 6.045295 seconds | |
blake2sp time = 6.597525 seconds | |
测试分析:对于小米4 华为荣耀3X,并行数2,4,8的差异不大;对于Nexus5,并行数越小,性能越高,但弱于其他手机(可是这款手机的CPU不算差) | |
继续测试:改用c file io读取文件,buffer = 8KB;不使用mmap; | |
华为荣耀3X 执行2次 | |
blake2s time = 10.288910 seconds 10.249259 seconds | |
blake2sp time = 4.657944 seconds 4.876143 seconds | |
Nexus5 执行2次 | |
blake2s time = 5.689329 seconds 6.569514 seconds | |
blake2sp time = 38.843955 seconds 36.973709 seconds | |
小米4 | |
blake2s time = 7.024298 seconds 6.848578 seconds | |
blake2sp time = 5.210849 seconds 5.023147 seconds | |
小米4 io buffer = 16KB 结果跟8KB没差别 | |
blake2s time = 6.938758 seconds 6.939108 seconds | |
blake2sp time = 11.459279 seconds 22.057534 seconds 手机中途黑屏 CPU降频 | |
小米4 io buffer = 1Byte blake2sp代码中没有对入参size做判断,对1Byte数据做并行计算发生异常 | |
blake2s time = 24.402502 seconds | |
blake2sp fail - 死循环 | |
小米4 io buffer = 32KB | |
blake2s time = 6.626112 seconds 6.779171 seconds 跟8KB没差别 | |
blake2sp time = 4.760458 seconds 4.642409 seconds 差距出现了 | |
数据分析:io buffer大小对串行计算影响不大;对并行计算影响很大; | |
阅读blake2sp源码得知,它对每次update的buffer做并行计算,因此buffer越小,创建线程的开销越高,导致运行更慢。 | |
而mmap整个文件进行read,规避了这个代码缺陷 | |
继续测试:使用新的并行方案,将整个文件按并行数切分,每段数据使用file io buffer读写 | |
blake2sp_file 源码见:https://gist.github.com/9468305/20068a1af16910361278 | |
小米4 执行4次 | |
blake2s time = 6.881021 seconds 6.881021 seconds 6.890889 seconds 6.816782 seconds | |
blake2sp time = 4.752358 seconds 4.752358 seconds 6.570885 seconds 4.864570 seconds | |
blake2sp_file time = 2.652560 seconds 2.652560 seconds 2.652320 seconds 2.791247 seconds | |
增加md5标准方案(串行) | |
md5 time = 4.476925 seconds time = 4.487960 seconds | |
华为荣耀3X 执行2次 | |
blake2s time = 11.003261 seconds 11.111040 seconds | |
blake2sp time = 7.405163 seconds 6.993986 seconds | |
blake2sp_file time = 3.407042 seconds 4.195505 seconds | |
md5 time = 3.657673 seconds 3.724206 seconds | |
Nexus5 | |
blake2s time = 5.720441 seconds | |
blake2sp time = 30.024116 seconds | |
blake2sp_file time = 7.610792 seconds | |
md5 time = 3.173777 seconds | |
数据分析:md5串行优于blake2各种变种方案;blake2sp_file优于blake2sp | |
继续测试:添加md5标准串行和md5文件并行方案,观察性能; 首先md5并行数=8 | |
增加MD5并行方案md5p 源码见 https://gist.github.com/9468305/97dca7c470ee02a6867c | |
华为荣耀3X | |
blake2s time = 10.355605 seconds 10.297682 seconds | |
blake2sp time = 4.699469 seconds 4.719653 seconds | |
blake2sp_file time = 3.587286 seconds 3.332196 seconds | |
md5 time = 3.927048 seconds 3.923544 seconds | |
md5p time = 1.160357 seconds 1.052565 seconds | |
Nexus5 | |
blake2s time = 5.444591 seconds 5.516199 seconds | |
blake2sp time = 34.665423 seconds 36.844409 seconds | |
blake2sp_file time = 7.044548 seconds 7.199510 seconds | |
md5 time = 2.865387 seconds 3.132674 seconds | |
md5p time = 3.096923 seconds 3.492756 seconds | |
小米2S 均衡模式 | |
blake2s time = 10.639981 seconds | |
blake2sp time = 11.367557 seconds | |
blake2sp_file time = 4.067023 seconds | |
md5 time = 7.117229 seconds | |
md5p time = 3.617916 seconds | |
小米2S 性能模式 | |
blake2s time = 11.341095 seconds | |
blake2sp time = 7.885427 seconds | |
blake2sp_file time = 3.922051 seconds | |
md5 time = 7.437327 seconds | |
md5p time = 3.234458 seconds | |
华为荣耀3X | |
blake2s time = 11.012500 seconds | |
blake2sp time = 6.982329 seconds | |
blake2sp_file time = 3.445275 seconds | |
md5 time = 3.889710 seconds | |
md5p time = 1.171731 seconds | |
华为荣耀3X md5并行数=8 | |
blake2s time = 10.212562 seconds | |
blake2sp time = 4.839898 seconds | |
blake2sp_file time = 3.537396 seconds | |
md5 time = 3.851837 seconds | |
md5p time = 0.819049 seconds | |
华为荣耀3X 所有算法的并行数=8 | |
blake2s time = 10.210714 seconds | |
blake2sp time = 5.407414 seconds | |
blake2sp_file time = 1.791010 seconds | |
md5 time = 3.841578 seconds | |
md5p time = 0.606765 seconds | |
小米4 并行数=8 | |
blake2s time = 7.481798 seconds 6.739367 seconds | |
blake2sp time = 10.937392 seconds 10.036826 seconds | |
blake2sp_file time = 2.343030 seconds 2.327568 seconds | |
md5 time = 4.670945 seconds 4.498684 seconds | |
md5p time = 1.817163 seconds 1.622922 seconds | |
小米4 并行数=4 | |
blake2s time = 6.784404 seconds 6.810516 seconds | |
blake2sp time = 10.230556 seconds 9.958661 seconds | |
blake2sp_file time = 2.353576 seconds 2.342131 seconds | |
md5 time = 4.474723 seconds 4.493030 seconds | |
md5p time = 1.787866 seconds 1.813598 seconds | |
Nexus5 md5并行数=4 | |
blake2s time = 5.591858 seconds 5.504601 seconds | |
blake2sp time = 20.066783 seconds time = 20.679386 seconds | |
blake2sp_file time = 7.147342 seconds 7.157790 seconds | |
md5 time = 2.969569 seconds 2.886641 seconds | |
md5p time = 3.274280 seconds 3.064426 seconds | |
Nexus5 md5并行数=8 | |
blake2s time = 6.177822 seconds | |
blake2sp time = 19.658719 seconds | |
blake2sp_file time = 7.498131 seconds | |
md5 time = 3.202704 seconds | |
md5p time = 3.490773 seconds | |
数据分析:blake2sp_file优于blake2sp;md5p优于md5;file分段read方案时,并行数已经影响不大 | |
继续测试:淘汰blake2sp | |
Nexus5 | |
blake2s time = 5.575001 seconds | |
blake2sp_file time = 7.127042 seconds | |
md5 time = 2.874951 seconds | |
md5p time = 3.333800 seconds | |
华为荣耀3X md5p=4 | |
blake2s time = 11.114438 seconds 10.220039 seconds | |
blake2sp_file time = 2.727083 seconds 1.895031 seconds | |
md5 time = 3.851460 seconds 3.850437 seconds | |
md5p time = 1.082791 seconds 0.995741 seconds | |
华为荣耀3X md5p=8 | |
blake2s time = 10.171139 seconds 10.433916 seconds | |
blake2sp_file time = 1.989670 seconds 2.010275 seconds | |
md5 time = 3.873485 seconds 3.869215 seconds | |
md5p time = 0.618082 seconds 0.664903 seconds | |
小米4 md5p=8 | |
blake2s time = 6.755866 seconds 6.765623 seconds | |
blake2sp_file time = 2.260821 seconds 2.245780 seconds | |
md5 time = 4.471778 seconds 4.477018 seconds | |
md5p time = 1.802037 seconds 1.611127 seconds | |
总结分析: | |
Nexus5奇葩,总是不按常理出牌; | |
在各种情况下,md5p表现都优秀; | |
并行数=4在各种情况下表现稳定; | |
华为荣耀3X应该是8核手机,所以并行数=8时总是最优; | |
目前主流手机是4核为主,因此选择并行数=4的md5算法最好; | |
后续测试: | |
进行大文件的分段mmap+并行算法进行测试 | |
存在争议: | |
究竟瓶颈是cpu还是file io? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment