admin c93614

多关键字的文本精确匹配搜索 (mss - Multi-String Search)

基本的要求是对给定的一个文本块进行关键字匹配，进行匹配的关键字是多个，而且可能非常多(成千上万)，然后输出各个匹配到的关键字的位置。这可用用作关键字的高亮显示，当然也可用进行敏感词过滤。

参考linux命令行fgrep

输入一段文本，输出匹配到的关键字在输入文本中的位置。

例如，关键词列表：

访问WR703N在OpenWrt的[Wiki页][wr703n-openwrt]，然后在Flashing一节中找到下载链接：[squashfs-factory.bin][flash.bin]，下载后别忘了[比对md5][md5sum]。^[1]

进入路由器管理界面，出厂配置为http://192.168.1.1，用户名和密码均为admin，然后进入固件更新，选择下载的文件，然后更新。

	<?php

	/**
	* Sends statistics to the stats daemon over UDP
	*
	**/

	class StatsD {

	/**

	These weights are often combined into a tf-idf value, simply by multiplying them together. The best scoring words under tf-idf are uncommon ones which are repeated many times in the text, which lead early web search engines to be vulnerable to pages being stuffed with repeated terms to trick the search engines into ranking them highly for those keywords. For that reason, more complex weighting schemes are generally used, but tf-idf is still a good first step, especially for systems where no one is trying to game the system.

	There are a lot of variations on the basic tf-idf idea, but a straightforward implementation might look like:

	<?php
	$tfidf = $term_frequency * // tf
	log( $total_document_count / $documents_with_term, 2); // idf
	?>

	It's worth repeating that the IDF is the total document count over the count of the ones containing the term. So, if there were 50 documents in the collection, and two of them contained the term in question, the IDF would be 50/2 = 25. To be accurate, we s

	test:
	clear
	nosetests --with-coverage --cover-package name_utils test_name_utils.py

	clean:
	find -regex '.*\.pyc' -exec rm {} \;
	find -regex '.*~' -exec rm {} \;

	.PHONY: test clean

	<?php
	/**
	* CIDR.php
	*
	* Utility Functions for IPv4 ip addresses.
	* Supports PHP 5.3+ (32 & 64 bit)
	* @author Jonavon Wilcox <[email protected]>
	* @revision Carlos Guimarães <[email protected]>
	* @version Wed Mar 12 13:00:00 EDT 2014
	*/


	require 'eventmachine'
	require 'em-http-request'

	# Reference:
	# https://github.com/igrigorik/em-http-request/wiki/Parallel-Requests
	# http://rdoc.info/github/eventmachine/eventmachine/master/EventMachine/Iterator

	urls = ['http://www.google.com', 'http://www.cloudamqp.com']

	var parser = document.createElement('a');
	parser.href = "http://example.com:3000/pathname/?search=test#hash";

	parser.protocol; // => "http:"
	parser.hostname; // => "example.com"
	parser.port; // => "3000"
	parser.pathname; // => "/pathname/"
	parser.search; // => "?search=test"
	parser.hash; // => "#hash"
	parser.host; // => "example.com:3000"

	var app = require('express').createServer()
	var io = require('socket.io').listen(app);
	var fs = require('fs');

	app.listen(8008);

	// routing
	app.get('/', function (req, res) {
	res.sendfile(__dirname + '/chat.html');
	});