基本的要求是对给定的一个文本块进行关键字匹配,进行匹配的关键字是多个,而且可能非常多(成千上万),然后输出各个匹配到的关键字的位置。这可用用作关键字的高亮显示,当然也可用进行敏感词过滤。
参考linux命令行fgrep
输入一段文本,输出匹配到的关键字在输入文本中的位置。
例如,关键词列表:
<?php | |
/** | |
* Sends statistics to the stats daemon over UDP | |
* | |
**/ | |
class StatsD { | |
/** |
These weights are often combined into a tf-idf value, simply by multiplying them together. The best scoring words under tf-idf are uncommon ones which are repeated many times in the text, which lead early web search engines to be vulnerable to pages being stuffed with repeated terms to trick the search engines into ranking them highly for those keywords. For that reason, more complex weighting schemes are generally used, but tf-idf is still a good first step, especially for systems where no one is trying to game the system. | |
There are a lot of variations on the basic tf-idf idea, but a straightforward implementation might look like: | |
<?php | |
$tfidf = $term_frequency * // tf | |
log( $total_document_count / $documents_with_term, 2); // idf | |
?> | |
It's worth repeating that the IDF is the total document count over the count of the ones containing the term. So, if there were 50 documents in the collection, and two of them contained the term in question, the IDF would be 50/2 = 25. To be accurate, we s |
<?php | |
function http_get($host) { | |
// create event base | |
$base_fd = event_base_new(); | |
// create a new event | |
$event_fd = event_new(); | |
// resource to be monitored |
test: | |
clear | |
nosetests --with-coverage --cover-package name_utils test_name_utils.py | |
clean: | |
find -regex '.*\.pyc' -exec rm {} \; | |
find -regex '.*~' -exec rm {} \; | |
.PHONY: test clean |
<?php | |
/** | |
* CIDR.php | |
* | |
* Utility Functions for IPv4 ip addresses. | |
* Supports PHP 5.3+ (32 & 64 bit) | |
* @author Jonavon Wilcox <[email protected]> | |
* @revision Carlos Guimarães <[email protected]> | |
* @version Wed Mar 12 13:00:00 EDT 2014 | |
*/ |
require 'eventmachine' | |
require 'em-http-request' | |
# Reference: | |
# https://github.com/igrigorik/em-http-request/wiki/Parallel-Requests | |
# http://rdoc.info/github/eventmachine/eventmachine/master/EventMachine/Iterator | |
urls = ['http://www.google.com', 'http://www.cloudamqp.com'] |
var parser = document.createElement('a'); | |
parser.href = "http://example.com:3000/pathname/?search=test#hash"; | |
parser.protocol; // => "http:" | |
parser.hostname; // => "example.com" | |
parser.port; // => "3000" | |
parser.pathname; // => "/pathname/" | |
parser.search; // => "?search=test" | |
parser.hash; // => "#hash" | |
parser.host; // => "example.com:3000" |
var app = require('express').createServer() | |
var io = require('socket.io').listen(app); | |
var fs = require('fs'); | |
app.listen(8008); | |
// routing | |
app.get('/', function (req, res) { | |
res.sendfile(__dirname + '/chat.html'); | |
}); |