Skip to content

Instantly share code, notes, and snippets.

# clf = sklearn.linear_model.LogisticRegression
# significant_terms = set of terms appearing more than n times in training
def classify(left_name, right_name):
"""
Classifies names using delta term analysis.
:return:
A tuple (p_is_duplicate, exact_match_rare_terms, one_side_rare_terms).
* p_is_duplicate is the score from the log-linear classifier. It's
probably the most relevant signal.
require 'date'
require 'nokogiri'
require 'rest-client'
require 'reverse_markdown'
# Match [caption <stuff>]...[/caption] tags
# example: http://rubular.com/r/r2FH3QSOpL
CAPTION_REGEX = /\[caption.*\](?=.*\[)|\[\/caption\]/
{
"location": {
"city": "some_city",
"state": "ABC"
}
}
ruby|ruby ⇒ ruby sample.rb lookup --business-id=tropisueño-san-francisco-3
Found business with id yelp-san-francisco:
{
"categories": [
{
"alias": "localflavor",
"title": "Local Flavor"
},
{
"alias": "massmedia",