Skip to content

Instantly share code, notes, and snippets.

View treper's full-sized avatar

Maybe treper

  • Shanghai
View GitHub Profile
// main.cpp : Defines the entry point for the console application.
/*
g++ step1-ocr.cpp -I /home/ataosky/software/cvblobs8.3_linux -L /home/ataosky/software/cvblobs8.3_linux -lopencv_core -lopencv_ml -lopencv_imgproc -lopencv_highgui -lblob -o step2-ocr
*/
//
//#include "stdafx.h" //AO
#include "opencv/cv.h"
#include "opencv/highgui.h"
#include "opencv/ml.h"

#####lshkit的数据格式,load代码

void Matrix<T>::load (std::istream &is)
{
    unsigned header[3]; /* entry size, row, col */
    assert(sizeof header == 3*4);
    is.read((char *)header, sizeof header);
    BOOST_VERIFY(is);
    BOOST_VERIFY(header[0] == sizeof(T));
    reset(header[2], header[1]);
@treper
treper / SimilarityJoin.md
Last active March 11, 2021 02:45
Scaling Up All Pairs Similarity Search

####Scaling Up All Pairs Similarity Search ####论文理解: maxweighti(V)列i最大值 maxweight(x) x[:]的最大值

####代码阅读

Google的开源实现是All-Pair-Binary的

  • 几个重要的数据结构
/*
This script parse a text file like:
3 {"list":[[8702,9630,3],[192470,8502,3],[25234,4160,3]]}
into:
3\t8702\t192470\t25234
and sort by tagId count acendingly in order to fit to the format described in paper:
Scaling Up All Pairs Similarity Search
to calculate item to item similarity
*/
//generate tagId-tagId-cos similarity result
import spark.util.Vector
import scala.math.sqrt
import java.io._
val word_vec_size=150
def parseVector(line: String): Vector = {
return new Vector(line.split(' ').slice(1,word_vec_size+1).map(_.toDouble))
}
@treper
treper / NeighborCount.scala
Last active August 29, 2015 14:02
tag neighbor count,use pageRank maybe more appropriate
import scala.util.parsing.json._
import org.json4s._
import org.json4s.native.JsonMethods._
import scala.collection.mutable.ArrayBuffer
import java.io._
def parseTagTransaction(line:String):ArrayBuffer[String]={
var tagList = line.split(" ").filter(m => m.length>1);
var result = ArrayBuffer[String]()
if(tagList.length>1)

Sublime Text 2 – Useful Shortcuts (PC)

Loosely ordered with the commands I use most towards the top. Sublime also offer full documentation.

Editing

Ctrl+C copy current line (if no selection)
Ctrl+X cut current line (if no selection)
Ctrl+⇧+K delete line
Ctrl+↩ insert line after
@treper
treper / clearRAM.sh
Created November 18, 2015 05:48 — forked from pklaus/clearRAM.sh
A Script to Clear Cached RAM on Linux
#!/bin/bash
## Bash Script to clear cached memory on (Ubuntu/Debian) Linux
## By Philipp Klaus
## see <http://blog.philippklaus.de/2011/02/clear-cached-memory-on-ubuntu/>
if [ "$(whoami)" != "root" ]
then
echo "You have to run this script as Superuser!"
exit 1
fi
@treper
treper / TestHiveSQL-in-SparkShell.scala
Created December 28, 2015 09:55
TestHiveSQL-in-SparkShell
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
val pp = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
pp.registerTempTable("people")
sqlContext.sql("select concat('test_',single) from people").collect().foreach(println)
@treper
treper / spark-defaults.conf
Created January 15, 2016 12:30 — forked from deenar/spark-defaults.conf
CDH 5.4 and Spark 1.5.1
sysJupiterDev@gbrdcr00015n02: /bigdata/projects/MERCURY
$ ls spark-1.5.1-bin-hadoop2.6/conf/yarn-conf/
core-site.xml hadoop-env.sh hdfs-site.xml hive-site.xml mapred-site.xml ssl-client.xml topology.map topology.py yarn-site.xml