⌘T | go to file |
⌘⌃P | go to project |
⌘R | go to methods |
⌃G | go to line |
⌘KB | toggle side bar |
⌘⇧P | command prompt |
class A | |
class A2 extends A | |
class B | |
trait M[X] | |
// | |
// Upper Type Bound | |
// | |
def upperTypeBound[AA <: A](x: AA): A = x |
The INSTALL instructions that come with Vowpal Wabbit appear not to work on Mac OS X Lion. Here's what I did to get it to compile. You will need the developer tools that come with the XCode installation.
The only dependency VW has is the boost C++ library. So first, download and install Boost
To install Boost, do the following:
$ cp ~/Downloads/boost_1_48_0.tar.bz2 ./
/* | |
* Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at | |
* the Lawrence Livermore National Laboratory. Written by Keith Stevens, | |
* [email protected] OCEC-10-073 All rights reserved. | |
* | |
* This file is part of the S-Space package and is covered under the terms and | |
* conditions therein. | |
* | |
* The S-Space package is free software: you can redistribute it and/or modify | |
* it under the terms of the GNU General Public License version 2 as published |
Here are the areas I've been researching, some things I've read and some open source packages...
Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model
Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/
// Just an ordinary function | |
def sum(x: Int, y: Int, z: Int) = x + y + z | |
// A tuple of arguments | |
val args = (1, 2, 3) | |
// Convert the function to a (partial) Function, which has a tupled method | |
// that takes tuples up to arity 5 | |
(sum _).tupled(args) |
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover} | |
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer} | |
import org.apache.spark.mllib.linalg.Vector | |
import sqlContext.implicits._ | |
val numTopics: Int = 100 | |
val maxIterations: Int = 100 | |
val vocabSize: Int = 10000 |