| ⌘T | go to file |
| ⌘⌃P | go to project |
| ⌘R | go to methods |
| ⌃G | go to line |
| ⌘KB | toggle side bar |
| ⌘⇧P | command prompt |
| class A | |
| class A2 extends A | |
| class B | |
| trait M[X] | |
| // | |
| // Upper Type Bound | |
| // | |
| def upperTypeBound[AA <: A](x: AA): A = x |
The INSTALL instructions that come with Vowpal Wabbit appear not to work on Mac OS X Lion. Here's what I did to get it to compile. You will need the developer tools that come with the XCode installation.
The only dependency VW has is the boost C++ library. So first, download and install Boost
To install Boost, do the following:
$ cp ~/Downloads/boost_1_48_0.tar.bz2 ./
| /* | |
| * Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at | |
| * the Lawrence Livermore National Laboratory. Written by Keith Stevens, | |
| * [email protected] OCEC-10-073 All rights reserved. | |
| * | |
| * This file is part of the S-Space package and is covered under the terms and | |
| * conditions therein. | |
| * | |
| * The S-Space package is free software: you can redistribute it and/or modify | |
| * it under the terms of the GNU General Public License version 2 as published |
Here are the areas I've been researching, some things I've read and some open source packages...
Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model
Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/
| // Just an ordinary function | |
| def sum(x: Int, y: Int, z: Int) = x + y + z | |
| // A tuple of arguments | |
| val args = (1, 2, 3) | |
| // Convert the function to a (partial) Function, which has a tupled method | |
| // that takes tuples up to arity 5 | |
| (sum _).tupled(args) |
| import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover} | |
| import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer} | |
| import org.apache.spark.mllib.linalg.Vector | |
| import sqlContext.implicits._ | |
| val numTopics: Int = 100 | |
| val maxIterations: Int = 100 | |
| val vocabSize: Int = 10000 |