Skip to content

Instantly share code, notes, and snippets.

@cldellow
cldellow / LDA_SparkDocs
Last active February 16, 2016 18:40 — forked from jkbradley/LDA_SparkDocs
LDA Example: Modeling topics in the Spark documentation
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable
@cldellow
cldellow / gist:34cf5c2088f366f25d97
Created February 23, 2016 23:13
Message when trying to log in
ValueError
ValueError: invalid literal for int() with base 10: 'default'
Traceback (most recent call last)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python2.7/dist-packages/werkzeug/contrib/fixers.py", line 152, in __call__
return self.app(environ, start_response)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
# 100 Topics
401206 Andrea Jones
2630212 Lee and Kennedy
2162471 Isaac Elzevir
2837219 Ludwig Winter
1221394 David Glenn (garden designer)
341088 Alfred Parsons (artist)
3902967 Rowland Hilder
4884640 Wild Flowers Worth Knowing
Value Description
IAB1 Arts & Entertainment
IAB1-1 Books & Literature
IAB1-2 Celebrity Fan/Gossip
IAB1-3 Fine Art
IAB1-4 Humor
IAB1-5 Movies
IAB1-6 Music
IAB1-7 Television
IAB2 Automotive
Comparisons of 20 runs of keywords/prefix "a", limit of 100,000:
old-times = master
new-times = doing sort on strings to avoid parsing and allocating an unbounded amount of KeywordResults
new-times-geometric = changing how bigNHits grows
new-times-geometric-no-sign = no longer permitting negative numbers in the freq column
x old-times
+ new-times
* new-times-geometric
10 donate charity 0.007943817637577712
10 car covers 0.009440478931614092
10 hair extensions 0.009440478931614092
10 torch 0.010937140225650472
10 transport 0.013124568270780564
10 truck driver jobs 0.015196868524061708
10 low calorie beer 0.01577250748330647
10 background check 0.015887635275155423
10 multivitamin 0.015887635275155423
10 resume builder 0.01611789085885333
http://s3.amazonaws.com/sortable-assets/misc/colin/eduardo/1/index.html
http://s3.amazonaws.com/sortable-assets/misc/colin/eduardo/2/index.html
aggregator_expire thread:
root@metrics1:/build1-data# strace -p 25551 -Tc
Process 25551 attached
^CProcess 25551 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.95 21.433371 154197 139 nanosleep
1.88 0.411658 32 12876 write
0.12 0.026664 33 798 135 futex

False negatives (NO_MATCH -> model)

Olympus PEN E-PL1

listing manufacturer: Olympus
        title:        OLYMPUS Pen E-PL1 - champagne + M.ZUIKO DIGITAL ED 14-42 mm Lens + Pix Medium Case + Pocket in black + 16 GB SDHC Memory Card + PS-BLS1 Battery
        price:        499.99 GBP

our answer:       olympus_pen_e-pl1
    manufacturer: Olympus
    model:        PEN E-PL1