This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mysql:[email protected] [staging]> SELECT -6.4 BETWEEN -5.1 AND -3.1; | |
+----------------------------+ | |
| -6.4 BETWEEN -5.1 AND -3.1 | | |
+----------------------------+ | |
| 0 | | |
+----------------------------+ | |
1 row in set (0.00 sec) | |
mysql:[email protected] [staging]> SELECT -3.4 BETWEEN -5.1 AND -3.1; | |
+----------------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE halfak.searches | |
SELECT dt, ip, user_agent, uri_host, uri_query | |
FROM webrequest | |
WHERE | |
uri_query LIKE "%title=Special%3ASearch%" AND | |
uri_query LIKE "%search=%" AND | |
uri_path = "/w/index.php" AND | |
year = 2014; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
import sys | |
""" | |
HEADERS = [ | |
('index', 'index'), | |
('product/productId', 'product_id'), | |
('product/productTitle', 'product_title'), | |
('product/price', 'price'), | |
('review/userId', 'review_user_id'), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mysql:[email protected] [enwiki]> SELECT COUNT(*) FROM revision WHERE rev_timestamp BETWEEN "20140101" AND "20140102"; | |
+----------+ | |
| COUNT(*) | | |
+----------+ | |
| 138753 | | |
+----------+ | |
1 row in set (0.47 sec) | |
mysql:[email protected] [enwiki]> SELECT COUNT(*) FROM revision WHERE rev_timestamp BETWEEN "2014-01-01" AND "2014-01-02"; | |
+----------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[halfak@stat1003: ~/projects/productivity] | |
$ rsync -rv simplewiki_20141025.fields_and_diffs.head.tsv stat1002.wikimedia.org::a/halfak/diffengine/ | |
rsync: getaddrinfo: stat1002.wikimedia.org 873: Name or service not known | |
rsync error: error in socket IO (code 10) at clientserver.c(128) [sender=3.1.0] | |
[halfak@stat1003: ~/projects/productivity] | |
$ rsync -rv simplewiki_20141025.fields_and_diffs.head.tsv stat1002.eqiad.wmnet::a/halfak/diffengine/ | |
@ERROR: access denied to a from stat1003.wikimedia.org (208.80.154.82) | |
rsync error: error starting client-server protocol (code 5) at main.c(1653) [sender=3.1.0] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gini <- function(x, unbiased = TRUE, na.rm = FALSE){ | |
if (!is.numeric(x)){ | |
warning("'x' is not numeric; returning NA") | |
return(NA) | |
} | |
if (!na.rm && any(na.ind <- is.na(x))) | |
stop("'x' contain NAs") | |
if (na.rm) | |
x <- x[!na.ind] | |
n <- length(x) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ssh wikimedia.altiscale | |
Last login: Wed Jan 14 17:02:27 2015 from 10.252.17.5 | |
_ _ _ _ | |
| | | | (_) | | | |
__ _ | |_| |_ _ ___ ___ __ _ | | ___ | |
/ _` || |_ _| |/ __| / __| / _` || | / _ \ | |
| (_| || | | | | |\__ \| (__ | (_| || || __/ | |
\__,_||_| |_| |_||___/ \___| \__,_||_| \___| | |
[halfak@desktop-wikimedia ~]$ df -h |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> import revscores | |
>>> dir(revscores) | |
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__'] | |
>>> from revscores import languages | |
>>> dir(languages) | |
['Language', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'english', 'language', 'portuguese'] | |
Notice that the first "dir()" doesn't list out langauge. This is because language is not imported by default. | |
But when we run dir() on language, we can see "english", "portuguese" and "language". This is because these modules are imported by default. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(3.4) [halfak@stat1002: ~] | |
$ scp foo wikimedia.altiscale: | |
foo 100% 39 0.0KB/s 00:00 | |
(3.4) [halfak@stat1002: ~] | |
$ ssh -N -L 14000:wikimedia.z42.altiscale.com:14000 wikimedia.altiscale & | |
[1] 13510 | |
(3.4) [halfak@stat1002: ~] | |
$ hdfs dfs -ls webhdfs://localhost:14000/user/halfak/streaming/enwiki-20141106/json-bz2/ | |
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". | |
SLF4J: Defaulting to no-operation (NOP) logger implementation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[20:46:05] <harej> halfak: as a gentle reminder: https://meta.wikimedia.org/wiki/Research:WikiProjects_and_Subject_Area_Activity_(English_Wikipedia) | |
[20:46:42] <halfak> Harej, did you want me to look at the methods section? | |
[20:46:50] <harej> I think that was what it was | |
[20:46:58] <halfak> What's a longitudinal factor? | |
[20:46:59] <harej> I am also interested in information about your quality heuristics! | |
[20:47:24] <halfak> logitudinal factor == https://en.wikipedia.org/wiki/Censoring_(statistics) | |
[20:47:52] <harej> the longitudinal factors that affect wikiprojects mostly have to do with how some wikiprojects were active years ago even if they are not active now; differing levels of activity throughout a project's life. To keep everything even from a time scale perspective I am just doing things from July 1 to December 31 | |
[20:48:39] <halfak> I'm not sure this will help. Many WikiProjects will be in different lifecycle stages between July 1 and Dec. 31 | |
[20:49:02] <halfak> Might we try to control for |