q = calibrated probability
= p / (p + (1-p) / w)
https://pdfs.semanticscholar.org/daf9/ed5dc6c6bad5367d7fd8561527da30e9b8dd.pdf
where
p = predicted probability
w = negative down-sampling rate
= (Neg/Neg+(Pos*k)) / (Neg/(Neg+Pos))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69.613 129.070 52.111 | |
70.670 128.161 52.446 | |
72.303 128.450 52.853 | |
73.759 127.522 51.786 | |
74.085 129.067 53.352 | |
74.561 134.031 50.992 | |
74.911 134.944 50.744 | |
75.205 129.162 52.800 | |
75.395 129.711 52.844 | |
75.554 132.642 51.427 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def auc(num_positives, num_negatives, predicted): | |
l_sorted = sorted(range(len(predicted)),key=lambda i: predicted[i], | |
reverse=True) | |
fp_cur = 0.0 | |
tp_cur = 0.0 | |
fp_prev = 0.0 | |
tp_prev = 0.0 | |
fp_sum = 0.0 | |
auc_tmp = 0.0 | |
last_score = float("nan") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
# sort list.txt | uniq | grep -v '#' | grep -v 'noreply' | grep -v 'local' | grep -e '\.' | grep -v 'internal' | grep -v 'contact' | |
import os | |
import sys | |
import requests | |
import time | |
from github3 import login | |
from tqdm import tqdm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
create table page ( | |
docid int, | |
contents string | |
); | |
INSERT OVERWRITE TABLE page_exploded | |
select | |
d.docid, | |
normalize_unicode(t.word) as word | |
from |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH term_frequency as ( | |
select | |
docid, | |
word, | |
freq | |
from ( | |
select | |
docid, | |
tf(word) as word2freq | |
from |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
create table page ( | |
docid int, | |
contents string | |
); | |
INSERT OVERWRITE TABLE page_exploded | |
select | |
d.docid, | |
normalize_unicode(t.word) as word | |
from |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-------------------- | |
Hivemall | |
Hivemall is a library for machine learning implemented as Hive | |
UDFs/UDAFs/UDTFs. | |
Hivemall has been incubating since 2016-09-13. | |
Three most important issues to address in the move towards graduation: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the | |
* "License"); you may not use this file except in compliance | |
* with the License. You may obtain a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the | |
* "License"); you may not use this file except in compliance | |
* with the License. You may obtain a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 |