Takuya Kitazawa takuti

Date	Calories Burned	Steps	Distance	Floors	Minutes Sedentary	Minutes Lightly Active	Minutes Fairly Active	Minutes Very Active	Activity Calories
2020-01-01	2,313	9,179	6.52	4	901	152	4	51	916
2020-01-02	2,367	10,634	7.56	17	718	206	24	23	1,032
2020-01-03	2,366	10,002	7.01	5	789	241	15	10	1,033
2020-01-04	2,740	15,201	10.72	15	622	315	42	16	1,542
2020-01-05	2,346	9,737	6.83	13	847	232	20	9	1,023
2020-01-06	2,714	15,106	10.68	12	662	238	29	67	1,461
2020-01-07	2,486	12,440	8.83	5	587	218	11	49	1,158
2020-01-08	2,869	19,895	14	6	648	181	48	107	1,599
2020-01-09	2,495	9,844	6.97	5	766	251	7	22	1,136

Assume we have a binary classifier that gives the probability of being a positive sample in the [0.0, 1.0] range. Area Under the ROC Curve (AUC) quantitatively measures the accuracy of prediction made by such a classification model. Intuitively, what AUC does is to make sure if positive (i.e., label=1) samples in a validation set get higher probability of being positive than negative ones.

The AUC metric eventually gives a single value in [0.0, 1.0]. When we have five test samples sorted by their prediction results as follows, we can see that the classifier put higher probability to all positive samples, #1, #2, and #4, than the others. We define the best scenario as an AUC of 1.0.

Test sample #	Probability of `label=1`	True `label`
1	0.8	1
2	0.7	1
4	0.6	1
3	0.5	0

Generic functions

convert_label(const int|const float) - Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1
each_top_k(int K, Object group, double cmpKey, *) - Returns top-K values (or tail-K values when k is less than 0)
generate_series(const int|bigint start, const int|bigint end) - Generate a series of values, from start to end. A similar function to PostgreSQL's generate_serics. http://www.postgresql.org/docs/current/static/functions-srf.html
```
select generate_series(1,9);
```

	time,user_id,source,conversion
	1590863740,xobepz7opw,direct,0
	1590863754,vpo60mcha1,facebook,0
	1590864169,89u9knmqni,direct,0
	1590864169,cdmgdvf6oo,google,0
	1590864380,h0czqgvxbg,google,0
	1590864409,cj98eurd91,google,0
	1590864574,fqu9t0sd02,facebook,0
	1590864646,89pp6xf2pb,google,0
	1590864929,k1jtp0bz2j,facebook,0

	#!/bin/bash

	project_name=foo

	project_files=(
	"config/"
	"scripts/"
	"queries/"
	"workflow1.dig"
	"workflow2.dig"

	FROM node:alpine

	ENV HUGO_VERSION=0.30.2
	ADD https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.tar.gz /tmp

	ADD . /src
	WORKDIR /src

	RUN \
	# install hugo

	# coding: utf-8

	"""USAGE: %(program)s WIKI_XML_DUMP OUTPUT_PREFIX
	"""

	import logging
	import os.path
	import sys

	import gensim.corpora.wikicorpus as wikicorpus