Skip to content

Instantly share code, notes, and snippets.

Data preparation

-- set mapred.max.split.size=128000000;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set hive.mapjoin.smalltable.filesize=30000000;
-- set hive.optimize.s3.query=true;
set hive.exec.dynamic.partition.mode=nonstrict; 
set hive.optimize.sort.dynamic.partition=false;
bitcoin = cryptos[0]
bitcoin_cash = cryptos[1]
dash = cryptos[2]
ethereum_classic = cryptos[3]
bitconnect = cryptos[4]
litecoin = cryptos[5]
monero = cryptos[6]
nem = cryptos[7]
neo = cryptos[8]
numeraire = cryptos[9]
#!/usr/bin/env python
# coding: utf-8
# ## Perform Chi-Square test for Bank Churn prediction (find out different patterns on customer leaves the bank) . Here I am considering only few columns to make things clear
# ### Import libraries
# In[2]:
SELECT m.* FROM #matches m
INNER JOIN #matches m1 ON m.fromid = m1.toid AND m.toid = m1.fromid AND m1.fromid <=m1.toid
ORDER BY m.toteam
SELECT t.id fromid,t.Team fromteam,t1.id toid,t1.Team toteam
INTO #matches
FROM #Team t
INNER JOIN #Team t1 ON t.id <> t1.id
SELECT * FROM #matches
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# more than 3 -> 1, less than 5 -> 0
data_df['preference'] = np.where(data_df['rating'] > 3, 1, 0)
data_df.head()
@Sandy4321
Sandy4321 / Criteo.ipynb
Created January 26, 2020 19:55 — forked from stsievert/Criteo.ipynb
Criteo dataset example
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.