Skip to content

Instantly share code, notes, and snippets.

We can make this file beautiful and searchable if this error is corrected: It looks like row 9 should actually have 28 columns, instead of 18 in line 8.
ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
5524,1957,Graduation,Single,58138,0,0,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,3,11,1
2174,1954,Graduation,Single,46344,1,1,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,3,11,0
4141,1965,Graduation,Together,71613,0,0,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,3,11,0
6182,1984,Graduation,Together,26646,1,0,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,3,11,0
5324,1981,PhD,Married,58293,1,0,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,3,11,0
7446,1967,Master,Together,62513,0,1,16,520,42,98,0,42,14,2,6,4,10,6,0,0,0,0,0,0,3,11,0
965,1971,Graduation,Divorced,55635,0,1,34,235,65,164,50,49,27,4,7,3,7,6,0,0,0,0,0,0,3,11,0
6177,1985,PhD,Married,33454,1,0,32,76,10,56,3,1,23,2,4,0,
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 29 columns, instead of 15 in line 7.
ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
5524,1957,Graduation,Single,58138,0,0,2012-09-04,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,3,11,1
2174,1954,Graduation,Single,46344,1,1,2014-03-08,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,3,11,0
4141,1965,Graduation,Together,71613,0,0,2013-08-21,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,3,11,0
6182,1984,Graduation,Together,26646,1,0,2014-02-10,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,3,11,0
5324,1981,PhD,Married,58293,1,0,2014-01-19,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,3,11,0
7446,1967,Master,Together,62513,0,1,2013-09-09,16,520,42,98,0,42,14,2,6,4,10,6,0,0,0,0,0,0,3,11,0
965,1971,Graduation,Divorced,55635,0,1,2012-11-13,34,235,65,164,50,49,
2021/10/28(木) 合成データの社会動向
公的統計における合成データ
 合成データ...何か統計量/統計モデル等により、擬似的に属性値が生成された、ミクロレベルのデータ
 欧米では...一般公開型ミクロデータPUFのように、合成データが利用されている
  欠測値補完、シミュレーション
手法
 ミクロデータを元に作成する方法
We can't make this file beautiful and searchable because it's too large.
39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
42,Private,159449,Bachelor
New York City 40.72 74.00
Los Angeles 34.05 118.25
Chicago 41.88 87.63
Houston 29.77 95.38
Phoenix 33.45 112.07
Philadelphia 39.95 75.17
San Antonio 29.53 98.47
Dallas 32.78 96.80
San Diego 32.78 117.15
San Jose 37.30 121.87
@gghatano
gghatano / attack.bash
Last active September 13, 2021 00:12
PWSCUP2021 サンプルスクリプトで攻撃する
# summary:
## create E files for pwscup2021 attack phase by using "rlink.py"
# preparation:
## put directories "pre_anony_d" and "pre_attack" at the same directory as "attack.bash"
## get "rlink.py"
# parameters:
AM0001 Male 62.0 White Graduate Married 27.8 0 0 0 0 Q2 1
AM0002 Male 53.0 White HighSchool Divorced 30.8 0 1 0 0 Q1 0
AM0003 Male 78.0 White HighSchool Married 28.8 0 0 0 0 Q3 1
AM0004 Female 56.0 White Graduate Parther 42.4 1 0 0 0 Q3 0
AM0005 Female 42.0 Black College Divorced 20.3 1 0 0 0 Q4 0
AM0006 Female 72.0 Mexican 11th Separated 28.6 0 0 0 0 Q1 0
AM0007 Male 56.0 Black HighSchool Divorced 33.6 0 0 0 0 Q3 1
AM0008 Male 46.0 White Graduate Parther 27.6 0 0 0 0 Q3 0
AM0009 Male 45.0 Other 11th Never 24.1 0 0 0 0 Q3 0
AM0010 Female 30.0 Hispanic College Parther 26.6 0 0 0 0 Q4 0
@gghatano
gghatano / B.csv
Last active September 9, 2021 03:25
PWSCUP2021 NHANESデータ
Male 62.0 White Graduate Married 27.8 0 0 0 0 Q2 1
Male 53.0 White HighSchool Divorced 30.8 0 1 0 0 Q1 0
Male 78.0 White HighSchool Married 28.8 0 0 0 0 Q3 1
Female 56.0 White Graduate Parther 42.4 1 0 0 0 Q3 0
Female 42.0 Black College Divorced 20.3 1 0 0 0 Q4 0
Female 72.0 Mexican 11th Separated 28.6 0 0 0 0 Q1 0
Male 56.0 Black HighSchool Divorced 33.6 0 0 0 0 Q3 1
Male 46.0 White Graduate Parther 27.6 0 0 0 0 Q3 0
Male 45.0 Other 11th Never 24.1 0 0 0 0 Q3 0
Female 30.0 Hispanic College Parther 26.6 0 0 0 0 Q4 0
FROM centos:7
ENV PYTHONPATH "/opt/python/library"
ENV LANG en_US.utf8
LABEL maintainer="PWSCUP_ADMIN (Twitter: @PWScup_Admin)"
ARG version="3.7.3"
COPY ./jupyter_notebook_config.py /tmp/jupyter_notebook_config.py
@gghatano
gghatano / T.csv
Created July 13, 2021 05:13
PWSCUP2018 final T data
We can't make this file beautiful and searchable because it's too large.
12583,2010/12/1,22728,3.75,24
12583,2010/12/1,22727,3.75,24
12583,2010/12/1,22726,3.75,12
12583,2010/12/1,21724,0.85,12
12583,2010/12/1,21883,0.65,24
12583,2010/12/1,10002,0.85,48
12583,2010/12/1,21791,1.25,24
12583,2010/12/1,21035,2.95,18
12583,2010/12/1,22326,2.95,24
12583,2010/12/1,22629,1.95,24