cristiano74 · June 15, 2013 13:47
diff --git a/chi square python b/chi square python
 #Chi square - python   



 [toc]



 ##Definizione del processo

 1. Generazione di una popolazione a partire da una tupla generica
 2. estrazione casuale di un campione di n_elementi dalla pop creata
 3. calcolo della frequenze attese e ossevarte in base alla tupla estratta
 4. calcolo del chi-quadrato con p_value associato e definizione dei livelli di accuratezza del risulato.  

 + State your conclusion in terms of your hypothesis.

 + **If the p value for the calculated chi-square is p > 0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it**.   

 + A p value of **0.6**, for example, means that **there is a 60% probability that any deviation from expected is *due to chance only***. This is within the range of acceptable deviation.  

 + If the p value for the calculated chi-square is ***p < 0.05*, REJECT your hypothesis, and conclude that some factor other than chance** is operating for the deviation to be so great.   

 + For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved.  

 + The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory investigations in this lab manual. 

 LINK (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)


 ##Esempio  

 **La tupla generica estratta è:**  

 `    px [0.0, 0.05, 0.15, 0.1, 0.35, 0.15, 0.2]`

 Da una popolazione di 100,000 elementi , estraggo 1000 elementi:  
 Quindi la freq. **attesa** assoluta (FA) è la seguente (totale = 1000 elementi):  
   
 `fa_pre [50, 150, 100, 350, 150, 200]`

 *Notare che il **valore 0** è stato eliminato dal vettore, per [bug con il sistema di calcolo python][6]


 Quindi la freq. **osservata** assoluta (FO) è :  

 `    fo_pre [53, 155, 96, 332, 162, 202]`

 Il confronto fra FA e FO:  

 `[ 50 150 100 350 150 200] 	--> FA`  
 `[ 53 155  96 332 162 202]  --> FO`


 Calcolo del pvalue: **`Chi square test: 0.789628937062`**

 A ciscun valore di p_value corrisponde un livello di accuratezza, come nella tabella sotto:

 Dalla elaborazione della [v1.1](#v11), appare come risultato quanto segue:

 ###Risultato ottenuto con [CSTP](#cstp) v1.1

 BASH line$ in linux: `python test4.py 500 100000 1000 10  `


 ` ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 149, ' Med:', 201, ' Low:', 150, 'nullo', 0)
 `


 name					|	label
 ------------------------|
 'pop=', 100000,			|   100k casi considerati come  pop
 'campione=', 1000    	|   1000 casi estratti (sample)
 'Tuple estratte=' 500,  |   500 combinazioni di freschezza su 230k totali utilizzate nel processo
 ' High:', 149,   		| 149 su 500 combinazioni hanno dato p value alto (vedere [tabella](#ptable))
 ' Med:', 201,   		| 201 su 500 combinazioni hanno dato p value medio
 ' Low:', 150,   		| 150 su 500 combinazioni hanno dato p value medio
 'nullo', 0  			| da trascurare - solo per controllo

 Il processo è ripetuto n volte, con la stessa numerosità della poopolazione e del campione, ma con differenti tuple estratte.
 I risultati sono memorizzati nel file [**SAMPLERESULTS_CHISQUARE.txt**](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit]).

 Un estratto del file :
 ```
 ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 139, ' Med:', 221, ' Low:', 140, 'nullo', 0)
 ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 163, ' Med:', 177, ' Low:', 160, 'nullo', 0)
 ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 170, ' Med:', 179, ' Low:', 151, 'nullo', 0)
 ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 142, ' Med:', 209, ' Low:', 149, 'nullo', 0)
 ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 195, ' Low:', 159, 'nullo', 0)
 ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 204, ' Low:', 150, 'nullo', 0)
 ```

 ###<a id="ptable">Tabella di referimento per il p_value</a>

 p value   | level
 --------- | -----
 <0.30     | low accuracy
 0.30-0.70 | medium accuracy
 \>0.70	  | High accuracy


 ----------
 ## CODE 
 ### Create chi square test process [CSTP] <a id="cstp"></a>- test4.py 
 > note: Chi Square Test Process = CSTP  

 **`CODICE per la generazione del pvalue CHI SQUARE`**
 [source code ] [7]  



 ####<a id="v11">V1.1</a>
 > - introdotto il ciclo sulle ripetizioni del processo, param 4
 > 	- la lista delle tuple viene ri-estratta ogni ciclo
 > - save dei risultati in: 	[**SAMPLERESULTS_CHISQUARE.txt**](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit])
 > - dipende dalle tuple create con [Cretate tuples v1.0](#createtuplesv1.0)

 ```
 ############################################
 #   USAGE:
 #   python test4.py 100 100000 5000
 #   param1= numero estrazioni casuali di tuple
 #   param2= num pop
 #   param3= num sample
 #   param4= num di ripetizioni del processo
 ############################################
 import sys
 import random
 import collections
 import numpy
 #QUI SI APRE IL FILE CON LE COMBINAZIONI
 #################################
 lista=[]
 import pickle
 #f = file("test.txt", "r")
 f = file("outfile.txt", "r")
 lista=pickle.load(f)
 #############################
 #from scipy.stats import mannwhitneyu
 #from scipy.stats import chisquare
 #from scipy.stats import kruskal
 from scipy.stats import rv_discrete

 import scipy.stats.mstats as mst

 tuple_tot = 0
 acc=0
 high = 0
 low = 0
 med = 0
 nullo = 0
 p_value=0
 # a= random.sample(range(100), 100)
 n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni

 fa = []
 fa_pre = []
 fo_pre = []
 fo =[]
 tot=int(sys.argv[2])
 Nsample = int(sys.argv[3])

 for i in xrange(int(sys.argv[4])):
    lista = random.sample(lista,n_extract)
    acc=0
    high = 0
    low = 0
    med = 0
    nullo = 0
    p_value=0
    for e in lista:

        print e

        #px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)]  # 7 rep]

        e1= ((e[0]/float(100)))
        e2= ((e[1]/float(100)))
        e3= ((e[2]/float(100)))
        e4= ((e[3]/float(100)))
        e5= ((e[4]/float(100)))
        e6= ((e[5]/float(100)))
        e7= ((e[6]/float(100)))


        px=[e1,e2,e3,e4,e5,e6,e7]  # 7 rep]
        print "px", px
        x =[1,2,3,4,5,6,7]
        fa_pre=([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])
        fa_pre[:] = (value for value in fa_pre if value != 0.0)
        print "fa_pre",fa_pre
        fa= numpy.array(fa_pre)
        #fa = numpy.array([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])

        pop = rv_discrete(values=(x, px)).rvs(size=tot)
        random.shuffle(pop)
        sample1 = random.sample(pop,Nsample)
        counter_sample=collections.Counter(sample1)
        fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
        fo_pre[:] = (value for value in fo_pre if value != 0.0)
        print "fo_pre",fo_pre
        fo = numpy.array(fo_pre)
        #fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
        print fa
        print fo
        #print mst.chisquare(fo, fa)

        #print(samples)
        try:
            u, p_value = mst.chisquare(fo, fa)
            #u, p_value  =   2*mannwhitneyu(pop, sample1)
            #u, p_value =   kruskal(pop, sample1)
            if p_value > 0.70:
                acc += 1
                test = "HIGH precision"
                high += 1
            else:
                if p_value < 0.30:
                    test = "Low precision"
                    low += 1
                else:
                    test = "Medium precision"
                    med += 1
        except:
                nullo +=1
                p_value = 0
                pass
        print "Chi square test:", p_value
    final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
    print final
    log = open("SAMPLERESULTS_CHISQUARE.txt","a")
    print >> log,final
    log.close()

 ```

 ####v 1.0



    ############################################
    #   USAGE:
    #   python test4.py 100 100000 5000
    #   param1= replicazioni test, casuali tuple
    #   param2= num pop
    #   param3= num sample
    ############################################
    import sys
    import random
    import collections
    import numpy
    
 	#QUI SI APRE IL FILE CON LE COMBINAZIONI
    #################################
    lista=[]
    import pickle
    #f = file("test.txt", "r")
    f = file("outfile.txt", "r")
    lista=pickle.load(f)
    #############################

    #from scipy.stats import mannwhitneyu
    #from scipy.stats import chisquare
    #from scipy.stats import kruskal
    from scipy.stats import rv_discrete
    
    import scipy.stats.mstats as mst
    
    tuple_tot = 0
    acc=0
    high = 0
    low = 0
    med = 0
    nullo = 0
    p_value=0
    # a= random.sample(range(100), 100)
    n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni
    lista = random.sample(lista,n_extract)
    fa = []
    fa_pre = []
    fo_pre = []
    fo =[]
    for e in lista:
        tot=int(sys.argv[2])
        Nsample = int(sys.argv[3])
        print e
    
        #px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)]  # 7 rep]
    
        e1= ((e[0]/float(100)))
        e2= ((e[1]/float(100)))
        e3= ((e[2]/float(100)))
        e4= ((e[3]/float(100)))
        e5= ((e[4]/float(100)))
        e6= ((e[5]/float(100)))
        e7= ((e[6]/float(100)))
    
    
        px=[e1,e2,e3,e4,e5,e6,e7]  # 7 rep]
        print "px", px
        x =[1,2,3,4,5,6,7]
        fa_pre=([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])
        fa_pre[:] = (value for value in fa_pre if value != 0.0)
        print "fa_pre",fa_pre
        fa= numpy.array(fa_pre)
        #fa = numpy.array([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])
    
        pop = rv_discrete(values=(x, px)).rvs(size=tot)
        random.shuffle(pop)
        sample1 = random.sample(pop,Nsample)
        counter_sample=collections.Counter(sample1)
        fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
        fo_pre[:] = (value for value in fo_pre if value != 0.0)
        print "fo_pre",fo_pre
        fo = numpy.array(fo_pre)
        #fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
        print fa
        print fo
        #print mst.chisquare(fo, fa)
    
        #print(samples)
        try:
            u, p_value = mst.chisquare(fo, fa)
            #u, p_value  =   2*mannwhitneyu(pop, sample1)
            #u, p_value =   kruskal(pop, sample1)
            if p_value > 0.70:
                acc += 1
                test = "HIGH precision"
                high += 1
            else:
                if p_value < 0.30:
                    test = "Low precision"
                    low += 1
                else:
                    test = "Medium precision"
                    med += 1
        except:
                nullo +=1
                p_value = 0
                pass
        print "Chi square test:", p_value
    final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
    print final

 ###<a id="createtuples">Create tuples-test1.py</a>
 > file per la generazione delle tuple con permutazione , con i seguenti valori di percentuale:  

 `v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];`

 >[Source code for test1.py][8]

 Il risultato delle 230230 tuple è memorizzato nel file OUTFILE.TXT
 ####<a id="createtuplesv1.0">V1.0</a>  

 > esempio BASH : **python filename.py 1**   
 > il valore 1 indica che il testmode=1, cioe' falso. 
 ```
 ##################################################
 #QUESTO FILE GENERA LE TUPLE DA UTILIZZARE NEL FILE
 #DA ESEGUIRE UNA SOLA VOLTA
 #IL FILE : OUTFILE.TXT
 ##################################################

 import itertools;
 v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];

 import sys

 # esempio BASH : python filename.py 1  



 #testmode=0
 testmode=int(sys.argv[1])

 target=100      #somma da raggiungere - test

 if testmode==1:
    n_rep=3

 else:
    n_rep=7

 ############### CREAZIONE TUPLA
 combos = [a for a in itertools.product(v,repeat=n_rep) if sum(a)==target]
 #############################

 lista = []
 pop = []

 ############### AGGIUNGE TUPLA ALLA LISTA
 for e in combos:
        # print e     #indica la tupla
        lista.append(e)
 #########################################

 ############### SCRIVE LISTA SU FILE OUTFILE.TXT
 import pickle
 f = file("outfile.txt", "w")
 pickle.dump(lista, f)
 #print lista
 ##################################

 ```
 ##Notes

 ####Chi square Goodness of fit: - python
 1. [Articolo di un modulo python per chi square GOF][4]  
 2. [**How to Test Your Discrete Distribution- Minitab**] [5]
 3. [Difference between scipy.stats.mstats.chisquare and 
 scipy.stats.mstats in Python][6]  
 1. [p VALUE - INTERPRETATION][9]
 1. [How to generate numbers based on an arbitrary **discrete distribution**?][2]
 2.	[ANOVA, Nonparametric Tests, and More Added to SVS via New **Python** Capabilities][3]

 ####PYTHON CODE. SUM and permutation
 1. http://mathoverflow.net/questions/9477/uniquely-generate-all-permutations-of-three-digits-that-sum-to-a-particular-value
 2. http://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum  
 4. Utilizzo *LIST* in python: http://effbot.org/zone/python-list.htm
 5. [Sample size for pilot studies](http://www.statstodo.com/SSiz2Props_Pgm.php)


 ####Algorithm
 1. http://web.eecs.utk.edu/~vose/Publications/random.pdf
 2. Bootstrap Python : https://github.com/cgevans/scikits-bootstrap
 3. [wilcoxon](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wicoxon.html#scipy.stats.wilcoxon)
 5. [mann withney](http://www.alglib.net/hypothesistesting/mannwhitneyu.php)


 ----------


 [1]: https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/6_example.py?edit
 [2]: http://stats.stackexchange.com/questions/26858/how-to-generate-numbers-based-on-an-arbitrary-discrete-distribution
 [3]:[http://blog.goldenhelix.com/?p=595]  
 [4]: (http://statsmodels.sourceforge.net/stable/stats.html#goodness-of-fit-tests-and-measures)
 [5]: http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution
 [6]:(http://stackoverflow.com/questions/13913572/difference-between-scipy-stats-mstats-chisquare-and-scipy-stats-mstats-in-python)
 [7]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test4.py?edit
 [8]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test1.py?edit
 [9]: (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)
	#Chi square - python



	[toc]



	##Definizione del processo

	1. Generazione di una popolazione a partire da una tupla generica
	2. estrazione casuale di un campione di n_elementi dalla pop creata
	3. calcolo della frequenze attese e ossevarte in base alla tupla estratta
	4. calcolo del chi-quadrato con p_value associato e definizione dei livelli di accuratezza del risulato.

	+ State your conclusion in terms of your hypothesis.

	+ If the p value for the calculated chi-square is p > 0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it.

	+ A p value of 0.6, for example, means that *there is a 60% probability that any deviation from expected is due to chance only***. This is within the range of acceptable deviation.

	+ If the p value for the calculated chi-square is **p < 0.05, REJECT your hypothesis, and conclude that some factor other than chance** is operating for the deviation to be so great.

	+ For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved.

	+ The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory investigations in this lab manual.

	LINK (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)


	##Esempio

	La tupla generica estratta è:

	` px [0.0, 0.05, 0.15, 0.1, 0.35, 0.15, 0.2]`

	Da una popolazione di 100,000 elementi , estraggo 1000 elementi:
	Quindi la freq. attesa assoluta (FA) è la seguente (totale = 1000 elementi):

	`fa_pre [50, 150, 100, 350, 150, 200]`

	Notare che il valore 0* è stato eliminato dal vettore, per [bug con il sistema di calcolo python][6]


	Quindi la freq. osservata assoluta (FO) è :

	` fo_pre [53, 155, 96, 332, 162, 202]`

	Il confronto fra FA e FO:

	`[ 50 150 100 350 150 200] --> FA`
	`[ 53 155 96 332 162 202] --> FO`


	Calcolo del pvalue: `Chi square test: 0.789628937062`

	A ciscun valore di p_value corrisponde un livello di accuratezza, come nella tabella sotto:

	Dalla elaborazione della [v1.1](#v11), appare come risultato quanto segue:

	###Risultato ottenuto con [CSTP](#cstp) v1.1

	BASH line$ in linux: `python test4.py 500 100000 1000 10 `


	` ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 149, ' Med:', 201, ' Low:', 150, 'nullo', 0)
	`


	name \| label
	------------------------\|
	'pop=', 100000, \| 100k casi considerati come pop
	'campione=', 1000 \| 1000 casi estratti (sample)
	'Tuple estratte=' 500, \| 500 combinazioni di freschezza su 230k totali utilizzate nel processo
	' High:', 149, \| 149 su 500 combinazioni hanno dato p value alto (vedere [tabella](#ptable))
	' Med:', 201, \| 201 su 500 combinazioni hanno dato p value medio
	' Low:', 150, \| 150 su 500 combinazioni hanno dato p value medio
	'nullo', 0 \| da trascurare - solo per controllo

	Il processo è ripetuto n volte, con la stessa numerosità della poopolazione e del campione, ma con differenti tuple estratte.
	I risultati sono memorizzati nel file [SAMPLERESULTS_CHISQUARE.txt](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit]).

	Un estratto del file :
	```
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 139, ' Med:', 221, ' Low:', 140, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 163, ' Med:', 177, ' Low:', 160, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 170, ' Med:', 179, ' Low:', 151, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 142, ' Med:', 209, ' Low:', 149, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 195, ' Low:', 159, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 204, ' Low:', 150, 'nullo', 0)
	```

	###<a id="ptable">Tabella di referimento per il p_value</a>

	p value \| level
	--------- \| -----
	<0.30 \| low accuracy
	0.30-0.70 \| medium accuracy
	\>0.70 \| High accuracy


	----------
	## CODE
	### Create chi square test process [CSTP] <a id="cstp"></a>- test4.py
	> note: Chi Square Test Process = CSTP

	`CODICE per la generazione del pvalue CHI SQUARE`
	[source code ] [7]



	####<a id="v11">V1.1</a>
	> - introdotto il ciclo sulle ripetizioni del processo, param 4
	> - la lista delle tuple viene ri-estratta ogni ciclo
	> - save dei risultati in: [SAMPLERESULTS_CHISQUARE.txt](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit])
	> - dipende dalle tuple create con [Cretate tuples v1.0](#createtuplesv1.0)

	```
	############################################
	# USAGE:
	# python test4.py 100 100000 5000
	# param1= numero estrazioni casuali di tuple
	# param2= num pop
	# param3= num sample
	# param4= num di ripetizioni del processo
	############################################
	import sys
	import random
	import collections
	import numpy
	#QUI SI APRE IL FILE CON LE COMBINAZIONI
	#################################
	lista=[]
	import pickle
	#f = file("test.txt", "r")
	f = file("outfile.txt", "r")
	lista=pickle.load(f)
	#############################
	#from scipy.stats import mannwhitneyu
	#from scipy.stats import chisquare
	#from scipy.stats import kruskal
	from scipy.stats import rv_discrete

	import scipy.stats.mstats as mst

	tuple_tot = 0
	acc=0
	high = 0
	low = 0
	med = 0
	nullo = 0
	p_value=0
	# a= random.sample(range(100), 100)
	n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni

	fa = []
	fa_pre = []
	fo_pre = []
	fo =[]
	tot=int(sys.argv[2])
	Nsample = int(sys.argv[3])

	for i in xrange(int(sys.argv[4])):
	lista = random.sample(lista,n_extract)
	acc=0
	high = 0
	low = 0
	med = 0
	nullo = 0
	p_value=0
	for e in lista:

	print e

	#px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)] # 7 rep]

	e1= ((e[0]/float(100)))
	e2= ((e[1]/float(100)))
	e3= ((e[2]/float(100)))
	e4= ((e[3]/float(100)))
	e5= ((e[4]/float(100)))
	e6= ((e[5]/float(100)))
	e7= ((e[6]/float(100)))


	px=[e1,e2,e3,e4,e5,e6,e7] # 7 rep]
	print "px", px
	x =[1,2,3,4,5,6,7]
	fa_pre=([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])
	fa_pre[:] = (value for value in fa_pre if value != 0.0)
	print "fa_pre",fa_pre
	fa= numpy.array(fa_pre)
	#fa = numpy.array([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])

	pop = rv_discrete(values=(x, px)).rvs(size=tot)
	random.shuffle(pop)
	sample1 = random.sample(pop,Nsample)
	counter_sample=collections.Counter(sample1)
	fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
	fo_pre[:] = (value for value in fo_pre if value != 0.0)
	print "fo_pre",fo_pre
	fo = numpy.array(fo_pre)
	#fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
	print fa
	print fo
	#print mst.chisquare(fo, fa)

	#print(samples)
	try:
	u, p_value = mst.chisquare(fo, fa)
	#u, p_value = 2*mannwhitneyu(pop, sample1)
	#u, p_value = kruskal(pop, sample1)
	if p_value > 0.70:
	acc += 1
	test = "HIGH precision"
	high += 1
	else:
	if p_value < 0.30:
	test = "Low precision"
	low += 1
	else:
	test = "Medium precision"
	med += 1
	except:
	nullo +=1
	p_value = 0
	pass
	print "Chi square test:", p_value
	final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
	print final
	log = open("SAMPLERESULTS_CHISQUARE.txt","a")
	print >> log,final
	log.close()

	```

	####v 1.0



	############################################
	# USAGE:
	# python test4.py 100 100000 5000
	# param1= replicazioni test, casuali tuple
	# param2= num pop
	# param3= num sample
	############################################
	import sys
	import random
	import collections
	import numpy

	#QUI SI APRE IL FILE CON LE COMBINAZIONI
	#################################
	lista=[]
	import pickle
	#f = file("test.txt", "r")
	f = file("outfile.txt", "r")
	lista=pickle.load(f)
	#############################

	#from scipy.stats import mannwhitneyu
	#from scipy.stats import chisquare
	#from scipy.stats import kruskal
	from scipy.stats import rv_discrete

	import scipy.stats.mstats as mst

	tuple_tot = 0
	acc=0
	high = 0
	low = 0
	med = 0
	nullo = 0
	p_value=0
	# a= random.sample(range(100), 100)
	n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni
	lista = random.sample(lista,n_extract)
	fa = []
	fa_pre = []
	fo_pre = []
	fo =[]
	for e in lista:
	tot=int(sys.argv[2])
	Nsample = int(sys.argv[3])
	print e

	#px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)] # 7 rep]

	e1= ((e[0]/float(100)))
	e2= ((e[1]/float(100)))
	e3= ((e[2]/float(100)))
	e4= ((e[3]/float(100)))
	e5= ((e[4]/float(100)))
	e6= ((e[5]/float(100)))
	e7= ((e[6]/float(100)))


	px=[e1,e2,e3,e4,e5,e6,e7] # 7 rep]
	print "px", px
	x =[1,2,3,4,5,6,7]
	fa_pre=([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])
	fa_pre[:] = (value for value in fa_pre if value != 0.0)
	print "fa_pre",fa_pre
	fa= numpy.array(fa_pre)
	#fa = numpy.array([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])

	pop = rv_discrete(values=(x, px)).rvs(size=tot)
	random.shuffle(pop)
	sample1 = random.sample(pop,Nsample)
	counter_sample=collections.Counter(sample1)
	fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
	fo_pre[:] = (value for value in fo_pre if value != 0.0)
	print "fo_pre",fo_pre
	fo = numpy.array(fo_pre)
	#fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
	print fa
	print fo
	#print mst.chisquare(fo, fa)

	#print(samples)
	try:
	u, p_value = mst.chisquare(fo, fa)
	#u, p_value = 2*mannwhitneyu(pop, sample1)
	#u, p_value = kruskal(pop, sample1)
	if p_value > 0.70:
	acc += 1
	test = "HIGH precision"
	high += 1
	else:
	if p_value < 0.30:
	test = "Low precision"
	low += 1
	else:
	test = "Medium precision"
	med += 1
	except:
	nullo +=1
	p_value = 0
	pass
	print "Chi square test:", p_value
	final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
	print final

	###<a id="createtuples">Create tuples-test1.py</a>
	> file per la generazione delle tuple con permutazione , con i seguenti valori di percentuale:

	`v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];`

	>[Source code for test1.py][8]

	Il risultato delle 230230 tuple è memorizzato nel file OUTFILE.TXT
	####<a id="createtuplesv1.0">V1.0</a>

	> esempio BASH : python filename.py 1
	> il valore 1 indica che il testmode=1, cioe' falso.
	```
	##################################################
	#QUESTO FILE GENERA LE TUPLE DA UTILIZZARE NEL FILE
	#DA ESEGUIRE UNA SOLA VOLTA
	#IL FILE : OUTFILE.TXT
	##################################################

	import itertools;
	v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];

	import sys

	# esempio BASH : python filename.py 1



	#testmode=0
	testmode=int(sys.argv[1])

	target=100 #somma da raggiungere - test

	if testmode==1:
	n_rep=3

	else:
	n_rep=7

	############### CREAZIONE TUPLA
	combos = [a for a in itertools.product(v,repeat=n_rep) if sum(a)==target]
	#############################

	lista = []
	pop = []

	############### AGGIUNGE TUPLA ALLA LISTA
	for e in combos:
	# print e #indica la tupla
	lista.append(e)
	#########################################

	############### SCRIVE LISTA SU FILE OUTFILE.TXT
	import pickle
	f = file("outfile.txt", "w")
	pickle.dump(lista, f)
	#print lista
	##################################

	```
	##Notes

	####Chi square Goodness of fit: - python
	1. [Articolo di un modulo python per chi square GOF][4]
	2. [How to Test Your Discrete Distribution- Minitab] [5]
	3. [Difference between scipy.stats.mstats.chisquare and
	scipy.stats.mstats in Python][6]
	1. [p VALUE - INTERPRETATION][9]
	1. [How to generate numbers based on an arbitrary discrete distribution?][2]
	2. [ANOVA, Nonparametric Tests, and More Added to SVS via New Python Capabilities][3]

	####PYTHON CODE. SUM and permutation
	1. http://mathoverflow.net/questions/9477/uniquely-generate-all-permutations-of-three-digits-that-sum-to-a-particular-value
	2. http://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum
	4. Utilizzo LIST in python: http://effbot.org/zone/python-list.htm
	5. [Sample size for pilot studies](http://www.statstodo.com/SSiz2Props_Pgm.php)


	####Algorithm
	1. http://web.eecs.utk.edu/~vose/Publications/random.pdf
	2. Bootstrap Python : https://github.com/cgevans/scikits-bootstrap
	3. [wilcoxon](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wicoxon.html#scipy.stats.wilcoxon)
	5. [mann withney](http://www.alglib.net/hypothesistesting/mannwhitneyu.php)


	----------


	[1]: https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/6_example.py?edit
	[2]: http://stats.stackexchange.com/questions/26858/how-to-generate-numbers-based-on-an-arbitrary-discrete-distribution
	[3]:[http://blog.goldenhelix.com/?p=595]
	[4]: (http://statsmodels.sourceforge.net/stable/stats.html#goodness-of-fit-tests-and-measures)
	[5]: http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution
	[6]:(http://stackoverflow.com/questions/13913572/difference-between-scipy-stats-mstats-chisquare-and-scipy-stats-mstats-in-python)
	[7]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test4.py?edit
	[8]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test1.py?edit
	[9]: (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)