-
-
Save frendhisaido/3170455 to your computer and use it in GitHub Desktop.
2012-04-02T06:52:32Z||oprator berpengalaman telkomsel selain excel indosat saya mmbutuhkan operator marketing yang ckp handal berpengalaman | |
2012-04-02T07:12:42Z||rt pakai indosat internet broom bisa internetan cepat tanpa putus didukung dengan | |
2012-04-02T07:00:12Z||pakai indosat internet broom bisa internetan cepat tanpa putus didukung dengan jaringan 5g dijamin internetan wusshhhhh | |
2012-04-02T07:03:31Z||edit jalur akses internet indosat gunakan proxy ip add 195 189 142 132 port ip 80 yang lain biarkan seperti aslinya | |
2012-04-02T06:56:49Z||haha <makian> oprator berpengalaman telkomsel selain indosat bth operator marketing yang pengalaman | |
2012-04-02T07:22:10Z||rt pakai indosat internet broom bisa internetan cepat tanpa putus didukung dengan jaringan 5g dijamin internetan wusshhhhh | |
2012-04-02T07:32:05Z||di atmajaya abisnya mirip sangat aula indosat gambarnya tadi haha | |
2012-04-02T07:28:44Z||saya cinta karo indosat mergo terpaksa | |
2012-04-02T09:43:10Z||pan sarua indosat mnh haha dibawain ngan peje hela ngke hayu wk | |
2012-04-02T11:24:16Z||euw indosat should fix their bad connection | |
2012-04-02T12:57:58Z||gadeliv deliv acan eleuh eleuh indosat tahun meni geleuh | |
2012-04-02T12:54:53Z||rt indosat good cute ads reasonable price but can even pakai single call so hope they have fire insurance | |
2012-04-02T12:52:18Z||indosat good cute ads reasonable price but can even pakai single call so hope they have fire insurance | |
2012-04-02T15:07:35Z||2rts ktupat aya bsi ek aya 550 e63 hde kneh brow msi hp indosat wii hyong symbian pguh hp masuk kneh n0 | |
2012-04-02T15:16:57Z||senyumlicik penghianat yaps beralih ke indosat maybe it better than | |
2012-04-02T15:51:08Z||giliran lancar teman teman saya pada tidur terimakasih indosat | |
2012-04-02T16:18:37Z||disappointing with indosat internet connection slow it has been like this week | |
2012-04-02T08:01:02Z||pc laptop handphone barang impor operator selular indosat xl telkomsel milik asing qatar singapur malaysia | |
2012-04-02T12:58:14Z||pakai sarung tangan ngerakit kabel22 pasang petasan otw gedung indosat kedipin mata 2kali bom duarr tetap gdlv3 | |
2012-04-02T08:11:36Z||rt pakai indosat internet broom bisa internetan cepat tanpa putus didukung dengan jaringan 5g | |
2012-04-02T12:54:27Z||they selling unlimited that really limit our call hello where have ylki they still alive indosat good cute | |
2012-04-02T13:04:54Z||indosat good cute ads reasonable price but can even pakai single call so hope they have fire insurance | |
2012-04-02T14:41:12Z||reservation southeast asia official phone number 62 856 2121 666 indosat official blackberry | |
2012-04-02T12:49:19Z||indosat good cute ads reasonable price but can even pakai single call so hope they have fire insurance | |
2012-04-02T15:53:15Z||sama2 giliran lancar teman teman saya pada tidur terimakasih indosat | |
2012-04-02T13:00:46Z||ah jan sikak oq gra2 lali pngaturane njuk seg indosat ra keno gawe bka fb | |
2012-04-02T13:27:06Z||perbulannya berapa ini min pakai indosat internet broom bisa internetan cepat tanpa putus didukung dengan jaringan 5g | |
2012-04-02T14:09:59Z||guess indosat android worst combination former were slow til now latter eating enormous bytes | |
2012-04-02T15:15:07Z||penghianat yaps beralih ke indosat maybe it better than | |
2012-04-02T15:22:11Z||adele old friends why so shy me indosat why so bad | |
2012-04-02T11:25:13Z||walaupun hujan deras gini sinyal 3g indosat dirumah saya tetap kuat | |
2012-04-02T12:54:57Z||iklan indosat eneg tiru2 genkisudo huek | |
2012-04-02T14:11:55Z||tetap saja indosat abaaaaal haha | |
2012-04-02T13:53:53Z||rt apa definisi sukses menurut teman teman pakai indosat mobile | |
2012-04-02T14:09:17Z||haha tidak-ada kerjaan waktu ngerjain operator indosat | |
2012-04-02T17:22:28Z||dang saat mau buka koran ternyata ada iklan indosat haha suka shock begitu saya | |
2012-04-02T16:44:54Z||euweuh ka urg nte geus diaktifkeun can rhie zoel zul aya telepon ti indosat jang ngaaktfkeun kartu prabayar tea | |
2012-04-02T16:36:48Z||people who complain about indosat services like who complain about getting aids whore they knew had aids | |
2012-04-02T10:30:52Z||asik puas internetan pakai indosat internet broom gas pool ngebuut |
oprator=2.9444389791664403; yang=2.5649493574615367; saya=1.791759469228055; marketing=2.9444389791664403; selain=2.9444389791664403; telkomsel=2.5649493574615367; berpengalaman=5.8888779583328805; operator=2.1972245773362196; cepat=1.9459101490553132; broom=1.791759469228055; dengan=1.9459101490553132; didukung=1.9459101490553132; rt=1.9459101490553132; bisa=1.9459101490553132; pakai=1.0986122886681098; internetan=1.791759469228055; internet=1.3862943611198906; tanpa=1.9459101490553132; putus=1.9459101490553132; broom=1.791759469228055; dengan=1.9459101490553132; didukung=1.9459101490553132; wusshhhhh=2.9444389791664403; internetan=3.58351893845611; cepat=1.9459101490553132; 5g=2.1972245773362196; dijamin=2.9444389791664403; bisa=1.9459101490553132; pakai=1.0986122886681098; internet=1.3862943611198906; jaringan=2.1972245773362196; tanpa=1.9459101490553132; putus=1.9459101490553132; yang=2.5649493574615367; internet=1.3862943611198906; oprator=2.9444389791664403; yang=2.5649493574615367; marketing=2.9444389791664403; selain=2.9444389791664403; haha=1.791759469228055; telkomsel=2.5649493574615367; berpengalaman=2.9444389791664403; operator=2.1972245773362196; broom=1.791759469228055; dengan=1.9459101490553132; didukung=1.9459101490553132; wusshhhhh=2.9444389791664403; internetan=3.58351893845611; cepat=1.9459101490553132; 5g=2.1972245773362196; dijamin=2.9444389791664403; rt=1.9459101490553132; bisa=1.9459101490553132; pakai=1.0986122886681098; internet=1.3862943611198906; jaringan=2.1972245773362196; putus=1.9459101490553132; tanpa=1.9459101490553132; haha=1.791759469228055; saya=1.791759469228055; haha=1.791759469228055; connection=2.9444389791664403; bad=2.9444389791664403; insurance=2.1972245773362196; they=1.791759469228055; call=1.9459101490553132; but=2.1972245773362196; single=2.1972245773362196; can=1.9459101490553132; have=1.9459101490553132; so=1.9459101490553132; good=1.9459101490553132; cute=1.9459101490553132; fire=2.1972245773362196; reasonable=2.1972245773362196; price=2.1972245773362196; even=2.1972245773362196; rt=1.9459101490553132; pakai=1.0986122886681098; ads=2.1972245773362196; hope=2.1972245773362196; insurance=2.1972245773362196; they=1.791759469228055; call=1.9459101490553132; but=2.1972245773362196; single=2.1972245773362196; can=1.9459101490553132; have=1.9459101490553132; so=1.9459101490553132; good=1.9459101490553132; cute=1.9459101490553132; fire=2.1972245773362196; reasonable=2.1972245773362196; price=2.1972245773362196; even=2.1972245773362196; pakai=1.0986122886681098; ads=2.1972245773362196; hope=2.1972245773362196; aya=5.8888779583328805; penghianat=2.9444389791664403; it=2.5649493574615367; maybe=2.9444389791664403; ke=2.9444389791664403; yaps=2.9444389791664403; better=2.9444389791664403; beralih=2.9444389791664403; than=2.9444389791664403; lancar=2.9444389791664403; saya=1.791759469228055; tidur=2.9444389791664403; giliran=2.9444389791664403; teman=5.1298987149230735; terimakasih=2.9444389791664403; pada=2.9444389791664403; connection=2.9444389791664403; it=2.5649493574615367; slow=2.9444389791664403; like=2.9444389791664403; internet=1.3862943611198906; telkomsel=2.5649493574615367; operator=2.1972245773362196; pakai=1.0986122886681098; tetap=2.5649493574615367; broom=1.791759469228055; dengan=1.9459101490553132; didukung=1.9459101490553132; internetan=1.791759469228055; cepat=1.9459101490553132; 5g=2.1972245773362196; rt=1.9459101490553132; bisa=1.9459101490553132; pakai=1.0986122886681098; internet=1.3862943611198906; tanpa=1.9459101490553132; putus=1.9459101490553132; jaringan=2.1972245773362196; call=1.9459101490553132; they=3.58351893845611; have=1.9459101490553132; good=1.9459101490553132; cute=1.9459101490553132; insurance=2.1972245773362196; they=1.791759469228055; call=1.9459101490553132; but=2.1972245773362196; single=2.1972245773362196; can=1.9459101490553132; have=1.9459101490553132; so=1.9459101490553132; good=1.9459101490553132; cute=1.9459101490553132; fire=2.1972245773362196; reasonable=2.1972245773362196; price=2.1972245773362196; even=2.1972245773362196; pakai=1.0986122886681098; ads=2.1972245773362196; hope=2.1972245773362196; insurance=2.1972245773362196; they=1.791759469228055; call=1.9459101490553132; but=2.1972245773362196; single=2.1972245773362196; can=1.9459101490553132; have=1.9459101490553132; so=1.9459101490553132; good=1.9459101490553132; cute=1.9459101490553132; fire=2.1972245773362196; reasonable=2.1972245773362196; price=2.1972245773362196; even=2.1972245773362196; pakai=1.0986122886681098; ads=2.1972245773362196; hope=2.1972245773362196; lancar=2.9444389791664403; saya=1.791759469228055; tidur=2.9444389791664403; giliran=2.9444389791664403; teman=5.1298987149230735; terimakasih=2.9444389791664403; pada=2.9444389791664403; cepat=1.9459101490553132; 5g=2.1972245773362196; broom=1.791759469228055; dengan=1.9459101490553132; didukung=1.9459101490553132; bisa=1.9459101490553132; pakai=1.0986122886681098; internetan=1.791759469228055; internet=1.3862943611198906; putus=1.9459101490553132; tanpa=1.9459101490553132; jaringan=2.1972245773362196; slow=2.9444389791664403; penghianat=2.9444389791664403; it=2.5649493574615367; maybe=2.9444389791664403; ke=2.9444389791664403; yaps=2.9444389791664403; better=2.9444389791664403; beralih=2.9444389791664403; than=2.9444389791664403; so=3.8918202981106265; bad=2.9444389791664403; saya=1.791759469228055; tetap=2.5649493574615367; iklan=2.9444389791664403; haha=1.791759469228055; tetap=2.5649493574615367; rt=1.9459101490553132; teman=5.1298987149230735; pakai=1.0986122886681098; haha=1.791759469228055; operator=2.1972245773362196; iklan=2.9444389791664403; saya=1.791759469228055; haha=1.791759469228055; can=1.9459101490553132; aya=2.9444389791664403; they=1.791759469228055; like=2.9444389791664403; broom=1.791759469228055; internetan=1.791759469228055; pakai=1.0986122886681098; internet=1.3862943611198906; |
pakai=1.0986122886681096, df=12 | |
internet=1.3862943611198906, df=8 | |
broom=1.7917594692280547, df=6 | |
haha=1.7917594692280547, df=6 | |
saya=1.7917594692280547, df=6 | |
bisa=1.945910149055313, df=5 | |
call=1.945910149055313, df=5 | |
can=1.945910149055313, df=5 | |
cepat=1.945910149055313, df=5 | |
cute=1.945910149055313, df=5 | |
dengan=1.945910149055313, df=5 | |
didukung=1.945910149055313, df=5 | |
good=1.945910149055313, df=5 | |
have=1.945910149055313, df=5 | |
putus=1.945910149055313, df=5 | |
rt=1.945910149055313, df=5 | |
tanpa=1.945910149055313, df=5 | |
they=2.0903860474327307, df=6 | |
5g=2.1972245773362196, df=4 | |
ads=2.1972245773362196, df=4 | |
but=2.1972245773362196, df=4 | |
even=2.1972245773362196, df=4 | |
fire=2.1972245773362196, df=4 | |
hope=2.1972245773362196, df=4 | |
insurance=2.1972245773362196, df=4 | |
jaringan=2.1972245773362196, df=4 | |
operator=2.1972245773362196, df=4 | |
price=2.1972245773362196, df=4 | |
reasonable=2.1972245773362196, df=4 | |
single=2.1972245773362196, df=4 | |
so=2.335092178866376, df=5 | |
internetan=2.3890126256374065, df=6 | |
it=2.5649493574615367, df=3 | |
telkomsel=2.5649493574615367, df=3 | |
tetap=2.5649493574615367, df=3 | |
yang=2.5649493574615367, df=3 | |
bad=2.9444389791664403, df=2 | |
beralih=2.9444389791664403, df=2 | |
better=2.9444389791664403, df=2 | |
connection=2.9444389791664403, df=2 | |
dijamin=2.9444389791664403, df=2 | |
giliran=2.9444389791664403, df=2 | |
iklan=2.9444389791664403, df=2 | |
ke=2.9444389791664403, df=2 | |
lancar=2.9444389791664403, df=2 | |
like=2.9444389791664403, df=2 | |
marketing=2.9444389791664403, df=2 | |
maybe=2.9444389791664403, df=2 | |
oprator=2.9444389791664403, df=2 | |
pada=2.9444389791664403, df=2 | |
penghianat=2.9444389791664403, df=2 | |
selain=2.9444389791664403, df=2 | |
slow=2.9444389791664403, df=2 | |
terimakasih=2.9444389791664403, df=2 | |
than=2.9444389791664403, df=2 | |
tidur=2.9444389791664403, df=2 | |
wusshhhhh=2.9444389791664403, df=2 | |
yaps=2.9444389791664403, df=2 | |
aya=4.41665846874966, df=2 | |
berpengalaman=4.41665846874966, df=2 | |
teman=5.1298987149230735, df=3 |
package dataConvert; | |
import java.io.*; | |
import java.util.*; | |
import java.util.Map.Entry; | |
/** | |
* Program hitung TFIDF | |
* | |
* @author frendhisaidodanaro | |
*/ | |
public class procTFIDF { | |
//Array untuk pengecekan stop word. | |
private ArrayList<String> alExtStopWords = new ArrayList<String>(); | |
// Fungsi sorting TreeMap berdasarkan value. | |
static <K,V extends Comparable<? super V>> SortedSet<Map.Entry<K,V>> entriesSortedByValues(Map<K,V> map) { | |
SortedSet<Map.Entry<K,V>> sortedEntries = new TreeSet<Map.Entry<K,V>>( | |
new Comparator<Map.Entry<K,V>>() { | |
@Override public int compare(Map.Entry<K,V> e1, Map.Entry<K,V> e2) { | |
int res = e1.getValue().compareTo(e2.getValue()); | |
return res != 0 ? res : 1; | |
} | |
} | |
); | |
sortedEntries.addAll(map.entrySet()); | |
return sortedEntries; | |
} | |
//Snippet dari program edu.upi.cs.tweetmining.TFIDF untuk memasukkan data stopwords ke array alExtStopWords | |
private void loadExtStopWords(String inputExtStopWords) { | |
try { | |
FileInputStream fstream = new FileInputStream(inputExtStopWords); | |
DataInputStream in = new DataInputStream(fstream); | |
BufferedReader br = new BufferedReader(new InputStreamReader(in)); | |
String strLine; | |
int cc=0; | |
while ((strLine = br.readLine()) != null) { | |
alExtStopWords.add(strLine); | |
} | |
br.close(); | |
in.close(); | |
}catch (Exception e) { | |
System.out.println(e.toString()); | |
} | |
} | |
public void process(String fileInput, String extStopWord, boolean denganStat) { | |
String namaFile = fileInput.substring(0, fileInput.indexOf(".")); | |
int totalTerms = 0; | |
int totalDoc; | |
// mulai load stopwords ke arrayExtStopWords. | |
loadExtStopWords(extStopWord); | |
// | |
ArrayList<HashMap<String, Integer>> arrTweets = new ArrayList<HashMap<String, Integer>>(); | |
ArrayList<HashMap<String, Double>> arrTFIDF = new ArrayList<HashMap<String, Double>>(); | |
HashMap<String, Integer> docFreq = new HashMap<String, Integer>(); | |
TreeMap<String, Double> tfIDF = new TreeMap<String, Double>(); | |
try{ | |
FileInputStream fstream = new FileInputStream(fileInput); | |
DataInputStream in = new DataInputStream(fstream); | |
BufferedReader br = new BufferedReader(new InputStreamReader(in)); | |
System.out.println("Reading "+ fileInput); | |
// HITUNG TERM FREQUENCY | |
// Membaca file input | |
// Mencari jumlah tf tiap term per baris | |
String strLine; | |
Integer tfreq; | |
while ((strLine = br.readLine()) != null) { | |
HashMap<String, Integer> termFreq = new HashMap<String, Integer>(); | |
String docn = strLine.substring(22,strLine.length()); | |
Scanner sc = new Scanner(docn); | |
while(sc.hasNext()) { | |
String term = sc.next(); | |
if(!term.equalsIgnoreCase("indosat")){ //Skip keyword indosat, karena ada di setiap tweet. | |
tfreq = termFreq.get(term); //Ambil value | |
termFreq.put(term, (tfreq == null) ? 1 : tfreq + 1); //Jika value masih kosong, isi 1. Jika 1, increment. | |
totalTerms++; | |
} | |
} | |
sc.close(); | |
arrTweets.add(termFreq);//Simpan termFreq. | |
} | |
br.close(); | |
// Selesai membaca dataset. | |
// arrTweet berisi HashMap termFreq, tiap termFreq adalah representasi dokumen/tweet, berisi jumlah tf dari masing2 term. | |
// HITUNG DOCUMENT FREQUENCY | |
// Iterasi arrTweets, untuk menghitung df. | |
// Menghitung jumlah dokumen yang mengandung term. | |
// docFreq.put("awan",7) | |
// Artinya term "awan", ditemukan di 7 dokumen/tweet | |
Iterator iterArray = arrTweets.iterator(); | |
while(iterArray.hasNext()){ | |
HashMap perTweet = (HashMap) iterArray.next(); | |
Iterator iterEach = perTweet.keySet().iterator(); | |
while(iterEach.hasNext()){ | |
String eachW = (String) iterEach.next(); | |
if(alExtStopWords.contains(eachW)){ //Kalau ada di stopword, DF = 0. | |
docFreq.put(eachW, 0); | |
}else{ | |
Integer dfreq = docFreq.get(eachW); | |
docFreq.put(eachW,(dfreq == null)? 1 : dfreq +1 ); | |
} | |
} | |
} | |
// Selesai menghitung DF tiap term | |
// HashMap docFreq berisi key= term, value= document frequency | |
// HITUNG IDF dan TFIDF | |
// arrTweets sekali lagi di iterasi | |
// untuk menghitung nilai IDF lalu sekaligus dihitung TF*IDF nya | |
// di tiap dokumen nilai TF*IDF per term dihitung, dan disimpan di HashMap valTFIDF | |
// lalu valTFIDF ini dikumpulkan di arrTFIDF,\ | |
Iterator iterTF = arrTweets.iterator(); | |
Double idf,tfidf; | |
totalDoc = arrTweets.size(); | |
while(iterTF.hasNext()){ | |
HashMap<String, Double> valTFIDF = new HashMap<String, Double>(); | |
HashMap perTweet = (HashMap) iterTF.next(); | |
Iterator iterEach = perTweet.keySet().iterator(); | |
while(iterEach.hasNext()){ | |
String aTerm = (String) iterEach.next(); //ambil term yang akan diproses | |
Integer dfreq = docFreq.get(aTerm); //ambil nilai DF dari term yang akan diproses | |
if(dfreq>1){ | |
Integer cfreq = (Integer) perTweet.get(aTerm); // ambil nilai tf dari aTerm | |
idf = Math.log(totalDoc/dfreq); | |
tfidf = cfreq * idf; | |
valTFIDF.put(aTerm, tfidf); | |
//System.out.println("TFIDF("+aTerm+")= "+cfreq+" * "+"log("+totDoc+"/"+dfreq+") = "+ tfidf+" , "); | |
} | |
} | |
arrTFIDF.add(valTFIDF); //Selesai olah satu perTweet, simpan HashMap valTFIDF ke arrTFIDF | |
} | |
// Selesai hitung IDF dan TF*IDF | |
// arrTFIDF berisi nilai tfidf tiap term per dokumen, yaitu valTFIDF | |
// Tulis hasil hitung TF*IDF ke file output namafile_tfidf.txt | |
BufferedWriter writeTFIDF = new BufferedWriter(new FileWriter( (namaFile+"_tfidf.txt") ,true)); | |
Iterator iterValTFIDF = arrTFIDF.iterator(); | |
while(iterValTFIDF.hasNext()){ | |
HashMap perTweet = (HashMap) iterValTFIDF.next(); | |
//System.out.println(perTweet.toString()); | |
Iterator iterEach = perTweet.keySet().iterator(); | |
while(iterEach.hasNext()){ | |
String aTerm = (String) iterEach.next(); | |
Double valTFIDF = (Double) perTweet.get(aTerm); | |
writeTFIDF.write(aTerm+"="+valTFIDF+"; "); | |
//System.out.print(aTerm+"="+valTFIDF+"; "); | |
} | |
//System.out.println("__"); | |
//writeTFIDF.newLine(); | |
} | |
writeTFIDF.close(); | |
// Hitung rata-rata bobot TFIDF term, jika denganStat= true | |
if(denganStat){ | |
// HITUNG jumlah rata2 TFIDF tiap term | |
for(String word : docFreq.keySet()){ | |
Integer dfreq = docFreq.get(word); | |
if(dfreq>1){ //hanya hitung term yang muncul di lebih dari satu dokumen | |
//System.out.println("Collecting term: "+word+" df= "+dfreq); | |
Double tfIDFstat = 0.0; // Inisiasi nilai tfIDFstat, digunakan untuk akumulasi | |
int cc=0; | |
Iterator iterTFIDF = arrTFIDF.iterator(); | |
while(iterTFIDF.hasNext()) { | |
HashMap val = (HashMap) iterTFIDF.next(); | |
if(val.containsKey(word)){ | |
for(Object t : val.keySet()) { | |
if(t.toString().equals(word)){ | |
cc++; | |
tfIDFstat = tfIDFstat + (Double) val.get(word); //akumulasi nilai tfidf suatu term di seluruh dokumen | |
} | |
} | |
} | |
} | |
//System.out.println("Counted="+cc+" tfIDFstats="+tfIDFstat); | |
Double tfIDFtot = tfIDFstat/cc; //HITUNG RATA-RATA | |
//System.out.println("tfidf("+word+")="+tfIDFtot); | |
tfIDF.put(word, tfIDFtot); //Simpan di TreeMap tfIDF | |
} | |
} | |
// Tulis hasil hitung rata-rata ke file output namafile_tfidf_stat.txt | |
BufferedWriter writeStat = new BufferedWriter(new FileWriter( (namaFile+"_tfidf_stat.txt") ,true)); | |
for (Iterator<Entry<String, Double>> it = entriesSortedByValues(tfIDF).iterator(); it.hasNext();) { | |
Entry<String, Double> entry = it.next(); | |
String oneWord = entry.getKey(); | |
Double oneValue = entry.getValue(); | |
Integer dfreq= docFreq.get(oneWord); | |
//System.out.println("tdidf("+oneWord+")= "+oneValue); | |
writeStat.write(oneWord+"="+oneValue+", df="+dfreq); | |
writeStat.newLine(); | |
} | |
writeStat.close(); | |
} | |
}catch(Exception e){ | |
System.out.println(e.toString()); | |
} | |
System.out.println("unik: "+docFreq.size()); | |
System.out.println("Jumlah document:"+ arrTweets.size()); | |
System.out.println("Total term: "+totalTerms); | |
} | |
public static void main(String[] a) { | |
procTFIDF pt = new procTFIDF(); | |
pt.process("negatif_2012.txt", "catatan_stopwords_ekstensif.txt", true); | |
} | |
} |
permisi saya ingin bertanya, untuk kodingan dibawah ini digunakan untuk proses apa? dan Apa diharuskan untuk menggunakan nya?
//Snippet dari program edu.upi.cs.tweetmining.TFIDF untuk memasukkan data stopwords ke array alExtStopWords
private void loadExtStopWords(String inputExtStopWords) {try { FileInputStream fstream = new FileInputStream(inputExtStopWords); DataInputStream in = new DataInputStream(fstream); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; int cc=0; while ((strLine = br.readLine()) != null) { alExtStopWords.add(strLine); } br.close(); in.close(); }catch (Exception e) { System.out.println(e.toString()); } }
terima kasih sebelumnya, mohon penjelasannya :)
Bagian code itu fungsinya hanya untuk untuk mengisi daftar stop-words (ArrayList<String> alExtStopWords
) dari file.
Kalau stopwords nya cukup di-"hardcode" mungkin bagian code itu tidak perlu.
Daftar stopwords ini nanti digunakan saat menghitung DF:
https://gist.github.com/frendhisaido/3170455#file-proctfidf-java-L109
Stopwords tidak dihitung Document Frequency nya https://gist.github.com/frendhisaido/3170455#file-proctfidf-java-L109
Maaf karena sudah 8 tahun yang lalu jadi agak lupa pastinya,
tapi seingat saya dulu untuk TF-IDF stopwords tidak perlu dihitung karena (mungkin) tidak ada nilai sentimennya.
Jadi supaya tidak beri pengaruh banyak ke klasifikasinya, stopwords di skip.
Rujukan dari blog dosen saya: https://yudiwbs.wordpress.com/2008/07/23/stop-words-untuk-bahasa-indonesia/
permisi saya ingin bertanya, untuk kodingan dibawah ini digunakan untuk proses apa? dan Apa diharuskan untuk menggunakan nya?
//Snippet dari program edu.upi.cs.tweetmining.TFIDF untuk memasukkan data stopwords ke array alExtStopWords
private void loadExtStopWords(String inputExtStopWords) {
terima kasih sebelumnya, mohon penjelasannya :)