This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| KV<Metadata, String> element = c.element(); | |
| String line = element.getValue().trim(); | |
| if (!line.isEmpty()) { | |
| String[] tokens = line.split("\\P{L}+", -1); | |
| for (String token : tokens) { | |
| if (token.length() > 1) { | |
| c.output(KV.of(element.getKey(),token.toLowerCase())); | |
| } | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| public class ReduceFn extends Combine.CombineFn<Metadata, Index, Index> { | |
| @Override | |
| public Index createAccumulator() { | |
| return new Index(); | |
| } | |
| @Override | |
| public Index addInput(Index accumulator, Metadata input) { | |
| accumulator.add(input); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| public class StopWordRemoveFnTest { | |
| static private class Empty {} | |
| static private final Empty EMPTY = new Empty(); | |
| @Test | |
| public void testDoFn() throws Exception { | |
| StopWordRemoveFn<Empty> doFn = new StopWordRemoveFn<>(); | |
| DoFnTester<KV<Empty, String>, KV<Empty, String>> fnTester = DoFnTester.of(doFn); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| public class StopWordRemoveFnTest { | |
| @Rule | |
| public final transient TestPipeline pipeline = TestPipeline.create(); | |
| @Test | |
| public void testDoFn_TestPipeline() throws Exception { | |
| PCollection<KV<Empty, String>> input = pipeline.apply(Create.of( | |
| KV.of(Empty.EMPTY, "be"), KV.of(Empty.EMPTY, "is"), KV.of(Empty.EMPTY, "night"), KV.of(Empty.EMPTY, "dream") | |
| ).withCoder(KvCoder.of(AvroCoder.of(Empty.class), StringUtf8Coder.of()))); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE MODEL `kaggle_talkingdata_adtracking.talkingdata_logreg_sample` | |
| OPTIONS ( | |
| model_type='logistic_reg', | |
| input_label_cols=['is_attributed'] | |
| ) AS | |
| SELECT ip, | |
| app, | |
| device, | |
| os, | |
| channel, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE MODEL `kaggle_talkingdata_adtracking.talkingdata_logreg_sample_0003` | |
| OPTIONS ( | |
| model_type='logistic_reg', | |
| input_label_cols=['is_attributed'], | |
| data_split_method='seq', | |
| data_split_col='click_time' | |
| ) AS | |
| SELECT CAST(ip AS STRING) as ip, | |
| CAST(app AS STRING) as app, | |
| CAST(device AS STRING) as device, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE MODEL `kaggle_talkingdata_adtracking.talkingdata_logreg_0001` | |
| OPTIONS ( | |
| model_type='logistic_reg', | |
| input_label_cols=['is_attributed'], | |
| data_split_method='seq', | |
| data_split_col='click_time' | |
| ) AS | |
| SELECT CAST(ip AS STRING) as ip, | |
| CAST(app AS STRING) as app, | |
| CAST(device AS STRING) as device, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE MODEL `kaggle_talkingdata_adtracking.talkingdata_logreg_0001` | |
| OPTIONS ( | |
| model_type='logistic_reg', | |
| input_label_cols=['is_attributed'], | |
| data_split_method='seq', | |
| data_split_col='click_time' | |
| ) AS | |
| SELECT CAST(ip AS STRING) as ip, | |
| CAST(app AS STRING) as app, | |
| CAST(device AS STRING) as device, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE TABLE `kaggle-adfraud.kaggle_talkingdata_adtracking.dataset_test_with_prediction_logreg_0001` | |
| AS | |
| SELECT * FROM ML.PREDICT(MODEL `kaggle-adfraud.kaggle_talkingdata_adtracking.talkingdata_logreg_0001`, | |
| (SELECT | |
| click_id, | |
| CAST(ip AS STRING) as ip, | |
| CAST(app AS STRING) as app, | |
| CAST(device AS STRING) as device, | |
| CAST(os AS STRING) as os, | |
| CAST(channel AS STRING) as channel, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| CREATE TABLE `kaggle_talkingdata_adtracking.dataset_test_submission_logreg_0001` | |
| AS | |
| SELECT click_id, prob as is_attributed | |
| FROM `kaggle_talkingdata_adtracking.dataset_test_with_prediction_logreg_0001` | |
| JOIN UNNEST(predicted_is_attributed_probs) | |
| WHERE label = 1 | |
| ORDER BY click_id; |