This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import great_expectations as ge | |
| import sys | |
| import json | |
| import os | |
| def validate_data(file_path: str, expectation_suite_path: str): | |
| # read the dataset into ge DataFrame | |
| ge_df = ge.read_csv(file_path) | |
| result_format: dict = { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "data_asset_type": "Dataset", | |
| "expectation_suite_name": "default", | |
| "expectations": [ | |
| { | |
| "expectation_type": "expect_table_row_count_to_be_between", | |
| "kwargs": { | |
| "max_value": 50, | |
| "min_value": 1 | |
| }, |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import tqdm | |
| from elasticsearch import Elasticsearch | |
| from elasticsearch.helpers import streaming_bulk | |
| import pandas as pd | |
| FILE_LOC = 'staging/TweetsElonMusk.csv' | |
| INDEX_NAME = 'elonmusktweets' | |
| df = pd.read_csv(FILE_LOC) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from elasticsearch import Elasticsearch | |
| client = Elasticsearch(hosts='localhost') | |
| client.indices.create(index='elonmusktweets') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| version: '3.7' | |
| services: | |
| # Elasticsearch Docker Images: https://www.docker.elastic.co/ | |
| elasticsearch: | |
| image: docker.elastic.co/elasticsearch/elasticsearch:8.4.3 | |
| container_name: elasticsearch | |
| environment: | |
| - discovery.type=single-node |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| from sklearn import datasets | |
| from sklearn.ensemble import RandomForestClassifier | |
| import mlflow | |
| import mlflow.sklearn | |
| from mlflow.tracking import MlflowClient | |
| from sklearn.metrics import roc_auc_score, accuracy_score | |
| from sklearn.model_selection import train_test_split | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| name := "WordCounter" | |
| version := "0.1" | |
| scalaVersion := "2.12.6" | |
| // https://mvnrepository.com/artifact/org.apache.spark/spark-core | |
| libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.6" | |
| libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| package org.spark.learning | |
| import org.apache.spark.sql.{DataFrame, SparkSession} | |
| import org.apache.spark.sql.functions.{lower, regexp_replace, col, explode, count, desc} | |
| import org.apache.spark.ml.feature.{Tokenizer, StopWordsRemover} | |
| object WikiContentWordCounter { | |
| def main(args: Array[String]): Unit = { | |
| val spark = SparkSession |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Base code taken from the Spark samples with the spark installation | |
| import sys | |
| import os | |
| import pandas as pd | |
| from pyspark.sql import SparkSession | |
| from pyspark.sql.types import * | |
| from pyspark.sql.functions import lower, regexp_replace |
NewerOlder