Anurag Chatterjee ace-racer

🎯

Focusing

Build well designed and performant applications to solve real-world problems using data and machine learning!

31 followers · 91 following

London
18:40 (UTC)

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

ace-racer / validate.py

Created February 24, 2024 03:13

Validate incoming data using generated expectations

	import great_expectations as ge
	import sys
	import json
	import os

	def validate_data(file_path: str, expectation_suite_path: str):
	# read the dataset into ge DataFrame
	ge_df = ge.read_csv(file_path)

	result_format: dict = {

ace-racer / expectations.json

Created February 24, 2024 03:07

Generated expectations

	{
	"data_asset_type": "Dataset",
	"expectation_suite_name": "default",
	"expectations": [
	{
	"expectation_type": "expect_table_row_count_to_be_between",
	"kwargs": {
	"max_value": 50,
	"min_value": 1
	},

ace-racer / create_expectations.ipynb

Created February 24, 2024 03:01

Create expectations using GE

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

ace-racer / 02_load_tweets_es.py

Created October 29, 2022 17:52

Load Tweets to Elasticsearch using Pandas and Python Elasticsearch client

	import tqdm
	from elasticsearch import Elasticsearch
	from elasticsearch.helpers import streaming_bulk
	import pandas as pd

	FILE_LOC = 'staging/TweetsElonMusk.csv'
	INDEX_NAME = 'elonmusktweets'
	df = pd.read_csv(FILE_LOC)

ace-racer / 01_create_index.py

Last active October 29, 2022 14:29

Create index in elasticsearch running locally using Python

	from elasticsearch import Elasticsearch

	client = Elasticsearch(hosts='localhost')
	client.indices.create(index='elonmusktweets')

ace-racer / docker-compose.yml

Created October 29, 2022 13:50

Elasticsearch with Kibana docker compose

	version: '3.7'

	services:

	# Elasticsearch Docker Images: https://www.docker.elastic.co/
	elasticsearch:
	image: docker.elastic.co/elasticsearch/elasticsearch:8.4.3
	container_name: elasticsearch
	environment:
	- discovery.type=single-node

ace-racer / mlflow_operations.py

Last active January 23, 2021 10:44

IRIS classification with MLFlow

	import pandas as pd
	from sklearn import datasets
	from sklearn.ensemble import RandomForestClassifier
	import mlflow
	import mlflow.sklearn
	from mlflow.tracking import MlflowClient
	from sklearn.metrics import roc_auc_score, accuracy_score
	from sklearn.model_selection import train_test_split

ace-racer / build.sbt

Created January 10, 2021 14:06

Build sbt for Word Counter Scala application

	name := "WordCounter"

	version := "0.1"

	scalaVersion := "2.12.6"


	// https://mvnrepository.com/artifact/org.apache.spark/spark-core
	libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.6"
	libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6"

ace-racer / WikiContentWordCounter.scala

Created January 10, 2021 14:03

Word counter Spark job using scala

	package org.spark.learning

	import org.apache.spark.sql.{DataFrame, SparkSession}
	import org.apache.spark.sql.functions.{lower, regexp_replace, col, explode, count, desc}
	import org.apache.spark.ml.feature.{Tokenizer, StopWordsRemover}

	object WikiContentWordCounter {
	def main(args: Array[String]): Unit = {

	val spark = SparkSession

ace-racer / word_count_extended.py

Last active December 31, 2020 04:06

Spark job to count the occurances of words after removing stop words

	# Base code taken from the Spark samples with the spark installation
	import sys
	import os

	import pandas as pd

	from pyspark.sql import SparkSession
	from pyspark.sql.types import *

	from pyspark.sql.functions import lower, regexp_replace

NewerOlder