Skip to content

Instantly share code, notes, and snippets.

View mepsrajput's full-sized avatar
🎯
Focusing

Pradeep Singh mepsrajput

🎯
Focusing
View GitHub Profile

1. Processing A Line of Text

Import the English language class
from spacy.lang.en import English

# Create the nlp object
nlp = English()
@mepsrajput
mepsrajput / big_data.md
Last active May 30, 2020 10:26
Big Data

Hadoop Vocabulary

Here is a list of some terms associated with Hadoop. You'll learn more about these terms and how they relate to Spark in the rest of the lesson.

  • Hadoop - an ecosystem of tools for big data storage and data analysis. Hadoop is an older system than Spark but is still used by many companies. The major difference between Spark and Hadoop is how they use memory. Hadoop writes intermediate results to disk whereas Spark tries to keep data in memory whenever possible. This makes Spark faster for many use cases.
  • Hadoop MapReduce - a system for processing and analyzing large data sets in parallel.
  • Hadoop YARN - a resource manager that schedules jobs across a cluster. The manager keeps track of what computer resources are available and then assigns those resources to specific tasks.
  • Hadoop Distributed File System (HDFS) - a big data storage system that splits data into chunks and stores the chunks across a cluster of computers.

As Hadoop matured, other tools were developed t

MongoDB Cheat Sheet

Show All Databases

show dbs

Show Current Database

@mepsrajput
mepsrajput / gcp.md
Last active September 12, 2020 15:53
GCP

sudo (SuperUser DO): run programs / commands with administrative privileges

apt-get

sudo apt-get update The first command you need to run in any Linux system after a fresh install. Updates the database and let your system know if there are newer packages available or not.

sudo apt-get upgrade For upgrading all the packages with available updates.

@mepsrajput
mepsrajput / as.md
Last active May 5, 2020 11:39
assignment
  • ACCOUNT_TABLE; Data ACCOUNT_TABLE;

    infile DATALINES delimiter=','; INPUT FirstName $ LastName $ Age Gender $;

    DATALINES; x,y,23,Male z,w,45,Female a,b,64,Male

@mepsrajput
mepsrajput / dppm.md
Last active September 13, 2020 08:30
Data Pre-processing Master

Data Preprocessing

In any ML process, Data Preprocessing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it.

Feature

A feature is an individual measurable property or characteristic of a phenomenon being observed. alt text

Types of Features

@mepsrajput
mepsrajput / sql.md
Last active November 11, 2020 05:25
SQL Notes

Operators in the where clause

  1. Equal (=)
  2. Greater Than (>)
  3. Less Than (<)
  4. Greater Than or Equal (>=)
  5. Less Than or Equal (<=)
  6. Not Equal (<>)
  7. BETWEEN () : Between a certain range
  8. LIKE () : Search for a pattern
@mepsrajput
mepsrajput / Dockerfile
Last active May 28, 2022 15:36
Apache Airflow / Cloud Composer
# This basically installs some dependencies, adds two SQL scripts and runs a provided SH script.
FROM apache/airflow:2.0.0-python3.7
USER root
# INSTALL TOOLS
RUN apt-get update \
&& apt-get -y install libaio-dev \
&& apt-get install postgresql-client
RUN mkdir extra
USER airflow
@mepsrajput
mepsrajput / dsa.md
Last active November 23, 2021 12:25

DS Flow Chart

BIG-O Cheatsheet

BIG O'S

O(1) Constant - no loops O(log N) Logarithmic - usually searching algorithms have log n if they are sorted (Binary Search) O(n) Linear - for loops, while loops through n items O(n log(n)) Log Linear - usually sorting operations O(n^2) Quadratic - every element in a collection needs to be compared to ever other element. Two nested loops O(2^n) Exponential - recursive algorithms that solves a problem of size N

@mepsrajput
mepsrajput / adruino_uno.md
Last active December 2, 2021 19:44
Adruino Uno
// constants won't change. They're used here to set pin numbers:
const int buttonPin = 2;     // the number of the pushbutton pin
const int ledPin =  13;      // the number of the LED pin

// variables will change:
int buttonState = 0;         // variable for reading the pushbutton status

void setup() {
 // initialize the LED pin as an output: