Skip to content

Instantly share code, notes, and snippets.

View atulkumar2's full-sized avatar

Atul Kumar atulkumar2

  • Freelance
  • Bangalore
View GitHub Profile
-------------------------------------------------------- Edit to Enlarge ----------------------------------------------
Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.
Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.
SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
Bigdata is like combination of bunch of subjects. Mainly require programming, analysis, nlp, MLP, mathematics.
To see links, Go : http://www.quora.com/What-are-some-good-sources-to-learn-big-data
Here are bunch of courses I came accross:
Introduction to CS Course
Notes: Introduction to Computer Science Course that provides instructions on coding.
Online Resources:
Udacity - intro to CS course,
Coursera - Computer Science 101
@atulkumar2
atulkumar2 / replace_names.py
Created July 28, 2018 15:52
Rename all folders with group matching and wildcard matching
# Search for files and folders whose names matches pattern 'M*S*'
# and rename those to names like 'M0*S*'
import os, re, glob
for item in glob.glob('^M*S*'):
print(item)
os.rename(item, re.sub(r'(\d)', r'0\1', item, count=1))
@atulkumar2
atulkumar2 / pgdmlai_external_links.md
Last active July 28, 2018 16:29
External Links for PGDMLAI course from upgrad.com
@atulkumar2
atulkumar2 / aws-delete-config-rules.sh
Last active April 5, 2019 15:07
Delete config rules from AWS regions
# This will delete all AWS config rules from regions listed in the string regions below
# Run it on one region first to confirm
# To run, python 3.* with awscli package is needed
# For anaconda, use https://anaconda.org/conda-forge/awscli to install package
# List of AWS regions https://docs.aws.amazon.com/general/latest/gr/rande.html
# https://docs.aws.amazon.com/config/latest/developerguide/evaluate-config_manage-rules.html
# There is dry run option in config rules so beware
#regions=(us-east-2 us-east-1 us-west-1 us-west-2 ap-south-1 ap-northeast-2 ap-southeast-1 ap-southeast-2 ap-northeast-1 ca-central-1 cn-north-1 cn-northwest-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 eu-north-1 sa-east-1 us-gov-east-1 us-gov-west-1)
regions=(ap-southeast-2 ap-northeast-1 ap-northeast-2 ca-central-1 cn-north-1 cn-northwest-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 eu-north-1 sa-east-1)
Install a lot of Data science packages at once (They bring in lot of dependencies with them)
conda install pandas seaborn notebook scikit-learn lxml json requests
Update all conda environments at once in Linux (except base)
for env in $(conda env list | cut -d" " -f1 | tail -n+4); do conda update -y -n $env --all; done
for base, do as follows
conda update -y -all -n base
Update all conda environments at once in Windows 10 (Using Anaconda Poweshell prompt)
conda env list | %{$_.split(' ')[0];} | Select-String -notmatch '#' | %{conda update -y -n $_ --all}
import logging
IS_LOGGING_INITIALIZED = False
class LogFile:
filename = 'mylogging.log'
maxBytes = (1000*5000)
backupCount = 10
format = '%(asctime)s %(message)s'
datefmt = '%m/%d/%Y %I:%M:%S %p'
@atulkumar2
atulkumar2 / continuous_screenshot.py
Last active April 9, 2021 18:05
Take screenshot of the whole screen at regular intervals
###############################################################################
# Takes screenshot at defined interval and saves at a defined path
# Minimize the program window for proper screenshot
# Install package pyautogui in a separate conda environment
# Prefer to use anaconda.org to create environment
# Created on 2021-04-09
###############################################################################
import os
import time
@atulkumar2
atulkumar2 / Python-2spaces-fromat.md
Created April 27, 2022 06:15
Format Python files using 2 spaces in VSCode
@atulkumar2
atulkumar2 / date_datetime.py
Created April 29, 2022 08:39
Check some date and datetime functioning
from datetime import datetime, timedelta
date1 = datetime.now()
date2 = date1 + timedelta(days=1)
print(date1)
print(date2)
print(date1 > date2)