Skip to content

Instantly share code, notes, and snippets.

@bobquest33
bobquest33 / extract_email_currentjobs.py
Created June 4, 2017 06:55
Extracting Email IDs from a Html Page using Beautiful Soup, html2text, Regular Expression https://bigdatacv.com/currentjobs/
import requests
from bs4 import BeautifulSoup
r = requests.get("https://bigdatacv.com/currentjobs/")
content = r.text
soup = BeautifulSoup(content, 'html.parser')
print(soup.prettify())
[s.extract() for s in soup('script')]
[s.extract() for s in soup('style')]
[s.extract() for s in soup('img')]
@bobquest33
bobquest33 / script_34_file_handlers.py
Last active May 18, 2017 20:59
Contains the code for recovering the Sticky Notes Data from Windows 7
from rtf.Rtf2Markdown import getMarkdown
import olefile
import sys
import chardet
import json
#Gets the notes from the File path where Sticky Notes backup is preent
def get_notes(sticky_notes_file_path):
notes = []
@bobquest33
bobquest33 / chat.py
Created May 18, 2017 14:34 — forked from gregvish/chat.py
Python 3.4 asyncio chat server example
from socket import socket, SO_REUSEADDR, SOL_SOCKET
from asyncio import Task, coroutine, get_event_loop
class Peer(object):
def __init__(self, server, sock, name):
self.loop = server.loop
self.name = name
self._sock = sock
self._server = server
Task(self._peer_handler())
@bobquest33
bobquest33 / masked_data.txt
Created May 12, 2017 13:06
output of masked data
"MT0000048877";"Street Address for MT0000048877";"Secondary Address for MT0000048877";"Postal Code for MT0000048877";"City for MT0000048877";"San Marino";"SM";"Zip Code for MT0000048877";"RVVGAT2B4XXXX";"Tel No for MT0000048877";"Email ID for MT0000048877";"Contact Person for MT0000048877";"Company Name MT0000048877";"Fax Num MT0000048877"
"UG0000055142";"Street Address for UG0000055142";"Secondary Address for UG0000055142";"Postal Code for UG0000055142";"City for UG0000055142";"Colombia";"CO";"Zip Code for UG0000055142";"BCOEESMM0XXXX";"Tel No for UG0000055142";"Email ID for UG0000055142";"Contact Person for UG0000055142";"Company Name UG0000055142";"Fax Num UG0000055142"
"AE0000060766";"Street Address for AE0000060766";"Secondary Address for AE0000060766";"Postal Code for AE0000060766";"City for AE0000060766";"Seychelles";"SC";"Zip Code for AE0000060766";"IMEXUA2XKXXXX";"Tel No for AE0000060766";"Email ID for AE0000060766";"Contact Person for AE0000060766";"Company Name AE0000060766";"Fax Num AE0000060766"
@bobquest33
bobquest33 / script_32_mask_data.py
Created May 12, 2017 12:59
For the purposes of masking the data, I have created the below script, I only worked on 100 records because of the limitations on my system allocating only 1GB driver memory at the end of which there is not enough Heap Size for the data to processed for multiple data frames.Hence one major issues that I faced is that you not only need lot of mem…
import os
import sys
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark.sql import DataFrameReader
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
@bobquest33
bobquest33 / script_32_load_data.py
Last active March 9, 2020 05:07
The below script helps to load the data to a database using Pyspark. I used the following command to load the below data and it created a new table with appropriate data types in Postgres. This a very good feature I liked of PySpark data frames.
import os
import sys
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark.sql import DataFrameReader
conf = SparkConf().setAppName('Simple App')
sc = SparkContext("local", "Simple App")
@bobquest33
bobquest33 / script_31_faker_data_gen.py
Created May 12, 2017 11:59
The below code uses Faker library which has many functions to get random values for address, people, telephone number and various other types of data. I used the Faker library and randomized form for BIC data add to the test data.
from faker import Faker
from random import randint
import pycountry
import pandas as pd
fake = Faker()
df = pd.read_csv("total_bic.csv")
swift_bics=list(df["swift"])
val = {}
@bobquest33
bobquest33 / script_30_clean_bic_script.py
Created May 12, 2017 11:03
The following script reads Bic data from Pickle file and filter it and saved the refined data into another pickle file.
import requests
import sys
from bs4 import BeautifulSoup
from tqdm import tqdm
import re
import time
import traceback
import os
import pickle
from fake_useragent import UserAgent
@bobquest33
bobquest33 / script_29_extract_mic.py
Last active May 12, 2017 10:58
Extracting Bic addresses from http://www.bankswiftcode.org using a Web Scraper
import requests
import sys
from bs4 import BeautifulSoup
from tqdm import tqdm
import re
import time
import traceback
import os
import pickle
from fake_useragent import UserAgent
@bobquest33
bobquest33 / loans_rest_server.py
Created May 8, 2017 02:35
100 Scripts in 30 Days challenge: Script 28 Python eve for Restful endpoints for Databases
''' Trivial Eve-SQLAlchemy example. '''
from eve import Eve
from sqlalchemy import Column, Integer, String, DateTime, Date,Boolean,ForeignKey,Float
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import column_property,relationship,mapper
from eve_sqlalchemy import SQL
from eve_sqlalchemy.decorators import registerSchema
from eve_sqlalchemy.validation import ValidatorSQL
import datetime
from faker import Factory