vinovator’s gists

vinovator / forbes2kMiner.py

Last active March 2, 2018 22:46

Scrape JS rendered website using Selenium, PhantomJS and BeautifulSoup and wrangle the data using pandas. Extract Forbes 2000 list, process and import to csv file.

	# forbes2kMiner.py
	# Python 3.4


	"""
	Extracts the Forbes Global 2000 list of companies and imports into a CSV file
	Since Forbes is a JS rendered site, selenium is used to mimic user action
	BeautifulSoup is used to scrape html content
	Since selenium is used, Firefox is needed as webdiver
	"""

vinovator / jsonToCsv2.py

Last active November 3, 2015 13:59

Scans a JSON file and extracts the key value pairs to CSV

	# jsonToCSV.py
	# Python 2.7.6

	'''
	Place all the json payloads as separate text files in base folder
	Program will extract each payload and generate single csv file
	csv file will have key value pairs in separate columns
	'''

	import json

vinovator / timeZoneExplorer.py

Last active October 9, 2015 20:39

Simple query to fetch all common time zones and their current time

	# Python 2.7.6
	# timeZoneExplorer.py

	from pytz import timezone, common_timezones # import all_timezones for more exhaustive list
	from datetime import datetime
	import os

	# Log file will be created in the same folder as the python script
	my_path = "."
	log_path = os.path.join(my_path + "/" + "loc_log.txt")

vinovator / portScanner.py

Created October 8, 2015 15:39

Simple Python socket program to scan TCP ports

	# python 2.7.6.
	# portScanner.py

	import socket
	from datetime import datetime
	import sys

	# Here we are scanning your own terminal
	# Replace this with gethostbyname("host") to scan a remote host

vinovator / pdfTextMiner.py

Last active April 20, 2023 03:47

A sample code which uses pdfminer module to extract text from pdf files

	# pdfTextMiner.py
	# Python 2.7.6
	# For Python 3.x use pdfminer3k module
	# This link has useful information on components of the program
	# https://euske.github.io/pdfminer/programming.html
	# http://denis.papathanasiou.org/posts/2010.08.04.post.html


	''' Important classes to remember
	PDFParser - fetches data from pdf file

vinovator / fileExplorer.py

Last active October 2, 2015 19:44

Loop through a folder path and extract all files and sub-folders. Get count of files by extension.

	# fileExplorer.py
	# python 2.7.6

	import os
	# defaultdict is used to have keys created if it doesn't exist or appended it if exists
	from collections import defaultdict

	folder_count = 0
	file_count = 0
	loop_count = 0

vinovator / getHttpHeader.py

Last active October 2, 2015 14:24

Get the request header and response header from a http request-response sequence. Assumes that the url accepts digest authentication

	# getHttpHeader.py
	# Python 2.7.6

	import requests
	from requests.auth import HTTPDigestAuth
	import getpass # To mask the password typed in

	# Replace with the correct URL
	url = "http://some_url"

vinovator / RestfulPostClient.py

Last active October 2, 2015 14:04

A sample restful client for POST operation - assumes digest authentication

	# RestfulPostClient.py
	# Python 2.7.6

	import requests
	from requests.auth import HTTPDigestAuth
	# import json # Json module is not required as we are directly passing json to requests


	# Replace with the correct URL
	url = "http://api_url"

vinovator / RestfulGetClient.py

Last active January 24, 2025 18:52

A sample code to invoke GET method of restful API with digest authentication

	#Python 2.7.6
	#RestfulClient.py

	import requests
	from requests.auth import HTTPDigestAuth
	import json

	# Replace with the correct URL
	url = "http://api_url"

vinovator / pdfInvoiceMiner.py

Created July 26, 2015 19:03

From a set of invoice pdf files within a folder, extract the invoice number and client information and place them in an excel file

	__author__ = 'Vinoth_Subramanian'
	# Python3
	# pdfInvoiceMiner.py

	# Program to extract the client info and invoice no from a bunch of invoice pdf files
	# pdfminer3k library is used to extract text from pdf
	# PyPDF2 library does not extract the text from pdf properly
	# place all the invoice pdf files within a folder named "INVOICE"
	# place an excel file named "invoice_info.xlsx" in the parent folder of "INVOICE"
	# First column - invoice no; Second column - client details

HVS vinovator