This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# forbes2kMiner.py | |
# Python 3.4 | |
""" | |
Extracts the Forbes Global 2000 list of companies and imports into a CSV file | |
Since Forbes is a JS rendered site, selenium is used to mimic user action | |
BeautifulSoup is used to scrape html content | |
Since selenium is used, Firefox is needed as webdiver | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# jsonToCSV.py | |
# Python 2.7.6 | |
''' | |
Place all the json payloads as separate text files in base folder | |
Program will extract each payload and generate single csv file | |
csv file will have key value pairs in separate columns | |
''' | |
import json |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Python 2.7.6 | |
# timeZoneExplorer.py | |
from pytz import timezone, common_timezones # import all_timezones for more exhaustive list | |
from datetime import datetime | |
import os | |
# Log file will be created in the same folder as the python script | |
my_path = "." | |
log_path = os.path.join(my_path + "/" + "loc_log.txt") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# python 2.7.6. | |
# portScanner.py | |
import socket | |
from datetime import datetime | |
import sys | |
# Here we are scanning your own terminal | |
# Replace this with gethostbyname("host") to scan a remote host |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# pdfTextMiner.py | |
# Python 2.7.6 | |
# For Python 3.x use pdfminer3k module | |
# This link has useful information on components of the program | |
# https://euske.github.io/pdfminer/programming.html | |
# http://denis.papathanasiou.org/posts/2010.08.04.post.html | |
''' Important classes to remember | |
PDFParser - fetches data from pdf file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# fileExplorer.py | |
# python 2.7.6 | |
import os | |
# defaultdict is used to have keys created if it doesn't exist or appended it if exists | |
from collections import defaultdict | |
folder_count = 0 | |
file_count = 0 | |
loop_count = 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# getHttpHeader.py | |
# Python 2.7.6 | |
import requests | |
from requests.auth import HTTPDigestAuth | |
import getpass # To mask the password typed in | |
# Replace with the correct URL | |
url = "http://some_url" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# RestfulPostClient.py | |
# Python 2.7.6 | |
import requests | |
from requests.auth import HTTPDigestAuth | |
# import json # Json module is not required as we are directly passing json to requests | |
# Replace with the correct URL | |
url = "http://api_url" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Python 2.7.6 | |
#RestfulClient.py | |
import requests | |
from requests.auth import HTTPDigestAuth | |
import json | |
# Replace with the correct URL | |
url = "http://api_url" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
__author__ = 'Vinoth_Subramanian' | |
# Python3 | |
# pdfInvoiceMiner.py | |
# Program to extract the client info and invoice no from a bunch of invoice pdf files | |
# pdfminer3k library is used to extract text from pdf | |
# PyPDF2 library does not extract the text from pdf properly | |
# place all the invoice pdf files within a folder named "INVOICE" | |
# place an excel file named "invoice_info.xlsx" in the parent folder of "INVOICE" | |
# First column - invoice no; Second column - client details |