Skip to content

Instantly share code, notes, and snippets.

View jjsantanna's full-sized avatar

jjsantanna jjsantanna

View GitHub Profile
@jjsantanna
jjsantanna / google_paid_translate
Created March 3, 2022 08:18
Google translation paid api
# ATTENTION: PAID solution from Google
# https://cloud.google.com/translate/docs/setup
# https://cloud.google.com/translate/docs/basic/quickstart
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="BLABLABLA.json"
def translate_text(target, text):
import six
from google.cloud import translate_v2 as translate
translate_client = translate.Client()
@jjsantanna
jjsantanna / colab_google_translate
Last active March 3, 2022 08:49
Google colab translate
I've tried many googltrans versions and other libraries; none worked except this one! Libraries that I've tried:
- !pip install googletrans; from googletrans import Translator
- !pip install googletrans==3.1.0a0; from googletrans import Translator
- !pip install googletrans==4.0.0-rc1; from googletrans import Translator
- !pip install google_trans_new; from google_trans_new import google_translator; translator = google_translator()
- !pip install translate ; from translate import Translator **WORKS TEMPORARILY**
- At Google colab googletrans==3.1.0a0 **WORKS**
from google.colab import files
uploaded = files.upload()
@jjsantanna
jjsantanna / color display pandas .py
Created December 21, 2021 16:13
color display pandas
df_test = df[df['score']>80]
def color_row(row):
value = row.loc["score"] #indicate which column this is based on
if value < 130:
color = 'pink'
elif value >130:
color = 'lightgreen'
else:
color = 'black'
@jjsantanna
jjsantanna / pie chart matplotlib.py
Last active December 20, 2021 17:11
pie chart matplotlib
import pandas as pd
import matplotlib.pyplot as plt # http://matplotlib.org/gallery.html
plt.style.use('ggplot') # https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html
fig = plt.figure(figsize=(5, 5))
ax = plt.subplot2grid((1,1), (0,0))
# ax.set_title("Distribution of message 'level'")
total = data.values.sum()
# !pip3 install OTXv2
from OTXv2 import OTXv2, IndicatorTypes
from pandas as pd
otx = OTXv2(conf_otx_key)
# https://otx.alienvault.com/indicator/file/c0202cf6aeab8437c638533d14563d35
md5hash2check = "c0202cf6aeab8437c638533d14563d35"
df_otx = pd.json_normalize(otx.get_indicator_details_full(IndicatorTypes.FILE_HASH_MD5,md5hash2check))
@jjsantanna
jjsantanna / crawler_TOR_selenium
Last active July 1, 2022 11:07
crawler TOR selenium
# From https://towardsdatascience.com/how-to-scrape-the-dark-web-53145add7033
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
tor_binary_path = "/Applications/Tor Browser.app/Contents/MacOS/firefox"
binary = FirefoxBinary(tor_binary_path)
!wget https://github.com/mozilla/geckodriver/releases/download/v0.29.1/geckodriver-v0.29.1-macos.tar.gz
!tar -xzvf geckodriver-v0.29.1-macos.tar.gz

Defining platform to process large amount of data

Requirement

  • [1. Transfer data -> 2. Store data -> 3. Parse data -> 4. Query data] -> 5. Use the result in a playbook environment (ex. in Jupyter notebook)
  • hot storage (no longer than 2 months)

Potential technologies

@jjsantanna
jjsantanna / ip timeseries matplotlib
Created February 19, 2020 05:49
IP timeseries matplotlib
user_name = '[email protected]'
################################################################################
# Data Processing
import pandas as pd
dataset = df[df['UserIds']==user_name][['CreationDate','ClientIP']]
dataset['ClientIP'] = dataset['ClientIP'].replace('','0.0.0.0').replace('<null>','0.0.0.0')
dataset['ip'] = dataset['ClientIP'].astype(str).apply(lambda x: x.split(':')[0] if len(x.split('.'))==4 and len(x.split(':'))==2 else x)
dataset = dataset.set_index('CreationDate')
dataset['ip_id'] = dataset['ip'].astype('category').cat.codes
categories = dataset.groupby(['ip','ip_id']).size().reset_index()
@jjsantanna
jjsantanna / figure matplotlib
Last active April 14, 2021 09:04
matplotib plot figure template
import pandas as pd
import matplotlib.pyplot as plt # http://matplotlib.org/gallery.html
plt.style.use('ggplot') # https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html
%matplotlib inline
fig = plt.figure(figsize=(5, 15))
ax = plt.subplot2grid((1,1), (0,0))
ax.set_title('Title')
# ax.set_ylabel("€ Per Month")
@jjsantanna
jjsantanna / ip2location
Created February 12, 2020 10:12
ip2location paid
#by João Ceron
import IP2Location
def ip2location_geo(pandaseries_ips):
ip2location = IP2Location.IP2Location()
ip2locationdb="IP-COUNTRY-REGION-CITY-LATITUDE-LONGITUDE-ISP.BIN"
ip2location.open(ip2locationdb)
# country