Skip to content

Instantly share code, notes, and snippets.

View jjsantanna's full-sized avatar

jjsantanna jjsantanna

View GitHub Profile
@jjsantanna
jjsantanna / jp_json_bash
Last active July 21, 2019 22:23
usage of jp on a json.gz file
gzcat raw-daily-2019-07-17.json.gz |jq "{data, ip_str, port, location, asn}" -n > shodan_20190717_lessfields.json
cat shodan.json | jq -r '. | {ip: .ip_str, port: .port, cc: .location.country_code3, data: .data} | @json' > /tmp/shodan1.txt
-----------
shodan_raw_filename='shodan_output.json.gz'
import json
import gzip
@jjsantanna
jjsantanna / .gitignore
Created March 29, 2019 07:55
List of file extensions to ignore (considering latex repo)
.DS_Store
*.log
*.aux
*.dvi
*.lof
*.lot
*.bit
*.idx
*.glo
*.bbl
@jjsantanna
jjsantanna / merge_fingerprint_log_summaries.py
Created March 25, 2019 13:52
merge_ddos_fingerpring_log_summary.py
mypath ="/Users/santannajj/Desktop"
import os
import pandas as pd
import numpy as np
for file in os.listdir(mypath):
if file.endswith(".log"):
path_file = os.path.join(mypath, file)
@jjsantanna
jjsantanna / merge_summaries_ddosdissector.sh
Last active March 3, 2019 23:32
getting all the last lines of multi-vector attacks logs (output from ddos_dissector) and outputting a .csv
#!/bin/bash
rm all_summaries.csv; ls |grep .log |while read file; do echo $file; tail -1 $file >> all_summaries.csv; done
@jjsantanna
jjsantanna / top_n_others.py
Created February 11, 2019 06:09
Calculate the Top N statistics from a dataframe series and the remaining info is grouped as 'others'
def top_n_dataframe(n,dataframe_field):
top_n = n
field_name = dataframe_field.name
top = dataframe_field.value_counts()[:top_n].to_frame().reset_index()
new_row = pd.DataFrame(data = {
'hits' : [ dataframe_field.value_counts()[top_n:].sum()],
field_name : ['others'],
})
@jjsantanna
jjsantanna / readHdf5folderDataframe.py
Last active February 7, 2019 08:57
Function for loading serveral .hdf5 files from a folder into a single dataframe
import glob #for reading recursively
import pandas as pd
def read_NL_dataset(output_dir):
all_files = glob.glob(output_dir+'/<FILE_NAMEish>*.hdf5')
df_all = pd.DataFrame()
print(all_files)
for all_file in all_files:
df_temp = pd.read_hdf(all_file, 'shodan')
@jjsantanna
jjsantanna / wildcard_ips.py
Created October 20, 2018 09:27
Wildcard IP of reserved IPs
### CRAWLER wikipedia for reserved IPs
import cfscrape
from lxml import etree
import pandas as pd
url="https://en.wikipedia.org/wiki/Reserved_IP_addresses"
scraper = cfscrape.create_scraper()
scraped_html=scraper.get(url).content
tables = pd.read_html(scraped_html) # Returns list of all tables on page
reserved_ipv4=tables[0][0].drop(0)
reserved_ipv6=tables[1][0].drop(0)
@jjsantanna
jjsantanna / source_motor_price.md
Last active September 5, 2018 11:49
moto buy crawler analyser

Notes to buy a motocicle

Types of motorcycle:

  • Sport Touring
  • Adventure (ADV)
  • Standard
  • (Super)Sport
  • Touring
  • Cafe racer
  • Roadster
@jjsantanna
jjsantanna / plotly_example.py
Created July 17, 2018 08:45
Example plotly
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import plotly
plotly.offline.init_notebook_mode()
# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)
df['ip']=df['column'].str.extract('([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)')
##########################
########### IP to hostname
import socket
import pandas as pd
def ip2hostname (df_series_ip):
df_output = pd.DataFrame()
for ip in df_series_ip: