jjsantanna jjsantanna

Cloud Security Lead at Northwave; Enthusiastic Teacher & Researcher; & Big Data Analyst

jjsantanna / jp_json_bash

Last active July 21, 2019 22:23

usage of jp on a json.gz file

	gzcat raw-daily-2019-07-17.json.gz \|jq "{data, ip_str, port, location, asn}" -n > shodan_20190717_lessfields.json

	cat shodan.json \| jq -r '. \| {ip: .ip_str, port: .port, cc: .location.country_code3, data: .data} \| @json' > /tmp/shodan1.txt

	-----------
	shodan_raw_filename='shodan_output.json.gz'

	import json
	import gzip

jjsantanna / .gitignore

Created March 29, 2019 07:55

List of file extensions to ignore (considering latex repo)

jjsantanna / merge_fingerprint_log_summaries.py

Created March 25, 2019 13:52

merge_ddos_fingerpring_log_summary.py

	mypath ="/Users/santannajj/Desktop"

	import os
	import pandas as pd
	import numpy as np


	for file in os.listdir(mypath):
	if file.endswith(".log"):
	path_file = os.path.join(mypath, file)

jjsantanna / merge_summaries_ddosdissector.sh

Last active March 3, 2019 23:32

getting all the last lines of multi-vector attacks logs (output from ddos_dissector) and outputting a .csv

	#!/bin/bash
	rm all_summaries.csv; ls \|grep .log \|while read file; do echo $file; tail -1 $file >> all_summaries.csv; done

jjsantanna / top_n_others.py

Created February 11, 2019 06:09

Calculate the Top N statistics from a dataframe series and the remaining info is grouped as 'others'

	def top_n_dataframe(n,dataframe_field):

	top_n = n
	field_name = dataframe_field.name
	top = dataframe_field.value_counts()[:top_n].to_frame().reset_index()

	new_row = pd.DataFrame(data = {
	'hits' : [ dataframe_field.value_counts()[top_n:].sum()],
	field_name : ['others'],
	})

jjsantanna / readHdf5folderDataframe.py

Last active February 7, 2019 08:57

Function for loading serveral .hdf5 files from a folder into a single dataframe

	import glob #for reading recursively
	import pandas as pd

	def read_NL_dataset(output_dir):
	all_files = glob.glob(output_dir+'/<FILE_NAMEish>*.hdf5')
	df_all = pd.DataFrame()
	print(all_files)

	for all_file in all_files:
	df_temp = pd.read_hdf(all_file, 'shodan')

jjsantanna / wildcard_ips.py

Created October 20, 2018 09:27

Wildcard IP of reserved IPs

	### CRAWLER wikipedia for reserved IPs
	import cfscrape
	from lxml import etree
	import pandas as pd
	url="https://en.wikipedia.org/wiki/Reserved_IP_addresses"
	scraper = cfscrape.create_scraper()
	scraped_html=scraper.get(url).content
	tables = pd.read_html(scraped_html) # Returns list of all tables on page
	reserved_ipv4=tables[0][0].drop(0)
	reserved_ipv6=tables[1][0].drop(0)

jjsantanna / source_motor_price.md

Last active September 5, 2018 11:49

moto buy crawler analyser

Notes to buy a motocicle

jjsantanna / plotly_example.py

Created July 17, 2018 08:45

Example plotly

	import matplotlib
	import matplotlib.pyplot as plt
	import numpy as np

	import plotly
	plotly.offline.init_notebook_mode()

	# Data for plotting
	t = np.arange(0.0, 2.0, 0.01)
	s = 1 + np.sin(2 * np.pi * t)

jjsantanna / ip address enrichments.py

Last active March 6, 2022 21:12

IP2ASN IP2host