Elias Dabbas eliasdabbas

💭

https://github.com/eliasdabbas/langchain-advertools

#DigitalMarketing meets #DataScience #advertools #Python #Dashboards #SEO #SEM #Plotly #Dash

369 followers · 52 following

The Media Supermarket
Online
04:20 (UTC +01:00)
https://adver.tools/
@eliasdabbas
in/eliasdabbas

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

eliasdabbas / unique_external_links_per_domain.py

Last active September 10, 2022 13:17

	import advertools as adv
	import pandas as pd
	pd.options.display.max_columns = None

	homepage = 'https://example.com/' # <--- change this
	domain = 'example.com' # <--- and this

	adv.crawl(homepage, 'output_file.jl', follow_links=True,
	custom_settings={'LOG_FILE': 'output_file.log'})

eliasdabbas / company_marketcap_interactive_scatter.py

Last active June 11, 2022 12:40

Interactive emailable HTML chart of top 500 companies. Users can select which countries to display

	import plotly.express as px
	import pandas as pd
	import requests

	dflist = []
	for i in range(1, 6):
	resp = requests.get(f'https://companiesmarketcap.com/page/{i}/')
	df = pd.read_html(resp.text)[0]
	dflist.append(df)

eliasdabbas / mutlilevel_treemap.py

Last active June 10, 2022 17:03

	import plotly.express as px

	def treemap(traffic_df, metric='Users', path=['Medium', 'Source']):
	"""Make in interactive treemap for two data dimensions/levels.

	Parameters:
	-----------

	traffic_df : A DataFrame containing two dimensions, and one or more metrics

eliasdabbas / dress_serp_heatmap.py

Last active April 29, 2023 19:17

Dress SERP heat-map: "dress type styles" and "shop dress type". 40 types. 4 countries: US, UK, CA, AU

	import advertools as adv
	import pandas as pd
	import plotly
	import plotly.graph_objects as go

	pd.options.display.max_columns = None

	cx = 'YOUR_CSE_ID'
	key = 'YOUR_GOOGLE_DEV_KEY'

eliasdabbas / cancer_serps.py

Last active April 29, 2023 19:03

	import advertools as adv
	import pandas as pd
	pd.options.display.max_columns = None


	# Copied from https://en.wikipedia.org/wiki/List_of_cancer_types

	cancers = {
	"Chondrosarcoma": "Bone and muscle sarcoma" ,
	"Ewing's sarcoma": "Bone and muscle sarcoma" ,

eliasdabbas / serp_heatmap.py

Last active February 2, 2024 22:58

Create a heatmap of SERPs, using a table with columns: "keyword", "rank", and "domain"

	import plotly.graph_objects as go
	import pandas as pd

	def serp_heatmap(df, num_domains=10, select_domain=None):
	df = df.rename(columns={'domain': 'displayLink',
	'searchTerms': 'keyword'})
	top_domains = df['displayLink'].value_counts()[:num_domains].index.tolist()
	top_domains = df['displayLink'].value_counts()[:num_domains].index.tolist()
	top_df = df[df['displayLink'].isin(top_domains) & df['displayLink'].ne('')]

eliasdabbas / crawl_multiple_sites.py

Last active April 27, 2022 08:56

Crawl multiple websites with one for loop, while saving the output, logs, and job status separately for each website. Resume crawling any time simply be re-running the same code

	from urllib.parse import urlsplit

	import advertools as adv


	sites = [
	'https://www.who.int',
	'https://www.nytimes.com',
	'https://www.washingtonpost.com',
	]

eliasdabbas / flag.py

Created March 19, 2022 13:09

	from unicodedata import lookup

	def flag(cc):
	l1 = lookup(f'REGIONAL INDICATOR SYMBOL LETTER {cc[0]}')
	l2 = lookup(f'REGIONAL INDICATOR SYMBOL LETTER {cc[1]}')
	return l1 + l2

eliasdabbas / parse_news_sitemaps.py

Created March 14, 2022 18:19

	import datetime

	import advertools as adv
	import pandas as pd


	stopwords = ['to', 'of', 'the', 'in', 'for', 'and', 'on', 'a', 'as', 'with',
	'from', 'over', 'is', 'at', '—', '-', 'be', '2022', '–', 'it', 'by',
	'we', 'why', 'but', 'my', 'how', 'not', 'an', 'are', 'no', 'go',
	'your', 'up', 'his']

eliasdabbas / robots_sitemaps_urls_wordfreq.sh

Last active April 6, 2022 20:35

Fetch robots.txt file, get relevant XML sitemap, extract and split URLs, count words in article titles. Watch for more details: https://bit.ly/3HMZC0A

	# pip install advertools==0.14.0a7

	# get the robots.txt file, save to csv:
	advertools robots --url https://www.economist.com/robots.txt econ_robots.csv

	# find lines that start with sitemap, save to variable sitemap_url
	sitemap_url=$(grep ^sitemap -i econ_robots.csv \| cut -d , -f 2)

	# get the sitemap index file without downloading the sub-sitemaps (not recursive),
	advertools sitemaps $sitemap_url econ_sitemap.csv --recursive 0

Newer Older