F W Saarah Hossain fwSara95h

HTML Heading Nesting Parser

This gist contains a Python tool for converting flat HTML documents—especially those with nested content indicated only by header levels (like <h1>, <h2>, etc.)—into a structured JSON object that reflects the implicit hierarchy.

Originally inspired by this StackOverflow question from @psychicesp, this script is useful for:

Web scraping content-heavy pages (like menus, outlines, legal docs)
Explicitly inferring structure from heading levels
Preserving content hierarchy when working with markdown-to-HTML output

Extract JavaScript variables from BeautifulSoup objects

INPUTS

inpX: must be a bs4 document/tag/ResultSet or a string or a list of strings
- ( target variable must be JSON and seaparated from other variables by ; )
varName: name of the target variable
- ( only the first variable found with the specified name will be returned )
selector: a CSS selector for searching the bs4 document/tag for target script
- ( if inpX is a script-tag/ResultSet/string/list then selector doesn't matter )
prepFn: should be a univariate function that takes a string and returns a string
- ( for modifying the script string before searching for and parsing variable )

	from datetime import date, datetime, timedelta

	def get_midpoint(sd:date, ed:date, eod=True, is_v:bool=True, toRet=None):
	# Calculate the midpoint
	start_date = datetime(sd.year, sd.month, sd.day,
	23 if eod else 0, 59 if eod else 0)
	end_date = datetime(ed.year, ed.month, ed.day, 0, 1 if eod else 0)
	total_duration = (end_date - start_date).total_seconds()
	half_duration = total_duration // 2 # integer quotient
	midpoint_dt = start_date + timedelta(seconds=half_duration)

	### Variables + zone & region

	YOUR_REGION = "" # scenario europe-west1
	YOUR_ZONE = "" #scenario europe-west1-b
	INSTANCE_NAME = "" #task1 nucleus-jumphost-640
	CLUSTER_NAME = "" #task2 [nucleus-backend] [not provided - made up]
	APP_PORT_NUMBER = "" #task2 8080
	FIREWALL_RULE = "" #task3 accept-tcp-rule-846