Skip to content

Instantly share code, notes, and snippets.

@joshisa
Created February 3, 2017 02:27
Show Gist options
  • Save joshisa/297b0bc1ec0dcdda0d1625029711fa24 to your computer and use it in GitHub Desktop.
Save joshisa/297b0bc1ec0dcdda0d1625029711fa24 to your computer and use it in GitHub Desktop.
Parsing of URLs using bash sh scripting
#!/bin/bash
# Referenced and tweaked from http://stackoverflow.com/questions/6174220/parse-url-in-shell-script#6174447
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
userpass="$(echo $url | grep @ | cut -d@ -f1)"
pass="$(echo $userpass | grep : | cut -d: -f2)"
if [ -n "$pass" ]; then
user="$(echo $userpass | grep : | cut -d: -f1)"
else
user=$userpass
fi
# extract the host
host="$(echo ${url/$user@/} | cut -d/ -f1)"
# by request - try to extract the port
port="$(echo $host | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"
echo "url: $url"
echo " proto: $proto"
echo " user: $user"
echo " pass: $pass"
echo " host: $host"
echo " port: $port"
echo " path: $path"
@amitizle
Copy link

amitizle commented Apr 14, 2019

That's pretty cool, thanks for sharing!
Just a small note: it won't work if you've a simple http authentication with user that includes @ (i.e. https://[email protected]:[email protected]:443/p/a/t/h)

@dereks
Copy link

dereks commented Jan 21, 2021

Updated Version (2021-01-21T19:39:05Z):

  • Removes use of sed -e, replaces it with cut and rev
  • Clarifies variable $url to be $url_no_protocol
  • Removes $port number from $host
  • Applies @amirmasud fix
  • Fixes @amitizle bug report (now works with @ in the username or password)
  • Makes the $protocol be lower-case (for easy string compare)
# Inspired by: https://gist.github.com/joshisa/297b0bc1ec0dcdda0d1625029711fa24
# Referenced and tweaked from http://stackoverflow.com/questions/6174220/parse-url-in-shell-script#6174447
url="$1"

protocol=$(echo "$1" | grep "://" | sed -e's,^\(.*://\).*,\1,g')
# Remove the protocol
url_no_protocol=$(echo "${1/$protocol/}")
# Use tr: Make the protocol lower-case for easy string compare
protocol=$(echo "$protocol" | tr '[:upper:]' '[:lower:]')

# Extract the user and password (if any)
# cut 1: Remove the path part to prevent @ in the querystring from breaking the next cut
# rev: Reverse string so cut -f1 takes the (reversed) rightmost field, and -f2- is what we want
# cut 2: Remove the host:port
# rev: Undo the first rev above 
userpass=$(echo "$url_no_protocol" | grep "@" | cut -d"/" -f1 | rev | cut -d"@" -f2- | rev)
pass=$(echo "$userpass" | grep ":" | cut -d":" -f2)
if [ -n "$pass" ]; then
  user=$(echo "$userpass" | grep ":" | cut -d":" -f1)
else
  user="$userpass"
fi

# Extract the host
hostport=$(echo "${url_no_protocol/$userpass@/}" | cut -d"/" -f1)
host=$(echo "$hostport" | cut -d":" -f1)
port=$(echo "$hostport" | grep ":" | cut -d":" -f2)
path=$(echo "$url_no_protocol" | grep "/" | cut -d"/" -f2-)

echo "url: $url"
echo "  protocol: $protocol"
echo "  userpass: $userpass"
echo "  user: $user"
echo "  pass: $pass"
echo "  host: $host"
echo "  port: $port"
echo "  path: $path"

Example:

url: sftp://[email protected]:mypass@[email protected]:1/home/odroid/dump::1.txt
  protocol: sftp://
  userpass: [email protected]:mypass@home
  user: [email protected]
  pass: mypass@home
  host: 192.168.2.162
  port: 1
  path: home/odroid/dump::1.txt

@andy-shev
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment