Created
February 3, 2017 02:27
-
-
Save joshisa/297b0bc1ec0dcdda0d1625029711fa24 to your computer and use it in GitHub Desktop.
Parsing of URLs using bash sh scripting
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Referenced and tweaked from http://stackoverflow.com/questions/6174220/parse-url-in-shell-script#6174447 | |
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')" | |
# remove the protocol | |
url="$(echo ${1/$proto/})" | |
# extract the user (if any) | |
userpass="$(echo $url | grep @ | cut -d@ -f1)" | |
pass="$(echo $userpass | grep : | cut -d: -f2)" | |
if [ -n "$pass" ]; then | |
user="$(echo $userpass | grep : | cut -d: -f1)" | |
else | |
user=$userpass | |
fi | |
# extract the host | |
host="$(echo ${url/$user@/} | cut -d/ -f1)" | |
# by request - try to extract the port | |
port="$(echo $host | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')" | |
# extract the path (if any) | |
path="$(echo $url | grep / | cut -d/ -f2-)" | |
echo "url: $url" | |
echo " proto: $proto" | |
echo " user: $user" | |
echo " pass: $pass" | |
echo " host: $host" | |
echo " port: $port" | |
echo " path: $path" |
That's pretty cool, thanks for sharing!
Just a small note: it won't work if you've a simple http authentication with user that includes @
(i.e. https://[email protected]:[email protected]:443/p/a/t/h
)
Updated Version (2021-01-21T19:39:05Z):
- Removes use of sed -e, replaces it with cut and rev
- Clarifies variable $url to be $url_no_protocol
- Removes $port number from $host
- Applies @amirmasud fix
- Fixes @amitizle bug report (now works with
@
in the username or password) - Makes the $protocol be lower-case (for easy string compare)
# Inspired by: https://gist.github.com/joshisa/297b0bc1ec0dcdda0d1625029711fa24
# Referenced and tweaked from http://stackoverflow.com/questions/6174220/parse-url-in-shell-script#6174447
url="$1"
protocol=$(echo "$1" | grep "://" | sed -e's,^\(.*://\).*,\1,g')
# Remove the protocol
url_no_protocol=$(echo "${1/$protocol/}")
# Use tr: Make the protocol lower-case for easy string compare
protocol=$(echo "$protocol" | tr '[:upper:]' '[:lower:]')
# Extract the user and password (if any)
# cut 1: Remove the path part to prevent @ in the querystring from breaking the next cut
# rev: Reverse string so cut -f1 takes the (reversed) rightmost field, and -f2- is what we want
# cut 2: Remove the host:port
# rev: Undo the first rev above
userpass=$(echo "$url_no_protocol" | grep "@" | cut -d"/" -f1 | rev | cut -d"@" -f2- | rev)
pass=$(echo "$userpass" | grep ":" | cut -d":" -f2)
if [ -n "$pass" ]; then
user=$(echo "$userpass" | grep ":" | cut -d":" -f1)
else
user="$userpass"
fi
# Extract the host
hostport=$(echo "${url_no_protocol/$userpass@/}" | cut -d"/" -f1)
host=$(echo "$hostport" | cut -d":" -f1)
port=$(echo "$hostport" | grep ":" | cut -d":" -f2)
path=$(echo "$url_no_protocol" | grep "/" | cut -d"/" -f2-)
echo "url: $url"
echo " protocol: $protocol"
echo " userpass: $userpass"
echo " user: $user"
echo " pass: $pass"
echo " host: $host"
echo " port: $port"
echo " path: $path"
Example:
url: sftp://[email protected]:mypass@[email protected]:1/home/odroid/dump::1.txt
protocol: sftp://
userpass: [email protected]:mypass@home
user: [email protected]
pass: mypass@home
host: 192.168.2.162
port: 1
path: home/odroid/dump::1.txt
For the sake of reference: https://stackoverflow.com/questions/6174220/parse-url-in-shell-script
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey friend,
you should replace
host="$(echo ${url/$user@/} | cut -d/ -f1)"
with
host="$(echo ${url/$user:$pass@/} | cut -d/ -f1)"
Also, I wonder whether the host should contain the port itself? 🤔
Why not remove the port from the host part after deriving the port?
(
host=${host/:$port}
)