Skip to content

Instantly share code, notes, and snippets.

@RavenHursT
Created February 23, 2015 22:58
Show Gist options
  • Save RavenHursT/fe8a95a59109096ac1f8 to your computer and use it in GitHub Desktop.
Save RavenHursT/fe8a95a59109096ac1f8 to your computer and use it in GitHub Desktop.
Javascript extract root domain from URL
var extractRootDomain = function(url){
return url.match(/^https?\:\/\/([^\/?#]+)(?:[\/?#]|$)/i)[1].split('.').slice(-2).join('.');
};
@mdelcambre
Copy link

extractRootDomain('http://www.google.co.uk/blah')
"co.uk"

@ozgurrgul
Copy link

extractRootDomain("localhost")

Uncaught TypeError: Cannot read property '1' of null at extractRootDomain (<anonymous>:2:61) at <anonymous>:1:1

@RavenHursT
Copy link
Author

Probably better to change this to parse the given url first:

const parsed = new URL(window.location.href)

Then just do string and array manipulation based on what kinds of domains you're working w/ (-2 for .com's, -3 for .co.uk's, etc.)

parsed.hostname.split('.').slice(-2).join('.')

@maorkavod
Copy link

not working on domain like "http://google.co.jp"

@Martin-Luther
Copy link

Does not work with https://www.mesdroitssociaux.gouv.fr/accueil/

@gaetanlegac
Copy link

Does not work with https://www.mesdroitssociaux.gouv.fr/accueil/

You can try this one: https://github.com/scrapingapi/get-root-domain

@Martin-Luther
Copy link

Martin-Luther commented Jan 14, 2023

Does not work with https://www.mesdroitssociaux.gouv.fr/accueil/

You can try this one: https://github.com/scrapingapi/get-root-domain

Merci, mais c'est la même chose ... It would be more simpler and more reliable to use a SLD list as a base and apply a regex on the url by parsing the SDL list.
We just need to have a repo with a list which will be always up to date.

@innocentamadi
Copy link

innocentamadi commented Mar 12, 2023

A one-liner:

const getRoot =  (url = "") => (new URL(url)).hostname.split('.').slice(-2).join('.')

@ComradeVanti
Copy link

@innocentamadi that also does not work depending on how many ending segments the domain has. For www.fhstp.ac.at (my school domain), it would only give you ac.at

@DebapriyaSengupta28
Copy link

this works -

function extractDomain(url) {
// Remove protocol if exists
let domain = url.replace(/^https?:///i, '');

// Remove www. if exists
domain = domain.replace(/^www\./i, '');

// Get the hostname from the URL
try {
    domain = new URL('http://' + domain).hostname;
} catch (error) {
    // If there's an error in URL parsing, return the original domain
    return domain;
}

// Extract subdomains
const parts = domain.split('.');
if (parts.length > 2) {
    // Check if the last part is a TLD (Top Level Domain)
    if (parts[parts.length - 1].length <= 3) {
        // Handles cases like co.uk, com.au, etc.
        domain = parts.slice(-3).join('.');
    } else {
        domain = parts.slice(-2).join('.');
    }
}

// Add www. prefix back if it exists in the original URL
if (url.includes('www.')) {
    domain = 'www.' + domain;
}

return domain;

}

// Test cases
console.log(extractDomain("https://studio.youtube.com/channel/UCntj-iDUfMBvc8_peZWbQ4g/editing/sections")); // Output: studio.youtube.com
console.log(extractDomain("https://www.youtube.com/")); // Output: www.youtube.com
console.log(extractDomain("https://www.youtube.com/channel/UCntj-iDUfMBvc8_peZWbQ4g")); // Output: www.youtube.com

@pesseyjulien
Copy link

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment