-
-
Save rob-murray/01d43581114a6b319034732bcbda29e1 to your computer and use it in GitHub Desktop.
/^((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU))))$/ |
@rob-murray @mrbrianevans Another tweak would be the addition of (?!.{9}) to the beginning to limit the value to 8 characters:
/^(?!.{9})((AC...
@Kiwidave68 I'm not sure its necessary to add that to the beginning, because its already checking the full length by having the ^
at the start and $
at the end and specifying the allowed length of each segment. Do you have an example of a 9 character string that matches my regex? You can test it out on here: https://regexr.com/734i1
I suppose it might depend on how you use the regex in your application, but if the logic is that the entire string must match the regex, then it should be fine without the (?!.{9})
.
@mrbrianevans OK, I was testing it on https://regex101.com/ and with some C# unit tests, and 123456789 isn't rejected. It is rejected on your link however. I guess we can just do a length check in code :) I'm not a regex guru, so happy to go with your original one :)
Okay @Kiwidave68 . Indeed you are right, I tried in JavaScript and it does validate the 9 digit number. Not sure why that is, can't immediately see the flaw.
> /^\d{8}$/.test('123456789')
false
In theory it should work like above simplified example, which rejects the 9 digit number.
Checking length seperately might be the best option though, because many people omit leading zeros in company numbers, so they need to be normalised to 8 characters length before validation. Eg 09226141
can be written 9226141
.
@mrbrianevans I think it might be because it's doing a lazy check - matches on the first 8 so doesn't bother looking any further?
You need an extra set of brackets in the regex as otherwise the terminating $ can be ignored by the first match as it's left as part of the or condition
/^(((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))))$/
You need an extra set of brackets in the regex as otherwise the terminating $ can be ignored by the first match as it's left as part of the or condition
/^(((AC|ZC|FC|GE|LP|OC|SE|SA|SZ|SF|GS|SL|SO|SC|ES|NA|NZ|NF|GN|NL|NC|R0|NI|EN|\d{2}|SG|FE)\d{5}(\d|C|R))|((RS|SO)\d{3}(\d{3}|\d{2}[WSRCZF]|\d(FI|RS|SA|IP|US|EN|AS)|CUS))|((NI|SL)\d{5}[\dA])|(OC(([\dP]{5}[CWERTB])|([\dP]{4}(OC|CU)))))$/
Thanks for this, totally solved the problem!
I think some companies can be still missing.
PC000001
PC000002
PC000003
PC000004
PC000005
PC000006
OE000001
OE000002
OE000003
OE000004
OE000005
OE000006
OE000007
OE000008
OE000009
OE000010
OE000011
OE000012
Search here.
Actually, many companies have OE prefix.
Edit: there are more unmatched prefixes.
'SP', 'NO', 'PC', 'RC', 'CE', 'SR', 'IP', 'NP', 'IC', 'CS', 'OE', 'SI'
I would like to attach a list of unmatched IDs (around 75k), but Github tell that such file is not support (it is a .txt file, ~700Kb size).
I've written this page with a list of the meanings of the various prefixes: https://chguide.co.uk/general/company-number.html . I have also found a quite comprehensive list in some old docs:
This page has description for PC prefix, but the provided regex does not include it.
In addition, in the official documentation SI prefix listed in the section with prefixes for companies with no available data (only name is available).
Had another go at it:
/^(((AC|CE|CS|FC|FE|GE|GS|IC|LP|NC|NF|NI|NL|NO|NP|OC|OE|PC|R0|RC|SA|SC|SE|SF|SG|SI|SL|SO|SR|SZ|ZC|\d{2})\d{6})|((IP|SP|RS)[A-Z\d]{6})|(SL\d{5}[\dA]))$/;
The rules around registered societies (starting with IP|SP|RS
prefix) could probably be tightened up, this is a more permissive regex, matches all the company numbers in the bulk CSV file.
Hi,
The published guide for URI is here:
https://assets.publishing.service.gov.uk/media/5d08c0f340f0b6094a379078/uniformResourceIdentifiersCustomerGuide.pdf
I gave the @mrbrianevans version a go.
I downloaded the companies house public database http://download.companieshouse.gov.uk/en_output.html
Its 2.6Gb or so extracted, so created a new file containing only the company numbers (50MB) and ran the regex against it.
Got only 1 failure: "RS007853Z"
Added the exception to the regex, this is passing with the complete dataset as per Feb 2025:
^(((AC|CE|CS|FC|FE|GE|GS|IC|LP|NC|NF|NI|NL|NO|NP|OC|OE|PC|R0|RC|SA|SC|SE|SF|SG|SI|SL|SO|SR|SZ|ZC|\\d{2})\\d{6})|((IP|SP|RS)[A-Z\\d]{6})|(SL\\d{5}[\\dA])|(RS007853Z))$
@mrbrianevans Thanks for the speedy response :) We'll be using this for user input validation, so good to get it a right as possible, although I suspect 99% of our input will be 'normal' companies.