Skip to content

Instantly share code, notes, and snippets.

@jakebathman
Last active October 12, 2023 09:31
Show Gist options
  • Save jakebathman/c18cc117caaf9bb28e7f60e002fb174d to your computer and use it in GitHub Desktop.
Save jakebathman/c18cc117caaf9bb28e7f60e002fb174d to your computer and use it in GitHub Desktop.
ICD-9 and ICD-10 code regex

ICD code matching regex

The regex patterns below only help validate when something is not valid ICD-10 or ICD-9. They do not ensure that the code exists. You should consult a list of valid ICD codes for that level of verification.

ICD-10-CM list: https://gist.github.com/jakebathman/063c50cb9772e4bfc864a9e1ff4ccc8d

ICD-9 list: https://gist.github.com/jakebathman/f1ed0d473f12091a708243b0ddf03d82

ICD-10/ICD-9 combined regex

Useful for validating field values that could contain both, and may or may not use decimals

([V\d]\d{2}(\.?\d{0,2})?|E\d{3}(\.?\d)?|\d{2}(\.?\d{0,2})?)|([A-TV-Z][0-9][A-Z0-9](\.?[A-Z0-9]{0,4})?)

ICD-10-CM regex

([A-TV-Z][0-9][A-Z0-9](\.?[A-Z0-9]{0,4})?)

ICD-9 regex

([V\d]\d{2}(\.?\d{0,2})?|E\d{3}(\.?\d)?|\d{2}(\.?\d{0,2})?)
@mpjohnston
Copy link

Do you have a CSV or text file with just the ICD 10 CM codes and descriptions?

@jakebathman
Copy link
Author

@mpjohnston sure, you can find that from CMS here: ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD10CM/2017/

Specifically, you're probably looking for this file (~6 MB): icd10cm_codes_2017.txt

@allantokuda-zipnosis
Copy link

allantokuda-zipnosis commented Feb 14, 2018

I think the ICD10 one can be simplified a bit: removed the ? after the . and changed the latter section to {1,4} (exactly 1 to 4 characters) because of the trailing ?.

([A-TV-Z][0-9][A-Z0-9](\.[A-Z0-9]{1,4})?)

@ewanmellor
Copy link

Just about everything about these regexes is wrong.

  • Purely numeric ICD-9 codes have three digits before the period, not two, so \d{2} should be\d{3}.
  • The periods aren't optional if they're followed by digits, so \.? should be \. everywhere.
  • A period followed by no digits is not valid either, so {0,2} and {0,4} should be {1,2} and {1,4}.
  • Arguably, excluding U from the ICD-10 prefixes is incorrect too (c.f. https://en.wikipedia.org/wiki/ICD-10_Chapter_XXII:_Codes_for_special_purposes)

See also, slide 7 of https://www.cms.gov/Medicare/Coding/ICD10/downloads/032310_ICD10_Slides.pdf

@mattiasflodin
Copy link

@ewanmellor Slide 7 does not seem consistent with slide 9. The former says that character 2 is always numeric and the latter says alphanumeric. Do you know which it is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment