Created
November 5, 2013 16:21
-
-
Save hollanddd/7321621 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Address: | |
''' | |
Algo: | |
Work backward. Start from the zip code, which will be near the end, and in one of two known formats: XXXXX or XXXXX-XXXX. | |
If this doesn't appear, you can assume you're in the city, state portion, below. | |
The next thing, before the zip, is going to be the state, and it'll be either in a two-letter format, or as words. | |
You know what these will be, too -- there's more than 50 of them. | |
Also, you could soundex the words to help compensate for spelling errors. | |
before that is the city, and it's probably on the same line as the state. | |
You could use a zip-code database to check the city and state based on the zip, or at least use it as a BS detector. | |
The street address will generally be one or two lines. The second line will generally be the suite number if there is one, | |
but it could also be a PO box. | |
It's going to be near-impossible to detect a name on the first or second line, | |
though if it's not prefixed with a number | |
(or if it's prefixed with an "attn:" or "attention to:" it could give you a hint as to whether it's a name or an address line. | |
''' | |
def __init__(self, addr_string, **addr_parts): | |
if addr_string != None: | |
parts = self.parse_address(addr_string) | |
self.street = parts['street'] | |
self.city = parts['city'] | |
self.state = parts['state'] | |
self.zipCode = parts['zipCode'] | |
elif any(addr_parts): | |
print addr_parts | |
def parse_address(self, addr_string): | |
exclude = set([',', '.']) | |
s = ''.join(ch for ch in addr_string if ch not in exclude) | |
parts_list = [] | |
for part in reversed(s.upper().split()): | |
parts_list.append(part) | |
parts = {} | |
parts['zipCode'] = parts_list[0] | |
parts['state'] = parts_list[1] | |
parts['city'] = parts_list[2] | |
parts['street'] = parts_list[3:] | |
return parts | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment