Skip to content

Instantly share code, notes, and snippets.

@aambrioso1
Created February 1, 2020 16:42
Show Gist options
  • Save aambrioso1/dc65cf694830bc4afb5739af034dd8d1 to your computer and use it in GitHub Desktop.
Save aambrioso1/dc65cf694830bc4afb5739af034dd8d1 to your computer and use it in GitHub Desktop.
RegexLesson.py
"""
The regular expressions operations module (re) is a standard Python
library. It is beautiful collection of operations for finding and manipulating
matched text.
This program use the re library to search the text in a file, regex.txt, for email addresses and phone numbers. It copies
them to a list called matches and prints then them neatly in a standard format. The program has a string call text that
is commented out. Use the program by saving some text in a file called regex.txt. Or use the string call text. Comment out the code for the part you don't use.
The program is a slight modification of a program found at:
https://automatetheboringstuff.com/chapter7/
See also the documentation at Python.org for the re library (regular expression operations) here:
https://docs.python.org/3/library/re.html#re.ASCII
"""
import re
import os
# Use this code if you want to save the text you woild like to search to a file before you run the program.
# with open('regex.txt') as f1: text=f1.read()
"""
Use the string called text for the program instead of reading a text file from the local current working directory.
"""
text = 'This text file contains a few emails addresses and phone numbers. Alex\'s is home number is 813-123-4567. He has two email addresses [email protected] and [email protected]. Gregory has two phone nunbers: (cell) (813) 685-1234 and work 253-7000 ext. 1234. His email address is [email protected].'
# Create a regex for matching emails.
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+ # username
@ # @ symbol
[a-zA-Z0-9.-]+ # domain name
(\.[a-zA-Z]{2,4}) # dot-something
)''', re.VERBOSE)
# Create a regex for matching phone numbers and identify
# the parts of a phone number.
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code
(\s|-|\.)? # separator
(\d{3}) # first 3 digits
(\s|-|\.) # separator
(\d{4}) # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
)''', re.VERBOSE)
# Any text in the file which has a typical email or phone number will be added to the list matches.
matches = []
for groups in phoneRegex.findall(text):
# This code reformats the phone number in a standard way by using
# the blocks identified when the number was matched to the regex.
phoneNum = '-'.join([groups[1], groups[3], groups[5]])
if groups[8] != '':
phoneNum += ' x' + groups[8]
matches.append(phoneNum)
for groups in emailRegex.findall(text):
matches.append(groups[0])
# Neatly prints out emails and phone numbers found
# in the regex.txt file.
if len(matches) > 0:
print('\n'.join(matches))
else:
print('No phone numbers or email addresses found.')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment