Stephen Fordham StephenFordham

Interested in Bioinformatics, web scraping, data visualisation and ML. Tutorials posted via Medium, link below

StephenFordham / Using_the_search_function.py

Last active April 8, 2019 08:11

Getting started with re.search

	import re

	DNA = 'GAGCGCTAGCCAAA'
	match = re.search(pattern='AAA', string=DNA)
	# match = re.search('AAA', 'DNA')
	print(match)

	<re.Match object; span=(11, 14), match='AAA'>

StephenFordham / Regex_search_DNA.py

Last active April 8, 2019 08:12

Regex Example 2

StephenFordham / Extracting_match_object_values.py

Last active April 8, 2019 08:14

RegexExample3

StephenFordham / Alternation_example.py

Last active April 8, 2019 08:15

RegexExample4

	DNA = 'ATCGACCGGGTTT'
	if re.search('CCGGG', DNA) or re.search('CCCGG', DNA):
	print('Restriction enzyme found!')

	if re.search('CC(G\|C)GG', DNA):
	print('Restriction enzyme found!')

StephenFordham / ORF.py

Last active April 8, 2019 08:16

RegexExample5

open_reading_frame = 'AUG.*(AA|AG|GA)'

StephenFordham / Inframe_ORF.py

Last active April 8, 2019 08:17

RegexExample6

inframe_open_reading_frame = 'AUG(...)*U(AA|AG|GA)'

StephenFordham / N_glycosylation_pattern.py

Created April 8, 2019 07:56

N_glycosylation_pattern

N_glycosylation_pattern = 'N[^P][ST][^P]'

StephenFordham / Character_groups_example.py

Created April 8, 2019 08:00

Character_groups_example

	import re

	N_glycosylation_pattern = 'N[^P][ST][^P]'
	# putting a caret ^ at the start of the group will negate it
	# and match any character that is not in that group

	Protein_seq = 'YHWKYELIQNNSNEFC'

	if re.search(N_glycosylation_pattern, Protein_seq):
	print("N-glycosylation site motif found")

StephenFordham / htt_pattern.py

Created April 8, 2019 08:01

htt_pattern

	htt_pattern = '(CAG\|CAA){18,}'

	# just like with substrings we can leave out the lower and upper limits
	# here, we will match the pattern 18 or more times

StephenFordham / RegexComined.py

Created April 8, 2019 08:03

RegexCombined

	import re

	htt_pattern = '(CAG\|CAA){18,}'
	htt_mRNA = open('C:/Users/apsciuser/Downloads/htt_gene.fasta').read()
	match = re.findall(htt_pattern, htt_mRNA)
	print("The number of polyQ repeats found are: " + str(len(match)))

	# Console output
	# The number of polyQ repeats found are: 1