Created
November 24, 2012 21:39
-
-
Save horstjens/4141498 to your computer and use it in GitHub Desktop.
searching strings in files with python 3.2
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# to test this , create a file called poem.txt in the same folder as this python file | |
# make sure there are a lot of "more" in the poem.txt like | |
poem = """there is more to the world | |
than Demi Moore and Roger Mooore | |
It is a good Morning, but more so | |
a good day to every moron out there, | |
gimme more, more, moreofit""" | |
# i assume you are using python3.2 here | |
# open file as fileoobject f | |
f = open("poem.txt","r") # open in (r)ead mode (default) | |
# get a big list of all the lines in the file | |
lines = f.readlines() | |
# close the file, keep the lines list only | |
f.close() | |
# define what you are looking for, the searchstring | |
mysearchstring = "more" # make double or single quotes, but do not mix them | |
# iterate over all the lines | |
linenumber = 0 | |
counter = 0 | |
currentline = "" | |
for line in lines: | |
linenumber += 1 # shortcut for: linenumber = linenumber +1 | |
startpos = 0 | |
endpos = 0 | |
for foundpos in range(line.count(mysearchstring)): | |
counter += 1 | |
# find out at what position exactly | |
startpos = line[endpos:].find(mysearchstring) # see comments below | |
endpos += startpos + len(mysearchstring) | |
print("found '{}' in line:{} pos:{}:\n{}".format(mysearchstring, linenumber, endpos-len(mysearchstring), line)) | |
print("end of line {}".format(linenumber)) # this is after the end of the while loop | |
print("end of search") # this line comes after the end of the for loop | |
# --- what i did here -- | |
# ok, this code could be more elegant etc. | |
# the first loop, the for loop, processes each line of the list lines | |
# "iterating" over all the elements in the list lines. also the variable | |
# linenumber is counting the number of lines processed so far. | |
# at each line, startpos and endpos get resetted to the value zero (0) | |
# now the interesting bit: we don't know how many searchstrings are in | |
# the current line. - actually, we can ask python , using the .count() method: | |
# line.count(searchstring). This returns an integer value | |
# to loop over as many times, i use the range function. range(3) by examples | |
# creates a list with 3 items: [0,1,2] ( to play in python directmode, use | |
# list(range(3)) and see what happens. this list is iterable with a for loop | |
# the strange colons inside the square brackets are slicing commands. | |
# let say the line is "abcdef" then line[2:5] would return "cde" .. python | |
# counts all chars: 012345 starting with 0. so the startvalue 2 is a "c" | |
# the stopo value 5 (the "f") is not returned | |
# by manipulation startpos and endpos and increasing endpos, i force python | |
# to use the find command not at the whole line but only at the not-yet-searched | |
# remainder of the line | |
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
there is more to the world | |
than Demi Moore and Roger Mooore | |
It is a good Morning, but more so | |
a good day to every moron out there, | |
gimme more, more, moreofit |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
poem = """there is more to the world
than Demi Moore and Roger Mooore
It is a good Morning, but more so
a good day to every moron out there,
gimme more, more, moreofit"""
lines = poem.split("\n")
print lines
item = "more"
length = len(item)
counter = 0
for i, line in enumerate(lines):
k = 0
for _ in xrange(line.count(item)):
counter += 1
j = line.find(item, k)
k = j + length
print "counter", counter, "line", i, "pos", j