Skip to content

Instantly share code, notes, and snippets.

@dandye
Last active August 29, 2015 14:06
Show Gist options
  • Save dandye/46fd7cf3c00541277f86 to your computer and use it in GitHub Desktop.
Save dandye/46fd7cf3c00541277f86 to your computer and use it in GitHub Desktop.
Get Max Date from a big STORET textf file.
{
"metadata": {
"name": "Get Max Date from a big STORET textf file."
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "Quick peek at the file."
},
{
"cell_type": "code",
"collapsed": false,
"input": "infile = \"./STORET_21FLORAN_20140917.txt\" # string with file path & name\nfh = open(infile) # file handler\nfor i in range(4): # iterate 4 times [0,1,2,4]\n print \"%s: %s\" % (i, fh.readline()) # print the counter (also the line #, starting with zero) and the line",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "0: \n\n1: Org ID|Station ID|Act Date|Act Time|Act Depth|Depth Units|Relative Depth|Act Type|Act Category|Characteristic|Value|Units|Analysis Date|Analysis Time|Procedure Name|Comment|Sample Fraction|MDL|MDL Units|PQL|VQ|Data Source|Medium|Matrix\n\n2: 21FLORAN|A22|03-21-2007|12:35:00|0.5|m|Surface |Sample|Routine Sample|Alkalinity, Total (total hydroxide+carbonate+bicarbonate)|36|mg/l |03-26-2007|15:31:00|2320 ||Total|2 |mg/l |8 ||FLASTORET|Water|Surface Water\n\n3: 21FLORAN|A22|03-21-2007|12:20:00|0.5|m|Surface |Sample|Routine Sample|Alkalinity, Total (total hydroxide+carbonate+bicarbonate)|30|mg/l |03-26-2007|15:31:00|2320 ||Total|2 |mg/l |8 ||FLASTORET|Water|Surface Water\n\n"
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We see there is a leading blank line, a header line, and then data, which is pipe/bar-delimited (the \"|\" char is called \"pipe\" or \"bar\")"
},
{
"cell_type": "code",
"collapsed": false,
"input": "from datetime import datetime\n# Example of usage\n# date_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')\n\ninfile = \"./STORET_21FLORAN_20140917.txt\" # string with file path & name\n\ni = 0 # counter\nmax_date = datetime.strptime(\"02-03-1974\", '%M-%d-%Y') # initalize the date variable with a fairly old date\n\nwith open(infile) as fh: # while the file handler is still open. this allows iteration through each line of the file\n for aline in fh:\n i += 1 # increment the counter \n if i < 3: \n continue # skips lines 0 and 1, which are blank and have the header\n \n parts = aline.split(\"|\") # split the string that was read into a list of strings\n \n if len(parts) < 2: continue # if the line is blank, there isn't a part with a date so go to the next line\n \n if i % 125000==0: # gives some progress, so that you know it is still running\n print \"At line %s, max_date is %s\" % (i, max_date) # print with variable substitution\n \n adate = datetime.strptime(parts[2], '%M-%d-%Y') # convert part #3 (0,1,2) into a DateTime object\n max_date = max([adate,max_date]) # set the max_date variable to the max of the date just read or the max so far\n \n \nprint max_date # outside of the loop (the whole file has been read), print the max date encountered",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "At line 125000, max_date is 2013-01-31 00:07:00\nAt line 250000, max_date is 2013-01-31 00:07:00"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nAt line 375000, max_date is 2013-01-31 00:10:00"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nAt line 500000, max_date is 2013-01-31 00:10:00"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n2013-01-31 00:10:00"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n"
}
],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment