Last active
August 29, 2015 14:06
-
-
Save dandye/46fd7cf3c00541277f86 to your computer and use it in GitHub Desktop.
Get Max Date from a big STORET textf file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "metadata": { | |
| "name": "Get Max Date from a big STORET textf file." | |
| }, | |
| "nbformat": 3, | |
| "nbformat_minor": 0, | |
| "worksheets": [ | |
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "Quick peek at the file." | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "infile = \"./STORET_21FLORAN_20140917.txt\" # string with file path & name\nfh = open(infile) # file handler\nfor i in range(4): # iterate 4 times [0,1,2,4]\n print \"%s: %s\" % (i, fh.readline()) # print the counter (also the line #, starting with zero) and the line", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "0: \n\n1: Org ID|Station ID|Act Date|Act Time|Act Depth|Depth Units|Relative Depth|Act Type|Act Category|Characteristic|Value|Units|Analysis Date|Analysis Time|Procedure Name|Comment|Sample Fraction|MDL|MDL Units|PQL|VQ|Data Source|Medium|Matrix\n\n2: 21FLORAN|A22|03-21-2007|12:35:00|0.5|m|Surface |Sample|Routine Sample|Alkalinity, Total (total hydroxide+carbonate+bicarbonate)|36|mg/l |03-26-2007|15:31:00|2320 ||Total|2 |mg/l |8 ||FLASTORET|Water|Surface Water\n\n3: 21FLORAN|A22|03-21-2007|12:20:00|0.5|m|Surface |Sample|Routine Sample|Alkalinity, Total (total hydroxide+carbonate+bicarbonate)|30|mg/l |03-26-2007|15:31:00|2320 ||Total|2 |mg/l |8 ||FLASTORET|Water|Surface Water\n\n" | |
| } | |
| ], | |
| "prompt_number": 31 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": "We see there is a leading blank line, a header line, and then data, which is pipe/bar-delimited (the \"|\" char is called \"pipe\" or \"bar\")" | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "from datetime import datetime\n# Example of usage\n# date_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')\n\ninfile = \"./STORET_21FLORAN_20140917.txt\" # string with file path & name\n\ni = 0 # counter\nmax_date = datetime.strptime(\"02-03-1974\", '%M-%d-%Y') # initalize the date variable with a fairly old date\n\nwith open(infile) as fh: # while the file handler is still open. this allows iteration through each line of the file\n for aline in fh:\n i += 1 # increment the counter \n if i < 3: \n continue # skips lines 0 and 1, which are blank and have the header\n \n parts = aline.split(\"|\") # split the string that was read into a list of strings\n \n if len(parts) < 2: continue # if the line is blank, there isn't a part with a date so go to the next line\n \n if i % 125000==0: # gives some progress, so that you know it is still running\n print \"At line %s, max_date is %s\" % (i, max_date) # print with variable substitution\n \n adate = datetime.strptime(parts[2], '%M-%d-%Y') # convert part #3 (0,1,2) into a DateTime object\n max_date = max([adate,max_date]) # set the max_date variable to the max of the date just read or the max so far\n \n \nprint max_date # outside of the loop (the whole file has been read), print the max date encountered", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "At line 125000, max_date is 2013-01-31 00:07:00\nAt line 250000, max_date is 2013-01-31 00:07:00" | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "\nAt line 375000, max_date is 2013-01-31 00:10:00" | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "\nAt line 500000, max_date is 2013-01-31 00:10:00" | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "\n2013-01-31 00:10:00" | |
| }, | |
| { | |
| "output_type": "stream", | |
| "stream": "stdout", | |
| "text": "\n" | |
| } | |
| ], | |
| "prompt_number": 30 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": "", | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [] | |
| } | |
| ], | |
| "metadata": {} | |
| } | |
| ] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment