NC and PRT files (#BelieveInYourself #NeverGiveUp #Grep)

OMG, I was feeling so low about an hour ago:

I was looking at a super-slick shell script written by a pro (I'll call this person "Ace") using tools I hadn't seen before
The structure of the NC file was even more daunting at first blush than the PRT files—the PRT was massive, but it was flat
A basic idea like the values being the same between the PRT and NC files was eluding me

And I was about to email you this message:

So, here's my analysis of what's going on...

This shit is so far over my head!

I spent 2 hours just trying to build the damn software (which actually was its own reward) to mimic what's in extract.ksh, and after 10 minutes of running some commands and comparing things and stuff and I'm like, nope, I'm done!

And as I finished writing that, something got me... my pride, and I gotta say...

I think Marsellus Wallace is wrong: https://www.youtube.com/watch?v=ruhFmBrl4GM

Because I wasn't done.

So, here's the data from our darling PRT file:

# Manually searching around ANN2300.E212TomaSSP126aF40oQ40.PRT

                               GLOBAL    SH       -1    -9   -17   -25   -33   -41   -49   -57   -65   -73   -81   -90
DELTA CH4 BY AGR_CMIP6 sour    37.68    51.20      0     0     0     2    17    48    70    79   106    42    50    31

... 10,000 lines later ...

                               GLOBAL    SH       -1    -9   -17   -25   -33   -41   -49   -57   -65   -73   -81   -90
DELTA CH4 BY AGR_CMIP6 sour    37.68    24.17     33    18    49    35    37    12     0     0     0     0     0     0

And here's the data from the new kid on the block:

% ncdump -v chg_CH4_AGR_CMIP6_source extractedVars_from_ANN2300.E212TomaSSP126aF40oQ40.nc

 chg_CH4_AGR_CMIP6_source = 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0.2531714, 0.7903663, 0.6612674, 0.4493335, 2.9673, 7.769673, 
    11.83928, 11.86137, 19.03439, 40.70503, 31.95987, 36.87167, 46.99039, 
    38.87709, 35.2835, 34.81063, 56.70958, 65.43874, 55.85504, 48.74452, 
    44.42916, 26.63225, 20.12724, 17.96869, 28.76506, 20.3712, 23.14399, 
    33.16747, 30.74563, 20.22172, 20.98623, 33.25001, 49.99844, 78.40044, 
    81.16846, 65.16602, 42.02249, 34.59512, 49.58398, 85.00302, 106.3664, 
    93.41285, 91.4363, 98.76413, 78.74931, 69.16954, 78.91599, 56.61325, 
    70.49295, 62.56808, 60.39671, 52.80833, 47.90004, 50.08576, 66.70264, 
    35.97083, 16.81558, 10.34694, 5.601554, 3.612045, 2.121826, 1.530508, 
    0.8946707, 0.1281078, 0.03006012, 2.843758e-05, 5.982204e-05, 
    0.0007167484, 0.0007296685, 0.0007834428, 0, 0, 0

At first I was just looking at the data and visually trying to match up any sequence of numbers against the PRT columns, and it wasn't working. And it was give up and sad email time. But, I dunno, I just didn't want to admit that to you or me. So I went back to the script and reread some of Ace's comments:

#    North Atlantic overturning:
#    North Pacific overturning:
#    Antarctic Bottom Water production:
# e.g. these matched the numbers in the PRT:
# ncks -d zoce,3000.,3000. -d lato2,-52.,-52. -v sf_Glo ANN2277.ojlE212TomaSSP126aF40oQ40.nc
# ncks -v sf_Atl -d lato2,48.,48. -d zoce,900.,900. ANN2277.ojlE212TomaSSP126aF40oQ40.nc
# ncks -v sf_Pac -d lato2,48.,48. -d zoce,900.,900. ANN2277.ojlE212TomaSSP126aF40oQ40.nc

And so I ran the first command and got:

% ncks -d zoce,3000.,3000. -d lato2,-52.,-52. -v sf_Glo extractedVars_from_ANN2300.E212TomaSSP126aF40oQ40.nc

...

  data:
    lato2 = -52 ;

    sf_Glo = 
    3.775406 ;

    zoce = 3010 ;

And then I did something kinda dumb, I tried to search the PRT for the 3.77 in 3.775406. No help there.

And then I got lucky (kinda like I did the first time with the PRT file) and stumbled over something useful:

% grep -A12 'North Atlantic' ANN2300.E212TomaSSP126aF40oQ40.PRT
 North Atlantic overturning: 900m 48N:  19.44
 North Pacific overturning: 900m 48N:  -1.00
 Antarctic Bottom Water production: 3000m 52S:   3.78

Bam! 3000m ~= (Z ocean "down" of) 3010, 52S == -52, and 3.78 ~= 3.775406

So, Ace was right, and at least that data was equal between NC and PRT. But what about CH4 values... why were there so many? and why didn't any of the values line up?

On a whim, instead of looking at the values of a given variable, like asking NC file for values of the variable chg_CH4_AGR_CMIP6, I asked the NC for its structure (which is what makes NC files so hot... they're "self describing"):

% ncdump -c extractedVars_from_ANN2300.E212TomaSSP126aF40oQ40.nc 

...

 lat_budg = -90, -87, -85, -83, -81, -79, -77, -75, -73, -71, -69, -67, -65, 
    -63, -61, -59, -57, -55, -53, -51, -49, -47, -45, -43, -41, -39, -37, 
    -35, -33, -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11, -9, -7, 
    -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 
    33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 
    69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 90 ;

And I was like "ooohhh", in addition to the values having more precision, there are also more values. That hg_CH4_AGR_CMIP6 variable in the PRT was only represented by 12 latitude measurements—in the NC it has 90... And then I looked at those PRT columns again:

GLOBAL    SH       -1    -9   -17   -25   -33   -41   -49   -57   -65   -73   -81   -90

and was like...

OOOOOHHHHHH SHIT! THAT'S IT!!!!!!!

In the PRT, the values are only for the southern hemisphere, and they're oriented from equator to pole (north-to-south). The NC file has values from pole-to-pole and south-to-north. The two values would never visually line up like I was trying to make them do when I was just eyeing them!

So I dropped into Python and combined the NC values with the NC columns (just copy-pasting the prinouts from the examples above):

>>> nc_values = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
...     0, 0.2531714, 0.7903663, 0.6612674, 0.4493335, 2.9673, 7.769673, 
...     11.83928, 11.86137, 19.03439, 40.70503, 31.95987, 36.87167, 46.99039, 
...     38.87709, 35.2835, 34.81063, 56.70958, 65.43874, 55.85504, 48.74452, 
...     44.42916, 26.63225, 20.12724, 17.96869, 28.76506, 20.3712, 23.14399, 
...     33.16747, 30.74563, 20.22172, 20.98623, 33.25001, 49.99844, 78.40044, 
...     81.16846, 65.16602, 42.02249, 34.59512, 49.58398, 85.00302, 106.3664, 
...     93.41285, 91.4363, 98.76413, 78.74931, 69.16954, 78.91599, 56.61325, 
...     70.49295, 62.56808, 60.39671, 52.80833, 47.90004, 50.08576, 66.70264, 
...     35.97083, 16.81558, 10.34694, 5.601554, 3.612045, 2.121826, 1.530508, 
...     0.8946707, 0.1281078, 0.03006012, 2.843758e-05, 5.982204e-05, 
...     0.0007167484, 0.0007296685, 0.0007834428, 0, 0, 0]

>>> nc_columns = [-90, -87, -85, -83, -81, -79, -77, -75, -73, -71, -69, -67, -65, 
...     -63, -61, -59, -57, -55, -53, -51, -49, -47, -45, -43, -41, -39, -37, 
...     -35, -33, -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11, -9, -7, 
...     -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 
...     33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 
...     69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 90]


# zip "marries" elements from two lists in sequence: zip([a,b,c], [1,2,3]) -> [[a,1],[b,2],[c,3]]

>>> nc_cols_vals = list(zip(nc_columns, nc_values))
>>> nc_cols_vals

[... (-49, 0.4493335), (-47, 2.9673), (-45, 7.769673) ...]

Turned those pairs of columns-and-values into a dictionary so I could look up a value by its column (latitude):

>>> nc_c_v_dict = dict(nc_cols_vals)
>>> nc_c_v_dict

{... -49: 0.4493335, -47: 2.9673, -45: 7.769673 ...}

# plug in a column (latitude), get the NC value
>>> nc_c_v_dict[-49]

0.4493335

I then took the columns from the PRT:

prt_columns = [-1,    -9,   -17,   -25,   -33,   -41,   -49,   -57,   -65,   -73,   -81,   -90]

And used those as references to get NC values:

# for every prt_column in prt_columns, use prt_column to look up the nc value from nc_c_v_dict

>>> [nc_c_v_dict[prt_col] for prt_col in prt_columns]

[33.16747, 17.96869, 48.74452, 34.81063, 36.87167, 11.86137, 0.4493335, 0, 0, 0, 0, 0]

And look how those values from the NC data line up with PRT data:

                               GLOBAL    SH       -1       -9      -17      -25      -33      -41     -49       -57   -65   -73   -81   -90
DELTA CH4 BY AGR_CMIP6 sour    37.68    24.17     33       18       49       35       37       12       0         0     0     0     0     0

                                                  33.16747 17.96869 48.74452 34.81063 36.87167 11.86137 0.4493335 0     0     0     0     0

Moral of the story... don't give up! 'Cause if I had, I wouldn't have been able to write you another Gist, and I'm so much happier that I have now :)

zacharysyoung/README.md