generate_crthick_data.md

This data itself is hosted in a Google Drive folder titled moho_SHcoeffs/: https://drive.google.com/drive/u/0/folders/1CdSN1fTXzfKlWtfIM-rbties3lJPDkZS (more specifically, we unzip Weiczorek's data set, zip the dichotomy/ folder, upload it to Google Drive, and then use the cloud-based zip extractor tool since Google Drive does not play well with uploading a lot of files via HTTP).

There are a ton of raw data files here. In total, there's 21,894 files (4.76 GB, each is 229 KB). Rather than force the user to store everything locally, they can selectively download the model they need. In order to do this, we use a Google Drive "Apps Script" in order to enumerate all filenames + shareable links into a Google Sheet. This code is a modified version from this thread: https://webapps.stackexchange.com/questions/88769/get-share-link-of-multiple-files-in-google-drive-to-put-in-spreadsheet.

Steps:

Open the desired Google Drive folder, set share settings to "anyone with the link", and get its ID in the URL from the browser location bar: https://drive.google.com/drive/u/0/folders/<id>
Open a new Google Sheet and navigate Extensions > Apps Script.
Copy/paste the script below, replace <id> with the folder ID (for this work, 1boul7wqRkN8rnIIm7OzzzFEsn1D4hMio), and run the code (make sure to click "Run" and not "Deploy").

function myFunction() {
    var ss = SpreadsheetApp.getActiveSpreadsheet();
    var s = ss.getActiveSheet();
    var c = s.getActiveCell();
    var fldr = DriveApp.getFolderById("1boul7wqRkN8rnIIm7OzzzFEsn1D4hMio");
    var files = fldr.getFiles();
    var data = [];

    while (files.hasNext()) {
        var f = files.next();
        var filename = f.getName();
        var shareLink = f.getUrl();
        data.push([filename, shareLink]);
    }

    s.getRange(c.getRow(), c.getColumn(), data.length, data[0].length).setValues(data);
}

The Google Sheet should now contain relevant data (for this work, https://docs.google.com/spreadsheets/d/1dLblk88HpLZdk1Wg_eoV5V7MEvne6OE63DcXG7G68pw/edit#gid=0).

After this, we generate sha256 hashes for all files with the following code:

"""load `moho_SHcoeffs_LINKS.csv` into dictionary"""

file_path = r'C:\Users\Eris\Downloads\redplanet-data\Crust\1_raw\moho_SHcoeffs_LINKS.csv'



import csv
def load_csv_to_dictionary(file_path: str) -> dict:
    dictionary = {}
    with open(file_path, 'r') as csv_file:
        reader = csv.reader(csv_file)
        for row in reader:
            # if len(row) >= 2:  # Ensure the row has at least two columns
            key = row[0]
            value = row[1]
            # dictionary[key]['download_link'] = value
            dictionary[key] = value
    return dictionary


dat = load_csv_to_dictionary(file_path)





"""pre-generate dictionary for accessing download link/hash based on model name"""



path_folder = r'C:\Users\Eris\Documents\sync_local\00_Local\mars\scripts\InSight-Crustal-Thickness-Archive\dichotomy'

rawdata_registry = {}


import os
import pooch
from redplanet import utils

i = 0

for filename in os.listdir(path_folder):
    this_link = dat[filename]

    this_hash = pooch.file_hash(fname=os.path.join(path_folder, filename), alg='sha256')
    this_hash = f'sha256:{this_hash}'

    model_name = filename[10: utils.indexOf(filename, '.sh')]
    rawdata_registry[model_name] = {'link': this_link, 'hash': this_hash}

    i+=1
    if i % 100 == 0:
        print(i)


"""save"""

import json

with open("rawdata_registry.json", "w") as file:
    file.write(json.dumps(rawdata_registry))

Humboldt-Penguin/generate_crthick_data.md