Skip to content

Instantly share code, notes, and snippets.

View Humboldt-Penguin's full-sized avatar
🐧
la creatura

Zain Eris Kamal Humboldt-Penguin

🐧
la creatura
  • Rutgers University
View GitHub Profile

Question:

I have a folder with nested subdirectories that contain many .npy files. How can I create a copy of this folder but the .npy files are converted to text files?


Answer:

To copy a directory structure and convert files of a certain type to another type, you can use Python's built-in libraries, such as os and shutil. Here is a step-by-step plan:

This data itself is hosted in a Google Drive folder titled moho_SHcoeffs/: https://drive.google.com/drive/u/0/folders/1CdSN1fTXzfKlWtfIM-rbties3lJPDkZS (more specifically, we unzip Weiczorek's data set, zip the dichotomy/ folder, upload it to Google Drive, and then use the cloud-based zip extractor tool since Google Drive does not play well with uploading a lot of files via HTTP).

There are a ton of raw data files here. In total, there's 21,894 files (4.76 GB, each is 229 KB). Rather than force the user to store everything locally, they can selectively download the model they need. In order to do this, we use a Google Drive "Apps Script" in order to enumerate all filenames + shareable links into a Google Sheet. This code is a modified version from this thread: https://webapps.stackexchange.com/questions/88769/get-share-link-of-multiple-files-in-google-drive-to-put-in-spreadsheet.

Steps:

  1. Open the desired Google Drive folder, set share settings to "anyone with the l
For background, I'm pursuing a second major in computer science, I've been using Python/Jupyter in my research for three years, I've written/published Python packages for scientific analysis, I'm a TA for introductory computer science courses, and I'm co-teaching a computational physics class for college freshman. Because of this, I feel like it'd be best for me to limit the scope of my feedback to the coding-related aspects of the program. Overall, I think the coding education had a lot of room to improve with respects to clarity, ease of comprehension, and learning potential:
(1) The code we were given lacked proper documentation and comments, which made it difficult to understand the purpose and functionality of many sections. Given the push in academia for more clear, documented, and reproducible code, I think this is a bad example to set. This leads into the next point...
(2) When we had time to work independently, that time was exclusively spent tinkering (usually aimlessly) with existing code rather
# supplement to my stackoverflow comment on this post: https://stackoverflow.com/questions/75427538/regulargridinterpolator-excruciatingly-slower-than-interp2d
# since `x_vals` and `y_vals` are decreasing rather than increasing, we do some trickery on top of `np.searchsorted()` to get the desired indices.
i_x = x_vals.shape[0] - np.searchsorted(np.flip(x_vals), x)
i_y = y_vals.shape[0] - np.searchsorted(np.flip(y_vals), y)
'''here's an option with list comprehension
points = [
(
x_vals[i_x-1+i],