Skip to content

Instantly share code, notes, and snippets.

@chrisdaaz
Last active October 13, 2024 13:33
Show Gist options
  • Save chrisdaaz/1e894dc5af2e2c9cf0cc75d057fcf79f to your computer and use it in GitHub Desktop.
Save chrisdaaz/1e894dc5af2e2c9cf0cc75d057fcf79f to your computer and use it in GitHub Desktop.
A reading list for librarians learning about Git and GitHub

Git and GitHub for Librarians: A Brief Bibliography

Each citation includes an abstract or annotation. Feel free to suggest an addition!

Libraries

Davis, Robin Camille. 2015. “Git and GitHub for Librarians.” Publications and Research, January. https://academicworks.cuny.edu/jj_pubs/34.

  • One of the fastest-growing professional social networks is GitHub, an online space to share code. GitHub is based on free and open-source software called Git, a version control system used in many digital projects, from library websites to government data portals to scientific research. For projects that involve developing code and collaborating with others, Git is an invaluable tool; it also creates a backup system and structured documentation. In this article, we examine version control, the particulars of Git, the burgeoning social network of GitHub, and how Git can be an archival tool.

“Why Librarians Should Care about Git? · Issue #7 · Data-Lessons/Library-Git-DEPRECATED.” n.d. GitHub. Accessed August 27, 2020. data-lessons/library-git-DEPRECATED#7.

  • An interesting discussion about the relevance of git to librarianship.

Eaton, Mark. 2018. “A Comparative Analysis of the Use of GitHub by Librarians and Non-Librarians.” Publications and Research, May. https://academicworks.cuny.edu/kb_pubs/134.

  • Objective: GitHub is a popular tool that allows software developers to collaborate and share their code on the web. Librarians have adopted GitHub to support their own work, sharing code in support of their libraries. This paper asks: How does librarians’ use of GitHub compare to that of other users? Methods: To retrieve quantitative data on GitHub users, we queried the GitHub APIs (application programming interfaces). By assembling data on librarians’ use of GitHub, as well as on a comparison group, we provided preliminary comparisons of these two samples. We analyzed and visualized this data across a number of variables to offer salient insights as to how librarians compare to randomly selected GitHub users. Results: Librarians regularly use a more diverse range of programming languages than the comparison group, hinting at a broad range of possible uses of code in libraries. While the librarians’ sample group did not demonstrate statistically significant differences from the comparison group on most measures of activity and popularity, they scored significantly higher in reach and productivity than the comparison group. This could be due to librarians’ greater longevity on GitHub, as well as their greater investment in GitHub as a tool for sharing. Conclusion: Our data suggest that librarians are actively building their libraries with code and sharing the results. While it was unclear whether librarians were more active or popular on GitHub than the comparison group, it was clear that they demonstrated statistically significant outperformance in terms of reach and productivity. To explain these findings, we hypothesized that librarians’ embrace of GitHub is in line with widely held values of “openness” in the library profession.

Bouquin, Daina R. 2015. “GitHub.” Journal of the Medical Library Association : JMLA 103 (3): 166–67. https://doi.org/10.3163/1536-5050.103.3.019.

  • Medical librarians have unique opportunities to work with clinicians, researchers, and students to help them meet their goals and develop new knowledge. To fully commit to this work, librarians and information professionals working in biomedical environments need to acknowledge the level at which these fields are evolving in both speed and complexity. Librarians in biomedical and health-related settings are seeing firsthand how data-intensive the work of their patron communities has become and the extent to which their work has become geographically distributed and interdisciplinary. Whether the discussion focuses on bioinformatics, electronic health records, computational biomedicine, or outcomes research, librarians must provide resources to meet complex needs. Similarly, the needs of librarians themselves are quickly evolving as they learn to work in increasingly complex administrative and research roles that require them to regularly work with diverse groups of collaborators. A variety of tools are available to support geographically distributed, interdisciplinary, data-intensive work. One such tool is GitHub.

Diaz, Chris. 2018. “Using Static Site Generators for Scholarly Publications and Open Educational Resources.” The Code4Lib Journal, no. 42 (November). https://journal.code4lib.org/articles/13861.

  • Libraries that publish scholarly journals, conference proceedings, or open educational resources can use static site generators in their digital publishing workflows. Northwestern University Libraries is using Jekyll and Bookdown, two open source static site generators, for its digital publishing service. This article discusses motivations for experimenting with static site generators and walks through the process for using these technologies for two publications.

Davis, Robin Camille. 2013. "Using Drupal and Git for a library website," https://robincamille.com/2013-04-16-using-drupal-and-git-for-a-library-website/

  • Brief blog post on the development workflow for a library website built with the Drupal content management system and managed by Git.

Devin Becker, Jylisa Doney, Jessica Martinez, Evan Peter Williamson, Marco Seiferle-Valencia, Olivia M. Wikle. "Using static web technologies and git-based workflows to re-design and maintain a library website (quickly) with non-technical staff", https://mla.hcommons.org/deposits/item/hc:41623/

  • In 2018, a university-wide brand update prompted the University of Idaho Library to reexamine their website development practices and move toward a static web approach that leverages librarian skillsets and provides the library greater control over its systems and data. This case study describes the methodological reasons behind the decision to use the static site generator Jekyll over a Content Management System (CMS) and the practical steps taken to create a sustainable and agile development model. The article details the ways this static web approach (nicknamed “Lib-STATIC”) facilitates cross-departmental communication, collaboration, and innovative feature development for library staff members of varying technical abilities.

Keith Engwall and Mitchell Roe. "Git and GitLab in Library Website Change Management Workflows", https://journal.code4lib.org/articles/15250

  • Library websites can benefit from a separate development environment and a robust change management workflow, especially when there are multiple authors. This article details how the Oakland University William Beaumont School of Medicine Library use Git and GitLab in a change management workflow with a serverless development environment for their website development team. Git tracks changes to the code, allowing changes to be made and tested in a separate branch before being merged back into the website. GitLab adds features such as issue tracking and discussion threads to Git to facilitate communication and planning. Adoption of these tools and this workflow have dramatically improved the organization and efficiency of the OUWB Medical Library web development team, and it is the hope of the authors that by sharing our experience with them others may benefit as well.

Metadata

Crowe, Katherine, and Kevin Clair. 2015. “Developing a Tool for Publishing Linked Local Authority Data.” Journal of Library Metadata 15 (3–4): 227–40. https://doi.org/10.1080/19386389.2015.1099993.

  • Efforts are underway to publish archival authority information in linked, open, machine-readable data standards. However, these efforts have not yet scaled down to smaller institutions or to the long tail of authority records maintained on an institution-by-institution basis (often non–standards-based) about entities of local (rather than national or international) significance. This article documents the initial steps toward developing a shared authority tool for a collective of cultural institutions with shared collecting affinities, whether those affinities are geographic, thematic, or otherwise. This planned service will complement shared authority services such as the Virtual International Authority File (VIAF) and the Social Networks and Archival Context (SNAC) project, providing a means for institutions managing authority data about people, organizations, and families not typically found in authority files centered around bibliographic metadata to share that data with other institutions.

Washington, Anne M., and Weidner Weidner. 2017. “Collaborative Metadata Application Profile Development for DAMS Migration.” International Conference on Dublin Core and Metadata Applications 0 (December): 117–19.

  • In 2015, after an extensive review process, the University of Houston (UH) Libraries chose the open source systems Hyku (then known as the Hydra-in-a-Box project), Archivematica, and ArchivesSpace to form the Libraries’ digital collections access and preservation ecosystem. This suite of systems, along with locally developed tools, form the Bayou City Digital Asset Management System (BCDAMS). In 2016, the BCDAMS Implementation Team began work on a multi-phase process to roll out the new systems to replace the current digital collections management system, CONTENTdm. Phase I of this process included developing fundamental models and principles as well as much of the local infrastructure and workflows. Phase II of the project will involve migrating existing digital collection metadata and files to the new digital asset management system (DAMS). This poster summarizes work done during Phase I of the project to prepare for the migration work in Phase II. This included working collaboratively to develop a Metadata Application Profile (MAP) and crosswalk, and an analysis of metadata remediation required to prepare for migration. It shares the UH Libraries unique experience in preparing for the migration of UH Digital Library (UHDL) data from CONTENTdm to a new system and offers some general considerations for DAMS migrations.

Emily Escamilla, Lamia Salsabil, Martin Klein, Jian Wu, Michele C. Weigle & Michael L. Nelson. "It’s Not Just GitHub: Identifying Data and Software Sources Included in Publications," https://link.springer.com/chapter/10.1007/978-3-031-43849-3_17

  • Paper publications are no longer the only form of research product. Due to recent initiatives by publication venues and funding institutions, open access datasets and software products are increasingly considered research products and URIs to these products are growing more prevalent in scholarly publications. However, as with all URIs, resources found on the live Web are not permanent. Archivists and institutions including Software Heritage, Internet Archive, and Zenodo are working to preserve data and software products as valuable parts of reproducibility, a cornerstone of scientific research. While some hosting platforms are well-known and can be identified with regular expressions, there are a vast number of smaller, more niche hosting platforms utilized by researchers to host their data and software. If it is not feasible to manually identify all hosting platforms used by researchers, how can we identify URIs to open-access data and software (OADS) to aid in their preservation? We used a hybrid classifier to classify URIs as OADS URIs and non-OADS URIs. We found that URIs to Git hosting platforms (GHPs) including GitHub, GitLab, SourceForge, and Bitbucket accounted for 33% of OADS URIs. Non-GHP OADS URIs are distributed across almost 50,000 unique hostnames. We determined that using a hybrid classifier allows for the identification of OADS URIs in less common hosting platforms which can benefit discoverability for preserving datasets and software products as research products for reproducibility.

Data Analysis / Data Management / Reproducible Research

Bryan, Jennifer. 2017. “Excuse Me, Do You Have a Moment to Talk about Version Control?” e3159v2. PeerJ Inc. https://doi.org/10.7287/peerj.preprints.3159v2.

  • Data analysis, statistical research, and teaching statistics have at least one thing in common: these activities all produce many files! There are data files, source code, figures, tables, prepared reports, and much more. Most of these files evolve over the course of a project and often need to be shared with others, for reading or edits, as a project unfolds. Without explicit and structured management, project organization can easily descend into chaos, taking time away from the primary work and reducing the quality of the final product. This unhappy result can be avoided by repurposing tools and workflows from the software development world, namely, distributed version control. This article describes the use of the version control system Git and and the hosting site GitHub for statistical and data scientific workflows. Special attention is given to projects that use the statistical language R and, optionally, R Markdown documents. Supplementary materials include an annotated set of links to step-by-step tutorials, real world examples, and other useful learning resources.

Perez-Riverol, Yasset, Laurent Gatto, Rui Wang, Timo Sachsenberg, Julian Uszkoreit, Felipe da Veiga Leprevost, Christian Fufezan, et al. 2016. “Ten Simple Rules for Taking Advantage of Git and GitHub.” PLOS Computational Biology 12 (7): e1004947. https://doi.org/10.1371/journal.pcbi.1004947.

  • Bioinformatics is a broad discipline in which one common denominator is the need to produce and/or use software that can be applied to biological data in different contexts. To enable and ensure the replicability and traceability of scientific claims, it is essential that the scientific publication, the corresponding datasets, and the data analysis are made publicly available [1,2]. All software used for the analysis should be either carefully documented (e.g., for commercial software) or, better yet, openly shared and directly accessible to others [3,4]. The rise of openly available software and source code alongside concomitant collaborative development is facilitated by the existence of several code repository services such as SourceForge, Bitbucket, GitLab, and GitHub, among others. These resources are also essential for collaborative software projects because they enable the organization and sharing of programming tasks between different remote contributors. Here, we introduce the main features of GitHub, a popular web-based platform that offers a free and integrated environment for hosting the source code, documentation, and project-related web content for open-source projects. GitHub also offers paid plans for private repositories (see Box 1) for individuals and businesses as well as free plans including private repositories for research and educational use.

Blischak, John D., Emily R. Davenport, and Greg Wilson. 2016. “A Quick Introduction to Version Control with Git and GitHub.” PLOS Computational Biology 12 (1): e1004668. https://doi.org/10.1371/journal.pcbi.1004668.

  • Many scientists write code as part of their research. Just as experiments are logged in laboratory notebooks, it is important to document the code you use for analysis. However, a few key problems can arise when iteratively developing code that make it difficult to document and track which code version was used to create each result. First, you often need to experiment with new ideas, such as adding new features to a script or increasing the speed of a slow step, but you do not want to risk breaking the currently working code. One often-utilized solution is to make a copy of the script before making new edits. However, this can quickly become a problem because it clutters your file system with uninformative filenames, e.g., analysis.sh, analysis_02.sh, analysis_03.sh, etc. It is difficult to remember the differences between the versions of the files and, more importantly, which version you used to produce specific results, especially if you return to the code months later. Second, you will likely share your code with multiple lab mates or collaborators, and they may have suggestions on how to improve it. If you email the code to multiple people, you will have to manually incorporate all the changes each of them sends.

Ram, Karthik. 2013. “Git Can Facilitate Greater Reproducibility and Increased Transparency in Science.” Source Code for Biology and Medicine 8 (1): 7. https://doi.org/10.1186/1751-0473-8-7.

  • Reproducibility is the hallmark of good science. Maintaining a high degree of transparency in scientific reporting is essential not just for gaining trust and credibility within the scientific community but also for facilitating the development of new ideas. Sharing data and computer code associated with publications is becoming increasingly common, motivated partly in response to data deposition requirements from journals and mandates from funders. Despite this increase in transparency, it is still difficult to reproduce or build upon the findings of most scientific publications without access to a more complete workflow.

Weber, Nicholas, Sebastian Karcher, and James Myers. 2020. “Open Source Tools for Scaling Data Curation at QDR.” The Code4Lib Journal, no. 49 (August). https://journal.code4lib.org/articles/15436.

  • This paper describes the development of services and tools for scaling data curation services at the Qualitative Data Repository (QDR). Through a set of open-source tools, semi-automated workflows, and extensions to the Dataverse platform, our team has built services for curators to efficiently and effectively publish collections of qualitatively derived data. The contributions we seek to make in this paper are as follows: , 1. We describe ‘human-in-the-loop’ curation and the tools that facilitate this model at QDR;, 2. We provide an in-depth discussion of the design and implementation of these tools, including applications specific to the Dataverse software repository, as well as standalone archiving tools written in R; and, 3. We highlight the role of providing a service layer for data discovery and accessibility of qualitative data., Keywords: Data curation; open-source; qualitative data

Education

“GitHub for Academics: The Open-Source Way to Host, Create and Curate Knowledge.” 2013. Impact of Social Sciences (blog). June 4, 2013. https://blogs.lse.ac.uk/impactofsocialsciences/2013/06/04/github-for-academics/.

  • Though originally developed as a way to share and merge software code, any types of files can be part of a GitHub repository, making it a great collaborative tool for academics, finds Kris Shaffer. Since any open-licensed project can be hosted on GitHub for free, it can function as a publishing platform, a peer-review system, a learning management tool, and a locus for intra- and inter-institutional collaboration.

Zagalsky, Alexey. 2015. “Why You Should Use GitHub: Lessons for the Classroom and Newsroom.” Storybench (blog). September 29, 2015. https://www.storybench.org/use-github-lessons-classroom-newsroom/.

  • My fellow researchers and I studied how and why educators use GitHub. In the first phase, we searched for resources (such as blog posts and discussion groups) that described the personal experiences of educators using GitHub to support learning or teaching. Next, we interviewed 15 educators that have used GitHub, including one of the blog authors from the previous phase. We were able to thoroughly investigate the usefulness and potential of GitHub in education. We then proceeded to interview John Britton, a representative from GitHub, in order to gain insights into GitHub’s perspective.

Videos

Git for Humans – Alice Bartlett at UX Brighton 2016. 2017. https://www.youtube.com/watch?v=eWxxfttcMts.

  • One of the best explanations of Git out there.

Laura Cox. 2020. Vicky Steeves & Sarah Nguyen - Commit- Ment Issues with Git: Investigating & Archiving y’alls Work. https://www.youtube.com/watch?v=T6zp1k9RxbE.

  • Git and platforms like GitLab and GitHub have revolutionized how people track and share their work. This reality brings librarians to an interesting crossroad as they move to understand, inventory, archive, and preserve source code and its contextual ephemera, such as commit messages, merge requests, and issue discussions. This talk from a project team of research scientists and librarians will share how Git breaks barriers and promotes open research and scholarship, as well as identify ways it can be archived for the long-term citation and reproducibility. Part environmental scan, part behavioral study, we will discuss the ins-and-outs of a researcher’s Git hosting platform experience and digital preservation tools that can be creatively used in Git preservation efforts. From: csv,conf,v5 is a community conference for data makers everywhere, featuring stories about data sharing and data analysis from science, journalism, government, and open source. Held May 13-14, 2020, Online. https://csvconf.com/

Olivia Wikle, Evan Williamson, Kate Thornhill, and Gabriele Hayden. "Learn-STATIC: Innovative Digital Humanities Pedagogy With Static Web Technologies," https://pdxscholar.library.pdx.edu/onlinenorthwest/2022/schedule/7/

  • Static web technologies offer an exciting opportunity for Digital Humanities instructors to incorporate transferable digital literacy skills into the classroom, while producing low-cost, low-maintenance web projects that are sustainable even for institutions with limited resources. These tools spur students to learn transferable data management and digital literacy skills, and their open nature strengthens instructors' control over classroom web projects. This presentation will introduce the NEH-funded Learn-STATIC initiative, which aims to make static web technologies more accessible for students and instructors alike through open-source learning sequences that contain reusable code stored in GitHub repositories, example lesson plans, and documentation.

GitHub Best Practices

Neuburg, Matt. 2021. "Picturing Git: Conceptions and Misconceptions," Bite Interactive. https://www.biteinteractive.com/picturing-git-conceptions-and-misconceptions/

  • Recommended by Timothy Watters (lightly edited): "[This article] addresses problems with the way developers conceptualize how Git works under the hood. For example, he points out that although people often think of their Git repository as including all the files in the folder for their project; technically, only the hidden .git folder inside that project folder is the repository. This is important because having a hidden .git folder inside your project folder actually gives some control of your project files to Git. He points out that if you type git init from your terminal in a top level folder on a Mac you could potentially set yourself up for wiping out a big part of your hard drive unintentionally if you accidentally type the wrong command. He further shows that after you add and commit a file, that file becomes part of the Git repository and the file you now see in your project folder is actually of copy of that committed file which you can now edit. He uses the metaphor that Git is lending you back the files you gave Git so you can work on them in an area called the working tree. Another helpful way of thinking about commits is that they are a photographic snapshot of your entire working tree. He points out that this is a better way of thinking about commits rather than thinking of commits as changes to just one part of the project. He uses the analogy of an annual family reunion where there is a group photo. If one person grows a beard between reunions, you don't take a picture of him by himself next time. You always do a group photo. Even though this article was written for developers, I think the metaphors and commands demonstrating what is going on under the hood were very useful for casual Git beginners like myself."

Stannard, Aaron. 2019. “How to Use Github Professionally: Best Practices for Working with Github in Team Settings | Petabridge.” https://petabridge.com/blog/use-github-professionally/.

  • This is a blog post by a professional software developer, so some of the material is probably more advanced than a typical library example, but these insights can be helpful if you're participating on a software development project at your institution or through an open source community.

Tutorials

“Library Carpentry: Introduction to Git.” n.d. Accessed August 27, 2020. https://librarycarpentry.org/lc-git/.

  • Begin to understand and use Git/GitHub. You will not be an expert by the end of the class. You will probably not even feel very comfortable using Git. This is okay. We want to make a start but, as with any skill, using Git takes practice.

“A Guide to Creating and Hosting a Personal Website on GitHub | Jonathan McGlone | Librarian, Front-End Developer, Digital Publisher, Project Manager, Music Enthusiast, Web Noodler.” n.d. Accessed August 30, 2020. http://jmcglone.com/guides/github-pages/.

  • This guide is meant to help Git and GitHub beginners get up and running with GitHub Pages and Jekyll in an afternoon. It assumes you know very little about version control, Git, and GitHub. It is helpful if you know the basics of HTML and CSS since we'll be working directly with these languages. We'll also be using a little bit of Markdown, but by no means do you need to be an expert with any of these languages. The idea is to learn by doing, so the code we'll be implementing in this tutorial is available in this guide or can be downloaded entirely at this GitHub repo.

Diaz, Chris. 2020. "Introduction to Static Site Generators", Static Web Publishing for Digital Scholarship. https://chrisdaaz.github.io/static-web-scholcomm/tutorials/static-site-generators/

  • Covers the basics of using a static site generator. We will be using Hugo to build our demonstration site. We’ll play the role of a scholarly communications librarian. We’ll be using a command line terminal to install software and run commands and a text editor to edit and save plain text files. This in-depth tutotial is estimated to take between three and four hours to complete.

Visconti, Amanda, Brandon Walsh, and Scholars’ Lab Community. 2020. “Running a Collaborative Research Website and Blog with Jekyll and GitHub.” Programming Historian, November. https://programminghistorian.org/en/lessons/collaborative-blog-with-jekyll-github.

  • In this lesson you will be introduced to the challenges and opportunities that Jekyll, a popular, static site generator, offers for publishing collaborative, ongoing research online.

Visconti, Amanda. 2016. “Building a Static Website with Jekyll and GitHub Pages.” Programming Historian, April. https://programminghistorian.org/en/lessons/building-static-sites-with-jekyll-github-pages.

  • This lesson will help you create entirely free, easy-to-maintain, preservation-friendly, secure website over which you have full control, such as a scholarly blog, project website, or online portfolio.

“Workshops | Vanderbilt University Library Github Repository.” n.d. Accessed December 5, 2020. https://heardlibrary.github.io/workshops/tech/2016/01/22/github-ed-tech.html.

  • My goal for this session is to provide you with a brief introduction to GitHub. You will be introduced to the concept of source control and tools such as Git and GitHub. By the end of the session you will be able to create your own repository from scratch and work with repositories created by others.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment