This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version='1.0' encoding='UTF-8'?> | |
<doi_batch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.crossref.org/schema/4.4.1" xsi:schemaLocation="http://www.crossref.org/schema/4.4.1 http://www.crossref.org/schema/deposit/crossref4.4.1.xsd" version="4.4.1"> | |
<head> | |
<doi_batch_id>1646395517</doi_batch_id> | |
<timestamp>1646395517</timestamp> | |
<depositor> | |
<depositor_name>Matt Post</depositor_name> | |
<email_address>[email protected]</email_address> | |
</depositor> | |
<registrant>Association for Computational Linguistics</registrant> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
"""Uses the Semantic Scholar API to get citation counts for all papers in | |
an ACL volume. Assumes old-style IDs (e.g., P96-1). | |
Mad props to Semantic Scholar for making this so easy. | |
""" | |
import json | |
import os |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# -*- coding: utf-8 -*- | |
# | |
# Copyright 2019--2021 Matt Post <[email protected]> | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import sys | |
from sacremoses.normalize import MosesPunctNormalizer | |
def main(args): | |
normalizer = MosesPunctNormalizer(lang=args.lang, penn=args.penn) | |
for line in sys.stdin: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import sys | |
import sacremoses | |
def main(args): | |
"""Tokenizes, preserving tabs""" | |
mt = sacremoses.MosesTokenizer(lang=args.lang) | |
def tok(s): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
This is code to take a trained Fairseq model and discard the ADAM optimizer state, | |
which is not needed at test time. It can reduce a model size by ~70%. | |
Original author: Brian Thompson | |
""" | |
from fairseq import checkpoint_utils |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# I can never remember syntax for GNU parallel | |
## Treat STDIN as a pool of commands to run, running the command for each, at most j in parallel | |
cat commands.txt | parallel -j 10 | |
## Download a long list of files in parallel | |
cat files.txt | parallel -j 10 wget -q {} | |
## Start 10 parallel instances of COMMAND with FLAGS. Feed STDIN in 10k blocks to these commands. Assemble the outputs in order (-k). | |
cat large_input.txt | parallel -j 10 --pipe -k --block-size 10m COMMAND FLAGS > output.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
""" | |
Looks at all the *.ics files in the current directory, removes the X- keys, | |
and generates a new UUID. This is used for restoring an accidentally-deleted | |
calendar in Apple's Calendar program; it is a rewrite of the node.js version | |
that is linked to from here: | |
http://fokkezb.nl/2015/01/13/how-to-restore-a-deleted-icloud-calendar/ | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
# Python *sucks* at UTF-8 (don't tell me "It's fixed in Python 3"; I don't care, plus no one uses Python 3) | |
# If you put this at the top of every Python script, however, it get rids of most of the headaches dealing with STDIN | |
# and STDOUT (basically, akin to "perl -C31"). I don't know if it's all necessary; I just know that if I put it at | |
# the top of my scripts, most of the problems go away, and I can stop thinking about it. | |
import sys | |
import codecs |
NewerOlder