Skip to content

Instantly share code, notes, and snippets.

@ChristinaLK
Created November 14, 2024 19:53
Show Gist options
  • Save ChristinaLK/b9d6e10dd5824b41d6887895ff2ad5ed to your computer and use it in GitHub Desktop.
Save ChristinaLK/b9d6e10dd5824b41d6887895ff2ad5ed to your computer and use it in GitHub Desktop.
#!/bin/bash
type=$1
num=$2
datafile=$3
$type -n $num $datafile > $datafile.sub
[christina.koch@ap40 automation]$ cat simple.def
Bootstrap: docker
From: hub.opensciencegrid.org/htc/rocky:9
%files
extract.sh /opt
%environment
export PATH=/opt/:$PATH
import htcondor
import classad
input_list = ["SRR2584866_2", "SRR2589044_1", "SRR2589044_2", "SRR2584866_1", "SRR2584863_1", "SRR2584863_2"]
itemdata = []
for samplename in input_list:
itemdata.append({"sample": samplename})
function = 'head'
num = 5
upload_bucket = "osdf:///ospool/ap40/data/christina.koch/output-buckets"
singularity_image = "osdf:///ospool/ap40/data/christina.koch/singularity_imgs/extract.sif"
input_bucket = "osdf:///ospool/uc-shared/public/osg-training/tutorial-fastqc/data"
job = htcondor.Submit({
"container_image": singularity_image,
"universe": "container",
"executable": "/opt/extract.sh",
"transfer_executable": "False",
"should_transfer_files" : "True",
"transfer_input_files" : f"{input_bucket}/$(sample).trim.sub.fastq",
"arguments": f"{function} {num} $(sample).trim.sub.fastq",
"transfer_output_remaps": f'"$(sample).trim.sub.fastq.sub={upload_bucket}/$(ClusterID)/$(sample).trim.sub.fastqc.sub"',
"output": "logs/extract-$(ProcId).out", # output and error for each job, using the $(ProcId) macro
"error": "logs/extract-$(ProcId).err",
"log": "logs/extract.log", # we still send all of the HTCondor logs for every job to the same file (not split up!)
"request_cpus": "1",
"request_memory": "1GB",
"request_disk": "1GB",
})
schedd = htcondor.Schedd()
submit_result = schedd.submit(job, itemdata = iter(itemdata)) # submit one job for each item in the itemdata
#!/bin/bash
type=$1
num=$2
datafile=$3
$type -n $num $datafile > $datafile.sub
@ChristinaLK
Copy link
Author

container.def and extract.sh are used to build a container, which can be added to the OSDF.

extract.py takes a list of input data and submits a job (using the container/script) for each item, putting the results back into an OSDF location.

@ChristinaLK
Copy link
Author

slack transcript:

collector = htcondor2.Collector("remote-pool-cm.host.tld")
location = collector.locate(htcondor2.DaemonType.Schedd, "name-of-schedd-if-more-than-one")
schedd = htcondor2.Schedd(location)
  • You do not need anything installed beyond the bindings, but you may need a Condor config file to set authentication-related settings (or set them directly in your python script)
  • (... I don't remember what locate() returns if it fails to find anything; if it's None you may need to check for that explicitly to avoid accidentally submitting/trying to submit to the local schedd, instead.)
  • But if you have an IDToken in the usual location, that will work without any config files.
  • …or do the bindings complain if the default config file doesn’t exist?
  • The bindings may well complain, but I think they work anyway.
  • They spit out a warning if they don't see a config file
  • A blank file in the right spot or even setting the environment to look at /dev/null is enough to get it to not emit that warning (edited)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment