Skip to content

Instantly share code, notes, and snippets.

@tsibley
Last active November 12, 2024 18:29
Show Gist options
  • Save tsibley/57af8953b3202eea066903440fd2d132 to your computer and use it in GitHub Desktop.
Save tsibley/57af8953b3202eea066903440fd2d132 to your computer and use it in GitHub Desktop.
"""
For Cécile, re: <https://bedfordlab.slack.com/archives/C0K3GS3J8/p1731108015913239>.
The rule-of-thumb is: if you need to read from a file output by rule A in order
to determine what rule B will declare as input, then rule A needs to be a
checkpoint and rule B needs to use an input function that reads from the file.
In this example, the intermediate rule is naive to the checkpoint and input
function: it's a basic rule that uses two wildcards. The final rule uses an
input function to declare input files that "pass thru" the global {scenario}
wildcard, letting it be inferred/implicit, but make explicit values for the
{clade} wildcard used by the intermediate rule.
"""
SCENARIOS = ["A", "B", "C"]
rule all:
input:
output = expand("final_output_{scenario}.tsv", scenario = SCENARIOS)
checkpoint create_clade_list:
output:
clade_list = "clade_list_{scenario}.txt"
shell: r"""
printf clade{wildcards.scenario:q}'%s\n' 1 2 3 > {output.clade_list}
"""
rule intermediate_output_scenario:
output: "intermediate_output_scenario_{scenario}_{clade}.tsv"
shell: r"""
printf {wildcards.clade:q}'_intermediate%d\n' 1 2 3 > {output:q}
"""
def final_output_input(w):
with checkpoints.create_clade_list.get(**w).output.clade_list.open() as f:
clades = f.read().splitlines()
return expand("intermediate_output_scenario_{{scenario}}_{clade}.tsv", clade=clades)
rule final_output:
input: final_output_input
output: "final_output_{scenario}.tsv"
shell: r"""
cat {input:q} > {output:q}
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment