Skip to content

Instantly share code, notes, and snippets.

@brantfaircloth
Last active December 16, 2015 21:09
Show Gist options
  • Save brantfaircloth/5497173 to your computer and use it in GitHub Desktop.
Save brantfaircloth/5497173 to your computer and use it in GitHub Desktop.
Steps used for data analysis in Faircloth et. al. 2013 "A phylogenomic perspective on the radiation of ray-finned fishes based upon targeted sequencing of ultraconserved elements (UCEs)"

Assembly

  • download files from s3 using 3hub

  • unzip respective files (started w/ Bin001.zip)

  • run process_reads.py (now part of https://github.com/faircloth-lab/illumiprocessor/):

    python ~/git/brant/seqcap/Assembly/process_reads.py
    
  • run velvetoptimiser:

    VelvetOptimiser --s 55 -e 71 --f '-short -fastq ../n-less/Hexanchus-griseus.fastq' \
        --v --t 4
    
  • deviations:

    VelvetOptimiser --s 55 -e 71 --f '-short -fastq ../n-less/Hexanchus-griseus.fastq' \
        --v --t 4 (went shorter range than normal)
    
  • after assembly, symlink all assemblies into fish-data/contigs and:

    python ~/Git/brant/seqcap/Assembly/match_contigs_to_probes.py \
        /Volumes/Data3/fish-seqcap/contig \
        /Users/bcf/Git/brant/seqcap/Fish/Data/Probes/finalConserved.bufferTo180.probes.fa \
        /Volumes/Data3/fish-seqcap/lastz
    
  • finish assemblies by hand across all taxa in data set (this is now automated by assemblo.py)

Preparing Table Data

  • for all data in /Volumes/Data3/fish/fish-seqcap/Bin_00*/ compute the coverage for various contigs using Amos:

    ~/Source/amos-3.0.0/src/Bank/bank-transact -m velvet_asm.afg -b velvet_asm.bnk -c
    ~/Source/amos-3.0.0/src/Validation/analyze-read-depth velvet_asm.bnk -d
    
  • go ahead and get # of non-dupe contigs for each taxon from /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite:

    select sum(acanthurus_japonicus), sum(acipenser_fulvescens), sum(amia_calva),
    sum(anchoa_compressa), sum(antennarius_striatus), sum(astyanax_fasciatus), sum(carcharodon_carcharias),
    sum(danio_rerio), sum(diaphus_theta), sum(heterodontus_francisci), sum(hexanchus_griseus),
    sum(lampris_guttatus), sum(lepisosteus_sp), sum(megalops_sp), sum(myliobatis_californica),
    sum(osteoglossum_bicirrhosum), sum(pantodon_buchholzi), sum(polypterus_senegalus), sum(protopterus_annectens),
    sum(salvelinus_fontinalis), sum(sphyrna_mokarran), sum(strophidon_sathete), sum(taenianotus_triacanthus),
    sum(umbra_limi) from matches;
    
  • output data to compute coverage across enriched contigs. In /Volumes/Data3/fish/fish-seqcap/lastz:

    mkdir loci
    sqlite probe.matches.sqlite
    sqlite> .output ./loci/salvelinus_fontinalis.loci
    sqlite> select salvelinus_fontinalis from match_map where salvelinus_fontinalis is null;
    sqlite> .output stdout
    
  • in /Volumes/Data3/fish/fish-seqcap/:

    mkdir fish-uce-coverage
    mv /Volumes/Data3/fish/fish-seqcap/lastz/loci/* fish-uce-coverage
    
  • convert files to iid format:

    ~/Source/amos-3.0.0/src/Utils/cvgStat -b ../Bin_001/assembly/umbra-limi/velvet_asm.bnk > umbra_limi.cvg
    
    python get_iids_from_cvg_stat.py \
        umbra_limi.cvg umbra_limi.loci --output umbra-limi.iid
    
    ~/Source/amos-3.0.0/src/Validation/analyze-read-depth ../Bin_001/assembly/umbra-limi/velvet_asm.bnk -d -I umbra-limi.iid
    
  • get read stats for UCE contigs:

    python ~/git/brant/phyluce/bin/assembly/get_contig_lengths_for_all_uce_loci.py \
        /Volumes/Data3/fish/fish-seqcap/contig \
        /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite \
        /Users/bcf/Dropbox/Research/alfaro/lab/illumina-run/match-count.config \
        'wag plus'
    

Rebuild data sets for genome-enabled species

5 March 2012

  • lastz probes to themselves:

    python /Users/bcf/Git/brant/phyluce/bin/share/easy_lastz.py \
        --target /Users/bcf/Git/brant/seqcap/Fish/Data/Probes/finalConserved.fa \
        --query /Users/bcf/Git/brant/seqcap/Fish/Data/Probes/finalConserved.fa \
        --identity 85 --output /Users/bcf/Git/brant/seqcap/Non-repo/Manuscripts/Fish/Simulation/finalConserved.bufferTo180.probes.fa
    
  • Generate lastz alignments of probes to genomes [ for lepOcu1, gadMor1, latCha1, dicLab1 ]:

    for i in lepOcu1 gadMor1 latCha1 dicLab1; do
    mkdir -p /Volumes/Data/Genomes/Fish/$i/lastz/;
    python /Users/bcf/Git/brant/phyluce/bin/share/run_lastz.py \
        --target /Volumes/Data/Genomes/Fish/$i/$i.2bit \
        --query /Users/bcf/Git/brant/seqcap/Non-repo/Manuscripts/Fish/Simulation/finalConserved.bufferTo180.probes.2bit \
        --output /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \
        --nprocs 7 --identity=80 --coverage=83 --huge;
    done
    
    for i in danRer6 fr2 gasAcu1 hapBur1 neoBri1 oreNil1 oryLat2 punNye1 tetNig2; do
    mkdir -p /Volumes/Data/Genomes/Fish/$i/lastz/;
    python /Users/bcf/Git/brant/phyluce/bin/share/run_lastz.py \
        --target /Volumes/Data/Genomes/Fish/$i/$i.2bit \
        --query /Users/bcf/Git/brant/seqcap/Non-repo/Manuscripts/Fish/Simulation/finalConserved.bufferTo180.probes.2bit \
        --output /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \
        --nprocs 7 --identity=80 --coverage=83 --huge;
    done
    
  • in /Users/bcf/Working/fish-phylogeny/fake:

    mkdir fasta contig lastz
    
  • generate data for genome-enabled organisms [ for lepOcu1, gadMor1, latCha1, dicLab1 ], adding to those data we already have:

    for i in danRer6 fr2 gasAcu1 hapBur1 neoBri1 oreNil1 oryLat2 punNye1 tetNig2 lepOcu1 gadMor1 latCha1; do
    python /Users/bcf/Git/brant/phyluce/bin/share/get_fake_velvet_contigs_from_genomes.py  \
        /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \
        /Volumes/Data/Genomes/Fish/$i/$i.2bit \
        --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \
        --fasta /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-$i-500.fasta \
        --flank 500 --fish;
    done
    
  • this line is for testing purposes:

    for i in gadMor1; do
    python /Users/bcf/Git/brant/phyluce/bin/share/get_fake_velvet_contigs_from_genomes.py  \
        /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \
        /Volumes/Data/Genomes/Fish/$i/$i.2bit \
        --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \
        --fasta /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-$i-500.fasta \
        --flank 500 --fish;
    done
    
  • create a contig folder somewhere w/ symlinks to these fasta files,

    using appropriate names. for instance:

    # /Users/bcf/Working/fish-phylogeny/
    
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-danRer6-500.fasta danio-rerio.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-fr2-500.fasta takifugu-rupribes.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-gasAcu1-500.fasta gasterosteus-aculeatus.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-hapBur1-500.fasta haplochromis-burtoni.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-neoBri1-500.fasta neolamprologus-brichardi.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-oreNil1-500.fasta oreochromis-niloticus.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-oryLat2-500.fasta oryzias-latipes.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-punNye1-500.fasta pundamilia-nyererei.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-tetNig2-500.fasta tetraodon-nigroviridis.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-lepOcu1-500.fasta lepisosteus-oculatus.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-gadMor1-500.fasta gadus-morhua.contigs.fasta
    ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-latCha1-500.fasta latimeria-chalumnae.contigs.fasta
    
  • re-match the probes to the "fake" velvet contigs:

    python /Users/bcf/Git/brant/phyluce/bin/assembly/match_contigs_to_probes.py \
        /Users/bcf/Working/fish-phylogeny/fake/contig/ \
        /Users/bcf/Working/fish-phylogeny/finalConserved.fa \
        /Users/bcf/Working/fish-phylogeny/fake/lastz/ \
        --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \
        --identity=80 --coverage=67
    
    Processing:
     Getting dupes
     danio_rerio: 457 (97.03%) uniques of 471 contigs, 0 dupe probes, 2 dupe probe matches, 12 dupe node matches
     gadus_morhua: 373 (88.60%) uniques of 421 contigs, 0 dupe probes, 1 dupe probe matches, 10 dupe node matches
     gasterosteus_aculeatus: 468 (97.30%) uniques of 481 contigs, 0 dupe probes, 1 dupe probe matches, 12 dupe node matches
     haplochromis_burtoni: 457 (97.03%) uniques of 471 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches
     latimeria_chalumnae: 272 (95.44%) uniques of 285 contigs, 0 dupe probes, 1 dupe probe matches, 12 dupe node matches
     lepisosteus_oculatus: 410 (96.93%) uniques of 423 contigs, 0 dupe probes, 1 dupe probe matches, 12 dupe node matches
     neolamprologus_brichardi: 456 (97.02%) uniques of 470 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches
     oreochromis_niloticus: 461 (97.05%) uniques of 475 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches
     oryzias_latipes: 461 (97.05%) uniques of 475 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches
     pundamilia_nyererei: 461 (97.05%) uniques of 475 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches
     takifugu_rupribes: 469 (97.10%) uniques of 483 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches
     tetraodon_nigroviridis: 471 (97.52%) uniques of 483 contigs, 0 dupe probes, 0 dupe probe matches, 12 dupe node matches
    
  • rename that file [TODO - add rename to code]:

    mv /Users/bcf/Working/fish-phylogeny/fake/lastz/probe.matches.sqlite \
        /Users/bcf/Working/fish-phylogeny/fake/lastz/genome.matches.sqlite
    
  • Because we change parameters in matching script, re-run match_contigs_to_probes.py:

    python /Users/bcf/Git/brant/phyluce/bin/assembly/match_contigs_to_probes.py \
        /Volumes/Data3/fish/fish-seqcap/contig \
        /Users/bcf/Working/fish-phylogeny/finalConserved.fa \
        /Volumes/Data3/fish/fish-seqcap/lastz \
        --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \
        --identity=80 --coverage=67
    
    Processing:
     Getting dupes
     acanthurus_japonicus: 448 (73.08%) uniques of 613 contigs, 0 dupe probes, 4 dupe probe matches, 56 dupe node matches
     acipenser_fulvescens: 172 (29.81%) uniques of 577 contigs, 0 dupe probes, 2 dupe probe matches, 258 dupe node matches
     amia_calva: 360 (64.06%) uniques of 562 contigs, 0 dupe probes, 3 dupe probe matches, 72 dupe node matches
     anchoa_compressa: 292 (54.78%) uniques of 533 contigs, 0 dupe probes, 1 dupe probe matches, 83 dupe node matches
     antennarius_striatus: 415 (87.55%) uniques of 474 contigs, 0 dupe probes, 4 dupe probe matches, 22 dupe node matches
     astyanax_fasciatus: 355 (65.38%) uniques of 543 contigs, 0 dupe probes, 2 dupe probe matches, 61 dupe node matches
     carcharodon_carcharias: 148 (51.93%) uniques of 285 contigs, 0 dupe probes, 1 dupe probe matches, 20 dupe node matches
     danio_rerio: 381 (73.55%) uniques of 518 contigs, 0 dupe probes, 3 dupe probe matches, 53 dupe node matches
     diaphus_theta: 399 (68.32%) uniques of 584 contigs, 0 dupe probes, 3 dupe probe matches, 29 dupe node matches
     heterodontus_francisci: 137 (54.37%) uniques of 252 contigs, 0 dupe probes, 0 dupe probe matches, 18 dupe node matches
     hexanchus_griseus: 146 (24.75%) uniques of 590 contigs, 0 dupe probes, 1 dupe probe matches, 17 dupe node matches
     lampris_guttatus: 412 (84.77%) uniques of 486 contigs, 0 dupe probes, 2 dupe probe matches, 29 dupe node matches
     lepisosteus_sp: 375 (55.56%) uniques of 675 contigs, 0 dupe probes, 3 dupe probe matches, 88 dupe node matches
     megalops_sp: 243 (30.92%) uniques of 786 contigs, 0 dupe probes, 1 dupe probe matches, 373 dupe node matches
     myliobatis_californica: 148 (53.62%) uniques of 276 contigs, 0 dupe probes, 1 dupe probe matches, 24 dupe node matches
     osteoglossum_bicirrhosum: 270 (41.99%) uniques of 643 contigs, 0 dupe probes, 1 dupe probe matches, 250 dupe node matches
     pantodon_buchholzi: 272 (58.37%) uniques of 466 contigs, 0 dupe probes, 2 dupe probe matches, 124 dupe node matches
     polypterus_senegalus: 299 (51.91%) uniques of 576 contigs, 0 dupe probes, 5 dupe probe matches, 20 dupe node matches
     protopterus_annectens: 70 (9.40%) uniques of 745 contigs, 0 dupe probes, 1 dupe probe matches, 0 dupe node matches
     salvelinus_fontinalis: 159 (14.22%) uniques of 1118 contigs, 0 dupe probes, 1 dupe probe matches, 636 dupe node matches
     sphyrna_mokarran: 156 (40.00%) uniques of 390 contigs, 0 dupe probes, 1 dupe probe matches, 21 dupe node matches
     strophidon_sathete: 279 (27.71%) uniques of 1007 contigs, 0 dupe probes, 3 dupe probe matches, 158 dupe node matches
     taenianotus_triacanthus: 442 (62.08%) uniques of 712 contigs, 0 dupe probes, 3 dupe probe matches, 69 dupe node matches
     umbra_limi: 404 (36.43%) uniques of 1109 contigs, 0 dupe probes, 2 dupe probe matches, 76 dupe node matches
    

Create data sets

Missing data

  • then, with missing data:

    python ~/Git/brant/phyluce/bin/assembly/get_match_counts.py \
        /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite \
        /Users/bcf/Working/fish-phylogeny/match-count.config \
        --extend /Users/bcf/Working/fish-phylogeny/fake/lastz/genome.matches.sqlite \
        'group2' \
        --output /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.out \
        --notstrict
    
    # this is all but brook trout.
    [group2]
    acanthurus_japonicus
    amia_calva
    diaphus_theta
    antennarius_striatus
    astyanax_fasciatus
    danio_rerio
    lampris_guttatus
    umbra_limi
    taenianotus_triacanthus
    acipenser_fulvescens
    pantodon_buchholzi
    polypterus_senegalus
    osteoglossum_bicirrhosum
    anchoa_compressa
    megalops_sp
    strophidon_sathete
    salvelinus_fontinalis
    takifugu_rupribes*
    gasterosteus_aculeatus*
    haplochromis_burtoni*
    neolamprologus_brichardi*
    oreochromis_niloticus*
    oryzias_latipes*
    pundamilia_nyererei*
    tetraodon_nigroviridis*
    lepisosteus_oculatus*
    gadus_morhua*
    
  • turn those data into a giant fasta for alignment:

    python /Users/bcf/Git/brant/phyluce/bin/assembly/get_fastas_from_match_counts.py \
        /Volumes/Data3/fish/fish-seqcap/contig \
        /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite \
        /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.out \
        --extend-db /Users/bcf/Working/fish-phylogeny/fake/lastz/genome.matches.sqlite \
        --extend-dir /Users/bcf/Working/fish-phylogeny/fake/contig/ \
        --output /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.fasta \
        --notstrict /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.notstrict
    
    Getting acanthurus_japonicus reads...
        Replaced < 20 ambiguous bases in >29510_orylat2_chr20_19834163_19834343_acanthurus_japonicus
        Replaced < 20 ambiguous bases in >64849_orylat2_ultracontig117_111598_111777_acanthurus_japonicus
        Replaced < 20 ambiguous bases in >19592_orylat2_chr16_21174511_21174690_acanthurus_japonicus
        Replaced < 20 ambiguous bases in >34879_orylat2_chr22_16020600_16020780_acanthurus_japonicus
        Replaced < 20 ambiguous bases in >37317_orylat2_chr24_2744123_2744302_acanthurus_japonicus
        Replaced < 20 ambiguous bases in >28115_orylat2_chr20_11834906_11835085_acanthurus_japonicus
    Getting acipenser_fulvescens reads...
        Replaced < 20 ambiguous bases in >28407_orylat2_chr20_12253203_12253383_acipenser_fulvescens
        Replaced < 20 ambiguous bases in >51743_orylat2_chr6_22173752_22173931_acipenser_fulvescens
        Replaced < 20 ambiguous bases in >65920_orylat2_ultracontig257_170745_170924_acipenser_fulvescens
    Getting amia_calva reads...
        Replaced < 20 ambiguous bases in >59208_orylat2_chr9_15660075_15660255_amia_calva
        Replaced < 20 ambiguous bases in >31907_orylat2_chr21_24592457_24592637_amia_calva
    Getting anchoa_compressa reads...
    Getting antennarius_striatus reads...
        Replaced < 20 ambiguous bases in >36257_orylat2_chr23_8323776_8323955_antennarius_striatus
        Replaced < 20 ambiguous bases in >24999_orylat2_chr19_9348234_9348414_antennarius_striatus
        Replaced < 20 ambiguous bases in >31353_orylat2_chr21_23054326_23054505_antennarius_striatus
    Getting astyanax_fasciatus reads...
        Replaced < 20 ambiguous bases in >19564_orylat2_chr16_21073470_21073650_astyanax_fasciatus
        Replaced < 20 ambiguous bases in >24999_orylat2_chr19_9348234_9348414_astyanax_fasciatus
        Replaced < 20 ambiguous bases in >45039_orylat2_chr4_15269599_15269778_astyanax_fasciatus
        Replaced < 20 ambiguous bases in >29060_orylat2_chr20_16776517_16776697_astyanax_fasciatus
        Replaced < 20 ambiguous bases in >50797_orylat2_chr6_13673431_13673611_astyanax_fasciatus
        Replaced < 20 ambiguous bases in >42411_orylat2_chr3_28384904_28385084_astyanax_fasciatus
        Replaced < 20 ambiguous bases in >40337_orylat2_chr3_18054080_18054259_astyanax_fasciatus
    Getting danio_rerio reads...
        Replaced < 20 ambiguous bases in >3037_orylat2_chr10_7436728_7436908_danio_rerio
        Replaced < 20 ambiguous bases in >8006_orylat2_chr12_12703194_12703373_danio_rerio
        Replaced < 20 ambiguous bases in >6581_orylat2_chr11_22357337_22357516_danio_rerio
    Getting diaphus_theta reads...
        Replaced < 20 ambiguous bases in >43998_orylat2_chr4_5445911_5446091_diaphus_theta
    Getting gadus_morhua* reads...
        Replaced < 20 ambiguous bases in >28133_orylat2_chr20_11859433_11859612_gadus_morhua
    Getting gasterosteus_aculeatus* reads...
    Getting haplochromis_burtoni* reads...
    Getting lampris_guttatus reads...
        Replaced < 20 ambiguous bases in >7602_orylat2_chr12_9462216_9462396_lampris_guttatus
        Replaced < 20 ambiguous bases in >15812_orylat2_chr15_20169029_20169209_lampris_guttatus
        Replaced < 20 ambiguous bases in >9168_orylat2_chr13_4200208_4200388_lampris_guttatus
        Replaced < 20 ambiguous bases in >40283_orylat2_chr3_18016128_18016307_lampris_guttatus
    Getting lepisosteus_oculatus* reads...
    Getting megalops_sp reads...
    Getting neolamprologus_brichardi* reads...
    Getting oreochromis_niloticus* reads...
    Getting oryzias_latipes* reads...
        Replaced < 20 ambiguous bases in >54500_orylat2_chr7_21324258_21324438_oryzias_latipes
        Replaced < 20 ambiguous bases in >24653_orylat2_chr19_8255095_8255274_oryzias_latipes
        Replaced < 20 ambiguous bases in >17869_orylat2_chr16_5174650_5174830_oryzias_latipes
        Replaced < 20 ambiguous bases in >17424_orylat2_chr16_1035320_1035500_oryzias_latipes
        Replaced < 20 ambiguous bases in >22542_orylat2_chr17_28477535_28477714_oryzias_latipes
        Replaced < 20 ambiguous bases in >59208_orylat2_chr9_15660075_15660255_oryzias_latipes
        Replaced < 20 ambiguous bases in >37000_orylat2_chr23_22404477_22404657_oryzias_latipes
        Replaced < 20 ambiguous bases in >52857_orylat2_chr7_9011611_9011791_oryzias_latipes
        Replaced < 20 ambiguous bases in >15812_orylat2_chr15_20169029_20169209_oryzias_latipes
        Replaced < 20 ambiguous bases in >44335_orylat2_chr4_8453372_8453552_oryzias_latipes
        Replaced < 20 ambiguous bases in >30325_orylat2_chr21_9456497_9456676_oryzias_latipes
        Replaced < 20 ambiguous bases in >33912_orylat2_chr22_10311301_10311481_oryzias_latipes
        Replaced < 20 ambiguous bases in >1199_orylat2_chr1_20869179_20869358_oryzias_latipes
        Replaced < 20 ambiguous bases in >65943_orylat2_ultracontig257_377659_377839_oryzias_latipes
        Replaced < 20 ambiguous bases in >44706_orylat2_chr4_12103391_12103571_oryzias_latipes
        Replaced < 20 ambiguous bases in >52576_orylat2_chr7_7401879_7402059_oryzias_latipes
        Replaced < 20 ambiguous bases in >12379_orylat2_chr14_14534981_14535161_oryzias_latipes
        Replaced < 20 ambiguous bases in >50797_orylat2_chr6_13673431_13673611_oryzias_latipes
        Replaced < 20 ambiguous bases in >36825_orylat2_chr23_19660644_19660823_oryzias_latipes
        Replaced < 20 ambiguous bases in >13680_orylat2_chr14_31599989_31600168_oryzias_latipes
        Replaced < 20 ambiguous bases in >30113_orylat2_chr21_8630533_8630712_oryzias_latipes
        Replaced < 20 ambiguous bases in >36609_orylat2_chr23_15322876_15323055_oryzias_latipes
        Replaced < 20 ambiguous bases in >59216_orylat2_chr9_15672674_15672854_oryzias_latipes
        Replaced < 20 ambiguous bases in >7561_orylat2_chr12_9045138_9045317_oryzias_latipes
        Replaced < 20 ambiguous bases in >51400_orylat2_chr6_16554444_16554623_oryzias_latipes
        Replaced < 20 ambiguous bases in >9051_orylat2_chr13_2884355_2884534_oryzias_latipes
        Replaced < 20 ambiguous bases in >34943_orylat2_chr22_16084127_16084307_oryzias_latipes
        Replaced < 20 ambiguous bases in >38735_orylat2_chr24_21857307_21857486_oryzias_latipes
        Replaced < 20 ambiguous bases in >51660_orylat2_chr6_21106729_21106909_oryzias_latipes
        Replaced < 20 ambiguous bases in >24999_orylat2_chr19_9348234_9348414_oryzias_latipes
        Replaced < 20 ambiguous bases in >30337_orylat2_chr21_9465044_9465224_oryzias_latipes
        Replaced < 20 ambiguous bases in >24740_orylat2_chr19_8765368_8765547_oryzias_latipes
        Replaced < 20 ambiguous bases in >31502_orylat2_chr21_23902474_23902654_oryzias_latipes
        Replaced < 20 ambiguous bases in >9168_orylat2_chr13_4200208_4200388_oryzias_latipes
        Replaced < 20 ambiguous bases in >9652_orylat2_chr13_9467290_9467470_oryzias_latipes
        Replaced < 20 ambiguous bases in >10434_orylat2_chr13_14895221_14895401_oryzias_latipes
        Replaced < 20 ambiguous bases in >43567_orylat2_chr3_36062742_36062922_oryzias_latipes
        Replaced < 20 ambiguous bases in >15831_orylat2_chr15_20209844_20210024_oryzias_latipes
        Replaced < 20 ambiguous bases in >37805_orylat2_chr24_6772475_6772654_oryzias_latipes
        Replaced < 20 ambiguous bases in >64587_orylat2_ultracontig115_101466_101645_oryzias_latipes
        Replaced < 20 ambiguous bases in >65522_orylat2_ultracontig182_2850109_2850289_oryzias_latipes
        Replaced < 20 ambiguous bases in >13793_orylat2_chr15_666103_666283_oryzias_latipes
        Replaced < 20 ambiguous bases in >15263_orylat2_chr15_18813667_18813846_oryzias_latipes
        Replaced < 20 ambiguous bases in >10268_orylat2_chr13_14227519_14227698_oryzias_latipes
        Replaced < 20 ambiguous bases in >22459_orylat2_chr17_28292288_28292468_oryzias_latipes
        Replaced < 20 ambiguous bases in >58935_orylat2_chr9_15194502_15194681_oryzias_latipes
        Replaced < 20 ambiguous bases in >13009_orylat2_chr14_19750656_19750836_oryzias_latipes
        Replaced < 20 ambiguous bases in >1320_orylat2_chr1_21206085_21206265_oryzias_latipes
        Replaced < 20 ambiguous bases in >12390_orylat2_chr14_14557181_14557360_oryzias_latipes
        Replaced < 20 ambiguous bases in >53011_orylat2_chr7_9680131_9680310_oryzias_latipes
        Replaced < 20 ambiguous bases in >48673_orylat2_chr5_28952087_28952266_oryzias_latipes
        Replaced < 20 ambiguous bases in >28658_orylat2_chr20_14628555_14628734_oryzias_latipes
        Replaced < 20 ambiguous bases in >65101_orylat2_ultracontig182_1561600_1561780_oryzias_latipes
        Replaced < 20 ambiguous bases in >53067_orylat2_chr7_9724225_9724404_oryzias_latipes
        Replaced < 20 ambiguous bases in >18131_orylat2_chr16_8737432_8737611_oryzias_latipes
    Getting osteoglossum_bicirrhosum reads...
    Getting pantodon_buchholzi reads...
        Replaced < 20 ambiguous bases in >53979_orylat2_chr7_15107874_15108054_pantodon_buchholzi
    Getting polypterus_senegalus reads...
    Getting pundamilia_nyererei* reads...
    Getting salvelinus_fontinalis reads...
    Getting strophidon_sathete reads...
    Getting taenianotus_triacanthus reads...
        Replaced < 20 ambiguous bases in >9652_orylat2_chr13_9467290_9467470_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >36029_orylat2_chr23_4526475_4526655_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >5240_orylat2_chr11_4468135_4468314_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >41131_orylat2_chr3_22104665_22104845_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >8573_orylat2_chr12_23455521_23455701_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >54861_orylat2_chr7_24902228_24902408_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >47078_orylat2_chr5_7437057_7437237_taenianotus_triacanthus
        Replaced < 20 ambiguous bases in >42136_orylat2_chr3_28115321_28115501_taenianotus_triacanthus
    Getting takifugu_rupribes* reads...
        Replaced < 20 ambiguous bases in >56271_orylat2_chr8_10125366_10125546_takifugu_rupribes
        Replaced < 20 ambiguous bases in >17466_orylat2_chr16_1253758_1253937_takifugu_rupribes
        Replaced < 20 ambiguous bases in >63754_orylat2_scaffold6244_2769_2948_takifugu_rupribes
        Replaced < 20 ambiguous bases in >51660_orylat2_chr6_21106729_21106909_takifugu_rupribes
        Replaced < 20 ambiguous bases in >5503_orylat2_chr11_6725896_6726076_takifugu_rupribes
        Replaced < 20 ambiguous bases in >1607_orylat2_chr1_22461490_22461669_takifugu_rupribes
        Replaced < 20 ambiguous bases in >51124_orylat2_chr6_15729949_15730129_takifugu_rupribes
    Getting tetraodon_nigroviridis* reads...
        Replaced < 20 ambiguous bases in >20115_orylat2_chr16_28692769_28692948_tetraodon_nigroviridis
        Replaced < 20 ambiguous bases in >7489_orylat2_chr12_8416389_8416569_tetraodon_nigroviridis
    

March 10, 2012

MAFFT as alignment method

  • add in MAFFT as alignment engine and re-align:

    python ~/Git/brant/phyluce/bin/align/seqcap_align_2.py \
        missing-data-plus-genome.fasta mafft/nexus 27 \
        --notstrict --aligner mafft --multiprocessing
    
  • add in gaps for missing taxa:

    python ~/Git/brant/phyluce/bin/align/add_gaps_for_missing_taxa.py \
        nexus  nexus-with-gaps\
        ../missing-data-plus-genome.out \
        ../missing-data-plus-genome.notstrict
    
  • convert to fasta:

    python ~/Git/brant/phyluce/bin/align/convert_nexus_to_fasta.py \
        nexus-with-gaps fasta-with-gaps
    
  • check align stats:

    python ~/Git/brant/phyluce/bin/align/get_align_summary_data.py mafft/nexus
    
    Lengths
    -----
    Total length(aln)        149366
    Average length(aln)      304.207739308
    95 CI length(aln)        15.9533489321
    
    Taxa
    -----
    Average(taxa)            20.533604888
    95 CI(taxa)              0.373361274487
    min(taxa)                3
    max(taxa)                27
    Count(taxa:# alns)       {3: 2, 5: 1, 6: 2, 7: 3, 8: 2, 9: 5, 10: 8, 11: 3, 12: 4, 13: 5, 14: 11, 15: 11, 16: 17, 17: 14, 18: 20, 19: 34, 20: 44, 21: 55, 22: 62, 23: 70, 24: 66, 25: 32, 26: 17, 27: 3}
    
    Base composition
    -----
    Bases                    {'A': 700435, 'C': 566549, '-': 480056, 'T': 709156, 'G': 555093}
    Sum(all)                 3011289
    Sum(nucleotide only)     2531233
    

July 7, 2012

  • setup mafft alignments for raxml:

    python ~/Git/brant/phyluce/bin/align/nexus_to_concatenated_phylip.py \
      nexus-with-gaps \
      raxml/malfaro-fish-481-loci-missing.phylip
    
  • run on @ketchup:

    rsync -avz ./ [email protected]:/home/bcf/working/malfaro-fish/raxml/
    
  • run raxml on data w/ 500 bootreps:

    raxmlHPC-PTHREADS -f a -m GTRGAMMA -x 18802 -p 787588 -N 500 -s malfaro-fish-481-loci-missing.phylip -n bstrap500 -T 12
    
  • run mraic for:

    python ~/Git/brant/seqcap/Alignment/convert_nexus_to_phylip.py \
        --input nexus-with-gaps --output phylip-with-gaps --shorten-name --positions '-2,-1'
    
    python ~/Git/brant/seqcap/Phylo/run_mraic.py --input phylip --output aicc
    
  • convert to mrbayes format:

    python ~/Git/brant/bayes-phylo-tools/nexus_to_concat.py \
        --models output_models.txt --aligns nexus-with-gaps/ \
        --concat mrbayes/nexus-with-gaps.mrbayes.nexus \
        --mr-bayes --interleave --unlink
    
    Models were not estimated for:
    
    27931_orylat2_chr20_11066287_11066466
    2775_orylat2_chr10_5723437_5723616
    
  • create new folder without those two loci:

    mkdir nexus-with-gaps-minus-two
    rm 27931_orylat2_chr20_11066287_11066466.*
    rm 2775_orylat2_chr10_5723437_5723616.*
    
  • convert to concatenated phylip:

    python ~/Git/brant/phyluce/bin/align/nexus_to_concatenated_phylip.py \
        nexus-with-gaps-minus-two/ \
        raxml-minus-two/malfaro-fish-489-loci-missing-minus-two.phylip
    
  • sync to @ketchup:

    rsync -avz ./ [email protected]:/home/bcf/working/malfaro-fish/raxml-minus-two/
    
  • rerun raxml:

    raxmlHPC-PTHREADS -f a -m GTRGAMMA -x 18802 -p 787588 -N 500 -s malfaro-fish-489-loci-missing-minus-two.phylip -n bstrap500 -T 12
    
  • run the equivalent data through MrBayes for 5 M iterations

8 August, 2012

Cloudforest

  • pare down phylip data to include only those loci containing polypterus and only those loci longer than 50 bp (this => 298 loci):

    python ~/git/phyluce/bin/align/get_all_locus_lengths.py \
        phylip-with-gaps \
        --input-format phylip \
        --containing-data-for pol_sen \
        --output phylip-with-gaps-polypterus \
        --min-length 50
    
  • try cloudforest on fish data with incomplete matrix:

    python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \
        phylip-with-gaps-polypterus \
        cloudforest-output \
        genetrees \
        $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \
        --parallelism multiprocessing --cores 20
    
  • gin up species tree:

    python ~/git/phyluce/bin/genetrees/phybase.py -i genetrees.tre -o pol_sen -g -c 7
    
  • throw on some bootreps:

    python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \
        phylip-with-gaps-polypterus \
        cloudforest-output \
        bootstraps \
        $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \
        --parallelism multiprocessing \
        --cores 20 --bootreps 100 --genetrees cloudforest-output/genetrees.tre
    
  • get bootrep tree:

    python ~/git/phyluce/bin/genetrees/phybase.py -i 100-bootreps.tree -o pol_sen --bootstraps --sorted --cores 12
    

13 September 2012

  • pare down phylip data to include only those loci containing polypterus and acipenser and only those loci longer than 50 bp (this => 298 loci):

    python ~/git/phyluce/bin/align/get_all_locus_lengths.py \
        ../../phylip-with-gaps \
        --input-format phylip \
        --containing-data-for pol_sen aci_ful \
        --output phylip-with-gaps-polypterus-and-acipenser \
        --min-length 50
    
  • run genetrees:

    python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \
        phylip-with-gaps-polypterus-and-acipenser \
        phylip-with-gaps-polypterus-and-acipenser-output/ \
        genetrees $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \
        --parallelism multiprocessing \
        --cores 20
    
  • rename:

    mv genetrees.tree phylip-with-gaps-polypterus-and-acipenser-output-genetrees.tre
    
  • run bootreps:

    python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \
        phylip-with-gaps-polypterus-and-acipenser \
        phylip-with-gaps-polypterus-and-acipenser-output \
        bootstraps $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \
        --parallelism multiprocessing \
        --cores 20 \
        --bootreps 100 \
        --genetrees phylip-with-gaps-polypterus-and-acipenser-output/genetrees.tre
    
  • rename:

    mv 100-bootreps.tree phylip-with-gaps-polypterus-and-acipenser-output-100-bootreps.tree
    
  • get STAR tree:

    python ~/git/phyluce/bin/genetrees/phybase.py \
        -i phylip-with-gaps-polypterus-and-acipenser-output-100-bootreps.tree \
        -o pol_sen \
        --bootstraps \
        --sorted \
        --cores 12
    

24 September 2012

  • run 1000 bootstrap reps against the files containing data for polypterus and acipenser:

    python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \
        phylip-with-gaps-polypterus-and-acipenser \
        cloudforest-phylip-with-gaps-polypterus-and-acipenser-output \
        bootstraps $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \
        --parallelism multiprocessing --cores 12 --bootreps 1000 \
        --genetrees cloudforest-phylip-with-gaps-polypterus-and-acipenser-output/phylip-with-gaps-polypterus-and-acipenser-output-genetrees.tre
    
  • convert mrbayes nexus file to phylip

  • manually alter sequence names to match raxml phylip files and individual phylip files

  • copy partition data from mrbayes file to new file and manually convert to raxml partition format

  • run raxml partitioned analysis:

    ~/git/brant/raxml/raxmlHPC-PTHREADS-SSE3 -f a -m GTRGAMMA -x 18802 -p 787588 -N 500 -s nexus-with-gaps.mrbayes.phylip -q nexus-with-gaps.partitions -n bstrap500 -T 7
    

25 September 2012

  • run phybase (STAR) against the resulting bootreps and generate a consensus tree:

    python ~/git/phyluce/bin/genetrees/phybase.py -i 1000-bootreps.tree -o pol_sen --bootstraps --sorted --cores 12
    
  • run phybase against the existing gene trees:

    python ~/git/phyluce/bin/genetrees/phybase.py -i phylip-with-gaps-polypterus-and-acipenser-output-genetrees.tre -o pol_sen -g -c 12
    
  • generate trees from STAR treee and 1000 bootreps:

    /home/bcf/git/raxml/raxmlHPC-SSE3 -m GTRCAT -f b -t star-tree.tre -z 1000-bootreps.star.trees -n test
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment