download files from s3 using 3hub
unzip respective files (started w/ Bin001.zip)
run process_reads.py (now part of https://github.com/faircloth-lab/illumiprocessor/):
python ~/git/brant/seqcap/Assembly/process_reads.py
run velvetoptimiser:
VelvetOptimiser --s 55 -e 71 --f '-short -fastq ../n-less/Hexanchus-griseus.fastq' \ --v --t 4
deviations:
VelvetOptimiser --s 55 -e 71 --f '-short -fastq ../n-less/Hexanchus-griseus.fastq' \ --v --t 4 (went shorter range than normal)
after assembly, symlink all assemblies into fish-data/contigs and:
python ~/Git/brant/seqcap/Assembly/match_contigs_to_probes.py \ /Volumes/Data3/fish-seqcap/contig \ /Users/bcf/Git/brant/seqcap/Fish/Data/Probes/finalConserved.bufferTo180.probes.fa \ /Volumes/Data3/fish-seqcap/lastz
finish assemblies by hand across all taxa in data set (this is now automated by assemblo.py)
for all data in /Volumes/Data3/fish/fish-seqcap/Bin_00*/ compute the coverage for various contigs using Amos:
~/Source/amos-3.0.0/src/Bank/bank-transact -m velvet_asm.afg -b velvet_asm.bnk -c ~/Source/amos-3.0.0/src/Validation/analyze-read-depth velvet_asm.bnk -d
go ahead and get # of non-dupe contigs for each taxon from /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite:
select sum(acanthurus_japonicus), sum(acipenser_fulvescens), sum(amia_calva), sum(anchoa_compressa), sum(antennarius_striatus), sum(astyanax_fasciatus), sum(carcharodon_carcharias), sum(danio_rerio), sum(diaphus_theta), sum(heterodontus_francisci), sum(hexanchus_griseus), sum(lampris_guttatus), sum(lepisosteus_sp), sum(megalops_sp), sum(myliobatis_californica), sum(osteoglossum_bicirrhosum), sum(pantodon_buchholzi), sum(polypterus_senegalus), sum(protopterus_annectens), sum(salvelinus_fontinalis), sum(sphyrna_mokarran), sum(strophidon_sathete), sum(taenianotus_triacanthus), sum(umbra_limi) from matches;
output data to compute coverage across enriched contigs. In /Volumes/Data3/fish/fish-seqcap/lastz:
mkdir loci sqlite probe.matches.sqlite sqlite> .output ./loci/salvelinus_fontinalis.loci sqlite> select salvelinus_fontinalis from match_map where salvelinus_fontinalis is null; sqlite> .output stdout
in /Volumes/Data3/fish/fish-seqcap/:
mkdir fish-uce-coverage mv /Volumes/Data3/fish/fish-seqcap/lastz/loci/* fish-uce-coverage
convert files to iid format:
~/Source/amos-3.0.0/src/Utils/cvgStat -b ../Bin_001/assembly/umbra-limi/velvet_asm.bnk > umbra_limi.cvg python get_iids_from_cvg_stat.py \ umbra_limi.cvg umbra_limi.loci --output umbra-limi.iid ~/Source/amos-3.0.0/src/Validation/analyze-read-depth ../Bin_001/assembly/umbra-limi/velvet_asm.bnk -d -I umbra-limi.iid
get read stats for UCE contigs:
python ~/git/brant/phyluce/bin/assembly/get_contig_lengths_for_all_uce_loci.py \ /Volumes/Data3/fish/fish-seqcap/contig \ /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite \ /Users/bcf/Dropbox/Research/alfaro/lab/illumina-run/match-count.config \ 'wag plus'
lastz probes to themselves:
python /Users/bcf/Git/brant/phyluce/bin/share/easy_lastz.py \ --target /Users/bcf/Git/brant/seqcap/Fish/Data/Probes/finalConserved.fa \ --query /Users/bcf/Git/brant/seqcap/Fish/Data/Probes/finalConserved.fa \ --identity 85 --output /Users/bcf/Git/brant/seqcap/Non-repo/Manuscripts/Fish/Simulation/finalConserved.bufferTo180.probes.fa
Generate lastz alignments of probes to genomes [ for lepOcu1, gadMor1, latCha1, dicLab1 ]:
for i in lepOcu1 gadMor1 latCha1 dicLab1; do mkdir -p /Volumes/Data/Genomes/Fish/$i/lastz/; python /Users/bcf/Git/brant/phyluce/bin/share/run_lastz.py \ --target /Volumes/Data/Genomes/Fish/$i/$i.2bit \ --query /Users/bcf/Git/brant/seqcap/Non-repo/Manuscripts/Fish/Simulation/finalConserved.bufferTo180.probes.2bit \ --output /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \ --nprocs 7 --identity=80 --coverage=83 --huge; done for i in danRer6 fr2 gasAcu1 hapBur1 neoBri1 oreNil1 oryLat2 punNye1 tetNig2; do mkdir -p /Volumes/Data/Genomes/Fish/$i/lastz/; python /Users/bcf/Git/brant/phyluce/bin/share/run_lastz.py \ --target /Volumes/Data/Genomes/Fish/$i/$i.2bit \ --query /Users/bcf/Git/brant/seqcap/Non-repo/Manuscripts/Fish/Simulation/finalConserved.bufferTo180.probes.2bit \ --output /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \ --nprocs 7 --identity=80 --coverage=83 --huge; done
in /Users/bcf/Working/fish-phylogeny/fake:
mkdir fasta contig lastz
generate data for genome-enabled organisms [ for lepOcu1, gadMor1, latCha1, dicLab1 ], adding to those data we already have:
for i in danRer6 fr2 gasAcu1 hapBur1 neoBri1 oreNil1 oryLat2 punNye1 tetNig2 lepOcu1 gadMor1 latCha1; do python /Users/bcf/Git/brant/phyluce/bin/share/get_fake_velvet_contigs_from_genomes.py \ /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \ /Volumes/Data/Genomes/Fish/$i/$i.2bit \ --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \ --fasta /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-$i-500.fasta \ --flank 500 --fish; done
this line is for testing purposes:
for i in gadMor1; do python /Users/bcf/Git/brant/phyluce/bin/share/get_fake_velvet_contigs_from_genomes.py \ /Volumes/Data/Genomes/Fish/$i/lastz/finalConserved-$i.lastz \ /Volumes/Data/Genomes/Fish/$i/$i.2bit \ --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \ --fasta /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-$i-500.fasta \ --flank 500 --fish; done
- create a contig folder somewhere w/ symlinks to these fasta files,
using appropriate names. for instance:
# /Users/bcf/Working/fish-phylogeny/ ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-danRer6-500.fasta danio-rerio.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-fr2-500.fasta takifugu-rupribes.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-gasAcu1-500.fasta gasterosteus-aculeatus.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-hapBur1-500.fasta haplochromis-burtoni.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-neoBri1-500.fasta neolamprologus-brichardi.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-oreNil1-500.fasta oreochromis-niloticus.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-oryLat2-500.fasta oryzias-latipes.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-punNye1-500.fasta pundamilia-nyererei.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-tetNig2-500.fasta tetraodon-nigroviridis.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-lepOcu1-500.fasta lepisosteus-oculatus.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-gadMor1-500.fasta gadus-morhua.contigs.fasta ln -s /Users/bcf/Working/fish-phylogeny/fake/fasta/fake-velvet-finalConserved-latCha1-500.fasta latimeria-chalumnae.contigs.fasta
re-match the probes to the "fake" velvet contigs:
python /Users/bcf/Git/brant/phyluce/bin/assembly/match_contigs_to_probes.py \ /Users/bcf/Working/fish-phylogeny/fake/contig/ \ /Users/bcf/Working/fish-phylogeny/finalConserved.fa \ /Users/bcf/Working/fish-phylogeny/fake/lastz/ \ --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \ --identity=80 --coverage=67 Processing: Getting dupes danio_rerio: 457 (97.03%) uniques of 471 contigs, 0 dupe probes, 2 dupe probe matches, 12 dupe node matches gadus_morhua: 373 (88.60%) uniques of 421 contigs, 0 dupe probes, 1 dupe probe matches, 10 dupe node matches gasterosteus_aculeatus: 468 (97.30%) uniques of 481 contigs, 0 dupe probes, 1 dupe probe matches, 12 dupe node matches haplochromis_burtoni: 457 (97.03%) uniques of 471 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches latimeria_chalumnae: 272 (95.44%) uniques of 285 contigs, 0 dupe probes, 1 dupe probe matches, 12 dupe node matches lepisosteus_oculatus: 410 (96.93%) uniques of 423 contigs, 0 dupe probes, 1 dupe probe matches, 12 dupe node matches neolamprologus_brichardi: 456 (97.02%) uniques of 470 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches oreochromis_niloticus: 461 (97.05%) uniques of 475 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches oryzias_latipes: 461 (97.05%) uniques of 475 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches pundamilia_nyererei: 461 (97.05%) uniques of 475 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches takifugu_rupribes: 469 (97.10%) uniques of 483 contigs, 0 dupe probes, 0 dupe probe matches, 14 dupe node matches tetraodon_nigroviridis: 471 (97.52%) uniques of 483 contigs, 0 dupe probes, 0 dupe probe matches, 12 dupe node matches
rename that file [TODO - add rename to code]:
mv /Users/bcf/Working/fish-phylogeny/fake/lastz/probe.matches.sqlite \ /Users/bcf/Working/fish-phylogeny/fake/lastz/genome.matches.sqlite
Because we change parameters in matching script, re-run match_contigs_to_probes.py:
python /Users/bcf/Git/brant/phyluce/bin/assembly/match_contigs_to_probes.py \ /Volumes/Data3/fish/fish-seqcap/contig \ /Users/bcf/Working/fish-phylogeny/finalConserved.fa \ /Volumes/Data3/fish/fish-seqcap/lastz \ --dupefile /Users/bcf/Working/fish-phylogeny/finalConserved-to-finalConserved.lastz \ --identity=80 --coverage=67 Processing: Getting dupes acanthurus_japonicus: 448 (73.08%) uniques of 613 contigs, 0 dupe probes, 4 dupe probe matches, 56 dupe node matches acipenser_fulvescens: 172 (29.81%) uniques of 577 contigs, 0 dupe probes, 2 dupe probe matches, 258 dupe node matches amia_calva: 360 (64.06%) uniques of 562 contigs, 0 dupe probes, 3 dupe probe matches, 72 dupe node matches anchoa_compressa: 292 (54.78%) uniques of 533 contigs, 0 dupe probes, 1 dupe probe matches, 83 dupe node matches antennarius_striatus: 415 (87.55%) uniques of 474 contigs, 0 dupe probes, 4 dupe probe matches, 22 dupe node matches astyanax_fasciatus: 355 (65.38%) uniques of 543 contigs, 0 dupe probes, 2 dupe probe matches, 61 dupe node matches carcharodon_carcharias: 148 (51.93%) uniques of 285 contigs, 0 dupe probes, 1 dupe probe matches, 20 dupe node matches danio_rerio: 381 (73.55%) uniques of 518 contigs, 0 dupe probes, 3 dupe probe matches, 53 dupe node matches diaphus_theta: 399 (68.32%) uniques of 584 contigs, 0 dupe probes, 3 dupe probe matches, 29 dupe node matches heterodontus_francisci: 137 (54.37%) uniques of 252 contigs, 0 dupe probes, 0 dupe probe matches, 18 dupe node matches hexanchus_griseus: 146 (24.75%) uniques of 590 contigs, 0 dupe probes, 1 dupe probe matches, 17 dupe node matches lampris_guttatus: 412 (84.77%) uniques of 486 contigs, 0 dupe probes, 2 dupe probe matches, 29 dupe node matches lepisosteus_sp: 375 (55.56%) uniques of 675 contigs, 0 dupe probes, 3 dupe probe matches, 88 dupe node matches megalops_sp: 243 (30.92%) uniques of 786 contigs, 0 dupe probes, 1 dupe probe matches, 373 dupe node matches myliobatis_californica: 148 (53.62%) uniques of 276 contigs, 0 dupe probes, 1 dupe probe matches, 24 dupe node matches osteoglossum_bicirrhosum: 270 (41.99%) uniques of 643 contigs, 0 dupe probes, 1 dupe probe matches, 250 dupe node matches pantodon_buchholzi: 272 (58.37%) uniques of 466 contigs, 0 dupe probes, 2 dupe probe matches, 124 dupe node matches polypterus_senegalus: 299 (51.91%) uniques of 576 contigs, 0 dupe probes, 5 dupe probe matches, 20 dupe node matches protopterus_annectens: 70 (9.40%) uniques of 745 contigs, 0 dupe probes, 1 dupe probe matches, 0 dupe node matches salvelinus_fontinalis: 159 (14.22%) uniques of 1118 contigs, 0 dupe probes, 1 dupe probe matches, 636 dupe node matches sphyrna_mokarran: 156 (40.00%) uniques of 390 contigs, 0 dupe probes, 1 dupe probe matches, 21 dupe node matches strophidon_sathete: 279 (27.71%) uniques of 1007 contigs, 0 dupe probes, 3 dupe probe matches, 158 dupe node matches taenianotus_triacanthus: 442 (62.08%) uniques of 712 contigs, 0 dupe probes, 3 dupe probe matches, 69 dupe node matches umbra_limi: 404 (36.43%) uniques of 1109 contigs, 0 dupe probes, 2 dupe probe matches, 76 dupe node matches
then, with missing data:
python ~/Git/brant/phyluce/bin/assembly/get_match_counts.py \ /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite \ /Users/bcf/Working/fish-phylogeny/match-count.config \ --extend /Users/bcf/Working/fish-phylogeny/fake/lastz/genome.matches.sqlite \ 'group2' \ --output /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.out \ --notstrict # this is all but brook trout. [group2] acanthurus_japonicus amia_calva diaphus_theta antennarius_striatus astyanax_fasciatus danio_rerio lampris_guttatus umbra_limi taenianotus_triacanthus acipenser_fulvescens pantodon_buchholzi polypterus_senegalus osteoglossum_bicirrhosum anchoa_compressa megalops_sp strophidon_sathete salvelinus_fontinalis takifugu_rupribes* gasterosteus_aculeatus* haplochromis_burtoni* neolamprologus_brichardi* oreochromis_niloticus* oryzias_latipes* pundamilia_nyererei* tetraodon_nigroviridis* lepisosteus_oculatus* gadus_morhua*
turn those data into a giant fasta for alignment:
python /Users/bcf/Git/brant/phyluce/bin/assembly/get_fastas_from_match_counts.py \ /Volumes/Data3/fish/fish-seqcap/contig \ /Volumes/Data3/fish/fish-seqcap/lastz/probe.matches.sqlite \ /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.out \ --extend-db /Users/bcf/Working/fish-phylogeny/fake/lastz/genome.matches.sqlite \ --extend-dir /Users/bcf/Working/fish-phylogeny/fake/contig/ \ --output /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.fasta \ --notstrict /Users/bcf/Working/fish-phylogeny/taxon-groups/missing-data-plus-genome/missing-data-plus-genome.notstrict Getting acanthurus_japonicus reads... Replaced < 20 ambiguous bases in >29510_orylat2_chr20_19834163_19834343_acanthurus_japonicus Replaced < 20 ambiguous bases in >64849_orylat2_ultracontig117_111598_111777_acanthurus_japonicus Replaced < 20 ambiguous bases in >19592_orylat2_chr16_21174511_21174690_acanthurus_japonicus Replaced < 20 ambiguous bases in >34879_orylat2_chr22_16020600_16020780_acanthurus_japonicus Replaced < 20 ambiguous bases in >37317_orylat2_chr24_2744123_2744302_acanthurus_japonicus Replaced < 20 ambiguous bases in >28115_orylat2_chr20_11834906_11835085_acanthurus_japonicus Getting acipenser_fulvescens reads... Replaced < 20 ambiguous bases in >28407_orylat2_chr20_12253203_12253383_acipenser_fulvescens Replaced < 20 ambiguous bases in >51743_orylat2_chr6_22173752_22173931_acipenser_fulvescens Replaced < 20 ambiguous bases in >65920_orylat2_ultracontig257_170745_170924_acipenser_fulvescens Getting amia_calva reads... Replaced < 20 ambiguous bases in >59208_orylat2_chr9_15660075_15660255_amia_calva Replaced < 20 ambiguous bases in >31907_orylat2_chr21_24592457_24592637_amia_calva Getting anchoa_compressa reads... Getting antennarius_striatus reads... Replaced < 20 ambiguous bases in >36257_orylat2_chr23_8323776_8323955_antennarius_striatus Replaced < 20 ambiguous bases in >24999_orylat2_chr19_9348234_9348414_antennarius_striatus Replaced < 20 ambiguous bases in >31353_orylat2_chr21_23054326_23054505_antennarius_striatus Getting astyanax_fasciatus reads... Replaced < 20 ambiguous bases in >19564_orylat2_chr16_21073470_21073650_astyanax_fasciatus Replaced < 20 ambiguous bases in >24999_orylat2_chr19_9348234_9348414_astyanax_fasciatus Replaced < 20 ambiguous bases in >45039_orylat2_chr4_15269599_15269778_astyanax_fasciatus Replaced < 20 ambiguous bases in >29060_orylat2_chr20_16776517_16776697_astyanax_fasciatus Replaced < 20 ambiguous bases in >50797_orylat2_chr6_13673431_13673611_astyanax_fasciatus Replaced < 20 ambiguous bases in >42411_orylat2_chr3_28384904_28385084_astyanax_fasciatus Replaced < 20 ambiguous bases in >40337_orylat2_chr3_18054080_18054259_astyanax_fasciatus Getting danio_rerio reads... Replaced < 20 ambiguous bases in >3037_orylat2_chr10_7436728_7436908_danio_rerio Replaced < 20 ambiguous bases in >8006_orylat2_chr12_12703194_12703373_danio_rerio Replaced < 20 ambiguous bases in >6581_orylat2_chr11_22357337_22357516_danio_rerio Getting diaphus_theta reads... Replaced < 20 ambiguous bases in >43998_orylat2_chr4_5445911_5446091_diaphus_theta Getting gadus_morhua* reads... Replaced < 20 ambiguous bases in >28133_orylat2_chr20_11859433_11859612_gadus_morhua Getting gasterosteus_aculeatus* reads... Getting haplochromis_burtoni* reads... Getting lampris_guttatus reads... Replaced < 20 ambiguous bases in >7602_orylat2_chr12_9462216_9462396_lampris_guttatus Replaced < 20 ambiguous bases in >15812_orylat2_chr15_20169029_20169209_lampris_guttatus Replaced < 20 ambiguous bases in >9168_orylat2_chr13_4200208_4200388_lampris_guttatus Replaced < 20 ambiguous bases in >40283_orylat2_chr3_18016128_18016307_lampris_guttatus Getting lepisosteus_oculatus* reads... Getting megalops_sp reads... Getting neolamprologus_brichardi* reads... Getting oreochromis_niloticus* reads... Getting oryzias_latipes* reads... Replaced < 20 ambiguous bases in >54500_orylat2_chr7_21324258_21324438_oryzias_latipes Replaced < 20 ambiguous bases in >24653_orylat2_chr19_8255095_8255274_oryzias_latipes Replaced < 20 ambiguous bases in >17869_orylat2_chr16_5174650_5174830_oryzias_latipes Replaced < 20 ambiguous bases in >17424_orylat2_chr16_1035320_1035500_oryzias_latipes Replaced < 20 ambiguous bases in >22542_orylat2_chr17_28477535_28477714_oryzias_latipes Replaced < 20 ambiguous bases in >59208_orylat2_chr9_15660075_15660255_oryzias_latipes Replaced < 20 ambiguous bases in >37000_orylat2_chr23_22404477_22404657_oryzias_latipes Replaced < 20 ambiguous bases in >52857_orylat2_chr7_9011611_9011791_oryzias_latipes Replaced < 20 ambiguous bases in >15812_orylat2_chr15_20169029_20169209_oryzias_latipes Replaced < 20 ambiguous bases in >44335_orylat2_chr4_8453372_8453552_oryzias_latipes Replaced < 20 ambiguous bases in >30325_orylat2_chr21_9456497_9456676_oryzias_latipes Replaced < 20 ambiguous bases in >33912_orylat2_chr22_10311301_10311481_oryzias_latipes Replaced < 20 ambiguous bases in >1199_orylat2_chr1_20869179_20869358_oryzias_latipes Replaced < 20 ambiguous bases in >65943_orylat2_ultracontig257_377659_377839_oryzias_latipes Replaced < 20 ambiguous bases in >44706_orylat2_chr4_12103391_12103571_oryzias_latipes Replaced < 20 ambiguous bases in >52576_orylat2_chr7_7401879_7402059_oryzias_latipes Replaced < 20 ambiguous bases in >12379_orylat2_chr14_14534981_14535161_oryzias_latipes Replaced < 20 ambiguous bases in >50797_orylat2_chr6_13673431_13673611_oryzias_latipes Replaced < 20 ambiguous bases in >36825_orylat2_chr23_19660644_19660823_oryzias_latipes Replaced < 20 ambiguous bases in >13680_orylat2_chr14_31599989_31600168_oryzias_latipes Replaced < 20 ambiguous bases in >30113_orylat2_chr21_8630533_8630712_oryzias_latipes Replaced < 20 ambiguous bases in >36609_orylat2_chr23_15322876_15323055_oryzias_latipes Replaced < 20 ambiguous bases in >59216_orylat2_chr9_15672674_15672854_oryzias_latipes Replaced < 20 ambiguous bases in >7561_orylat2_chr12_9045138_9045317_oryzias_latipes Replaced < 20 ambiguous bases in >51400_orylat2_chr6_16554444_16554623_oryzias_latipes Replaced < 20 ambiguous bases in >9051_orylat2_chr13_2884355_2884534_oryzias_latipes Replaced < 20 ambiguous bases in >34943_orylat2_chr22_16084127_16084307_oryzias_latipes Replaced < 20 ambiguous bases in >38735_orylat2_chr24_21857307_21857486_oryzias_latipes Replaced < 20 ambiguous bases in >51660_orylat2_chr6_21106729_21106909_oryzias_latipes Replaced < 20 ambiguous bases in >24999_orylat2_chr19_9348234_9348414_oryzias_latipes Replaced < 20 ambiguous bases in >30337_orylat2_chr21_9465044_9465224_oryzias_latipes Replaced < 20 ambiguous bases in >24740_orylat2_chr19_8765368_8765547_oryzias_latipes Replaced < 20 ambiguous bases in >31502_orylat2_chr21_23902474_23902654_oryzias_latipes Replaced < 20 ambiguous bases in >9168_orylat2_chr13_4200208_4200388_oryzias_latipes Replaced < 20 ambiguous bases in >9652_orylat2_chr13_9467290_9467470_oryzias_latipes Replaced < 20 ambiguous bases in >10434_orylat2_chr13_14895221_14895401_oryzias_latipes Replaced < 20 ambiguous bases in >43567_orylat2_chr3_36062742_36062922_oryzias_latipes Replaced < 20 ambiguous bases in >15831_orylat2_chr15_20209844_20210024_oryzias_latipes Replaced < 20 ambiguous bases in >37805_orylat2_chr24_6772475_6772654_oryzias_latipes Replaced < 20 ambiguous bases in >64587_orylat2_ultracontig115_101466_101645_oryzias_latipes Replaced < 20 ambiguous bases in >65522_orylat2_ultracontig182_2850109_2850289_oryzias_latipes Replaced < 20 ambiguous bases in >13793_orylat2_chr15_666103_666283_oryzias_latipes Replaced < 20 ambiguous bases in >15263_orylat2_chr15_18813667_18813846_oryzias_latipes Replaced < 20 ambiguous bases in >10268_orylat2_chr13_14227519_14227698_oryzias_latipes Replaced < 20 ambiguous bases in >22459_orylat2_chr17_28292288_28292468_oryzias_latipes Replaced < 20 ambiguous bases in >58935_orylat2_chr9_15194502_15194681_oryzias_latipes Replaced < 20 ambiguous bases in >13009_orylat2_chr14_19750656_19750836_oryzias_latipes Replaced < 20 ambiguous bases in >1320_orylat2_chr1_21206085_21206265_oryzias_latipes Replaced < 20 ambiguous bases in >12390_orylat2_chr14_14557181_14557360_oryzias_latipes Replaced < 20 ambiguous bases in >53011_orylat2_chr7_9680131_9680310_oryzias_latipes Replaced < 20 ambiguous bases in >48673_orylat2_chr5_28952087_28952266_oryzias_latipes Replaced < 20 ambiguous bases in >28658_orylat2_chr20_14628555_14628734_oryzias_latipes Replaced < 20 ambiguous bases in >65101_orylat2_ultracontig182_1561600_1561780_oryzias_latipes Replaced < 20 ambiguous bases in >53067_orylat2_chr7_9724225_9724404_oryzias_latipes Replaced < 20 ambiguous bases in >18131_orylat2_chr16_8737432_8737611_oryzias_latipes Getting osteoglossum_bicirrhosum reads... Getting pantodon_buchholzi reads... Replaced < 20 ambiguous bases in >53979_orylat2_chr7_15107874_15108054_pantodon_buchholzi Getting polypterus_senegalus reads... Getting pundamilia_nyererei* reads... Getting salvelinus_fontinalis reads... Getting strophidon_sathete reads... Getting taenianotus_triacanthus reads... Replaced < 20 ambiguous bases in >9652_orylat2_chr13_9467290_9467470_taenianotus_triacanthus Replaced < 20 ambiguous bases in >36029_orylat2_chr23_4526475_4526655_taenianotus_triacanthus Replaced < 20 ambiguous bases in >5240_orylat2_chr11_4468135_4468314_taenianotus_triacanthus Replaced < 20 ambiguous bases in >41131_orylat2_chr3_22104665_22104845_taenianotus_triacanthus Replaced < 20 ambiguous bases in >8573_orylat2_chr12_23455521_23455701_taenianotus_triacanthus Replaced < 20 ambiguous bases in >54861_orylat2_chr7_24902228_24902408_taenianotus_triacanthus Replaced < 20 ambiguous bases in >47078_orylat2_chr5_7437057_7437237_taenianotus_triacanthus Replaced < 20 ambiguous bases in >42136_orylat2_chr3_28115321_28115501_taenianotus_triacanthus Getting takifugu_rupribes* reads... Replaced < 20 ambiguous bases in >56271_orylat2_chr8_10125366_10125546_takifugu_rupribes Replaced < 20 ambiguous bases in >17466_orylat2_chr16_1253758_1253937_takifugu_rupribes Replaced < 20 ambiguous bases in >63754_orylat2_scaffold6244_2769_2948_takifugu_rupribes Replaced < 20 ambiguous bases in >51660_orylat2_chr6_21106729_21106909_takifugu_rupribes Replaced < 20 ambiguous bases in >5503_orylat2_chr11_6725896_6726076_takifugu_rupribes Replaced < 20 ambiguous bases in >1607_orylat2_chr1_22461490_22461669_takifugu_rupribes Replaced < 20 ambiguous bases in >51124_orylat2_chr6_15729949_15730129_takifugu_rupribes Getting tetraodon_nigroviridis* reads... Replaced < 20 ambiguous bases in >20115_orylat2_chr16_28692769_28692948_tetraodon_nigroviridis Replaced < 20 ambiguous bases in >7489_orylat2_chr12_8416389_8416569_tetraodon_nigroviridis
add in MAFFT as alignment engine and re-align:
python ~/Git/brant/phyluce/bin/align/seqcap_align_2.py \ missing-data-plus-genome.fasta mafft/nexus 27 \ --notstrict --aligner mafft --multiprocessing
add in gaps for missing taxa:
python ~/Git/brant/phyluce/bin/align/add_gaps_for_missing_taxa.py \ nexus nexus-with-gaps\ ../missing-data-plus-genome.out \ ../missing-data-plus-genome.notstrict
convert to fasta:
python ~/Git/brant/phyluce/bin/align/convert_nexus_to_fasta.py \ nexus-with-gaps fasta-with-gaps
check align stats:
python ~/Git/brant/phyluce/bin/align/get_align_summary_data.py mafft/nexus Lengths ----- Total length(aln) 149366 Average length(aln) 304.207739308 95 CI length(aln) 15.9533489321 Taxa ----- Average(taxa) 20.533604888 95 CI(taxa) 0.373361274487 min(taxa) 3 max(taxa) 27 Count(taxa:# alns) {3: 2, 5: 1, 6: 2, 7: 3, 8: 2, 9: 5, 10: 8, 11: 3, 12: 4, 13: 5, 14: 11, 15: 11, 16: 17, 17: 14, 18: 20, 19: 34, 20: 44, 21: 55, 22: 62, 23: 70, 24: 66, 25: 32, 26: 17, 27: 3} Base composition ----- Bases {'A': 700435, 'C': 566549, '-': 480056, 'T': 709156, 'G': 555093} Sum(all) 3011289 Sum(nucleotide only) 2531233
setup mafft alignments for raxml:
python ~/Git/brant/phyluce/bin/align/nexus_to_concatenated_phylip.py \ nexus-with-gaps \ raxml/malfaro-fish-481-loci-missing.phylip
run on @ketchup:
rsync -avz ./ [email protected]:/home/bcf/working/malfaro-fish/raxml/
run raxml on data w/ 500 bootreps:
raxmlHPC-PTHREADS -f a -m GTRGAMMA -x 18802 -p 787588 -N 500 -s malfaro-fish-481-loci-missing.phylip -n bstrap500 -T 12
run mraic for:
python ~/Git/brant/seqcap/Alignment/convert_nexus_to_phylip.py \ --input nexus-with-gaps --output phylip-with-gaps --shorten-name --positions '-2,-1' python ~/Git/brant/seqcap/Phylo/run_mraic.py --input phylip --output aicc
convert to mrbayes format:
python ~/Git/brant/bayes-phylo-tools/nexus_to_concat.py \ --models output_models.txt --aligns nexus-with-gaps/ \ --concat mrbayes/nexus-with-gaps.mrbayes.nexus \ --mr-bayes --interleave --unlink Models were not estimated for: 27931_orylat2_chr20_11066287_11066466 2775_orylat2_chr10_5723437_5723616
create new folder without those two loci:
mkdir nexus-with-gaps-minus-two rm 27931_orylat2_chr20_11066287_11066466.* rm 2775_orylat2_chr10_5723437_5723616.*
convert to concatenated phylip:
python ~/Git/brant/phyluce/bin/align/nexus_to_concatenated_phylip.py \ nexus-with-gaps-minus-two/ \ raxml-minus-two/malfaro-fish-489-loci-missing-minus-two.phylip
sync to @ketchup:
rsync -avz ./ [email protected]:/home/bcf/working/malfaro-fish/raxml-minus-two/
rerun raxml:
raxmlHPC-PTHREADS -f a -m GTRGAMMA -x 18802 -p 787588 -N 500 -s malfaro-fish-489-loci-missing-minus-two.phylip -n bstrap500 -T 12
run the equivalent data through MrBayes for 5 M iterations
pare down phylip data to include only those loci containing polypterus and only those loci longer than 50 bp (this => 298 loci):
python ~/git/phyluce/bin/align/get_all_locus_lengths.py \ phylip-with-gaps \ --input-format phylip \ --containing-data-for pol_sen \ --output phylip-with-gaps-polypterus \ --min-length 50
try cloudforest on fish data with incomplete matrix:
python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \ phylip-with-gaps-polypterus \ cloudforest-output \ genetrees \ $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \ --parallelism multiprocessing --cores 20
gin up species tree:
python ~/git/phyluce/bin/genetrees/phybase.py -i genetrees.tre -o pol_sen -g -c 7
throw on some bootreps:
python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \ phylip-with-gaps-polypterus \ cloudforest-output \ bootstraps \ $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \ --parallelism multiprocessing \ --cores 20 --bootreps 100 --genetrees cloudforest-output/genetrees.tre
get bootrep tree:
python ~/git/phyluce/bin/genetrees/phybase.py -i 100-bootreps.tree -o pol_sen --bootstraps --sorted --cores 12
pare down phylip data to include only those loci containing polypterus and acipenser and only those loci longer than 50 bp (this => 298 loci):
python ~/git/phyluce/bin/align/get_all_locus_lengths.py \ ../../phylip-with-gaps \ --input-format phylip \ --containing-data-for pol_sen aci_ful \ --output phylip-with-gaps-polypterus-and-acipenser \ --min-length 50
run genetrees:
python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \ phylip-with-gaps-polypterus-and-acipenser \ phylip-with-gaps-polypterus-and-acipenser-output/ \ genetrees $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \ --parallelism multiprocessing \ --cores 20
rename:
mv genetrees.tree phylip-with-gaps-polypterus-and-acipenser-output-genetrees.tre
run bootreps:
python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \ phylip-with-gaps-polypterus-and-acipenser \ phylip-with-gaps-polypterus-and-acipenser-output \ bootstraps $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \ --parallelism multiprocessing \ --cores 20 \ --bootreps 100 \ --genetrees phylip-with-gaps-polypterus-and-acipenser-output/genetrees.tre
rename:
mv 100-bootreps.tree phylip-with-gaps-polypterus-and-acipenser-output-100-bootreps.tree
get STAR tree:
python ~/git/phyluce/bin/genetrees/phybase.py \ -i phylip-with-gaps-polypterus-and-acipenser-output-100-bootreps.tree \ -o pol_sen \ --bootstraps \ --sorted \ --cores 12
run 1000 bootstrap reps against the files containing data for polypterus and acipenser:
python ~/git/cloudforest/cloudforest/cloudforest_mpi.py \ phylip-with-gaps-polypterus-and-acipenser \ cloudforest-phylip-with-gaps-polypterus-and-acipenser-output \ bootstraps $HOME/git/cloudforest/cloudforest/binaries/PhyML3linux64 \ --parallelism multiprocessing --cores 12 --bootreps 1000 \ --genetrees cloudforest-phylip-with-gaps-polypterus-and-acipenser-output/phylip-with-gaps-polypterus-and-acipenser-output-genetrees.tre
convert mrbayes nexus file to phylip
manually alter sequence names to match raxml phylip files and individual phylip files
copy partition data from mrbayes file to new file and manually convert to raxml partition format
run raxml partitioned analysis:
~/git/brant/raxml/raxmlHPC-PTHREADS-SSE3 -f a -m GTRGAMMA -x 18802 -p 787588 -N 500 -s nexus-with-gaps.mrbayes.phylip -q nexus-with-gaps.partitions -n bstrap500 -T 7
run phybase (STAR) against the resulting bootreps and generate a consensus tree:
python ~/git/phyluce/bin/genetrees/phybase.py -i 1000-bootreps.tree -o pol_sen --bootstraps --sorted --cores 12
run phybase against the existing gene trees:
python ~/git/phyluce/bin/genetrees/phybase.py -i phylip-with-gaps-polypterus-and-acipenser-output-genetrees.tre -o pol_sen -g -c 12
generate trees from STAR treee and 1000 bootreps:
/home/bcf/git/raxml/raxmlHPC-SSE3 -m GTRCAT -f b -t star-tree.tre -z 1000-bootreps.star.trees -n test