James Taylor jxtx

Day 1: 25 June 2019

BioC2019: Where Software and Biology Connect (Martin)

Martin providing some "brief logistics"

Inference after prediction (Jeffrey "John" Leek)

aka "What do we do after we have machine learned everything"

Conference info: https://bioc2018.bioconductor.org/

My first Bioconductor meeting, and I'm not a BioC or R expert so these notes are probably going to be naïve!

Dealing with restriction fragment details

HiFive stores a fend file with information on the locations of restriction fragments in the genome. We need to get the locations of the RE sites into a BED

Keles -- Statistical Methods for profiling long range chromatin interactions from repetitive regions of the genome

Multi-mapping reads (multi-reads) are typically thrown out in many HTS analyses incuding Hi-C
- Assays predominently rely on short-read (50-150bp) so multi-reads are common
- Using ChIP-seq as an example, incorporating multi-reads finds peaks in regions where "uni-reads" do not
- e.g. Perm-seq using DHS + ChIP-seq data and multi-reads. 27.3% more peaks compared to ENCODE uniform processing pipeline
How to combine this with Hi-C data?
- Hi-C read processing
  - Typical pipelines: singletons, multi-mapping ends, low map quality, and unaligned all discarded
Evaluation of the impact of this using IMR90 and Plasmodium datasets

Why is it called Galaxy

Once upon a time there was the Genome ALignment and Annotation database or GALA, which allowed for analysis of genomic elements alongside comparative genomic information. However, this tool supported only a few analyses. What-would-be-galaxy was born from the idea of being able to easily take any existing analysis tool and quickly integrate it into this platform. But what should we call this next direction? Bob Harris suggested the use of X/Y to represent this "next dimension" of analysis. GALA + XY ⟶ GALAXY ⟶ Galaxy.

Or at least this is how I remember it.

#usegalaxy

	# Mostly based on this:
	# https://github.com/Homebrew/linuxbrew/wiki/Standalone-Installation
	# But I started with nothing (no ruby, no gcc)

	# Ruby and GCC will go here
	mkdir bootstrap

	# Get GCC 4.4 and install under bootstrap
	# We also need libstdc++ when we get to building gcc-4.9 because somebody decided it was a good idea to start writing GCC in C++
	wget http://ftp1.scientificlinux.org/linux/scientific/55/x86_64/SL/gcc44-4.4.0-6.el5.x86_64.rpm

	/**
	* usage: node scrape_gs.js USERKEY
	*
	* Determine h-index for papers published AFTER each year found in a Google
	* scholar profile. The USERKEY is found in your Google scholar citations
	* page url.
	*/

	var request = require('request');
	var cheerio = require('cheerio');

	/*
	* BLAST - Search two DNA sequences for locally maximal segment pairs. The basic
	* command syntax is
	*
	* BLAST sequence1 sequence2
	*
	* where sequence1 and sequence2 name files containing DNA sequences. Lines
	* at the beginnings of the files that don't start with 'A', 'C', 'T' or 'G'
	* are discarded. Thus a typical sequence file might begin:
	*