Created
September 14, 2012 00:55
-
-
Save arq5x/3719100 to your computer and use it in GitHub Desktop.
ENCODE consensus segmentations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 1. Get the ENCODE segmentations from EBI. | |
# consensus | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/gm12878.combined.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/h1hesc.combined.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/helas3.combined.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/hepg2.combined.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/huvec.combined.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/k562.combined.bb | |
# Segway (ahem; https://twitter.com/michaelhoffman/status/246679147164880897) | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/gm12878.segway.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/h1hesc.segway.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/helas3.segway.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/hepg2.segway.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/huvec.segway.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/k562.segway.bb | |
# ChromHMM | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/gm12878.ChromHMM.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/h1hesc.ChromHMM.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/helas3.ChromHMM.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/hepg2.ChromHMM.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/huvec.ChromHMM.bb | |
wget http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/awgHub/byDataType/segmentations/jan2011/k562.ChromHMM.bb | |
# 2. Make BEDGRAPHs of the ENCODE segmentation BigBeds | |
for bigbed in `ls *.bb` | |
do | |
bigBedToBed $bigbed stdout | cut -f 1-4 | bgzip > $bigbed.bedg.gz | |
done | |
# 3. Use bedtools to make the union of each ENCODE segmentation set. | |
# That is, make a one file for the consensus segmentations including all 6 cell | |
# lines, another for segway, and another for ChromHMM | |
bedtools unionbedg -i gm12878.combined.bb.bedg.gz \ | |
h1hesc.combined.bb.bedg.gz \ | |
helas3.combined.bb.bedg.gz \ | |
hepg2.combined.bb.bedg.gz \ | |
huvec.combined.bb.bedg.gz \ | |
k562.combined.bb.bedg.gz \ | |
-header \ | |
-names gm12878 \ | |
h1hesc \ | |
helas3 \ | |
hepg2 \ | |
huvec \ | |
k562 \ | |
-filler unknown \ | |
| bgzip \ | |
> encode.6celltypes.consensus.bedg.gz | |
bedtools unionbedg -i gm12878.segway.bb.bedg.gz \ | |
h1hesc.segway.bb.bedg.gz \ | |
helas3.segway.bb.bedg.gz \ | |
hepg2.segway.bb.bedg.gz \ | |
huvec.segway.bb.bedg.gz \ | |
k562.segway.bb.bedg.gz \ | |
-header \ | |
-names gm12878 \ | |
h1hesc \ | |
helas3 \ | |
hepg2 \ | |
huvec \ | |
k562 \ | |
-filler unknown \ | |
| bgzip \ | |
> encode.6celltypes.segway.bedg.gz | |
bedtools unionbedg -i gm12878.ChromHMM.bb.bedg.gz \ | |
h1hesc.ChromHMM.bb.bedg.gz \ | |
helas3.ChromHMM.bb.bedg.gz \ | |
hepg2.ChromHMM.bb.bedg.gz \ | |
huvec.ChromHMM.bb.bedg.gz \ | |
k562.ChromHMM.bb.bedg.gz \ | |
-header \ | |
-names gm12878 \ | |
h1hesc \ | |
helas3 \ | |
hepg2 \ | |
huvec \ | |
k562 \ | |
-filler unknown \ | |
| bgzip \ | |
> encode.6celltypes.ChromHMM.bedg.gz | |
# 4. take a peek (gzcat is for OSX, use zcat elsewhere) | |
# | |
# Glossary for ENCODE chromatin segment predictions. Taken verbatim from Table 3 | |
# of doi:10.1038/nature11247 | |
# CTCF: CTCF-enriched element | |
# E: Predicted enhancer | |
# PF: Predicted promoter flanking region | |
# R: Predicted repressed or low-activity region | |
# TSS: Predicted promoter region including TSS | |
# T: Predicted transcribed region | |
# WE: Predicted weak enhancer or open chromatin cis-regulatory element | |
# unknown: added by us. hopefully self-explanatory | |
(gzcat encode.6celltypes.consensus.bedg.gz | head -1; gzcat encode.6celltypes.consensus.bedg.gz | \ | |
awk 'NR >= 100000 && NR <= 100005') | |
chrom start end gm12878 h1hesc helas3 hepg2 huvec k562 | |
chr1 21710600 21710800 WE CTCF CTCF R R CTCF | |
chr1 21710800 21711000 WE R R R R R | |
chr1 21711000 21711200 WE unknown R R R R | |
chr1 21711200 21711298 WE unknown R R R unknown | |
chr1 21711298 21711400 WE unknown R R R WE | |
chr1 21711400 21711468 WE R R R R WE | |
(gzcat encode.6celltypes.segway.bedg.gz | head -1; gzcat encode.6celltypes.segway.bedg.gz | \ | |
awk 'NR >= 100000 && NR <= 100005') | |
chrom start end gm12878 h1hesc helas3 hepg2 huvec k562 | |
chr1 5103241 5103270 Low5 Low7 Low3 Low3 Low5 Low2 | |
chr1 5103270 5103289 Low5 Low7 Low3 Low3 Low5 Low6 | |
chr1 5103289 5103299 Low5 Low7 Low5 Low3 Low5 Low6 | |
chr1 5103299 5103381 Low5 Low7 Low5 Low3 Quies Low6 | |
chr1 5103381 5103388 Low5 Low7 Low1 Low3 Quies Low6 | |
chr1 5103388 5103389 Low5 Low1 Low1 Low3 Quies Low6 | |
(gzcat encode.6celltypes.ChromHMM.bedg.gz | head -1; gzcat encode.6celltypes.ChromHMM.bedg.gz | \ | |
awk 'NR >= 100000 && NR <= 100005') | |
chr1 36245000 36245600 EnhF Quies Low Quies Quies Low | |
chr1 36245600 36245800 EnhWF Quies Low Quies Quies Low | |
chr1 36245800 36251000 Low Quies Low Quies Quies Low | |
chr1 36251000 36251800 Low Quies Low Low Quies Low | |
chr1 36251800 36252200 Low Quies Low Low Quies EnhWF | |
chr1 36252200 36252400 Low Quies H4K20 Low Quies EnhWF | |
# 5. tabix the 6-way segmentation maps for use within gemini. | |
tabix -p bed encode.6celltypes.consensus.bedg.gz | |
tabix -p bed encode.6celltypes.segway.bedg.gz | |
tabix -p bed encode.6celltypes.ChromHMM.bedg.gz |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment