General:
| Tools | Description |
|---|---|
| flank | Create new intervals from the flanks of existing intervals. |
| slop | Adjust the size of intervals. |
| shift | Adjust the position of intervals. |
| subtract | Remove intervals based on overlaps b/w two files. |
| complement | Extract intervals not represented by an interval file. |
| closest | Find the closest, potentially non-overlapping interval. |
| intersect | Find overlapping intervals in various ways. |
| window | Find overlapping intervals within a window around an interval. |
| cluster | Cluster (but don't merge) overlapping/nearby intervals. |
| merge | Combine overlapping/nearby intervals into a single interval. |
| map | Apply a function to a column for each overlapping interval. |
| groupby | Group by common cols. & summarize oth. cols. (~ SQL "groupBy") |
Formatting:
Notes: BED file format, GFF vs BED indexing
| Tools | Description |
|---|---|
| getfasta | Use intervals to extract sequences from a FASTA file. |
| maskfasta | Use intervals to mask sequences from a FASTA file. |
| sort | Order the intervals in a file. |
| bed12tobed6 | Breaks BED12 intervals into discrete BED6 intervals. |
| bamtofastq | Convert BAM records to FASTQ records. |
| bamtobed | Convert BAM alignments to BED (& other) formats. |
| bedpetobam | Convert BEDPE intervals to BAM records. |
| bedtobam | Convert intervals to BAM records. |
Statistics:
| Tools | Description |
|---|---|
| jaccard | Calculate the Jaccard statistic b/w two sets of intervals. |
| random | Generate random intervals in a genome. |
| reldist | Calculate the distribution of relative distances b/w two files. |
| shuffle | Randomly redistribute intervals in a genome. |
| makewindows | Makes adjacent or sliding windows across a genome or BED file. |
| nuc | Profile the nucleotide content of intervals in a FASTA file. |
Coverage:
| Tools | Description |
|---|---|
| annotate | Annotate coverage of features from multiple files. |
| coverage | Compute the coverage over defined intervals. |
| genomecov | Compute the coverage over an entire genome. |
| multicov | Counts coverage from multiple BAMs at specific intervals. |
| unionbedg | Combines coverage intervals from multiple BEDGRAPH files. |
- -s, -S : Require same strandedness or opposite strandedness, respectively.
- -f, -F : Minimum overlap required as a fraction of A or a fraction of B respectively.
- -r, -e : Require that the minimum overlap be satisfied for A AND B, or A OR B respectively.
- -split : Treat "split" BAM or BED12 entries as distinct BED intervals.
- -abam : A is a BAM file.
Create new intervals from the flanks of existing intervals. (flank Docs)
Adjust the size of intervals. (slop Docs)
IN ▓▓▓▓▓ ▓▓▓
Flank ██ ██ ██ ██
Slop █████████ ███████
$ bedtools flank [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]
$ bedtools slop [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]
| OPTIONS | . |
|---|---|
| -b, -l, -r | Flank/extend regions by x bp on both sides, on the left, or on the right respectively. |
| -s | Define -l and -r based on strand. |
| -pct | Define -l and -r as a fraction of the feature's length. |
Adjust the position of intervals, while respecting chromosome edges. (Docs).
IN ██ ██ ████
OUT ██ ██ ████
$ bedtools shift [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-s or (-m and -p)]
| OPTIONS | . |
|---|---|
| -s | Number of BPs to shift the features. |
| -m, -p | Number of BPs to shift the features on the - strand or + strand, respectively. |
| -pct | Define -s, -m and -p as a fraction of the feature's length. |
Remove intervals based on overlaps b/w two files. (Docs)
A ▓▓▓▓▓▓▓▓▓▓ ▓▓▓ ▓▓▓▓▓▓
B ▓▓▓▓ ▓▓▓▓▓▓▓
A sub B ██ ████ ███ ███
$ bedtools subtract [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| -A | Remove entire feature if any overlap. |
| common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e |
Extract intervals not represented by an interval file. (Docs)
IN ▓▓▓▓▓ ▓▓▓ ▓▓▓▓▓▓
▓▓▓▓ ▓▓▓
OUT █████ █████ ██
$ bedtools complement -i <BED/GFF/VCF> -g <GENOME>
Find the closest, potentially non-overlapping interval. (Docs)
A █████ ✓
B ████ ███
$ bedtools closest [OPTIONS] -a <FILE> -b <FILE1, FILE2, ..., FILEN>
| OPTIONS | . |
|---|---|
| -d | Also report distance from A to the closest feature. |
| -k | Report the k closest hits. Default: 1. |
| -io | Ignore features in B that overlap A. |
| -iu, -id | Ignore features in B that are upstream or downstream, respectively, of features in A. |
| common | strandedness: -s, -S |
Find overlapping intervals in various ways. (Docs)
A ██████████
B ▓▓▓▓ ▓▓ ▓▓▓
A int B ▓▓ ▓▓
$ bedtools intersect [OPTIONS] -a <BAM/BED/GFF/VCF> -b <FILE1, FILE2, ..., FILEN>
| OPTIONS | . |
|---|---|
| -wa, -wb | Write the original entry in A/original entry in B, respectively, for each overlap. |
| -loj | For each feature in A report each overlap with B. Report a NULL feature for B if no overlap. |
| -wao | Report A and B features and no. of bp overlap between them. |
| -u | Only report each overlapping A feature once. |
| -c | For each entry in A, report count of overlapping B features. |
| -v | Only report features in A not overlapping B. |
| common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bam/bed12: -abam, -split |
Find overlapping intervals within a window around an interval. (Docs)
A ┌────█████────┐
B ▓▓▓▓ ▓▓▓ ▓▓▓
A win B ▓▓▓▓ ▓▓▓
$ bedtools window [OPTIONS] [-a|-abam] -b <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| -w, -l, -r | Flank length of overlap window in each direction, upstream or downstream, respectively. |
| -sw | Define -l and -r based on strand. |
| -u | Only report each overlapping A feature once. |
| -c | For each entry in A, report count of overlapping B features. |
| -v | Only report features in A not overlapping B. |
| common | strandedness: -sm, -Sm; bam: -abam |
Cluster (but don't merge) overlapping/nearby intervals. (Docs)
BED ████ █████ ███
clustID └─#1─┘ └────#2────┘
$ bedtools cluster [OPTIONS] -i <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| -d | Max distance between features in cluster. |
| common | strandedness: -s, -S |
For merge, groupby, and map the following* aggregation functions (specified by -o) can be applied to a column/columns specified by -c:
sum, count, count_distinct, min, max, mean, median, mode, antimode, stdev, sstdev, collapse, distinct, first, last
*Other functions are available.
Combine overlapping/nearby intervals into a single interval. (Docs)
IN ▓▓▓ ▓ ▓▓··d··▓▓▓
▓▓▓▓ ▓▓
OUT ██████ ███ ██████████
$ bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM>
| OPTIONS | . |
|---|---|
| -s | Require same strandedness. |
| -S | Force merge for one specific strand only. Options: <+/->. |
| -d | Maximum distance between features to be merged. |
| common | aggregation: -o, -c; |
Apply a function to a column for each overlapping interval.(Docs)
score = 3 1 5 4 6
B ▓▓▓ ▓ ▓▓▓▓▓ ▓▓▓▓▓▓ ▓▓▓▓
A ██████████ ███████
B map(mean) A ██████████ mean(3,1,5)=5 ███████ mean(4,6)=5
$ bedtools map [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
| OPTIONS | . . |
|---|---|
| common | aggregation: -o, -c; strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bed12: -split |
Group by common cols & summarize other cols (~ SQL "groupBy"). (Docs)
$ bedtools groupby [OPTIONS] -i <BED> -g <groupby columns> -c <op. column> -o <operation>
| OPTIONS | . |
|---|---|
| common | aggregation: -o, -c |
| Column | e.g. | Definition |
|---|---|---|
| chrom | Sc112.1 | <STR> name of chromosome/scaffold |
| start | 2134 | <INT> start position of feature |
| end | 2565 | <INT> end position of feature |
| name | gene123 | <STR> name of feature |
| score | 544 | <NUM> score for the feature e.g. bit score |
| strand | + | <+/-/.> strand on which feature is located |
| thickStart | 2235 | |
| thickEnd | 2489 | |
| itemRgb | 255,0,0 | |
| blockCount | 2 | |
| blockSizes | 150,80 | |
| blockStarts | 0,2333 |
GFF ┌─1 2 3─┐ 4 ...
G---A---T C ...
BED └─0 1 2 └─3 ...
| . | gff -> bed | bed -> gff |
|---|---|---|
| new_start = | gff_start - 1 | bed_start + 1 |
| new_end = | gff_end | bed_end |
Use intervals to extract sequences from a FASTA file. (Docs)
FASTA ACTGATCATGATACATGATACCATTAGGATACAATA
BED ████ █████ ████
OUTFA ATCA TGATA GGAT
$ bedtools getfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| -name | Use “name” column in BED file for FASTA headers in the output. |
| -s | Reverse complement features on "-" strand. Default: strand information ignored. |
| -split | Given BED12 input, concatenate the sequences from BED blocks (e.g., exons). |
Use intervals to mask sequences from a FASTA file. (Docs)
FASTA ACTGATCATGATACATGATACCATTAGGATACAATA
BED ████ █████ ████
FASTA' ACTGATNNNNATACATGNNNNNATTAGGNNNNAATA
$ bedtools maskfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA>
| OPTIONS | . |
|---|---|
| -soft | Soft-mask (convert to lower-case bases) instead of masking with "N". |
| -mc | Specify masking character. |
Order the intervals in a file. (Docs)
$ bedtools sort [OPTIONS] -i <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| -sizeA | Sort by feature size (asc). |
| -sizeD | Sort by feature size (desc). |
| -chrThenSizeA | Sort by chromosome (asc), then by feature size (asc). |
| -chrThenSizeD | Sort by chromosome (asc), then by feature size (desc). |
| -chrThenScoreA | Sort by chromosome (asc), then by score (asc). |
| -chrThenScoreD | Sort by chromosome (asc), then by score (desc). |
Calculate the Jaccard statistic b/w two sets of intervals. (Docs)
A ███████████ 15bp
B ▓▓▓▓ 10bp ▓▓ 4bp ▓▓▓ 8bp
A int B ▓▓ 6bp ▓▓ 4bp
Jaccard(A,B) (6+4)/((15+10+4+8)-(6+4)) = 0.37
$ bedtools jaccard [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bed12: -split |
Generate random intervals in a genome. (Docs)
$ bedtools random [OPTIONS] -g <GENOME>
| OPTIONS | . |
|---|---|
| -l | The length of the intervals to generate. Default: 100 |
| -n | The number of intervals to generate. Default: 1,000,000 |
| -seed | Supply an integer seed for the shuffling. |
Calculate the distribution of relative distances b/w two files. (Docs)
───────r──────
A ▓▓▓▓▓▓ ▓▓▓▓
B ███
───d1─── ──d2──
reldist = min(d1,d2)/r
$ bedtools reldist [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
| OPTIONS | . |
|---|---|
| -detail | Instead of a summary, report relative distance for each region in A. |
Randomly redistribute intervals in a genome. (Docs)
$ bedtools shuffle [OPTIONS] -i <BED/GFF/VCF> -g <GENOME>
| OPTIONS | . |
|---|---|
| -excl | BED file with regions into which features won't be shuffled. |
| -incl | BED file with regions into which features will be shuffled. |
| -chrom | Keep features on the same chromosome. |
| -chromFirst | Distribute features ~uniformly across chroms, not across total sequence. |
| -noOverlapping | Don't allow shuffled intervals to overlap. |
Annotate coverage of features from multiple files. (Docs)
$ bedtools annotate -i variants.bed -files genes.bed conserve.bed known_var.bed
chr1 100 200 nasty 1 - 0.500000 1.000000 0.300000
chr2 500 1000 ugly 2 + 0.000000 0.600000 1.000000
$ bedtools annotate [OPTIONS] -i <BED/GFF/VCF> -files FILE1 FILE2 FILE3 ... FILEn
| OPTIONS | . |
|---|---|
| -counts | Report count of features that overlap -i in each file. Default: report fraction of -i covered by each file. |
| -both | Report counts & fractions for each file. |
| common | strandedness: -s, -S. |
Compute the coverage over defined intervals. (Docs)
BED FILE A ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓
BED File B ████ ████ ██ █████████
████████
Result [ N=3, 10/15 ] [ N=1, 2/15 ] [N=1,6/6]
$ bedtools coverage [OPTIONS] -a <BAM/BED/GFF/VCF> -b <FILE1, FILE2, ..., FILEN>
| OPTIONS | . |
|---|---|
| -d | Report the depth at each position in each A feature. |
| common | strandedness: -s, -S; overlap: -f, -F; overlap mode: -r, -e; bam/bed12: -split,-abam |