Route collector projects (RIPE RIS, Routeviews, ...) collect BGP data in MRT files. This is a binary file format you can only process with specific programs. In this article we describe one alternative way to work
When working with this data, often analists parse the raw MRT data for each lookup they want to do. parse -> grep -> process. One alternative is to insert the data in a database and query that.
In this article, we show the basics of how you can index MRT data in clickhouse and analyse it. This still needs infrastructure (clickhouse), but we want to show this because it is flexible and may fit in your analysis stack. If you are more used to a datascience approach, stay tuned: We plan to have a file format for that type of analysis ready at the RIPE meeting.
Indexing it in a database makes querying more ergonomic. Some people do this already (e.g. https://anuragbhatia.com/post/2025/02/analysing-transit-free-networks/) but there have been few writeups of how to do this. Below, you can see the CLI commands for working with a dump file. After that, we will sketch the outline of how you can get the data into clickhouse and how you query that.
"Parse and Grep": When you know what RIB/bview file you need to search in, or you are investigating the updates for a short time period, it is very fast to parse the files and filter the lines that you need. The example below uses bgpreader (from bgpstream). But you can also use monocle (from bgpkit).
$ bgpreader -m -d singlefile -o rib-file=bview.20250902.1600.gz | grep '193.0.14.0/24'
WARN: No time window specified, defaulting to all available data
TABLE_DUMP2|1756828800|B|80.77.16.114|34549|193.0.14.0/24|34549 25152|IGP|80.77.16.114|0|0|34549:200 34549:10000|NAG||
TABLE_DUMP2|1756828800|B|165.16.221.66|37721|193.0.14.0/24|37721 25152|IGP|165.16.221.66|0|0|37721:2000 37721:6002 37721:6003 37721:6004 37721:11000 37721:11100 37721:11101|NAG||
TABLE_DUMP2|1756828800|B|185.210.224.254|49432|193.0.14.0/24|49432 25152|IGP|185.210.224.254|0|1000||NAG||
TABLE_DUMP2|1756828800|B|193.33.94.251|58057|193.0.14.0/24|58057 1836 25152|IGP|193.33.94.251|0|0|1836:20000 1836:110 1836:3000 1836:3020 58057:65010 25152:1016|NAG||
...
- Run a clickhouse database and have the CLI working (out of scope for here).
- See
compose.ymlfor a compose file
- See
- Download MRT RIB dumps (or updates)
- Create the table
- Import
- Query ✨
You can download the MRT archives from https://data.ris.ripe.net/. Each collector has a directory (e.g. https://data.ris.ripe.net/rrc00/) in which there is a directory per month. For example one link for the rrc00 multi-hop collector is https://data.ris.ripe.net/rrc00/2025.09/bview.20250905.0000.gz.
When indexing in a database, I recommend to start with a low number of dumps.
There are also tools, e.g.
$ uvx mrt-downloader ~/tmp/mrt/bviews \
2025-09-02T16:00 \
2025-09-02T16:00:01 \
--rib-only \
--project ris --project routeviews \
--create-target
Downloading updates from 2025-09-02 16:00:00 to 2025-09-02 16:00:01 to /home/user/tmp/bviews
Partitioning directories by collector and month
Gathering index for 77 collectors: RRC00, RRC01, RRC02, RRC03, RRC04, RRC05, RRC06, RRC07, RRC08, RRC09, RRC10, RRC11, RRC12, RRC13, RRC14, RRC15, RRC16, RRC18, RRC19, RRC20, ...
Which downloads all files in the time range (last second is exclusive). If you want only a specific collector, use --collector. For example --collector rrc13 for MSK-IX.
The tool that we will use here (monocle) has a pipe-separated output format. The table format may change depending on what you want to do. For this example I use
CREATE TABLE rib
(
`operation` String,
`timestamp` DateTime,
`peer_ip` String,
`peer_as` String,
`prefix` String,
`as_path` String,
`source` String,
`next_hop` String,
`_` String,
`__` String,
`communities` String,
`___` String,
`____` String,
`_____` String,
`______` String,
)
ENGINE = MergeTree
ORDER BY (prefix, peer_ip, peer_as)This is the simplest schema possible. You can use a more complicated schema. This may reduce storage, make querying easier, etc. At the end, I will show an improved schema. But for now, let's keep it simple :).
Importing data is "ugly" in this case:
- Use monocle (installation) to parse the file
- Connect to clickhouse, set up the CSV separator, load data from
STDIN.
This command assumed you are in the directory where you downloaded the RIB files. It then parses each of them, configures clickhouse to expect a | in CSV, and pipes the data into clickhouse:
for bview in $(find . -name \*.gz); do
monocle parse $bview | clickhouse-client --host=localhost --database=mrt --user=default --password=clickhouse -q "SET format_csv_delimiter = '|'; INSERT INTO rib FORMAT CSV"
done
Prefixes announced by an ASN:
SELECT *
FROM rib
WHERE prefix = '193.0.14.0/24'
LIMIT 2
Query id: 7ea7a828-987b-496e-8fe4-d886177ca741
┌─operation─┬───────────timestamp─┬─peer_ip────────┬─origin_as─┬─prefix────────┬─as_path─────────────────────────────┬─source─┬─next_hop───────┬─_─┬─__─┬─communities──────────────────────────────────────────────────────────────────────────────────────────────┬─___───┬─____─┬─_____─┬─______─┐
1. │ A │ 2025-09-01 08:00:00 │ 86.104.125.108 │ 33823 │ 193.0.14.0/24 │ 33823 25152 │ IGP │ 86.104.125.150 │ 0 │ 0 │ 25152:1074 39107:25152 39107:650 39107:200 39107:500 25152:1074:39107 39107:39107:25152 │ false │ │ │ │
2. │ A │ 2025-09-01 08:00:00 │ 80.81.196.156 │ 34927 │ 193.0.14.0/24 │ 34927 59900 12297 201986 8932 25152 │ IGP │ 80.81.196.156 │ 0 │ 0 │ 34927:130 34927:163 25152:1070:57463 57463:64700:25152 65000:0:100 65000:0:200 65000:0:202 65000:0:12297 │ false │ │ │ │
└───────────┴─────────────────────┴────────────────┴───────────┴───────────────┴─────────────────────────────────────┴────────┴────────────────┴───┴────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────┴──────┴───────┴────────┘
2 rows in set. Elapsed: 0.015 sec. Processed 270.27 thousand rows, 42.65 MB (18.16 million rows/s., 2.87 GB/s.)
Peak memory usage: 72.72 MiB.
Prefixes transiting an ASN:
SELECT
countDistinct(prefix),
multiIf(position(prefix, ':') > 0, 'ipv6', 'ipv4') AS afi
FROM rib
WHERE position(as_path, ' 1299 ') > 0
GROUP BY ALL
Query id: 5661cc1d-189d-4038-b536-d691ff0854e5
┌─countDistinct(prefix)─┬─afi──┐
1. │ 211018 │ ipv6 │
2. │ 965376 │ ipv4 │
└───────────────────────┴──────┘
2 rows in set. Elapsed: 1.022 sec. Processed 443.89 million rows, 24.11 GB (434.54 million rows/s., 23.60 GB/s.)
Peak memory usage: 2.62 GiB.
Et cetera
In this article I showed how you can import BGP dump files into clickhouse. It is relatively simple and it works. Queries are very fast when they match the order in which data is stored in. There also is space for improvement: clickhouse means infrastructure, while the same queries can be possible on pre-processed files. And, because of the way the data is imported, it has little structure. AS paths are a string, which is both easy (you can do a regular expression).
In the future we will show more methods you can use to look at this data, including a workflow where we use joins to easily calculate statistics for multiple networks.
Earlier on I mentioned a number of possible improvements to the schema. Some are in the list below. If they are striked through, they are in the updated schema and commands below. Be careful with every change that you do --- they only help if the database or your queries use them.
enum column for operation:operation Enum8('A' = 1, 'W' = 2)LowCardinalitycolumn for dictionary encoding for columns with a low number of unique values (e.g.peer_ip)- Add secondary incides
INDEX prefixes prefix TYPE set(100) GRANULARITY 2 order by truncated time to optimise queryiesORDER BY (toStartOfFifteenMinutes(timestamp), prefix, peer_ip)Partition the tablePARTITION BY toStartOfDay(timestamp)
CREATE TABLE rib
(
`operation` Enum8('A' = 1, 'W' = 2),
`timestamp` DateTime,
`peer_ip` LowCardinality(String),
`peer_as` LowCardinality(String),
`prefix` String,
`as_path` Array(String), --- recall: Can contain AS-sets
`source` LowCardinality(String),
`next_hop` String,
`_` String,
`__` String,
`communities` Array(String),
`___` String,
`____` String,
`_____` String,
`______` String
)
ENGINE = MergeTree
ORDER BY (toStartOfFifteenMinutes(timestamp), as_path[-1], prefix)
PARTITION BY toStartOfDay(timestamp)for bview in $(find . -name \*.gz); do
monocle parse $bview | clickhouse-client --host=localhost --database=mrt --user=default --password=clickhouse -q "
SET format_csv_delimiter = '|'; INSERT INTO rib
SELECT
operation, timestamp, peer_ip, peer_as, prefix,
splitByChar(' ', as_path_raw) as as_path,
source, next_hop, col9, col10,
splitByChar(' ', raw_communities) as communities,
col12, col13, col14, col15
FROM input('
operation String, timestamp DateTime, peer_ip String,
peer_as String, prefix String,
as_path_raw String,
source String, next_hop String, col9 String, col10 String,
raw_communities String, col12 String, col13 String, col14 String, col15 String
') FORMAT CSV"
done