Please comment below if you have an update, e.g., with another networking-related dataset.
- Finding datasets: New Google Dataset search
- Mendeley Data: https://data.mendeley.com/datasets
- Kaggle Data: https://www.kaggle.com/datasets
- Google's M-Lab networking performance data sets: https://www.measurementlab.net/data/
- SNDZoo: https://sndzoo.github.io/
- Datasets from the University of Catalunya: http://knowledgedefinednetworking.org/
- Traffic over time (in minutes, hours, or days) from ~2005 of a private ISP. Only time + traffic size, no source/destination or any other info. https://datamarket.com/data/list/?q=cat:ecd%20provider:tsdl
- Facebook traffic traces (access via FB group): https://research.fb.com/data-sharing-on-traffic-pattern-inside-facebooks-datacenter-network/
- North American Backbone network Abilene, date from 24 weeks of 5 minute (2004) averages, 12 routers (12x12 matrices): http://www.maths.adelaide.edu.au/matthew.roughan/project/traffic_matrix/ (160 MB)
- SNDlib: Library with real-world network topologies (most popular: Abilene) and sometimes traffic/service demands: http://sndlib.zib.de/
- Under
Library > Dynamic traffic
there are realistic traffic traces that are dynamically changing over time. For example traffic matrix from every 5min over 6 months for the Abilene network. - Also for every 15min for the larger Geant network
- Under
- UMassTraceRepository
- RawDad: Real, wireless data (124 datasets). Download only for registered users (free). https://crawdad.org/about.html
- Internet traffic archive: Useful, real-world traces. http://ita.ee.lbl.gov/html/traces.html
- MAWI Working Group Traffic Archive Packet traces from WIDE backbone
- Huge traces from Google. 12000+ machines, measured over 1 month, compressed size ~41GB (also smaller traces available). https://github.com/google/cluster-data
- IP-Network traffic labeled with different apps: https://www.kaggle.com/jsrojas/ip-network-traffic-flows-labeled-with-87-apps/home
- Summarized network traffic where computers are gradually compromised by a botnet: https://www.kaggle.com/crawford/computer-network-traffic/home
- Caida packet-level traffic traces from 2016: https://www.caida.org/data/passive/passive_2016_dataset.xml
tcpreplay
simple traffic traces (smallFlows.pcap
andbigFlows.pcap
) for traffic generation: https://tcpreplay.appneta.com/wiki/captures.html
- Trex generates statefull and stateless traffic. Python API is provided. https://github.com/cisco-system-traffic-generator/trex-core
- Intel's traffic testbed for data plane development kit (DPDK) (using Trex): https://software.intel.com/en-us/articles/build-your-own-dpdk-traffic-generator
- MoonGen
- How to generate Netflow data from PCAP traces (more available): https://stackoverflow.com/a/34792376/2745116
- SNDlib: http://sndlib.zib.de/home.action
- TopologyZoo: http://topology-zoo.org/
- Historical (hourly) ping measurements for large-scale network: https://wondernetwork.com/pings
- Datasets from the University of Catalonia: http://knowledgedefinednetworking.org/
- Alibaba traces with batch workloads on thousands of machines (from 2017 and 2018): https://github.com/alibaba/clusterdata