suyashcjoshi · December 7, 2022 20:07
diff --git a/gistfile1.txt b/gistfile1.txt
 Here is a list of various audio datasets, most of them are free use use under respective license.

 1. Audio Data Set by Google Research : 10-second sound clips drawn from 2,084,320 YouTube videos containing 527 labels. They contain a wide variety of every day sounds.

   License : DataSet is under Creative Commons Attribution 4.0 International (CC BY 4.0) license, while the ontology is available under a Creative Commons Attribution-ShareAlike 4.0 International(CC BY-SA 4.0) license.

   Link: https://research.google.com/audioset/index.html
   
 2. MAPS Database : A piano database for multipitch estimation and automatic transcription of music. 31 GB of CD-quality recordings in .wav format.

   License: (CC BY-NC-SA 2.0 FR)
   
   Link: https://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/
   
   
 3. Free Music Archieve DataSet : Full length, high quality audio audio from 106,574 tracks from 16,341 artists and 14,854 albums. It arranged in a hierarchical taxonomy of 161 genres.

   License: Creative Commons Attribution 4.0 International License (CC BY 4.0)
   
   Link: https://github.com/mdeff/fma
   
 4. FreeSound : It has more than 445k sounds contributed by people, highly encourage checking it out and contributing your own sounds.

   License: Various Types of Creative Common License for differnt sounds zero (cc0), attribution (by), attribution noncommercial etc. (by-nc)

   Link: https://freesound.org
   
 5. Urban Sound Dataset : It contains 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes.

   License: Creative Commons Attribution Noncommercial License (by-nc), version 3.0
   
   Link : https://urbansounddataset.weebly.com
   
 6. The NSynth Dataset : The NSynth Dataset is an audio dataset containing ~300k musical notes, each with a unique pitch, timbre, and envelope. Each note is annotated with three additional pieces of information based on a combination of human evaluation and heuristic algorithms: Source, Family, and Qualities.

   License : Creative Commons Attribution 4.0 International (CC BY 4.0) license
   
   Link : https://magenta.tensorflow.org/datasets/nsynth
   
 7. Groove MIDI DataSet : The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming captured on a Roland TD-11 V-Drum electronic drum kit.

   License :  Creative Commons Attribution 4.0 International (CC BY 4.0) License
   
   Link : https://magenta.tensorflow.org/datasets/groove
   
 8. Common Voice by Mozzila : It comprises of multi-language dataset of various voices in .mp3 format. It is already used by lot of speech recognition AI systems for training purpose.

   License : CC-0

   Link: https://commonvoice.mozilla.org/en/datasets
   
 9. TED-LIUM : The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.
   
   License :  Creative Commons BY-NC-ND 3.0 
   
   Link : https://www.tensorflow.org/datasets/catalog/tedlium
   
 10. CREMA-D (Crowd-sourced Emotional Mutimodal Actors Dataset) : It is a data set of 7,442 original clips from 91 actor. In the clips, actor speak from a selection of 12 sentences using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) and four different emotion levels (Low, Medium, High, and Unspecified).
   
   License :  Open Database License
   
   Link : https://github.com/CheyneyComputerScience/CREMA-D
   
 Apart from above, if you have video recordings and want to export audio from that, use this powerful open source tool 'fffmeg' : https://stackoverflow.com/questions/9913032/how-can-i-extract-audio-from-video-with-ffmpeg
	Here is a list of various audio datasets, most of them are free use use under respective license.

	1. Audio Data Set by Google Research : 10-second sound clips drawn from 2,084,320 YouTube videos containing 527 labels. They contain a wide variety of every day sounds.

	License : DataSet is under Creative Commons Attribution 4.0 International (CC BY 4.0) license, while the ontology is available under a Creative Commons Attribution-ShareAlike 4.0 International(CC BY-SA 4.0) license.

	Link: https://research.google.com/audioset/index.html

	2. MAPS Database : A piano database for multipitch estimation and automatic transcription of music. 31 GB of CD-quality recordings in .wav format.

	License: (CC BY-NC-SA 2.0 FR)

	Link: https://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/


	3. Free Music Archieve DataSet : Full length, high quality audio audio from 106,574 tracks from 16,341 artists and 14,854 albums. It arranged in a hierarchical taxonomy of 161 genres.

	License: Creative Commons Attribution 4.0 International License (CC BY 4.0)

	Link: https://github.com/mdeff/fma

	4. FreeSound : It has more than 445k sounds contributed by people, highly encourage checking it out and contributing your own sounds.

	License: Various Types of Creative Common License for differnt sounds zero (cc0), attribution (by), attribution noncommercial etc. (by-nc)

	Link: https://freesound.org

	5. Urban Sound Dataset : It contains 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes.

	License: Creative Commons Attribution Noncommercial License (by-nc), version 3.0

	Link : https://urbansounddataset.weebly.com

	6. The NSynth Dataset : The NSynth Dataset is an audio dataset containing ~300k musical notes, each with a unique pitch, timbre, and envelope. Each note is annotated with three additional pieces of information based on a combination of human evaluation and heuristic algorithms: Source, Family, and Qualities.

	License : Creative Commons Attribution 4.0 International (CC BY 4.0) license

	Link : https://magenta.tensorflow.org/datasets/nsynth

	7. Groove MIDI DataSet : The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming captured on a Roland TD-11 V-Drum electronic drum kit.

	License : Creative Commons Attribution 4.0 International (CC BY 4.0) License

	Link : https://magenta.tensorflow.org/datasets/groove

	8. Common Voice by Mozzila : It comprises of multi-language dataset of various voices in .mp3 format. It is already used by lot of speech recognition AI systems for training purpose.

	License : CC-0

	Link: https://commonvoice.mozilla.org/en/datasets

	9. TED-LIUM : The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.

	License : Creative Commons BY-NC-ND 3.0

	Link : https://www.tensorflow.org/datasets/catalog/tedlium

	10. CREMA-D (Crowd-sourced Emotional Mutimodal Actors Dataset) : It is a data set of 7,442 original clips from 91 actor. In the clips, actor speak from a selection of 12 sentences using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) and four different emotion levels (Low, Medium, High, and Unspecified).

	License : Open Database License

	Link : https://github.com/CheyneyComputerScience/CREMA-D

	Apart from above, if you have video recordings and want to export audio from that, use this powerful open source tool 'fffmeg' : https://stackoverflow.com/questions/9913032/how-can-i-extract-audio-from-video-with-ffmpeg