``` from opensoundscape import Audio Audio.from_file('/path/audio.wav') ```
Head to the Audio tutorial notebook
See also:
- docs for Audio class
``` from opensoundscape import Spectrogram, Audio s=Spectrogram.from_audio(Audio.from_file('/path/audio.wav')) s.plot() ``` Check out the Spectrogram tutorial notebook
See also:
- docs for Spectrogram class
Select your use case depending on where you will get your pre-trained model from:
See Readme of bioacoustics model zoo for a tutorial.
In short:
list available models in the GitHub repo bioacoustics-model-zoo
import torch
torch.hub.list('kitzeslab/bioacoustics-model-zoo')
Get a ready-to-use model object: choose from the models listed in the previous command
model = torch.hub.load('kitzeslab/bioacoustics-model-zoo','rana_sierrae_cnn')
model
is an OpenSoundscape CNN object which you can use as normal. For instance, use the model to generate predictions on an audio file:
audio_file_path = './hydrophone_10s.wav'
scores = model.predict([audio_file_path],activation_layer='softmax')
scores
Select a use case from below:
- Consider how much automated detection and classification will help you versus how much effort and time it will take to develop: how much data do you have? what sort of information do you want from it? Currently, AI approaches for acoustic monitoring are useful for detecting specific sounds in large datasets, such as those recorded by ARUs. More complex tasks such as counting individual organisms, monitoring behavior, or recognizing individuals are, at best, very difficult for current AI methods. - if you are trying to detect a relatively common bird species, check if the [Bird-NET](https://github.com/kahst/BirdNET-Analyzer) detector works well enough for your needs. Make sure to check false positive and false negative rates on your field data. - if you are trying to detect bats, there may be an existing software tool that meets your needs. A quick google search will get you started - do you have (or can you aquire) many (i.e., tens to hundreds) of examples of the sound you are interested in? If not, consider signal processsing approaches. In particular, if the sound you are interested in contains regular periodict structure in time, like the calls of many frogs, toads, and insects, check out [RIBBIT](https://github.com/kitzeslab/ribbit_manuscript_notebooks) and the [accelerating series detector](https://github.com/orgs/kitzeslab/repositories?type=all). - if you do have labeled data, consider training a CNN using opensoundscape (see below)
Start with the notebook tutorial on preparing Raven annotations for training CNN in OpenSoundscape, then proceed to CNN training below
OpenSoundscape supports GPU and multi-GPU acceleration of deep learning models during training and inference (prediction). Here are some tips to get you started
- the
nvidia-smi
command is usually the best way to monitor GPU usage and GPU memory usage.- example where we check GPU usage once per second for 100 seconds:
timeout 100 nvidia-smi --query-gpu=timestamp,pci.bus_id,utilization.gpu,memory.used,memory.total, --format=csv -l 1
(add on >> cuda_log.csv
at the end to log to a file instead of printing to your terminal)
- install a version of PyTorch that is compatible with your CUDA version. This page will help you find the correct Pytorch version. Use
nvidia-smi
to check your cuda version. - choose the
num_workers
argument >1 to parallelize preprocessing across CPUs when running.train()
and.predict()
- for training, generally choose large batch sizes that are powers of 2 (at least 64, sometimes as large as 1024+) but if you get CUDA out-of-memory errors, lower your batch size
- the
CNN
class's.device
attribute specifies where the network will run forward and backward passes.- OpenSoundscape will automatically try to find and use a cuda device (
cuda:0
) by default - you can specify a cuda device by writing, for instance,
cnn.device='cuda:1'
wherecnn
is youropensoundscape.CNN
object - to parallelize over multiple GPU devices, wrap the CNN object's
.network
attribute (which is a PyTorch model object) in DataParallel like this:model.network = torch.nn.DataParallel(model.network, device_ids=[0, 1]).cuda()
. The device IDs list should specify which CUDA devices to use. Use the commandnvidia-smi
to list all CUDA-compatible GPU devices on your machine. - if you have an Apple Silicon (M1, M2, etc) chip on your Mac laptop, you can use GPU acceleration by setting
cnn.device='mps'
wherecnn
is youropensoundscape.CNN
object
- OpenSoundscape will automatically try to find and use a cuda device (
- Even when using GPU nodes, the preprocessing steps (loading audio, creating a spectrogram, converting to tensor, etc) will happen on CPU nodes. This is important because it can cause a speed bottleneck for the entire process. Typically you'll need 5-10 CPU tasks for each GPU to avoid a preprocessing bottleneck, but of course, this depends on several things: the speed of the GPU, size of the network (larger networks -> more work for the GPU), and the amount of preprocessing done on each sample (heavier preprocessing or larger audio samples -> more work for the CPUs).
- Even more so than preprocessing, I/O (reading data files like WAV audio files from a storage location) can limit the speed of training and prediction. Store your data on the fastest drive you can, and somewhere "close" to where your model is running. From fastest to slowest: internal NVMe drive > SSD > HDD > external HDD > internet connection. (Note that some advanced hard drive configurations allow you to read data in parallel from several drives). If your data is stored on HDD (a spinning disk drive) or on a networked device (accessed by your compute machine via an internet connection), I/O will severely bottleneck your training and prediction speeds.
Look at the RIBBIT tutorial notebook. See also:
Work through an example of using signal processing to detect the accelerating wing drumming pattern of Ruffed Grouse
Ruffed grouse manuscript notebooks
See also: manuscript
Acoustic localization methods are available in OpenSoundscape as a "beta" feature, meaning that they are in active development and will continue to improve as we refine our software tools.
Head to the Localization tutorial notebook to see the tools in action.