For object detection models with very few classes (<5) and no background images, most models will struggle with false positives. The model is only learning what the thing is that it needs to detect. This is only half the problem.
The other half is that the model needs to learn what not to detect. Heres where background images come in.
The easiest way to get some background images is to take a video with a smart phone and chop it into individual frames.
I recommend using a phone that can record 1080p @60fps. This way, when it comes time to break the video into frames, the images have a lower chance of being blurry. To futher reduce the chance of blurry images, I recommend you move the camera around slowly.
Now just walk around with your camera recording. Make sure you don't let any objects that are in your list of classes into the recording, as this will hinder the model from learning well.
- Install ffmpeg
pip install ffmpeg
-
Split the video into frames.
Make sure to specify the number of fps. If you don't you will get every frame, which would mean lots of images that are almost exactly the same.
ffmpeg -i video_file_name.mp4 -vf fps=1 output_folder/frame%06d.png
I am trying to train a 320x320 model. To add some variability to my dataset I want the background images to have different dimension. I'm shooting for an even split of 640x640, 320x320, and 256x256 images. You can modify the sizes for your model accordingly.
- Clone this tool from github, thank the creator. Install the requirements.
git clone https://github.com/dvschultz/dataset-tools.git
cd dataset-tools
pip install -r requirements.txt
- Randomly crop parts of the image
python3 multicrop.py --input_folder /home/isaac/Desktop/background_dataset/working/bg_3/ --output_folder /home/isaac/Desktop/background_dataset/working/640_3 --min_size 640 --how_many 3