For object detection models with very few classes (<5) and no background images, most models will struggle with false positives. The model is only learning what the thing is that it needs to detect. This is only half the problem.
The other half is that the model needs to learn what not to detect. Heres where background images come in.
The easiest way to get some background images is to take a video with a smart phone and chop it into individual frames.