EfficientDet Pytorch-lightning with EfficientNet v2 backbone Blog Post.ipynb

arekmula commented Nov 4, 2021

Hi @Chris-hughes10, I'm struggling to use the backbones from timm.list_models('tf_efficientnetv2_*'). When I use one of them, my results are really poor, while when I'm using a model from efficientdet_model_param_dict my results are very good out of the box.

sarmientoj24 commented Nov 25, 2021

@Chris-hughes10 do I change anything on the code aside from the num_classes if I am using multi-class detection?
For example, I saw here that you have labels being 1

@typedispatch
    def predict(self, images: List):
        """
        For making predictions from images
        Args:
            images: a list of PIL images
        Returns: a tuple of lists containing bboxes, predicted_class_labels, predicted_class_confidences
        """
        image_sizes = [(image.size[1], image.size[0]) for image in images]
        images_tensor = torch.stack(
            [
                self.inference_tfms(
                    image=np.array(image, dtype=np.float32),
                    labels=np.ones(1),
                    bboxes=np.array([[0, 0, 1, 1]]),
                )["image"]
                for image in images
            ]
        )

        return self._run_inference(images_tensor, image_sizes)

sarmientoj24 commented Nov 25, 2021

I am getting something like this

  warnings.warn("Zero area box skipped: {}.".format(box_part))
/home/user/anaconda3/envs/yolov5/lib/python3.7/site-packages/ensemble_boxes/ensemble_boxes_wbf.py:88: UserWarning: Zero area box skipped: [0.6941577792167664, 1.0, 1.0, 1.0].
  warnings.warn("Zero area box skipped: {}.".format(box_part))
/home/user/anaconda3/envs/yolov5/lib/python3.7/site-packages/ensemble_boxes/ensemble_boxes_wbf.py:88: UserWarning: Zero area box skipped: [0.8687818050384521, 1.0, 1.0, 1.0].
  warnings.warn("Zero area box skipped: {}.".format(box_part))
/home/user/anaconda3/envs/yolov5/lib/python3.7/site-packages/ensemble_boxes/ensemble_boxes_wbf.py:88: UserWarning: Zero area box skipped: [0.9967997074127197, 1.0, 1.0, 1.0].
  warnings.warn("Zero area box skipped: {}.".format(box_part))
/home/user/anaconda3/envs/yolov5/lib/python3.7/site-packages/ensemble_boxes/ensemble_boxes_wbf.py:88: UserWarning: Zero area box skipped: [0.9494722485542297, 0.0, 1.0, 0.0].
  warnings.warn("Zero area box skipped: {}.".format(box_part))
/home/user/anaconda3/envs/yolov5/lib/python3.7/site-packages/ensemble_boxes/ensemble_boxes_wbf.py:88: UserWarning: Zero area box skipped: [0.9957159161567688, 1.0, 1.0, 1.0].

Author

Chris-hughes10 commented Nov 25, 2021 •

edited

Loading

Hi @Chris-hughes10, you didn't use images without cars to train the model, right? How could we take them into account?

Do you know it can handle empty images?

Hi @mikel-brostrom @lhkhiem28 ,

I believe that the way to handle this is by passing in an arbitrary bbox and using the class -1 (background), although this isn't something I've tried. If that doesn't work, I would try asking over at https://github.com/rwightman/efficientdet-pytorch

Author

Chris-hughes10 commented Nov 25, 2021 •

edited

Loading

Hi @Chris-hughes10 what efficientDet configuration was used to train this model? i don't see what configuration of efficient det is used here.

Hi @Chris-hughes10 i want to use efficientdet-D0 configuration to train on my custom dataset,is it possible with this code?

Hi @bmblr497

I used a custom config here, using EfficientNetV2 as a backbone. To use a different config, you would have to pass in a different argument for architecture into the create model function. So for D0, just pass in efficientdet_d0.

Author

Chris-hughes10 commented Nov 25, 2021

Hi @Chris-hughes10, I'm struggling to use the backbones from timm.list_models('tf_efficientnetv2_*'). When I use one of them, my results are really poor, while when I'm using a model from efficientdet_model_param_dict my results are very good out of the box.

Hi @arekmula ,

Using the efficientnetv2 architecture was just for a bit of fun and demonstrate how it could be done. Overall the architecture is optimised for the original backbones, so I'm not surprised that they perform better.

Author

Chris-hughes10 commented Nov 25, 2021

@Chris-hughes10 do I change anything on the code aside from the num_classes if I am using multi-class detection? For example, I saw here that you have labels being 1

@typedispatch
    def predict(self, images: List):
        """
        For making predictions from images
        Args:
            images: a list of PIL images
        Returns: a tuple of lists containing bboxes, predicted_class_labels, predicted_class_confidences
        """
        image_sizes = [(image.size[1], image.size[0]) for image in images]
        images_tensor = torch.stack(
            [
                self.inference_tfms(
                    image=np.array(image, dtype=np.float32),
                    labels=np.ones(1),
                    bboxes=np.array([[0, 0, 1, 1]]),
                )["image"]
                for image in images
            ]
        )

        return self._run_inference(images_tensor, image_sizes)

Hi @sarmientoj24 ,

You shouldn't have to change anything else, as long as your data adaptor returns the correct labels. The labels and bboxes used in the predict function are dummy values which are not really used, so it doesn't make a difference what you set them to. Check out: https://medium.com/data-science-at-microsoft/training-efficientdet-on-custom-data-with-pytorch-lightning-using-an-efficientnetv2-backbone-1cdf3bd7921f for more details on this.

As for your error message, without seeing what you were trying to modify/execute the stack trace is not very helpful to me!

sarmientoj24 commented Dec 3, 2021

@Chris-hughes10
Thank you. For some reasons, whenever I add more augmentations to it, it doesn't even learn at all. My augmentations include the following

    return A.Compose(
        [
            A.RandomBrightnessContrast(
                brightness_limit=0.15, contrast_limit=0.15, p=0.75
            ),
            A.OneOf(
                [
                    A.Blur(blur_limit=5, p=1.0), A.MedianBlur(blur_limit=5, p=1.0)
                ], p=0.25
            ),
            A.GaussNoise(var_limit=(0.0001, 0.005), per_channel=False, p=0.3),
            A.VerticalFlip(p=0.5),
            A.HorizontalFlip(p=0.5),
            A.RandomRotate90(p=1),
            A.Transpose(p=1),
            A.ShiftScaleRotate(
                shift_limit=0.0625, scale_limit=0.2, rotate_limit=45, p=1
            ),
            A.Sharpen(p=0.25),
            A.Resize(height=512, width=512, p=1),
            ToTensorV2(p=1),
        ],
        p=1.0,
        bbox_params=A.BboxParams(
            format="pascal_voc", min_area=0.1, min_visibility=0.1, label_fields=["labels"]
        ),

When I say it doesn't learn, the mAP gets stuck 0 or -1.

I've checked my dataset module and it seems to be working fine with the augmentations (i.e. the boxes are also modified when augmenting).

Here is my code

DatasetModule

class EfficientDetDataset(Dataset):
    def __init__(self, dataset_adaptor, transforms=get_valid_transforms(), test=True):
        self.ds = dataset_adaptor
        self.transforms = transforms
        self.test = test
        self.num_imgs = len(self.ds)

    def __getitem__(self, index):
        (
            image,
            pascal_bboxes,
            class_labels,
            image_id,
        ) = self.ds.get_image_and_labels_by_idx(index)

        orig_sample = {
            "image": image,
            "bboxes": pascal_bboxes,
            "labels": class_labels,
        }

        if not self.test:
            for i in range(10):
                sample = self.transforms(**orig_sample)
                
                if len(sample["bboxes"]) > 0:
                    sample["bboxes"] = np.array(sample["bboxes"])
                    image = sample["image"]
                    pascal_bboxes = sample["bboxes"]
                    labels = sample["labels"]

                    _, new_h, new_w = image.shape
                    sample["bboxes"][:, [0, 1, 2, 3]] = sample["bboxes"][
                        :, [1, 0, 3, 2]
                    ]  # convert to yxyx

                    target = {
                        "bboxes": torch.as_tensor(sample["bboxes"], dtype=torch.float32),
                        "labels": torch.as_tensor(labels),
                        "image_id": torch.tensor([image_id]),
                        "img_size": (new_h, new_w),
                        "img_scale": torch.tensor([1.0]),
                    }
        else:
            sample = self.transforms(**orig_sample)
            sample["bboxes"] = np.array(sample["bboxes"])
            image = sample["image"]
            pascal_bboxes = sample["bboxes"]
            labels = sample["labels"]

            _, new_h, new_w = image.shape
            sample["bboxes"][:, [0, 1, 2, 3]] = sample["bboxes"][
                :, [1, 0, 3, 2]
            ]  # convert to yxyx

            target = {
                "bboxes": torch.as_tensor(sample["bboxes"], dtype=torch.float32),
                "labels": torch.as_tensor(labels),
                "image_id": torch.tensor([image_id]),
                "img_size": (new_h, new_w),
                "img_scale": torch.tensor([1.0]),
            }

        return image, target, image_id

    def __len__(self):
        return len(self.ds)


class EfficientDetDataModule(LightningDataModule):
    def __init__(
        self,
        train_dataset_adaptor,
        validation_dataset_adaptor,
        train_transforms=get_train_transforms(target_img_size=512),
        valid_transforms=get_valid_transforms(target_img_size=512),
        num_workers=8,
        batch_size=4,
    ):

        self.train_ds = train_dataset_adaptor
        self.valid_ds = validation_dataset_adaptor
        self.train_tfms = train_transforms
        self.valid_tfms = valid_transforms
        self.num_workers = num_workers
        self.batch_size = batch_size
        super().__init__()

    def train_dataset(self) -> EfficientDetDataset:
        return EfficientDetDataset(
            dataset_adaptor=self.train_ds, transforms=self.train_tfms, test=False
        )

    def train_dataloader(self) -> DataLoader:
        train_dataset = self.train_dataset()
        train_loader = torch.utils.data.DataLoader(
            train_dataset,
            batch_size=self.batch_size,
            shuffle=True,
            pin_memory=True,
            drop_last=False,
            num_workers=self.num_workers,
            collate_fn=self.collate_fn,
        )

        return train_loader

    def val_dataset(self) -> EfficientDetDataset:
        return EfficientDetDataset(
            dataset_adaptor=self.valid_ds, transforms=self.valid_tfms, test=True
        )

    def val_dataloader(self) -> DataLoader:
        valid_dataset = self.val_dataset()
        valid_loader = torch.utils.data.DataLoader(
            valid_dataset,
            batch_size=self.batch_size,
            shuffle=False,
            pin_memory=True,
            drop_last=False,
            num_workers=self.num_workers,
            collate_fn=self.collate_fn,
        )

        return valid_loader

    @staticmethod
    def collate_fn(batch):
        images, targets, image_ids = tuple(zip(*batch))
        images = torch.stack(images)
        images = images.float()

        boxes = [target["bboxes"].float() for target in targets]
        labels = [target["labels"].float() for target in targets]
        img_size = torch.tensor([target["img_size"] for target in targets]).float()
        img_scale = torch.tensor([target["img_scale"] for target in targets]).float()

        annotations = {
            "bbox": boxes,
            "cls": labels,
            "img_size": img_size,
            "img_scale": img_scale,
        }

        return images, annotations, targets, image_ids

Model

class EfficientDetModel(LightningModule):
    def __init__(
        self,
        num_classes=2,
        img_size=512,
        prediction_confidence_threshold=0.1,
        learning_rate=1e-3,
        wbf_iou_threshold=0.4,
        inference_transforms=get_valid_transforms(target_img_size=512),
        model_architecture="tf_efficientnetv2_b0",
        val_imgs=None
    ):
        super().__init__()
        self.img_size = img_size
        self.model = create_model(
            num_classes, img_size, architecture=model_architecture
        )
        self.prediction_confidence_threshold = prediction_confidence_threshold
        self.lr = learning_rate
        self.wbf_iou_threshold = wbf_iou_threshold
        self.inference_tfms = inference_transforms
        self.val_imgs = val_imgs

    def forward(self, images, targets):
        return self.model(images, targets)

    def configure_optimizers(self):
        return torch.optim.AdamW(self.model.parameters(), lr=self.lr)

    def training_step(self, batch, batch_idx):
        images, annotations, _, image_ids = batch

        losses = self.model(images, annotations)

        logging_losses = {
            "class_loss": losses["class_loss"].detach(),
            "box_loss": losses["box_loss"].detach(),
        }

        self.log(
            "train_loss",
            losses["loss"],
            on_step=True,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )
        self.log(
            "train_class_loss",
            losses["class_loss"],
            on_step=True,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )
        self.log(
            "train_box_loss",
            losses["box_loss"],
            on_step=True,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )

        return losses["loss"]

    @torch.no_grad()
    def validation_step(self, batch, batch_idx):
        images, annotations, targets, image_ids = batch
        outputs = self.model(images, annotations)

        detections = outputs["detections"]

        batch_predictions = {
            "predictions": detections,
            "targets": targets,
            "image_ids": image_ids,
        }

        logging_losses = {
            "class_loss": outputs["class_loss"].detach(),
            "box_loss": outputs["box_loss"].detach(),
        }

        self.log(
            "valid_loss",
            outputs["loss"],
            on_step=True,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )
        self.log(
            "valid_class_loss",
            logging_losses["class_loss"],
            on_step=True,
            on_epoch=True,
            prog_bar=True,
            logger=True,

        )
        self.log(
            "valid_box_loss",
            logging_losses["box_loss"],
            on_step=True,
            on_epoch=True,
            prog_bar=True,
            logger=True,
        )
        
        # if batch_idx == 0:
        #     images = []
            # for i in range(2):
            #     original_image = batch_images[i].permute(1, 2, 0).detach().cpu()
            #     reconstructed_image = reconstructed_images[i].permute(1, 2, 0).detach().cpu()
            #     image = torch.cat((original_image, reconstructed_image), dim=1)
            #     images.append(image.numpy())
            # self.logger.log_image(key="reconstructions", images=images)

        return {"loss": outputs["loss"], "batch_predictions": batch_predictions}

    @typedispatch
    def predict(self, images: List):
        """
        For making predictions from images
        Args:
            images: a list of PIL images

        Returns: a tuple of lists containing bboxes, predicted_class_labels, predicted_class_confidences

        """
        image_sizes = [(image.size[1], image.size[0]) for image in images]
        images_tensor = torch.stack(
            [
                self.inference_tfms(
                    image=np.array(image, dtype=np.float32),
                    labels=np.ones(1),
                    bboxes=np.array([[0, 0, 1, 1]]),
                )["image"]
                for image in images
            ]
        )

        return self._run_inference(images_tensor, image_sizes)

    def validation_epoch_end(self, outputs):
        """Compute and log training loss and accuracy at the epoch level."""

        validation_loss_mean = torch.stack(
            [output["loss"] for output in outputs]
        ).mean()

        (
            predicted_class_labels,
            image_ids,
            predicted_bboxes,
            predicted_class_confidences,
            targets,
        ) = self.aggregate_prediction_outputs(outputs)
        
                
        # print('######outputs########')
        # print(outputs)
        # print(predicted_class_labels)
        # print('######image_ids########')
        # print(image_ids)
        # print('######predicted_bboxes########')
        # print(predicted_bboxes)
        # print('######predicted_class_confidences########')
        # print(predicted_class_confidences)
        # print('######targets########')
        # print(targets)

        truth_image_ids = [target["image_id"].detach().item() for target in targets]
        truth_boxes = [
            target["bboxes"].detach()[:, [1, 0, 3, 2]].tolist() for target in targets
        ]  # convert to xyxy for evaluation
        truth_labels = [target["labels"].detach().tolist() for target in targets]

        
        if self.val_imgs:
            validation_imgs = []
            img_samples = min(len(image_ids), 8)
            
            for i in range(img_samples):
                img_id = image_ids[i]
                img_path = self.val_imgs[img_id]
                
                image = cv2.imread(img_path, cv2.IMREAD_COLOR)
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                
                pred_bbox = predicted_bboxes[i]
                pred_class = predicted_class_labels[i]
                
                # Predictions
                for j in range(len(pred_bbox)):
                    pt1 = (int(pred_bbox[j][0]), int(pred_bbox[j][1]))
                    pt2 = (int(pred_bbox[j][2]), int(pred_bbox[j][3]))
                    
                    color = PREDS_CLASS_COLORS[int(pred_class[j])]
                    cv2.rectangle(image, pt1, pt2, color, thickness=1)
                
                # Targets
                bbox_targets = truth_boxes[i]
                labels = truth_labels[i]
                
                for j in range(len(bbox_targets)):
                    pt1 = (int(bbox_targets[j][0]), int(bbox_targets[j][1]))
                    pt2 = (int(bbox_targets[j][2]), int(bbox_targets[j][3]))
                    
                    color = LABELS_CLASS_COLORS[int(labels[j])]
                    cv2.rectangle(image, pt1, pt2, color, thickness=2)
                    
                validation_imgs.append(image)
            self.logger.log_image(key="validation_samples", images=validation_imgs)

        stats = get_coco_stats(
            prediction_image_ids=image_ids,
            predicted_class_confidences=predicted_class_confidences,
            predicted_bboxes=predicted_bboxes,
            predicted_class_labels=predicted_class_labels,
            target_image_ids=truth_image_ids,
            target_bboxes=truth_boxes,
            target_class_labels=truth_labels,
        )["All"]

        self.log("val_loss", validation_loss_mean, on_epoch=True, logger=True)

        self.log("mAP_0_50", stats["AP_all_IOU_0_50"], on_epoch=True, logger=True)

        self.log("mAP_0_50_95", stats["AP_all"], on_epoch=True, logger=True)

        return {"val_loss": validation_loss_mean, "metrics": stats}

    @typedispatch
    def predict(self, images_tensor: torch.Tensor):
        """
        For making predictions from tensors returned from the model's dataloader
        Args:
            images_tensor: the images tensor returned from the dataloader

        Returns: a tuple of lists containing bboxes, predicted_class_labels, predicted_class_confidences

        """
        if images_tensor.ndim == 3:
            images_tensor = images_tensor.unsqueeze(0)
        if (
            images_tensor.shape[-1] != self.img_size
            or images_tensor.shape[-2] != self.img_size
        ):
            raise ValueError(
                f"Input tensors must be of shape (N, 3, {self.img_size}, {self.img_size})"
            )

        num_images = images_tensor.shape[0]
        image_sizes = [(self.img_size, self.img_size)] * num_images

        return self._run_inference(images_tensor, image_sizes)

    def aggregate_prediction_outputs(self, outputs):
        detections = torch.cat(
            [output["batch_predictions"]["predictions"] for output in outputs]
        )

        image_ids = []
        targets = []
        for output in outputs:
            batch_predictions = output["batch_predictions"]
            image_ids.extend(batch_predictions["image_ids"])
            targets.extend(batch_predictions["targets"])

        (
            predicted_bboxes,
            predicted_class_confidences,
            predicted_class_labels,
        ) = self.post_process_detections(detections)

        return (
            predicted_class_labels,
            image_ids,
            predicted_bboxes,
            predicted_class_confidences,
            targets,
        )

    def _run_inference(self, images_tensor, image_sizes):
        dummy_targets = self._create_dummy_inference_targets(
            num_images=images_tensor.shape[0]
        )

        detections = self.model(images_tensor.to(self.device), dummy_targets)[
            "detections"
        ]
        (
            predicted_bboxes,
            predicted_class_confidences,
            predicted_class_labels,
        ) = self.post_process_detections(detections)

        scaled_bboxes = self.__rescale_bboxes(
            predicted_bboxes=predicted_bboxes, image_sizes=image_sizes
        )

        return scaled_bboxes, predicted_class_labels, predicted_class_confidences

    def _create_dummy_inference_targets(self, num_images):
        dummy_targets = {
            "bbox": [
                torch.tensor([[0.0, 0.0, 0.0, 0.0]], device=self.device)
                for i in range(num_images)
            ],
            "cls": [torch.tensor([1.0], device=self.device) for i in range(num_images)],
            "img_size": torch.tensor(
                [(self.img_size, self.img_size)] * num_images, device=self.device
            ).float(),
            "img_scale": torch.ones(num_images, device=self.device).float(),
        }

        return dummy_targets

    def post_process_detections(self, detections):
        predictions = []
        for i in range(detections.shape[0]):
            predictions.append(
                self._postprocess_single_prediction_detections(detections[i])
            )

        predicted_bboxes, predicted_class_confidences, predicted_class_labels = run_wbf(
            predictions, image_size=self.img_size, iou_thr=self.wbf_iou_threshold
        )

        return predicted_bboxes, predicted_class_confidences, predicted_class_labels

    def _postprocess_single_prediction_detections(self, detections):
        boxes = detections.detach().cpu().numpy()[:, :4]
        scores = detections.detach().cpu().numpy()[:, 4]
        classes = detections.detach().cpu().numpy()[:, 5]
        indexes = np.where(scores > self.prediction_confidence_threshold)[0]
        boxes = boxes[indexes]

        return {"boxes": boxes, "scores": scores[indexes], "classes": classes[indexes]}

    def __rescale_bboxes(self, predicted_bboxes, image_sizes):
        scaled_bboxes = []
        for bboxes, img_dims in zip(predicted_bboxes, image_sizes):
            im_h, im_w = img_dims

            if len(bboxes) > 0:
                scaled_bboxes.append(
                    (
                        np.array(bboxes)
                        * [
                            im_w / self.img_size,
                            im_h / self.img_size,
                            im_w / self.img_size,
                            im_h / self.img_size,
                        ]
                    ).tolist()
                )
            else:
                scaled_bboxes.append(bboxes)

        return scaled_bbox

trancenoid commented Dec 3, 2021

@sarmientoj24 I think you are not normalizing the image. But yes, even I am facing issues while using augmentations.

sarmientoj24 commented Dec 3, 2021

@trancenoid
I was able the augmentations issue by replacing this part

target = {
            "bboxes": torch.as_tensor(pascal_bboxes, dtype=torch.float32),
           ...
        }

to this

target = {
                "bboxes": torch.as_tensor(sample["bboxes"], dtype=torch.float32),
                ...
            }

trancenoid commented Dec 3, 2021

@sarmientoj24 I cannot find where in the original code target = { "bboxes": torch.as_tensor(pascal_bboxes, dtype=torch.float32), ... } was being used

sarmientoj24 commented Dec 3, 2021

@trancenoid there at the EfficientDet DataModule

thekaranacharya commented Dec 8, 2021 •

edited

Loading

Thank you for this implementation @Chris-hughes10 .
Question: When we set pre_trained_backbone=True while initialising EfficientDet, aere we just freezing all the layers in the architecture or does the backend code in effdet also add some Linear Fully-connected layers on top of them after freezing?

Author

Chris-hughes10 commented Dec 9, 2021

Hi @thekaranacharya, we are not freezing anything at all, pre_trained_backbone=True just loads the pretrained weights into the backbone. In the create_model function, you can see that we are creating a new classification head and adding this onto the backbone.

azkalot1 commented Dec 27, 2021

Is it a typo in your dataset

    def __getitem__(self, index):
        pascal_bboxes = sample["bboxes"]
        sample["bboxes"][:, [0, 1, 2, 3]] = sample["bboxes"][
            :, [1, 0, 3, 2]
        ]  # convert to yxyx

        target = {
            "bboxes": torch.as_tensor(pascal_bboxes, dtype=torch.float32),
            "labels": torch.as_tensor(labels),
            "image_id": torch.tensor([image_id]),
            "img_size": (new_h, new_w),
            "img_scale": torch.tensor([1.0]),
        }

        return image, target, image_id

in getitem youi return pascal_boxes, which are xyxy, yet for some reason you convert sample['boxes'] after you defined pascal boxes. So what it is? Should we return boxes as yxyx or xyxy? Should it be
"bboxes": torch.as_tensor(pascal_bboxes, dtype=torch.float32),
or
"bboxes": torch.as_tensor(sample['boxes'], dtype=torch.float32), ?

Author

Chris-hughes10 commented Dec 28, 2021

Hi @azkalot1, You are correct, that is a typo and it has now been updated. However, in my experiments it seems that even if you provide boxes as xyxy the model adapts to this quite quickly.

azkalot1 commented Dec 28, 2021

    def __rescale_bboxes(self, predicted_bboxes, image_sizes):
        scaled_bboxes = []
        for bboxes, img_dims in zip(predicted_bboxes, image_sizes):
            im_h, im_w = img_dims

            if len(bboxes) > 0:
                scaled_bboxes.append(
                    (
                        np.array(bboxes)
                        * [
                            im_w / self.img_size,
                            im_h / self.img_size,
                            im_w / self.img_size,
                            im_h / self.img_size,
                        ]
                    ).tolist()
                )
            else:
                scaled_bboxes.append(bboxes)

        return scaled_bboxes

This is wrong. if you use PIL images, img.size will give you w, h. Look at your predict function.

image_sizes = [(image.size[1], image.size[0]) for image in images]

so images sizes are (h, w), right? and later you swap h and w. This is why you bbox has a weird shape - in a first predicted image
So you should use this in predict

image_sizes = [(image.size[1], image.size[0]) for image in images]

However, in my experiments it seems that even if you provide boxes as xyxy the model adapts to this quite quickly.
Meaning your experiments are wrong. Output of the model are xyxy - yes, the model takes expects yxyx and converts it yo xyxy later

Author

Chris-hughes10 commented Dec 28, 2021 •

edited

Loading

@azkalot1 I don't understand your comment, the snippets you have posted are the same.

As per the experiments, I tried using both xyxy and yxyx as input to see how the model behaves, so again, I don't understand how the experiments are wrong.

Additionally, it has been over a year since I first wrote this code, so I may be rusty on the details without diving into it again.

azkalot1 commented Dec 28, 2021

You have swapped height and width of the images, so you __rescale_bboxes produces wrong bboxes.

Author

Chris-hughes10 commented Dec 28, 2021

@azkalot1 How so? Image sizes returns (h, w) tuples, therefore the assignment:

im_h, im_w = img_dims

is correct right? The snippet that you have suggested needs to be added into the predict function is the same as what is already there.

azkalot1 commented Dec 28, 2021

You expect your image_sizes to be a list of tuple (im_h, im_w) - however, in predict function, you generate them like this

image_sizes = [(image.size[0], image.size[1]) for image in images]

and use them like this

im_h, im_w = img_dims

in __rescale_bboxes
However, this gives you a tuple of (im_w, im_h) - because PIL.Image.size is (width, height), not (height, width,) as np.ndarray. So basically you rescale using wrong dimensions.

azkalot1 commented Dec 28, 2021

Just look at the first example with the car image you generated using predict - this explains why the bbox is stretched and not a square

Author

Chris-hughes10 commented Dec 28, 2021 •

edited

Loading

@azkalot1 I believe that you have misread the code, image sizes are already generated using:

image_sizes = [(image.size[1], image.size[0]) for image in images]

as you can see:

so the assignment is correct. I believe that the box is stretched as the prediction is not 100% correct.

mrinath123 commented Jan 16, 2022 •

edited

Loading

One of the best articles on Effdet and PytorchLightning!
I had a doubt on the data normalization part, in this another fantastic notebook https://www.kaggle.com/shonenkov/training-efficientdet/notebook by Shonenkov, he didn't do any normalization, only divided by 255.
So I was just confused if Effdet is normalizing automatically when input is fed.(or it might be in the previous version)
So are you sure that the normalization part with Albumentaions is required?

gastruc commented Jan 16, 2022

Hi, thank you for this implementation @Chris-hughes10 .
I'm using your code with multi classification. My loss has a good convergence so I do not understand why my model predict no bbox in. every single image (whereas every image contains a bbox). Do you have an idea where this could come from?
Thanks,

ramdhan1989 commented Aug 22, 2022

hi,
I got this error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [41], in <cell line: 4>()
      1 from pytorch_lightning import Trainer
      2 trainer = Trainer(gpus=[0], max_epochs=5, num_sanity_val_steps=1)
----> 4 trainer.fit(model, dm)

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\trainer\trainer.py:553, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, train_dataloader)
    547 self.data_connector.attach_data(
    548     model, train_dataloaders=train_dataloaders, val_dataloaders=val_dataloaders, datamodule=datamodule
    549 )
    551 self.checkpoint_connector.resume_start()
--> 553 self._run(model)
    555 assert self.state.stopped
    556 self.training = False

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\trainer\trainer.py:918, in Trainer._run(self, model)
    915 self.checkpoint_connector.restore_training_state()
    917 # dispatch `start_training` or `start_evaluating` or `start_predicting`
--> 918 self._dispatch()
    920 # plugin will finalized fitting (e.g. ddp_spawn will load trained model)
    921 self._post_dispatch()

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\trainer\trainer.py:986, in Trainer._dispatch(self)
    984     self.accelerator.start_predicting(self)
    985 else:
--> 986     self.accelerator.start_training(self)

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\accelerators\accelerator.py:92, in Accelerator.start_training(self, trainer)
     91 def start_training(self, trainer: "pl.Trainer") -> None:
---> 92     self.training_type_plugin.start_training(trainer)

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py:161, in TrainingTypePlugin.start_training(self, trainer)
    159 def start_training(self, trainer: "pl.Trainer") -> None:
    160     # double dispatch to initiate the training loop
--> 161     self._results = trainer.run_stage()

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\trainer\trainer.py:996, in Trainer.run_stage(self)
    994 if self.predicting:
    995     return self._run_predict()
--> 996 return self._run_train()

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\trainer\trainer.py:1031, in Trainer._run_train(self)
   1028 if not self.is_global_zero and self.progress_bar_callback is not None:
   1029     self.progress_bar_callback.disable()
-> 1031 self._run_sanity_check(self.lightning_module)
   1033 # enable train mode
   1034 self.model.train()

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\trainer\trainer.py:1115, in Trainer._run_sanity_check(self, ref_model)
   1113 # run eval step
   1114 with torch.no_grad():
-> 1115     self._evaluation_loop.run()
   1117 self.on_sanity_check_end()
   1119 # reset validation metrics

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\loops\base.py:111, in Loop.run(self, *args, **kwargs)
    109 try:
    110     self.on_advance_start(*args, **kwargs)
--> 111     self.advance(*args, **kwargs)
    112     self.on_advance_end()
    113     self.iteration_count += 1

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py:110, in EvaluationLoop.advance(self, *args, **kwargs)
    107 dataloader_iter = enumerate(dataloader)
    108 dl_max_batches = self._max_batches[self.current_dataloader_idx]
--> 110 dl_outputs = self.epoch_loop.run(
    111     dataloader_iter, self.current_dataloader_idx, dl_max_batches, self.num_dataloaders
    112 )
    114 # store batch level output per dataloader
    115 if self.should_track_batch_outputs_for_epoch_end:

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\loops\base.py:111, in Loop.run(self, *args, **kwargs)
    109 try:
    110     self.on_advance_start(*args, **kwargs)
--> 111     self.advance(*args, **kwargs)
    112     self.on_advance_end()
    113     self.iteration_count += 1

File ~\Anaconda3\envs\SiT\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py:93, in EvaluationEpochLoop.advance(self, dataloader_iter, dataloader_idx, dl_max_batches, num_dataloaders)
     80 """Calls the evaluation step with the corresponding hooks and updates the logger connector.
     81 
     82 Args:
   (...)
     89     StopIteration: If the current batch is None
     90 """
     91 void(dl_max_batches, num_dataloaders)
---> 93 batch_idx, batch = next(dataloader_iter)
     95 if batch is None:
     96     raise StopIteration

File ~\Anaconda3\envs\SiT\lib\site-packages\torch\utils\data\dataloader.py:521, in _BaseDataLoaderIter.__next__(self)
    519 if self._sampler_iter is None:
    520     self._reset()
--> 521 data = self._next_data()
    522 self._num_yielded += 1
    523 if self._dataset_kind == _DatasetKind.Iterable and \
    524         self._IterableDataset_len_called is not None and \
    525         self._num_yielded > self._IterableDataset_len_called:

File ~\Anaconda3\envs\SiT\lib\site-packages\torch\utils\data\dataloader.py:561, in _SingleProcessDataLoaderIter._next_data(self)
    559 def _next_data(self):
    560     index = self._next_index()  # may raise StopIteration
--> 561     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562     if self._pin_memory:
    563         data = _utils.pin_memory.pin_memory(data)

File ~\Anaconda3\envs\SiT\lib\site-packages\torch\utils\data\_utils\fetch.py:49, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
     47 def fetch(self, possibly_batched_index):
     48     if self.auto_collation:
---> 49         data = [self.dataset[idx] for idx in possibly_batched_index]
     50     else:
     51         data = self.dataset[possibly_batched_index]

File ~\Anaconda3\envs\SiT\lib\site-packages\torch\utils\data\_utils\fetch.py:49, in <listcomp>(.0)
     47 def fetch(self, possibly_batched_index):
     48     if self.auto_collation:
---> 49         data = [self.dataset[idx] for idx in possibly_batched_index]
     50     else:
     51         data = self.dataset[possibly_batched_index]

Input In [36], in EfficientDetDataset.__getitem__(self, index)
     43 (
     44     image,
     45     pascal_bboxes,
     46     class_labels,
     47     image_id,
     48 ) = self.ds.get_image_and_labels_by_idx(index)
     50 sample = {
     51     "image": np.array(image, dtype=np.float32),
     52     "bboxes": pascal_bboxes,
     53     "labels": class_labels,
     54 }
---> 56 sample = self.transforms(**sample)
     57 sample["bboxes"] = np.array(sample["bboxes"])
     58 image = sample["image"]

File ~\Anaconda3\envs\SiT\lib\site-packages\albumentations\core\composition.py:182, in Compose.__call__(self, force_apply, *args, **data)
    179     for p in self.processors.values():
    180         p.preprocess(data)
--> 182 data = t(force_apply=force_apply, **data)
    184 if dual_start_end is not None and idx == dual_start_end[1]:
    185     for p in self.processors.values():

File ~\Anaconda3\envs\SiT\lib\site-packages\albumentations\core\transforms_interface.py:90, in BasicTransform.__call__(self, force_apply, *args, **kwargs)
     85             warn(
     86                 self.get_class_fullname() + " could work incorrectly in ReplayMode for other input data"
     87                 " because its' params depend on targets."
     88             )
     89         kwargs[self.save_key][id(self)] = deepcopy(params)
---> 90     return self.apply_with_params(params, **kwargs)
     92 return kwargs

File ~\Anaconda3\envs\SiT\lib\site-packages\albumentations\core\transforms_interface.py:103, in BasicTransform.apply_with_params(self, params, force_apply, **kwargs)
    101     target_function = self._get_target_function(key)
    102     target_dependencies = {k: kwargs[k] for k in self.target_dependence.get(key, [])}
--> 103     res[key] = target_function(arg, **dict(params, **target_dependencies))
    104 else:
    105     res[key] = None

File ~\Anaconda3\envs\SiT\lib\site-packages\albumentations\augmentations\transforms.py:602, in Normalize.apply(self, image, **params)
    601 def apply(self, image, **params):
--> 602     return F.normalize(image, self.mean, self.std, self.max_pixel_value)

File ~\Anaconda3\envs\SiT\lib\site-packages\albumentations\augmentations\functional.py:141, in normalize(img, mean, std, max_pixel_value)
    138 denominator = np.reciprocal(std, dtype=np.float32)
    140 img = img.astype(np.float32)
--> 141 img -= mean
    142 img *= denominator
    143 return img

ValueError: operands could not be broadcast together with shapes (512,512) (3,) (512,512)

any idea to solve this?

yasharazadvatan commented Apr 16, 2023

Hi @Chris-hughes10. thank you for this great job.
I'm new in this field. I have a dataset that is in Pascal VOC format with XML annotations. How can I use it to training? Can you please help me?

Soroosh-aval commented Apr 30, 2023 •

edited

Loading

Dear @Chris-hughes10, Thank you for this amazing work.

Have you done any work as to convert the saved model from torch.save() to onnx?

I am asking this question because I got stuck trying to convert the saved model. The conversion script that I am using is this:

import os
import io
import numpy as np
import pandas as pd
from functools import partial

from custom_utils import widerface_data_adaptor
from custom_utils import effdet_data_module
from custom_utils import effdet_model

import torch
import torch.onnx

from effdet import get_efficientdet_config, EfficientDet, DetBenchPredict

model_checkpoint_path = "/home/soroush.tabadkani/projects/efficientdet-pytorch/checkpoints/trained_effdet.pt"
device = torch.device('cuda')

input_shape = (1, 3, 512, 512)
dummy_input = torch.randn(input_shape, dtype=torch.float32, requires_grad=True).to(device)

net = effdet_model.EfficientDetModel(
    num_classes=1,
    img_size=512
    )

net.load_state_dict(torch.load(model_checkpoint_path))
net.eval()

dynamic_axes = {out:{0:'batch_size'} for out in ['outputs']}
dynamic_axes.update({input: {0: 'batch_size'} for input in ['inputs']})
              
torch.onnx.export(net.cuda(),
                  (dummy_input),
                  'efficientdet-d0.onnx',
                  input_names = ['inputs'],
                  output_names = ['outputs'],
                  verbose=True,
                  dynamic_axes=dynamic_axes,
                  opset_version=12)

and the error I get is:

torch.onnx.export(net.cuda(),
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/onnx/__init__.py", line 271, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/onnx/utils.py", line 88, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/onnx/utils.py", line 694, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/onnx/utils.py", line 457, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args,
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/onnx/utils.py", line 420, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/onnx/utils.py", line 380, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/jit/_trace.py", line 1139, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 891, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/jit/_trace.py", line 125, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/jit/_trace.py", line 116, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 862, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/soroush.tabadkani/projects/efficientdet-pytorch/env_test/lib/python3.8/site-packages/pytorch_lightning/core/decorators.py", line 62, in auto_transfer_args
    return fn(self, *args, **kwargs)
TypeError: forward() missing 1 required positional argument: 'targets'

No matter how many approaches I tried to solve this problem with, they all eventually resulted in the error above. Any help or guidance if you can kindly provide me with is deeply appreciated.

saikrishna550 commented Dec 1, 2024

Hi @ramdhan1989 Were you able to solve the operands broadcast issue? I am facing a similar error when training the model

ravil9C8 commented Jan 30, 2025

@Chris-hughes10
My model is predicting

The model i have trained has 15 classes. What could have possibly gone wrong?

Chris-hughes10/EfficientDet Pytorch-lightning with EfficientNet v2 backbone Blog Post.ipynb

arekmula commented Nov 4, 2021

Uh oh!

sarmientoj24 commented Nov 25, 2021

Uh oh!

sarmientoj24 commented Nov 25, 2021

Uh oh!

Chris-hughes10 commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Chris-hughes10 commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Chris-hughes10 commented Nov 25, 2021

Uh oh!

Chris-hughes10 commented Nov 25, 2021

Uh oh!

sarmientoj24 commented Dec 3, 2021

Uh oh!

trancenoid commented Dec 3, 2021

Uh oh!

sarmientoj24 commented Dec 3, 2021

Uh oh!

trancenoid commented Dec 3, 2021

Uh oh!

sarmientoj24 commented Dec 3, 2021

Uh oh!

thekaranacharya commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Chris-hughes10 commented Dec 9, 2021

Uh oh!

azkalot1 commented Dec 27, 2021

Uh oh!

Chris-hughes10 commented Dec 28, 2021

Uh oh!

azkalot1 commented Dec 28, 2021

Uh oh!

Chris-hughes10 commented Dec 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azkalot1 commented Dec 28, 2021

Uh oh!

Chris-hughes10 commented Dec 28, 2021

Uh oh!

azkalot1 commented Dec 28, 2021

Uh oh!

azkalot1 commented Dec 28, 2021

Uh oh!

Chris-hughes10 commented Dec 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrinath123 commented Jan 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gastruc commented Jan 16, 2022

Uh oh!

ramdhan1989 commented Aug 22, 2022

Uh oh!

yasharazadvatan commented Apr 16, 2023

Uh oh!

Soroosh-aval commented Apr 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saikrishna550 commented Dec 1, 2024

Uh oh!

ravil9C8 commented Jan 30, 2025

Uh oh!

Chris-hughes10 commented Nov 25, 2021 •

edited

Loading

Chris-hughes10 commented Nov 25, 2021 •

edited

Loading

thekaranacharya commented Dec 8, 2021 •

edited

Loading

Chris-hughes10 commented Dec 28, 2021 •

edited

Loading

Chris-hughes10 commented Dec 28, 2021 •

edited

Loading

mrinath123 commented Jan 16, 2022 •

edited

Loading

Soroosh-aval commented Apr 30, 2023 •

edited

Loading