约定¶

如果您想将 MMDetection 修改为自己的项目，请查看以下约定。

关于图像形状的顺序¶

在 OpenMMLab 2.0 中，为了与 OpenCV 的输入参数保持一致，数据转换管道中关于图像形状的参数始终为 (width, height) 顺序。相反，为了计算方便，通过数据管道和模型的字段顺序为 (height, width)。具体来说，在每个数据转换管道处理的结果中，字段及其值含义如下

img_shape: (height, width)
ori_shape: (height, width)
pad_shape: (height, width)
batch_input_shape: (height, width)

例如，Mosaic 的初始化参数如下

@TRANSFORMS.register_module()
class Mosaic(BaseTransform):
    def __init__(self,
                img_scale: Tuple[int, int] = (640, 640),
                center_ratio_range: Tuple[float, float] = (0.5, 1.5),
                bbox_clip_border: bool = True,
                pad_val: float = 114.0,
                prob: float = 1.0) -> None:
       ...

       # img_scale order should be (width, height)
       self.img_scale = img_scale

    def transform(self, results: dict) -> dict:
        ...

        results['img'] = mosaic_img
        # (height, width)
        results['img_shape'] = mosaic_img.shape[:2]

损失¶

在 MMDetection 中，model(**data) 将返回包含损失和指标的 dict。

例如，在 bbox 头部，

class BBoxHead(nn.Module):
    ...
    def loss(self, ...):
        losses = dict()
        # classification loss
        losses['loss_cls'] = self.loss_cls(...)
        # classification accuracy
        losses['acc'] = accuracy(...)
        # bbox regression loss
        losses['loss_bbox'] = self.loss_bbox(...)
        return losses

bbox_head.loss() 将在模型前向传播期间被调用。返回的字典包含 'loss_bbox'，'loss_cls'，'acc'。只有 'loss_bbox'，'loss_cls' 将在反向传播期间使用，'acc' 将仅用作指标来监控训练过程。

默认情况下，只有键包含 'loss' 的值才会反向传播。可以通过修改 BaseDetector.train_step() 来改变此行为。

空提案¶

在 MMDetection 中，我们为两阶段的空提案添加了特殊处理和单元测试。我们需要同时处理整个批次和单张图像的空提案。例如，在 CascadeRoIHead 中，

# simple_test method
...
# There is no proposal in the whole batch
if rois.shape[0] == 0:
    bbox_results = [[
        np.zeros((0, 5), dtype=np.float32)
        for _ in range(self.bbox_head[-1].num_classes)
    ]] * num_imgs
    if self.with_mask:
        mask_classes = self.mask_head[-1].num_classes
        segm_results = [[[] for _ in range(mask_classes)]
                        for _ in range(num_imgs)]
        results = list(zip(bbox_results, segm_results))
    else:
        results = bbox_results
    return results
...

# There is no proposal in the single image
for i in range(self.num_stages):
    ...
    if i < self.num_stages - 1:
          for j in range(num_imgs):
                # Handle empty proposal
                if rois[j].shape[0] > 0:
                    bbox_label = cls_score[j][:, :-1].argmax(dim=1)
                    refine_roi = self.bbox_head[i].regress_by_class(
                         rois[j], bbox_label, bbox_pred[j], img_metas[j])
                    refine_roi_list.append(refine_roi)

如果您自定义了 RoIHead，您可以参考上述方法来处理空提案。

COCO 全景分割数据集¶

在 MMDetection 中，我们支持 COCO 全景分割数据集。我们在此阐明有关 CocoPanopticDataset 实现的一些约定。

对于 mmdet<=2.16.0，语义分割中前景和背景标签的范围不同于 MMDetection 的默认设置。标签 0 代表 VOID 标签，类别标签从 1 开始。从 mmdet=2.17.0 开始，语义分割的类别标签从 0 开始，标签 255 代表 VOID，以与边界框的标签保持一致。为了实现这一点，Pad 管道支持为 seg 设置填充值。
在评估中，全景分割结果是一个与原始图像形状相同的映射。结果映射中的每个值都具有 instance_id * INSTANCE_OFFSET + category_id 的格式。