SSD 算法在先验框匹配上,采用了两个原则:

  • 对于图像中每一个 ground truth 找到与其 IOU 最大的的先验框, 该先验框为正样本, 若一个先验框没有与任何的 ground truth 匹配,则为负样本。
  • 对于剩下的未匹配的先验框,若与某个 ground truth 的 IOU 大于某个阈值(一般取0.5),则该先验框也与 ground truth 匹配

通过代码可以看出:

 def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
    """Match each prior box with the ground truth box of the highest jaccard overlap, encode the bounding boxes, then return the matched indices corresponding to both confidence and location preds. Args: threshold: (float) The overlap threshold used when mathing boxes. truths: (tensor) Ground truth boxes, Shape: [num_obj, 4]. priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4]. variances: (tensor) Variances corresponding to each prior coord, Shape: [num_priors, 4]. labels: (tensor) All the class labels for the image, Shape: [num_obj]. loc_t: (tensor) Tensor to be filled w/ endcoded location targets. conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds. idx: (int) current batch index Return: The matched indices corresponding to 1)location and 2)confidence preds. """
    # jaccard index 每个真实框和先验框的IOU
    overlaps = jaccard(
        truths, #(x1,y1,w,h)
        point_form(priors)  # priors:(cx,cy,w,h) 转换成(x1,y1,w,h)
    ) # 二维张量,真实box数*先验框数
    
    # (Bipartite Matching)
    # [num_objects,1] best prior for each ground truth 每个真值对应的最好的先验框,依然保持维度不变
    # best_prior_idx存放的是先验框的id
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
    # [1,num_priors] best ground truth for each prior 每一个先验框对应最好的真值
    # best_truth_idx存放的是真值的id
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    # 往往 len(best_truth_idx) > len(best_prior_idx)

    best_truth_idx.squeeze_(0) # [num_priors]
    best_truth_overlap.squeeze_(0)
    best_prior_idx.squeeze_(1) # [num_objects]
    best_prior_overlap.squeeze_(1)
    best_truth_overlap.index_fill_(0, best_prior_idx, 2)  # ensure best prior

    for j in range(best_prior_idx.size(0)): # 0 -> (num_objects-1)
        best_truth_idx[best_prior_idx[j]] = j
    # 广播,best_truth_idx长度为num_priors,best_truth_idx装着objects序号(truths序号)
    # 表示第i个先验框对应的truths框坐标,总共num_priors个先验框
    matches = truths[best_truth_idx]  # Shape: [num_priors,4]
    
    # conf装着每个先验框对应的label值 +1处理,为了添加背景这一类
    conf = labels[best_truth_idx] + 1         # Shape: [num_priors]
    conf[best_truth_overlap < threshold] = 0  # label as background
    loc = encode(matches, priors, variances)
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior