图像读取

opencv 无法正确读取用于语义分割的mask图像, 因为opencv在读取png格式的图像时, 自动将单通道转换为三通道, 像素值也发生改变

import cv2 as cv
import numpy as np
import PIL.Image as Image

def countColors(img):
    m, n = img.shape[:2]
    pxtype = list if img.ndim > 2 else int
    colors = []
    for i in range(m):
        for j in range(n):
            px = pxtype(img[i, j])
            if px in colors:
                continue
            else:
                colors.append(px)
    print(colors)

path = "examples/imgs/Person/2008_002176.png"

# 转换的像素值
img = cv.imread(path)
print(img.shape)     # (294, 500, 3)
countColors(img)     # [[0, 0, 0], [0, 0, 128], [128, 0, 0], 
					 # [0, 128, 0], [0, 128, 128], [128, 0, 128]]

# 真实的像素值
img = np.array(Image.open(path))
print(img.shape)	 # (294, 500)
img = cv.merge([img, img, img])
countColors(img) 	 # [[0, 0, 0], [1, 1, 1], [4, 4, 4], 
					 # [2, 2, 2], [3, 3, 3], [5, 5, 5]]

# opencv单通道读取
img = cv.imread(path, 0)
print(img.shape)	 # (294, 500)
img = cv.merge([img, img, img])
countColors(img)	 # [[0, 0, 0], [38, 38, 38], [14, 14, 14], 
					 # [75, 75, 75], [113, 113, 113], [52, 52, 52]]

可以看出, 原本的mask图像是单通道的, 每个像素值代表了当前像素所属的标签. 当opencv转换后, 是不存在这一概念的, 这样读入的数据是无法用于深度学习训练的, 即使单通道读入也是不一样, 还很麻烦.

避坑指南: 使用 Pillow 库读取图像


图像尺寸变换

opencv的 cv::Size定义与Python版本中的shape定义是不一样的.

Size相关的是 W x H

  • size = cv::Size(imageWidth, imageHeight)
  • img = cv.resize(img, (imageWidth, imageHeight))

shape相关的是 m x n

  • rows, cols = img.shape[:2]
  • imageHeight, imageWidth = img.shape[:2]

避坑指南:

imageHeight, imageWidth = img.shape[:2]
...  # processing
dst= cv.resize(dst, (imageWidth, imageHeight), interpolation=cv.INTER_CUBIC)