In image analysis, to speed up the training time of a neural network and obtain a good generalization of it in inference, the distribution of the dataset can be modified using 3 techniques: normalization, zero centering and standardization. [1]
Normalization
In computer vision, the pixel normalization technique is often used to speed up model learning. The normalization of an image consists in dividing each of its pixel values by the maximum value that a pixel can take (255 for an 8-bit image, 4095 for a 12-bit image, 65 535 for a 16-bit image).
The CT images are mainly encoded in 12 bits and gray levels.To normalize a CT image, each pixel value is divided by 4095, to obtain a value between [0; 1].
Here is an example of a CT volume normalization with the TorchIO library [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
import torchio as tio
import matplotlib.pyplot as plt
import seaborn as sns
IMAGES_PATH = "./data/images_paths.csv"
MASKS_PATH = "./data/masks_paths.csv"
def trainset_io(X, y):
"""
To create a 3D train dataset with TorchIO library
Intensity rescaling between [0,1] for the CT air and bone HU values
"""
subjects_list = []
for (X_path, y_path) in zip(X.iteritems(), y.iteritems()):
subject = tio.Subject(
image=tio.Image(path = X_path[1], type="intensity"),
label=tio.LabelMap(path = y_path[1]),
)
subjects_list.append(subject)
transform = tio.Compose([
tio.transforms.ToCanonical(),
tio.transforms.Resample("image"),
tio.transforms.RescaleIntensity(out_min_max = [0,1], in_min_max = [-1000, 1000] ) # normalization between CT air and bone HU values
])
return tio.SubjectsDataset(subjects_list, transform=transform, load_getitem = True)
# data importation
images = pd.read_csv(IMAGES_PATH)
masks = pd.read_csv(MASKS_PATH)
patients = pd.merge(images, masks, left_index=True, right_index=True)
# train/validation split
patients_train, patients_val, masks_train, masks_val = \
train_test_split(patients['path_to_image_x'], \
patients['path_to_image_y'], \
test_size = 0.3, random_state = 42)
patients_train = patients_train.reset_index(drop = True)
patients_val = patients_val.reset_index(drop = True)
masks_train = masks_train.reset_index(drop = True)
masks_val = masks_val.reset_index(drop = True)
# dataset creation
train_set = trainset_io(patients_train, masks_train)
# pixels' intensity histogram
original_sample = tio.Image(patients_train[0], type="intensity")
rescaled_sample = train_set.__getitem__(0)
fig, axes = plt.subplots(2, 1)
sns.distplot(original_sample.data, ax=axes[0], kde=False)
sns.distplot(rescaled_sample.image.data, ax=axes[1], kde=False)
axes[0].set_title('Original histogram')
axes[1].set_title('Intensity rescaling')
axes[0].set_ylim((0, 1e6))
axes[1].set_ylim((0, 1e6))
plt.tight_layout()
Comparison of the pixel intensities distribution of a CT volume before and after normalization - credits: IMAIOS SAS
Zero centering
Zero centering of the data can be done before or after the normalization. Each pixel value is subtracted from the average value of the pixels’ subsample. Thus, the data is zero-centered, i.e. the average of the pixels is equal to zero. There are four types of zero centering, where the pixels average is calculated either by:
- frame where all pixels in the image are included in the averaging,
- channel where all the pixels of a channel of the image are included in the calculation of the average,
- mini-lot where all the pixels contained in a mini-lot are integrated in the calculation of the average,
- training dataset where all the pixels of the dataset are taken into account for the calculation of the average.
Zero centering the data is preferable when using Sigmoid or ReLU activation functions, as it avoids gradient saturation. This phenomenon occurs when the gradients tend to zero during gradient descent. [3]
Standardization
In medical image analysis, and in particular with the CT modality, the standardization of the dataset histograms is often a prerequisite before entering the neural network. Indeed, there are several acquisition and reconstruction protocols, depending on the clinical requirements of the examination, and several settings depending on the manufacturer of the acquisition machine and the examination center. The result is a CT image dataset with very different radiomic characteristics. In order to ensure a good generalization of the model in inference, it is important that the training and validation data subsets have similar distributions. [4]
Standardization results in a distribution of a sample of data with mean equal to 0 and standard deviation equal to 1. The mean and standard deviation are calculated either per image or per dataset. In the first case, we talk about "sample-wise" standardization and in the second case about "feature-wise" standardization.
Here is an example of "sample-wise" standardization from a CT volume with the TorchIO library [2] :
import pandas as pd
from sklearn.model_selection import train_test_split
import torchio as tio
import matplotlib.pyplot as plt
import seaborn as sns
IMAGES_PATH = "./data/images_paths.csv"
MASKS_PATH = "./data/masks_paths.csv"
def trainset_io(X, y):
"""
To create a 3D trainset with TorchIO library
"""
subjects_list = []
for (X_path, y_path) in zip(X.iteritems(), y.iteritems()):
subject = tio.Subject(
image=tio.Image(path = X_path[1], type="intensity"),
label=tio.LabelMap(path = y_path[1]),
)
subjects_list.append(subject)
transform = tio.Compose([
tio.transforms.ToCanonical(),
tio.transforms.Resample("image"),
tio.transforms.ZNormalization() # standardization by mean and standard deviation
])
return tio.SubjectsDataset(subjects_list, transform=transform, load_getitem = True)
# data importation
images = pd.read_csv(IMAGES_PATH)
masks = pd.read_csv(MASKS_PATH)
patients = pd.merge(images, masks, left_index=True, right_index=True)
# train/validation split
patients_train, patients_val, masks_train, masks_val = \
train_test_split(patients['path_to_image_x'], \
patients['path_to_image_y'], \
test_size = 0.3, random_state = 42)
patients_train = patients_train.reset_index(drop = True)
patients_val = patients_val.reset_index(drop = True)
masks_train = masks_train.reset_index(drop = True)
masks_val = masks_val.reset_index(drop = True)
# dataset creation
train_set = trainset_io(patients_train, masks_train)
# pixels' intensity histogram
sample = tio.Image(patients_train[0], type="intensity")
rescaled_sample = train_set.__getitem__(0)
fig, axes = plt.subplots(2, 1)
sns.distplot(sample.data, ax=axes[0], kde=False)
sns.distplot(rescaled_sample.image.data, ax=axes[1], kde=False)
axes[0].set_title('Original histogram')
axes[1].set_title('Intensity standardization')
axes[0].set_ylim((0, 1e6))
axes[1].set_ylim((0, 1e6))
plt.tight_layout()
Comparison of the pixel intensities distribution of a CT volume before and after standardization - credits: IMAIOS SAS
In addition to standardizing the input dataset of the network, it is common practice to apply a data standardization layer between the convolution layers of the network. There are four types of standardization [5]:
- batch-based,
- per layer,
- per instance,
- per group.
Credits : original publication [6]
We often speak inaccurately of "batch normalization", whereas it is in fact a standardization by batch when the standardization is performed on the axes (B, H, W) of the data tensor, with B the batch size, H the height of the feature/image map, W the width of the feature/image map. It has been observed that when batch standardization is applied, as the batch size decreases, the value of the learning error increases. Consequently, this type of standardization is not recommended when the learning task requires small batches of data, such as segmentation and object detection from medical images.
Layer normalization is computed along the axes (C, H, W) of the data tensor, with C the number of channels, H the height of the feature/image map, W the width of the feature/image map.
Instance normalization is equivalent to standardization by batch if the batch size is 1, or to layer normalization if there is only one channel. It is computed individually for each element of the feature/image map along the axes (H, W), where H is the height and W is the width.
The group normalization is independent of the batch size. The channels of the feature/image map are divided into groups and the standardization is calculated for each of them.
Cover image under CC BY 4.0 license from the University of Erlangen-Nuremberg Deep Learning lecture.
[2] Pérez-García, F., Sparks, R., & Ourselin, S. (2021). TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Computer Methods and Programs in Biomedicine, 106236.
[3] https://rohanvarma.me/inputnormalization/
[4] Selim, M., Zhang, J., Fei, B., Zhang, G. Q., & Chen, J. (2020). STAN-CT: Standardizing CT Image using Generative Adversarial Networks. In AMIA Annual Symposium Proceedings (Vol. 2020, p. 1100). American Medical Informatics Association.
[5] https://towardsdatascience.com/what-is-group-normalization-45fe27307be7
[6] Wu, Y., & He, K. (2018). Group normalization. In Proceedings of the European conference on computer vision (ECCV) (pp. 3-19).