Efficient deep learning-based approach for malaria detection using red blood cell smears | Scientific Reports – Nature.com

Posted: June 11, 2024 at 2:48 am


without comments

This section contains the details of the proposed methodology and details of the pre-trained models employed in this study. Figure 1 shows the workflow of the proposed methodology. The details of each step are provided in the subsequent sections.

Work flow of proposed methodology.

The dataset used in this study was obtained from the public data repository. It contains a total of 27,558 cell images with 13,779 parasitized images and 13,779 uninfected images. These images were obtained from 150 unhealthy patients (infected individuals) and 50 healthy patients. The expert slide-readers and pathologists manually annotated the whole dataset. Color variations in red cell images are due to different blood stains during the image acquisition process. Figure 2 shows samples of parasitized cell images and uninfected cell images.

Samples taken from the red blood cell image datasets contain parasitized cell images and uninfected cell images.

Preprocessing is a very crucial and initial step for deep learning image classification tasks. The dataset contains 13,779 images of parasitized cells and 13,779 images of uninfected cells, which are equally balanced. The cell images contain various widths and heights, and the deep learning model requires equal or fixed-size input. To test the models robustness and compatibility, we resized the images. After resizing, the next important step is to split the cell images into two parts; training and testing. The 80% data are used to train the deep learning models and 20% are kept for testing the model efficacy and performance. Table 1 shows the parasitized and uninfected images after data splitting into train and test.

Deep learning models learn complex patterns of data through various layers. Deep learning has demonstrated effectiveness in many image classification tasks in medical, engineering, and other applications. Deep learning models work well on large datasets, however, consume a lot of computational resources. The hyperparameter settings, loss function, and other layers are used to solve these problems by fastening the training process of deep learning models, reducing computational time, reducing layers, and creating efficient deep models30. Transfer learning is a popular technique that favors the pre-trained models that have been trained on large datasets such as ImageNet, and produces better results for small datasets (Table 2).

Architecture of proposed deep learning model for malaria detection.

EfficientNet-B2 is a CNN model that is exceptionally accurate and reliable and is mostly used for image classification problems. It is well suited for problems that require fewer parameters and have minimal processing resources. Using depth-wise separable convolutions (DWSC), an efficient scaling approach, this model improves the classification accuracy. The main aim of using EfficientNet-B2 in disease detection is its efficiency and accuracy because of its small model size and minimal computing resources. Figure 3 shows the architecture of the proposed model. The development of the EfficientNet-B2 model leads to the addition of a dropout layer, ultimately yielding an output shape of (5,5,1408). We use the flattened layer to convert the multi-dimensional input layer into a one-dimensional one. After that, we utilized three dense layers, four batch normalization layers, and three activation layers. We achieved this after flattening the layers into a single dimension. The first two dense layers of the network utilize ReLU activation functions. The Rectified Linear Unit (ReLU) functions not only collect complicated patterns correctly, but they also lower the chances of overfitting and generalization errors. This makes the model work better overall. The last dense layer primarily employs the sigmoid activation function for classification activities, particularly in binary classification situations. We use this function to complete classification tasks. Batch normalization is an essential component of deep learning architectures that improves accuracy while simultaneously speeding up the training process.

For training purposes, batch normalization uses a small amount of data to calculate the mean and standard deviation of each feature. The statistical data is then used to standardize the input when that step is completed. This approach minimizes internal co-variate shift, which is the change in the distribution of network activation resulting from differences in the parameters of the training process so that it can be used more efficiently. The efficiency of optimization techniques can be increased by batch normalization, which involves standardizing the input. If this is done, the model can be built more quickly and is less likely to encounter gradients that are evaporating or exploding. Additionally, it acts as a regularizer, which means it reduces the need for additional methods of regularization.

Malaria can be detected by analyzing images for symptoms using deep learning models that focus on red blood cells. The proposed model is trained to identify malaria-related symptoms by employing a collection of expert classifications applied to blood cells. Once the model has been adequately trained, it will have the ability to evaluate recently obtained blood cells and offer medical personnel useful information, thereby enabling a faster and more precise diagnosis. Once the model is adequately trained, it possesses the potential to aid physicians in the diagnostic process by classifying newly obtained blood cell samples as either infected or uninfected with malaria. Utilizing deep learning-based malaria detection models in clinical settings offers several potential advantages. These devices have the capability to deliver precise and prompt diagnosis, particularly in regions where there is a scarcity of skilled microscopists. These techniques expedite the initiation of medication for individuals with malaria, enabling front-line healthcare professionals to promptly identify the infection. Consequently, the incidence and mortality rates linked to malaria decline. Moreover, automated analysis is capable of efficiently managing a significant volume of samples on a broad scale, therefore alleviating the workload of laboratory personnel, particularly during outbreaks or monitoring initiatives.

This study also employed fine-tuned deep learning models such as CNN, VGG-16, DenseNet version 121,169, 201, Inception version 3, etc. for Malaria detection. Different pre-trained fine-tuned deep learning models and their trainable parameters are given in Table 3.

A CNN, a type of neural network, consists of numerous layers and aims to directly identify patterns from image pixels. It requires minimal pre-processing31. The convolution layer, the pooling layer, and the fully connected layer are the three essential layers that are widely considered to be the foundation of a CNN. We utilized three convolution blocks, three Maxpooling blocks, and three blocks for Batch normalization, ReLU activation, and Dropout layers. The convolution layer, a fundamental component of a CNN, performs the majority of the computational work. This layer performs the convolution or filtration operation on the input and then transmits the response to the subsequent layer. We place the pooling layer between the successive convolution layers to spatially reduce the input representation and the required processing space. This layer performs the pooling process on each sliced input, thereby reducing the computational workload for the subsequent convolution layer. After that, We flatten all the layers into single dimensions and then add two dense layers with Batch and ReLU activation. The application of the completely linked layer (sigmoid layer) generates the final output, which is also equal to the number of classes32. The detailed architecture of CNN is shown in Fig. 4.

Detailed CNN architecture.

In 2014, VGG16 won the ILSVR (ImageNet) competition and is now considered one of the most advanced vision models available. The VGG-16 network was trained using the ImageNet database and consists of 16 weighted layers, including 13 convolutional layers and 3 fully connected layers. Despite limited image datasets, the VGG-16 network delivers high accuracy due to its extensive training. VGG16 is capable of both object detection and classification with 92.7% accuracy, classifying 1000 images into 1000 unique categories. It is a widely used image classification algorithm that is easy to implement using transfer learning. By adding new layers to neural networks and utilizing batch normalization, the training process can be accelerated, making learning easier and the model more robust33.

Inception V3 is a deep CNN architecture introduced in 2015 by Google researchers. It is the third version of the Inception family of models and is designed to be more efficient and accurate than its predecessors. The Inception V3 model boasts a more expansive network compared to its predecessors, the Inception V1 and V2 models. This deep CNN is specifically designed to be trained on low-configuration computers, although it is still a challenging process that can take several days. Transfer learning provides a solution to this issue by retaining the parameters of previous layers while only updating the last layer for new categories. This approach involves deconstructing the Inception V3 model by removing its final layer, thereby leveraging the benefits of transfer learning34.

DenseNet121 is a CNN architecture that has gained widespread use in image classification tasks since its introduction in 2017. DenseNet121 architecture aims to increase the depth of deep learning networks while improving their training efficiency. This is achieved through the use of short connections between layers. In DenseNet, each layer is connected to all other layers that are deeper in the network, making it a CNN. The number 121 pertains to the count of layers with trainable weights, excluding batch normalization layers. The remaining 5 layers consist of the initial 77 convolutional layer, 3 transitional layers, and a fully connected layer35.

DenseNet169 is a deep CNN architecture that is part of the DenseNet family of models. It was introduced by researchers at Facebook AI Research in 2017 as an improvement over the original DenseNet model. DenseNet169 has 169 layers, which is more than the original DenseNet but less than DenseNet201. Like other DenseNet models, DenseNet169 uses dense connectivity to promote feature reuse and reduce the number of parameters needed to train the network. It also includes bottleneck layers to reduce the computational cost of convolutions. DenseNet169 has achieved state-of-the-art performance on several benchmark datasets, making it a popular choice for image classification tasks requiring high accuracy36.

DenseNet20137 is a deep CNN architecture. DenseNet201 uses a dense connectivity structure, where each layer is connected to every other layer in a feed-forward fashion. This dense connectivity promotes feature reuse and reduces the number of parameters needed to train the network. DenseNet201 also includes a feature called bottleneck layers which reduces the computational cost of convolutions by using 11 convolutions to reduce the dimensionality of the input. DenseNet201 has achieved state-of-the-art performance on several benchmark datasets and is widely used in image classification tasks.

ResNet50, an architecture in deep learning, was introduced in 2015 by Microsoft researchers. It has found applications in a range of computer vision tasks, including the analysis of medical images. ResNet50 is designed to overcome the challenge of vanishing gradients by introducing shortcut connections that allow the network to learn residual representations. By utilizing ResNet50, researchers have been able to attain various results in computer vision tasks, including object detection, image classification, and medical image analysis38.

EfficientNet-B1 is a neural network architecture that was proposed by Google researchers in 2019. It is part of the EfficientNet family of models that are designed to achieve high accuracy while minimizing computational resources. It has fewer parameters and floating-point operations (FLOP) than larger models but still achieves competitive performance on various benchmark datasets. EfficientNet-B1 has been used in a range of computer vision tasks, including image classification, object detection, and segmentation39. Its efficient design makes it particularly suitable for mobile and embedded devices.

EfficientNet-B7 is a powerful model that has shown promising results in a variety of computer vision tasks, including medical image analysis. It is the largest model in the EfficientNet family and has significantly more parameters and FLOP than smaller models in the family. EfficientNet-B740 achieves state-of-the-art performance on various benchmark datasets, including ImageNet, with significantly fewer computational resources than previous state-of-the-art models. However, due to its large size, EfficientNet-B7 may not be suitable for mobile and embedded devices with limited computational resources.

MobileNet is a family of neural network architectures that are designed to be efficient on mobile and embedded devices with limited computational resources. It was proposed by Google researchers in 2017 and has since become a popular choice for a range of computer vision tasks. MobileNet achieves its efficiency by using depth-wise separable convolutions, which separate the spatial and channel-wise dimensions of convolutions and reduce the number of parameters and computations. This design allows MobileNet to achieve high accuracy while requiring significantly fewer resources than larger models. MobileNet has been implemented in various frameworks and is widely used in real-world applications41.

MobileNetV2 is a follow-up to the original MobileNet architecture, proposed by Google researchers in 2018. It further improves the efficiency and accuracy of the original architecture by introducing several novel features. One of the key improvements is the use of a bottleneck block that expands and then contracts the number of channels, allowing for better feature extraction. MobileNetV2 also uses a technique called linear bottlenecks, which adds a linear activation function after each depth-wise convolution to further reduce the computational cost. These innovations make MobileNetV2 one of the most efficient neural network architectures for mobile and embedded devices, while still achieving high accuracy on a range of computer vision tasks39.

The performance of all models that were used in this study was evaluated using precision, recall, F1 score, and accuracy. After training the model, the testing part is used to test the models efficiency and classification. The performance is also evaluated using the confusion matrix. The confusion matrix constitutes TP, TN, FP, and FN predictions.

TP: The true positive rate refers to the actual positive class that is predicted to be positive.

TN: The true negative rate refers to the correct negative predictions made by the model among all negative records.

FP: There is a false positive rate that states the actual negative predictions that are classified as positive by the model.

FN: There is a false negative rate that states the records belong to the positive class and are predicted as negative by the model.

Accuracy: The number of truly classified predictions by a model among the total number of predictions it makes or computes to divide the TP plus TN prediction by the total number of predictions.

$$begin{aligned} = frac{TP+TN}{TP+TN+FP+FN} end{aligned}$$

(1)

Precision: Precision is the number of true positive predictions from the total number of actual predictions classified by the model or computed to divide the TP predictions by the TP plus FP predictions.

$$begin{aligned} = frac{TP}{TP+FP} end{aligned}$$

(2)

Recall: The recall is the score of the correct positive prediction that the model found by looking at all of the actual positive tweets or by dividing the TP predictions by the TP plus FN predictions.

$$begin{aligned} = frac{TP}{TP+FN} end{aligned}$$

(3)

F1 score: An F1 score is an evaluation metric that estimates model performance by taking the average of recall and precision.

$$begin{aligned} =2times frac{Precisiontimes Recall}{Precision+Recall} end{aligned}$$

(4)

Excerpt from:

Efficient deep learning-based approach for malaria detection using red blood cell smears | Scientific Reports - Nature.com

Related Posts

Written by admin |

June 11th, 2024 at 2:48 am

Posted in Machine Learning

Tagged with




matomo tracker