Identiﬁcation of plant disease images via a squeeze-and-excitation MobileNet model and twice transfer learning

Crop diseases have a devastating effect on agricultural production, and serious diseases can lead to harvest failure entirely. Recent developments in deep learning have greatly improved the accuracy of image identiﬁcation. In this study, we investigated the transfer learning of deep convolutional neural networks and modiﬁed the network structure to improve the learning capability of plant lesion characteristics. The MobileNet with squeeze-and-excitation (SE) block was selected in our approach. Integrating the merits of both, the pre-trained MobileNet and SE block were fused to form a new network, which we termed the SE-MobileNet, and was used to identify the plant diseases. In particular, the transfer learning was performed twice to obtain the optimum model. The ﬁrst phase trained the model for the extended layers while the bottom convolution layers were frozen with the pre-trained weights on ImageNet; by loading the model trained in the ﬁrst phase, the second phase retrained the model using the target dataset. The proposed procedure provides a signiﬁcant increase in efﬁciency relative to other state-of-the-art methods. It reaches an average accuracy of 99.78% in the public dataset with clear backdrops. Even under multiple classes and heterogeneous background conditions, the average accuracy realises 99.33% for identifying the rice disease types. The experimental ﬁndings show the feasibility and effectiveness of the proposed procedure.


INTRODUCTION
Food, cloth, housing, transportation, and education are the basic needs of every human being. All these five are essential for a healthy life. Among them, food security also gets the most priority. To secure a riskless food supply, we should focus on smoothly growing crops. Different types of plant diseases are the main threat to the spontaneous food supply [1]. Notably, the main food crops such as rice, which are the primary staple for over half the earth's population, are very important for ensuring enough food supply and security. Timely prediction and warning play a critical role in the prevention and control of plant diseases. It can reduce the costs of plant diseases and the unnecessary usages of pesticides. Therefore, obtaining the information of real-time plant diseases is highly desired in modern agriculture, and the machine learning-based image processing techniques alleviate the challenge of identifying plant diseases in time before large-scale outbreaks [2]. Until now, however, plant disease monitoring in many regions, especially in developing countries, still primarily relies on the conventional manual approach. When a plant disease outbreak occurs somewhere, plant disease specialists or technicians from different agriculture research institutions appointed by the government will visit the place and advise the farmers. Nevertheless, compared with the number of farmers, there are not adequate plant disease specialists or technicians in many areas. Particularly, affected by the social and economic environment, few people are engaged in plant protection formerly. Thus, there are high needs and important realistic significances for seeking fast, readily available, and less expensive tools to identify plant diseases automatically. Computer vision technology is an attractive alternative approach for the task of automatic crop observation since it is low-cost, visualised, and operates in a non-contact manner [3]. Much recent literature has concerned the image recognition, and a distinctive classifier was employed to identify the plants into normal or different disease types [4][5][6][7][8][9]. Yao et al. [6] adopted the support vector machine (SVM) for the identification of rice diseases, and the SVM method was used to classify three types of rice diseases, including bacterial leaf blight, rice sheath blight, and rice blast. They classified these disease spots to an accuracy of 97.2%. Kahar et al. [7] used the artificial neural network (ANN) to identify rice plant diseases such as rice leaf blast, rice leaf blight, and sheath blight, and they achieved the accuracy of 74.21%. Kai et al. [8] applied a radial basis kernel function in SVM to identify the maize diseases. In their study, 262 images were divided as training and test sets for the samples. They identified three categories and realised the best accuracy of 89.6%. Zhang et al [9] adopted a genetic algorithm (GA)-based SVM for the identification of maize diseases and their average accuracy was at a peak of 90.25%. Recently, developments in deep-learning-based convolutional neural networks (CNN), which have greatly improved the accuracy of image identification, has become the leading methods for overcoming the challenges of digital image processing [10]. Mohanty et al. [11] proposed a deep CNN to identify 14 species and 26 plant diseases (or absence thereof). On a held-out test set, their trained model achieved an accuracy of 99.35%. Kawasaki et al. [12] introduced a CNN-based system to identify cucumber leaf diseases, and they reached an accuracy of 94.9%. Zhang et al. [13] proposed an enhanced deep CNN to identify the maize plant diseases, achieving an average accuracy of 98.8%. Though reasonably good findings were published in the literature, limited diversity of image databases has been used for research. Most of the photographic resources include images only in experimental (research lab) environments and rarely in the area of agriculture under practical wild scenarios [10]. Additionally, most studies were performed to identify the plant diseases based on leaf images instead of various parts. In reality, the images were usually taken under a variety of environments and the plant diseases may occur in any part of the plants, whether it is the leaf, stem, grain and so forth. Despite these limitations, the recent research successfully indicated the potential of deep-learning-based CNN algorithms to identify the plant diseases. In this study, we performed transfer learning the deep CNN and varied the network structure to increase the learning capability of plant lesion characteristics. The SE block [14] was added behind the pre-trained MobileNet [15] on ImageNet [16], which was followed by an additional 3 × 3 × 512 convolutional layer for high-dimensional feature extraction. Then, the full connection layer was substituted for a global average pooling layer, and a new completely linked ReLU layer with 1024 neutrons was introduced into the network to enhance the learning ability. After that, the top layer comprised of a completely linked Softmax layer with the actual category number was ultimately used for the classification. Finally, considering the multi-classification tasks and different loss weights of positive and negative samples, an enhanced Focalloss function was used in the network instead of the traditional cross-entropy loss function. Thus, the pre-trained MobileNet paired with SE block was fused to form a new network, namely, SE-MobileNet, which was utilised to identify the plant diseases.
Particularly, the transfer learning was performed twice for model training: In the first phase, only the parameters of new extended layers were inferred from scratch while the weights of convolutional layers pre-trained from ImageNet were frozen; the second phase retrained the weight parameters on the target dataset by loading the pre-trained model in the first phase. In this way, the yielded optimum model was used to identify crop diseases.
The rest of the study is organised as follows. In Section 2, following an introduction of the image dataset collection, a general overview is described, and this section primarily discusses the methodology along with related studies. Later in Section 3, experiments for investigating the efficiency of the proposed procedure are presented; multiple experiments are performed in this section and the results are evaluated through comparative analysis. This study is ultimately concluded in Section 4.

Image dataset
In this study, we have used two datasets, including the public PlantVillage [17] and our collected rice disease image dataset, to perform the experiments.
PlantVillage is an international general image dataset used for the machine-learning algorithm test of plant disease identification. It has 38 categories that are composed of 26 diseases and 12 healthy plants for 14 plant species. PlantVillage contains 54,306 plant leaf images, which are captured under controlled conditions, both in grayscale and colour. All images are photographed in simple backdrops and uniform lighting strengths. Note that the number of samples in each category is not consistent. Besides, some of the different images are taken from the same plant leaf in diverse directions.
Approximately 600 rice plant disease images were captured from real-life agricultural fields. These collected natural images were all tagged with labels based on the domain experts' knowledge. The Fujian Institute of Subtropical Botany played a significant helping role, and the images of the plants used in this study were all in JPG format. The context conditions of rice plant images are complicated, and the lighting strengths are inconsistent. For later computations, photoshop software first processes these images evenly into the Red-Green-Blue (RGB) pattern and then adjusts the scale of the images. The rice disease types mainly comprise rice stackburn, leaf smut, leaf scald, white tip, bacterial leaf streak and so forth, and the key features of some rice disease have been described in Table 1. Partial sample images are also presented in Figure 1.

Overview
As shown in Figure 2, a brief description of our approach is provided as follows. First, the sample images of plant diseases are captured and tagged based on the domain expert's knowledge.  The image pre-processing techniques such as image sharpening, filtering, resizing, and edge-filling are conducted on the collected images. Then, the image dataset is enriched using the data augmentation scheme; both the enhanced generative adversarial networks (GAN) [18] and the traditional methods, including colour-jittering, rotation, translation, and scale transformation, are implemented to synthesise the new images. Lastly, the pre-processed images are input into the SE-MobileNet for training the model, and the generated optimum model is employed to predict the types of plant diseases. In this way, the final identification results can be obtained accordingly, and the specific illustration of this phase is presented in subsequent sections.

Depthwise separable convolution (DWSC)
DWSC is a form of factorable convolution which splits a standard convolution into two steps: A depthwise convolution (DWC) and a 1 × 1 convolution namely pointwise convolution (PWC) [19]. In the first step, each channel is conducted with the convolutional operation for the intermediate feature map with one filter, as depicted in Figure 3(a). Where D f is input length and width, M is the number of channel input and the number of convolution kernels. The length and width of the output are the same as the input. In the second step, PWC uses the results of DWC to conduct a 1 × 1 convolution kernel operation, as shown in Figure 3(b). Where M is the number of input channels, D f is input length and width, convolution kernel size is 1 × 1, and N is the number of output channels. Mathematically, these formulas of the above processes can be separately defined in Equations (1) and (2).
where F represents an intermediate feature map tensor, K denotes the filter (convolutional kernel), K d is a 1 × 1 convolution kernel, and (i, j) index position of the feature map; H, W, and D are the height, width, and depth, respectively, of The squeeze-and-excitation (SE) building block [14] the input for layer l in the network. DWSC has been explicitly incorporated in MobileNets [15], which are a family of mobilefirst models considering the resource constraints for portable devices. Setting the size of input feature map as D f × D f , the number of channels as M, the size of the convolutional kernel as D k × D k , and the number of output channels as N, the computation cost of standard convolution (C 1 ) and the DWSC (C 2 ) can be expressed in Equations (3) and (4), respectively.
As a result, the ratio of the computational cost for the DWSC to the standard convolution is Particularly, the 3 × 3 convolution kernel is often used for -DWSC networks in practical application scenarios, whereas the number of output channels N is big. Therefore, the computational cost of DWSC is approximately 1/9 of that of the standard convolution network.

SE network
To improve the representation quality of a network by explicitly modelling the interdependencies between the channels of convolutional features, the SE network was first introduced by Hu et al. [14] in 2017, and it won the ImageNet challenge competition with outstanding performance. The core component of the SE network is the SE block, which primarily consists of two operations: Squeeze and excitation. The squeeze operation is responsible for shrinking feature maps U∈R w×h×c2 through spatial dimensions (w × h), which is achieved by using global average pooling to generate channelwise statistics as the F sq depicted in Figure 4. Here, x denotes the input images, U∈R w×h×c2 is the feature maps by a given transformation, for example, a standard convolution. F sq is the squeeze operation and F ex represents the excitation operation. Fscale denotes a scale operation to weight the features of each channel. Subsequently, each feature map is represented by a statistic z as calculated in Equation (6).
where z c is the cth statistic value, u c represents the cth feature map of previous convolution operation, W and H denote the width and height of u c separately. The excitation operation is to fully capture the dependencies of channels by making use of the information aggregated in the squeeze, where a simple gating mechanism is adopted with a sigmoid activation: Here, δ refers to the rectified linear unit (ReLU) [20] func- ×C W 1 ∈R c/r×C and W 2 ∈R C×c/r ; r is a reduction ration initialised with a constant. The parameters W 1 and W 2 are learned by two fully connected layers around the non-linearity. Thus, the output of the SE block can be obtained by rescaling u c with the activations s: where [x 1 ,x 2 , … ,x c ] devotes the final outputX .

Transfer learning principle
Transfer learning is the technique by which the knowledge gained during training in one type of problem is used to train in other related tasks or domains [21,22]. Giving a source domain D s with a corresponding source task T s and a target domain D t with a learning task T t , the objective of transfer learning is to improve the performance of predictive function f T (•) for learning task T t by discovering and transferring latent knowledge from D s and T s , where D s ≠ D t or T s ≠ T t [22]. Transfer learning is particularly important in CNNs. The algorithms of deep learning require a large number of labelled data to train the models, and collecting numerous labelled data in a domain is undoubtedly expensive, time-consuming, and labour-intensive. Thus, the main challenge is that the training images are often not enough to adjust all the parameters without incurring losses in overfitting, and the scheme of transfer learning is naturally employed to address this type of problem.

SE-MobileNet network mode
MobileNet is a variety of lightweight CNNs based on depthwise separable convolution and has shown outstanding capability in processing large-scale and small-scale problems [19,23]. Besides, the SE block makes good use of the channel relationship and can perform dynamic channelwise feature recalibration. Thus, inspired by the performance, the MobileNet paired with SE block was selected in our approach, and the pre-trained MobileNet with SE block was fused to generate a new network, which we termed the SE-MobileNet, which was utilised to identify the plant disease types. The network structure of traditional MobileNet was modified to enhance the capability of learning the tiny disease spot features for plant disease images. The original classification layer at the tail of MobileNet was abandoned and the SE block was added behind the pre-trained MobileNet, which was followed by an additional 3 × 3 × 512 convolutional layer for high-dimensional feature extraction. Then, the full connection layer was replaced using a global average pooling layer, and a new completely linked layer with 1024 neutrons and ReLU activation function was introduced to enhance the ability to identify tiny disease spot features. Ultimately, a top layer comprised of a completely linked Softmax layer with the actual category number was used for the classification. More importantly, the twice transfer learning was performed for model training. In the first step, only the parameters of newly extended layers were learned from scratch while the bottom convolutional layers were kept frozen with the pre-trained weights on ImageNet. The second step fine-tuned all the weight using the target dataset by loading the model pre-trained in the first phase. Concretely, the model was obtained by conducting the processes below.
1. An adaption was conducted for the network. To address new identification tasks, the adaption was accomplished through freezing the weight on the pre-trained layers, while the auxiliary layers were opened and trained with the images on the target dataset. In this process, Adam [24], which is a stochastic optimisation algorithm, is utilised to update the weights.
where η denotes the learning rate, w is the weight matrix, k represents the index of classes,b k andŝ k are the bias-corrected first and second moment estimations of the gradient, respectively 2 The second step retrained (fine-tuned) the model. Using the target dataset, the model was retrained by loading the weight trained in the first step, and all the layers were trained with the new images. Similarly, stochastic gradient descent [25] optimiser was employed to update the weight as defined in Equation (10).
where L(⋅) indicates the loss function. Particularly, considering the tiny disease spot symptoms of plant disease images and multi-classification tasks, the traditional Focal-loss function [26] was enhanced and employed in the network as calculated in Equation (11).
where p c represents the predicted probability distribution, C is the number of classes, and a c denotes the weighting factor. Figure 5 shows the network architecture, and related parameters are listed in Table 2.

EXPERIMENTAL RESULTS AND ANALYSIS
We have primarily conducted the experiments using the software of Anaconda3 (Python 3.6), where the Tensorflow, OpenCV-python3, and Keras [27] libraries are utilised and accelerated by graphics processin unit (GPU). The hardware resources consist of Intel® Xeon(R) E5-2620 CPU, GeForce RTX 2080 graphics card (CUDA 10.2) [28], and 64-GB memory, which are used for the running program.

Experiment on a common dataset
As mentioned in Section 2.1, PlantVillage is a comprehensive global collection of data for the machine learning algorithm test to identify and classify crop diseases. To investigate the efficiency of the proposed procedure, we perform multiple experiments on this common dataset. The 38 categories of plant images, including apple, maize, grape, tomato, and potato, are downloaded from the PlantVillage database. Note that some same plant leaf images are taken in diverse orientations and the distributions of these sample images are not balanced. Some categories have a large number of samples while others have a small number of samples. The dimensions of all the images are uniformly adjusted to 256 × 256 pixels, and these sample images are taken in simple backdrops as displayed in Figure 6. Based on the method introduced in Section 2.3.4, we conduct the model training and test on the plant disease dataset. To ensure full training and avoid overfitting problems as much as possible, the training and validation sets are split with the ratio of 8:2, which is chosen referring Too et al. [21]. Besides, a one-hot encoding of the categorical variable is done for the model training. In particular, to understand how the proposed procedure implements on the unseen samples, a certain proportion of the original images are reserved to verify the validity of the models (20% of the dataset, 10,861 images). We take five influential CNNs such as VGGNet (Visual Geometry Group Network) [29], Inception V3 [30], DenseNet [31], NASNetMobile (Neural Architecture Search Network Mobile) [32] and MobileNet V2 [33] into account to compare models. By applying the approach of transfer learning, these networks are built and the pre-trained models are loaded from ImageNet [34]. The tail layers of the networks are discarded by setting a new completely linked Softmax layer with the actual number of categories. Thus, these diverse CNN models are trained and multiple experiments are conducted on the common dataset. The training and validation accuracies of different methods are listed in Table 3.
From Table 3, it can be observed that the proposed procedure shows superior performance compared with the other commonly used methods on the experimental dataset. The key explanation for this is that the proposed procedure not only utilises the general features of images extracted from the Ima-  geNet dataset by MobileNet but also takes advantage of the specific features extracted from the target dataset by newly extended layers. Especially, the model was trained in two phases, which made the optimum weights obtained for the network. In contrast, owing to the individual networks, the other models did not achieve the optimal results although the weights were initialised using pre-trained parameters rather than inferring from scratch. Therefore, the pre-trained MobileNet was fused with the SE block, which integrated the merits of both and achieved the best performance in experiments on the common dataset. Moreover, the optimum model obtained by the proposed procedure was selected to identify the types of plant diseases, and the corresponding identification results are depicted in Figure 7. Taking into account the statistics of accurate identifications (true positive (TP)), misidentification (false negatives (FN)), false positive (FP), and true negatives (TN), we may verify the efficiency of the models with the metrics like Specificity, Sensitivity (Recall), and Accuracy as displayed in Equations (13)- (15).
Speci ficity = TN FP + TN (15) where TP denotes the number of correctly identified samples; FN represents the number of failed identifications; FP represents the number of false identifications; TN denotes the number of samples that they are not in such disease and accurately identified by the classifier. Table 4 presents the actual analysis of the results which are assessed in the form of different indicators.  Figure 7(a), we can see that the receiver-operating characteristic (ROC) curve of each class is close to the upper-left corner except for class 10, and the areas of ROC curves for most classes are even equal to 1, indicating the effectiveness of the proposed procedure. Besides, as seen in Figure 7(b) and Table 4, most of the samples in each class are correctly detected by the SE-MobileNet model, and the average prediction Accuracy of the experimental results reaches 99.78% on this common dataset, which indicates that the proposed procedure has an impressive ability to identify the plant diseases in simple backdrop conditions. Particularly, in addition to identifying whether the plant is healthy or diseased, the proposed approach also distinguishes the specific disease types of plant disease images, and the average Sensitivity and Specificity reach 98.83% and 99.88%, respectively.
Additionally, Table 5 presents the results of experiments conducted on the PlantVillage database for some researches since 2016. The highest accuracy rate and corresponding method for each study are listed in the table. Most of the research does not use all the datasets of the PlantVillage, so the number of classifications is much smaller than the 38 classes of this database. As is well known, the smaller the number of categories is, the easier it is to achieve higher accuracy. This study considers all the species and disease types in the experiments, and the number of categories is 38. In the case of more difficult classification, a higher accuracy rate is achieved for the proposed procedure, which shows a promising performance relative to other stateof-the-art methods.

Experiment on the collected images
Similar to Section 3.1, the proposed procedure was further tested on the collected rice plant images, which were captured from real-life agricultural fields with heterogeneous background conditions and inconsistent lighting intensities. For example, the backgrounds of some images are the surroundings of the field, and in some other images, the backgrounds are the grasses or soils of different colours. Weather conditions are also diverse at different times such as images taken in sunny, cloudy or over-cast weather and so forth. In addition to retaining some images to verify the validity of the models, the original images were split into the training and validation sets according to the proportion of 8:2. In particular, to avoid the overfitting problems and make images diverse, the data-augmentation scheme was applied to ensure that no less than 200 sample images were in every category for the dataset. The detailed procedures of data augmentation are described as follows.
1. Colour-jittering: By adjusting the saturation, intensity, and contrast of the images, the different light conditions of the sample images were simulated. 2. Random rotation: By rotating the sample images by random angles in the interval (90, 180 • ), we would have five new sample images of each original image after this procedure. 3. Flipping and translation: Along the vertical axis, the flipping was performed for the original images, which were randomly moved to a certain distance in the x or y direction (or both directions). 4. Cropping: Considering the size of the original images is 256 × 256, we used a window with a size of 224 × 224 to randomly cut images. Also, the cropping transformation in xdirection was conducted. Let (1, -tanα; 0,1) be the transformation matrix in x-direction. For a pixel point P(x 1 ,y 1 ): x 1 , y 1 ∈ R, the transformed pixel point was P′(x 2 ,y 2 ) = P′(x 1y 1 ⋅tanα,y 1 ). 5. Scale transform: The original images were randomly zoomed in or out with the scale changed from 0.9 to 1.1. 6. GAN generation: The GAN was enhanced and assigned the input image dimensions as 224 × 224 pixels to generate new images for ensuring the quantity and diversity of images.
Different from the most used input size of 64 × 64 pixels, we modified the assignment of input image size for GAN as 224 × 224 pixels and employed a styled convolutional block of 128 × 64 × 3 following a 32 × 3 convolutional block as the last layer of the generator module. Likewise, for the discriminator module, the input shape was set as 224 × 224 × 3, and the corresponding 32 × 3 convolutional block followed by a 64 × 128 × 3 convolutional block was used in the module. In this way, applying the data-augmentation scheme, the size of the dataset is enlarged and the diversity of training sample images is improved, which is useful for the robustness of the model and avoids the overfitting problem. Figure 8 displays the partial augmented samples.
Based on the approach proposed in Section 2.3, the model training and test were conducted on the rice image dataset, and all the rice plant images were evenly adjusted to a fixed size to fit the model. To learn how the approach performs on new unknown images, we particularly reserved some images outside the modelling to validate the efficiency. Likewise, the five mostused CNNs including Inception V3, VGGNet, DenseNet, NASNetMobile, and MobileNet-V2 were selected for the comparison analysis. Using the transfer learning approach, we established these networks and loaded related weight parameters with the weights pre-trained on ImageNet. The initial top layers of the networks were replaced by a new completely linked layer with the actual category number and Softmax function. In this  way, the model training of these networks was performed and many experiments were implemented for the identification of rice plant diseases. Table 6 shows the results of model training. Table 6 reveals that the proposed SE-MobileNet outperforms other state-of-the-art methods that retrained the networks utilising the experimental dataset, even though the optimum classifier is implemented.
When trained for 10 and 30 epochs, the training accuracies of the proposed approach have achieved 98.58% and 98.93%, and the validation accuracies reach 90.45% and 91.07, respectively. More than that, the proposed approach is memory-efficient. It exhibits a fast running speed and substantial efficiency compared with other state-of-the-art methods. Further, applying the model trained by the introduced procedure, the new unseen images were used to conduct the identification of rice plant diseases. Figure 9 displays the identified results, and the evaluation indicators have been counted in Table 7.
From Figure 9(a), we can observe that the identification results of the proposed procedure show ideal operation points in the left-top corner of the ROC curve. Most identified rice disease types have a higher rate of TP while a lower rate of false-positive (FPR), which is also demonstrated by the confusion matrix of Figure 9(b). For example, the correct quantity is 13 for the identification of "rice stackburn" in 14 samples, and 18 samples are accurately identified in the category of "rice leaf smut" except for one misidentified sample. For the "white leaf streak," all the four samples are correctly identified by the proposed approach. Therefore, the average Accuracy achieves 99.33% and the average Sensitivity reaches no less than 87.87% for identifying the rice plant diseases, as displayed in Table 7. Conversely, there are also some misidentified samples such as 4 failed identifications in the category of "Rice leaf scald," which is caused by some rice diseases like the "Rice leaf scald" and "Rice white tip" occurring in the same plant images. Moreover, the cluttering wild backgrounds and irregular lighting strengths may also impact the results of identification. Figure 10 displayed the partially identified sample images.
As seen in Figure 10, the identified categories of most rice plant images are compatible with the true types of these samples and the proposed procedure accurately identifies most rice By contrast, some different types of rice diseases occurring in the same plant may lead to individual misclassification. Also, the extreme clutter field backdrops and irregular lighting strengths, which affect the feature extraction of plant lesion images, can cause some false identification as displayed in Figure 10(d). In spite of individual misidentifications, most of the rice plant diseases have been correctly identified by the proposed procedure. Therefore, based on the experimental analysis, it can be assumed that the proposed procedure is successful in identifying rice plant diseases and can also be transplanted to other related fields.

CONCLUSIONS
To ensure the production of agricultural products, timely and successful identification of plant diseases is crucial, and therefore the quest for a quick, automated, less expensive, and reliable system for identifying crop diseases is of great practical importance. Deep-learning methods, especially CNNs, have successfully overcome much of the technological challenges correlated with image identification and classification. Hence, a new deep-learning structure called SE-MobileNet for identification of plant diseases is implemented in this study. The pre-trained MobileNet paired with SE block was used in our approach. By modifying the network structure with new extension layers and optimised loss function, the SE-MobileNet was created to enhance the learning capability of tiny lesion characteristics for plant disease images. Moreover, the transfer learning was performed twice for model training: The first phase only inferred parameters from scratch for new extended layers while the convolutional layers were frozen with the parameters trained on ImageNet; the sec-ond phase retrained (fine-tuned) the network using the target dataset by loading the pre-trained model in the first phase. In this way, the yielded optimum model was used to identify plant disease types. The experimental findings showed the model with substantial efficacy on both the open dataset and our captured rice plant images. We are planning to deploy it on portable devices to track and automatically identify the wide-ranging knowledge regarding plant diseases in future production. In the meantime, this procedure can be applied to other similar fields such as online defect assessment, molecular cell recognition, position detection from dissimilar images and much more.