Machine Learning based Lung Cancer Diagnostic System using Optimized Feature Subset Selection

Main Article Content

Ramya Perumal
Yogesh Kumaran S
I. Manimozhi
A.C. Kaladevi
C. Rohith Bhat

Abstract




Lung is a vital organ that plays a major role in respiration. Without breathing, one may not survive in this world. Hence lung is an important organ that acts as filter to absorb oxygen and supply it to heart where pumping takes place through blood vessel in the circulatory system .The pumped blood takes oxygen and other nutrients to every other parts of the body. Hence one must take care of lung. There are various diseases associated with lungs. Lung Cancer is a deadly disease that spread across the countries all over the world. An early detection of lung cancer has been proved to improve the survival rate of human life. There are various resources are available to detect the lung cancer disease. They are low dose CT-scans, X-rays, blood-based screening, pathology slide reading, biopsy’s test, survey data(clinical dataset) etc. helps to predict the disease well in advance. Our proposed work uses two clinical datasets that has various features to detect how likely the persons get affected from the lung disease. Dataset1 includes features such as age, gender, smoking, yellow fingers, anxiety, peer-pressure, chronic disease, fatigue, allergy, wheezing, alcohol, coughing, shortness of breath, swallowing difficulty, and chest pain. Also, the work has experimented with another dataset2 that represents causes of lung cancer due to exposure of pesticide. Our proposed diagnostic system consider all these features in total and perform feature selection to extract optimal feature subsets using cuckoo search algorithm then perform classification using machine learning algorithms such as Linear Support Vector Machine, Logistic Regression and Random Forest algorithm. It is observed that with the cuckoo search algorithm, dataset 1 achieves an accuracy of 100%, precision of 100%, recall of 100%, and F1-score of 100% by LR Classifier. The Linear SVC classifier achieves an accuracy of 90%, a precision of 88%, a recall of 86%, and an F1-score of 87%.The Random forest Classifier achieves an accuracy of 91%, precision of 86%, recall of 93%, and F1-score of 90%. For dataset 2, both the LR classifier and Linear SVC classifier outperform with an accuracy of 100%, precision of 100%, recall of 100%, and F1-score of 100%. Whereas Random Forest provides accuracy of 97%, precision of 97%, recall of 96%, and F1-score of 97%.




Article Details

Section
Special Issue - Scalable Dew Computing for future generation IoT systems