Abstract same time is providing highest accuracy. In

Abstract

In today’s world Breast
cancer is one of the major problem faced by women . Identifying cancer is the
primitive  stage and is still
challenging. The diagnosis and treatment of the breast cancer have become an
urgent. Breast cancer, is widely seen tumor 
in Indian women . Early treatment of breast cancer have become an
extremely crucial  work to do, not only
helps to cure cancer but also helps in curative of its occurence.  Today , there are different  kinds of methods and data mining techniques
and various process like knowledge discovery 
are developed for predicting the breast cancer.  As per the study , we perform a comparison of
different classification and clustering algorithms.  Various classification algorithms and the
clustering algorithm are used. The result indicate that the classification
algorithms are better predictors than the clustering algorithms.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

Introduction

Now-a-days
breast cancer is common in women. Prognosis of breast cancer is as important as
its treatment. Breast cancer is the most common cause of death among women. If
breast cancer predicted at its earlier stages,better treatment can be provided
which enable the person to survive.Diagnosis and treatment of breast cancer has
become an urgent work to perform.Different datamining methods are used to
retrieve valuable information from large databases inorder to make decisions to
provide better health services.

Breast cancer begins with
the abnormal growth of some breast cells. These cells divide more rapidly and
continue to accumulate than healthy cells do, forming a lump or mass. These cells
may grow through your breast to your lymph nodes or to other parts of your body.Breast cancer varies on the basis ofage groups, it is less common at a young age (i.e., in their thirties), younger
women lean to have more aggressive breast cancers than older women.

In this paper we perform
comparison on different classification as well as clustering algorithm to
predict breast cancer. A number of attributes are used in performing comparison.
These attributes are compared to find the best classification algorithm.

Literature survey

In paper 1, three different data mining
classification methods are used for the prediction of breast cancer. It
considers different parameters for prediction of cancer. But for superior
prediction, focus is on accuracy and lowest computing time. Studies filtered
all algorithms based on lowest computing time and accuracy and it  came up with the conclusion that Naïve Bayes
is a superior algorithm compared to decision tree and k-nearest neighbor,
because it takes lowest time i.e. 0.02 seconds and at the same time is
providing highest accuracy.

 In 2 paper,
WPBC dataset is used for finding an efficient predictor algorithm to predict
the recurring or non-recurring nature of disease. This helps Oncologists to
differentiate a good prognosis (non-recurrent) from a bad one (recurrent) and
can treat the patients more effectively. Eight popular data mining methods have
been used, four from clustering algorithms (Kmeans ,EM, PAM and Fuzzy c-means)
and four from classification algorithms (SVM, C5.0, KNN and Naive Bayes).The
results of these algorithms are clearly outlined in this paper with necessary
results. The classification algorithms, C5.0 and SVM have shown 81% accuracy in
classifying there occurrence of the disease. This is found to be best among
all. On the other hand, EM was found to be the most promising clustering
algorithm with the accuracy of 68%. The research shows that the classification
algorithms are better predictor than clustering algorithms. The impact factors
of various parameters responsible for predicting the occurrence/non-occurrence
of the disease can be verified clinically. Further, the identified critical
parametersshould be verified by applying on larger medical dataset topredict
the recurrence of the disease in future.

In paper 3, they intend to build a diagnostic
model for breast cancer which is to search the relationship between breast
cancer and its symptoms. A feature selection method, INTERACT, is applied to
select related and important features in order to improve
the accuracy of the diagnostic model. And, SVM is applied to build the
classification model. Two diagnostic models are built with and without feature
selection for the sake of proving the significance of the feature selection.
Through the experiments, the accuracy of the diagnostic model with feature
selection is improved obviously compared with the model without feature
selection. Meantime, nine features are chosen out as the relevant factors for
building the diagnostic model. The information found out in this study can be
supplementary information for related practitioner better diagnosing heart
disease.

In paper 4it
focus on the importance of feature selection in breast cancer prognosis. Using
proper attribute selection technique, any classification algorithm can be
improved significantly. Attributes with less contribution in dataset often
misguides the classification and results in poor prediction. In this work, they
found Support Vector Machine giving much better output both before and after
attribute selection. Area under ROC curve analysis showed results in favor,
where Naïve Bayes and Decision Tree showed much better improvement after
feature selection method. In this paper we only focused on whether breast
cancer is recursive or not. In addition of this work, they try to predict the
time of recurrence of cancer which is classified as recursive.

 Paper 5 presented a survey of
classification simulations which can be used for breast cancer detection using WEKA
tool. A discussion on a variety of classification techniques that already exist
in real world and the performance accuracy is listed from that. By using that
we can decide which algorithm is best for the WEKA tool for breast cancer
detection. It compares different algorithmsand found SVM is better having high
accuracy and expectation maximization with the least accuracy.

In paper 6 paper
presented a survey of classification simulations which can be used for breast
cancer detection using WEKA tool. A variety of classification techniques that
already exist in real world are discussed. By using that we can decide which
algorithm is best for the WEKA tool for breast cancer detection.

 

Classification Algorithms

Clustering Algorithms

Algorithms

Confusion
Matrix

Accuracy

Algorithms

Confusion
Matrix

Accuracy

C5.0

N        
R
N      
47        0
      R        11       0

 
0.8103

 K-Means

              N         R
N          100      48
R           23       23

 
0.6340
 

KNN

N        
R
N      
47        0
      R        11       0

 
0.7068

 EM

              N         R
N          117      31
R           31       15

 
0.6804

Naïve
Bayes

N        
R
N      
47        0
      R        11       0

 
0.5344

PAM

            N         R
N            64      84
 R            29     17

 
0.4175

 SVM

N        
R
N      
47        0
      R        11       0

 
0.8103

Fuzzy
c-Means

             N         R
 N          50       98
 R           24      22

 
0.3711

Table :comparison of clustering and classification
algorithms2

 

Accuracy=
(TP+TN)/(TP+TN+FP+FN)

TP: True Positive

TN: True Negative

FP: False Positive

FN: False Negative

 

Conclusion

 

From the above
comparisons we came up with a conclusion that the classification algorithms
works better than the clustering algorithms in predicting breast cancer. Andin
the classification algorithms the SVM and C5.0 came up with better performance.
The best algorithm for predicting breast cancer is purely based on the accuracy
of the algorithm.

 

Reference

1 Chintan
Shah; Anjali G.
Jivani “Comparison of data mining classification
algorithms for breast cancer prediction”

2  Uma Ojha; Savita Goel “A study on prediction of breast cancer recurrence using data mining techniques” 2017 7th International Conference on Cloud Computing,
Data Science & Engineering – Confluence

3 Runjie ShenYuanyuan Yang
Fengfeng Shao “Intelligent Breast Cancer Prediction Model Using Data Mining
Techniques”

4 Ahmed Iqbal Pritom; Md. Ahadur Rahman Munshi; ShahedAnzarusSabab;Shihabuzzaman Shihab.”Predicting breast cancer recurrence using effective classification and feature selection technique”

5S.Padmapriya, M.Devika,V.Meena,
S.B.Dheebikaa.Vinodhini , ” Survey
on Breast Cancer Detection Using Weka Tool”

6  Jahanvi Joshi,  RinalDoshi, 
Jigar Patel, Ph.D,” Diagnosis of
Breast Cancer using Clustering Data Mining Approach”

BACK TO TOP
x

Hi!
I'm Angelica!

Would you like to get a custom essay? How about receiving a customized one?

Check it out