CSE 439 – Data Mining Assist. Prof. Dr. Derya BİRANT

Slides:



Advertisements
Benzer bir sunumlar
Gerekli olduğunda insanlara ulaşın Yer Uzantıları Reach prospective customers at important moment with location extensions. Location Extentions.
Advertisements

I ASİMO I ASİMO PREPARED: CENGİZ MURAT TEKİNBÜĞRÜ English Course Presentation TURKEY Mechatronics Engineering at SAKARYA UNIVERSITY PREPARED: CENGİZ.
THE PRESENT PERFECT TENSE
Atama ve eşleme (eşleştirme) problemleri (Matching and Assignment problems)
Veri Madenciliği Sınıflama ve Kümeleme Yöntemleri
CONDITIONALS (IF CLAUSES) Type -1
Yabancı Dil - II Bölüm - 1.
Bu proje Avrupa Birliği ve Türkiye Cumhuriyeti tarafından finanse edilmektedir. Building Mutual Trust Between the Labour Market and Education Panel Session.
M.Fatih AMASYALI Uzman Sistemler Ders Notları
Hareket halindeki insanlara ulaşın.Mobil Arama Ağı Reklamları Reach customers with Mobile Search Network.
NOUN CLAUSES (İSİM CÜMLECİKLERİ).
SORU CÜMLESİ (?) (QUESTIONS)
İNGİLİZCE MAKING SUGGESTIONS
Kampanyanızı optimize edin. Görüntülü Reklam Kampanyası Optimize Edici'yi Kullanma Display Ads Campaign Optimizer. Let Google technology manage your diplay.
BİL551 – YAPAY ZEKA Öğrenme ve Sınıflandırma
SÜLEYMAN DEM İ REL PRIMARY AND SECONDARY SCHOOL. GENERAL INTRODUCTION SCHOOL DIRECTORY, TEACHERS AND OTHER STAFF CLASSROOMS PARTS OF THE SCHOOL GALLERY.
Birthday party. birthday party new year's party new year's party New Year's Party.
Öneri Sistemleri M.Fatih AMASYALI Uzman Sistemler Ders Notları.
 Let’s go for a walk. ( Hadi yürüyüşe çıkalım.)  Let’s drink coffee. (Hadi kahve içelim.)
Key Terms from the Chapters. Chapter -1 Statistics, Data, and Statistical Thinking Fundemantal Elements of Statistics Statistics: EN: Statistics is the.
Sınıflandırma & Tahmin — Devam—
Veri Yapıları ve Algoritmalar
BM-305 Mikrodenetleyiciler Güz 2015 (6. Sunu) (Yrd. Doç. Dr. Deniz Dal)
Database for APED Büşra Bilgili | Emirhan Aydoğan | Meryem Şentürk | M. Arda Aydın COMPE 341.
AVL Trees / Slide 1 Silme * Anahtar hedefi silmek için, x yaprağında buluruz ve sonra sileriz. * Dikkat edilmesi gereken iki durum vardır. (1) Hedef bazi.
21/02/2016 A Place In My Heart Nana Mouskouri « Istanbul « (A Different Adaptation)
RELIGIOUS TRADITIONS BAGS TO DO IN YOUR CITY T.C. Ünye Kaymakamlığı ANAFARTA ORTAOKULU.
Environmental pollution Traffic Infrastructural problems Unconscious employee Urbanization and industrialization Lack of financial sources.
BİL551 – YAPAY ZEKA Öğrenme ve Siniflandırma
Near future (be going to)
Practice your writing skills
First Conditional Sentences. LOOK AT THE EXAMPLES If the weather is fine, we’ll play tenis If I have enough money, I’ll buy the car If it rains, we’ll.
CHILD PORNOGRAPHY IŞIK ÜNİVERSİTESİ
Students social life and join the social clubs. BARIŞ KILIÇ - EGE DÖVENCİ IŞIK ÜNİVERSİTESİ
Grade 8 Unit 7 Bilginbakterim.com.
Örüntü Tanıma.
Sınıflandırma ve Tahmin
Sınıflandırma & Tahmin — Devam—
Karar Ağaçları (Decision trees)
İSTATİSTİK-II Korelasyon ve Regresyon.
SHOULD EXPLANATION   *Should öğüt ya da tavsiye bildirmektedir. Bir kişiye öğütte bulunurken kullanılmaktadır. Yapılmasının iyi olmayacağını söyleyebilmek.
YDI101 YABANCI DIL 1 HAFTA 1. We use subject pronouns when the pronoun is the subject of the sentence. When the subject appears the second time, we don’t.
İleri Muhasebe ve Denetim Düzenleme Programı Modül 24: UFRS’lerin Bankacılık Sektöründe Kabul Edilmesi (Bölüm II) 2. Denetçi Perspektifi Reinhard Klemmer,
BİLİMSEL ÇALIŞMA BASAMAKLARI SCIENTIFIC WORKING STEPS MHD BASHAR ALREFAEI Y
NAVIE BAYES CLASSIFICATION
Who wants to start? Kim başlamak ister? 401.
Sınıflandırma & Tahmin — Devam—
Would you like a different color?
FINLAND EDUCATION SYSTEM I am talking about the Finnish education system today.
SHOULD EXPLANATION   *Should öğüt ya da tavsiye bildirmektedir. Bir kişiye öğütte bulunurken kullanılmaktadır. Yapılmasının iyi olmayacağını söyleyebilmek.
Future: I will/shall & I am going to. Structure: Subject+will/shall+verb(base form)+object.
tomorrow soon next week / year in five minutes/ in two hours later today I'll go to the market tomorrow. Don’t worry. He will be here soon. There.
CONDITIONALS TYPE
Döngüler ve Shift Register
NİŞANTAŞI ÜNİVERSİTESİ
MAKİNA TEORİSİ II GİRİŞ Prof.Dr. Fatih M. Botsalı.
“Differentiation for making a positive Difference!!!!”
NİŞANTAŞI ÜNİVERSİTESİ
Feminism, unlike the idea of ​​ mankind, is a trend that is prioritized to bring gender inequality to the agenda. The notion of feminism, which is not.
Imagine that you are a teacher and you are taking your 20 students to England for the summer school.
DÜZLEMSEL MEKANİZMALARIN
CONDITIONALS TYPE
ELİF SU KÜÇÜKKAVRUK. plants When you touch this plant, it can be like the photograph. When you let go, it becomes normal.
SUBJECT NAME Prepeared by Write the names of group members here
People with an entrepreneurial mindset are always brave.
NİŞANTAŞI ÜNİVERSİTESİ
SHOULD EXPLANATION   *Should öğüt ya da tavsiye bildirmektedir. Bir kişiye öğütte bulunurken kullanılmaktadır. Yapılmasının iyi olmayacağını söyleyebilmek.
Sunum transkripti:

CSE 439 – Data Mining Assist. Prof. Dr. Derya BİRANT Classification Part 1 CSE 439 – Data Mining Assist. Prof. Dr. Derya BİRANT

Outline What Is Classification? Classification Examples Classification Methods Decision Trees Bayesian Classification K-Nearest Neighbor Neural Network Genetic Algorithms Support Vector Machines (SVM) Fuzzy Set Approaches

What Is Classification? Construction of a model to classify data When constructing the model, use the training set and the class labels After the construction of the model, use it in classifying new data

Classification (A Two-Step Process) Model construction Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, trees, or mathematical formulae Model usage (Classifying future or unknown objects) Estimate accuracy rate of the model Accuracy rate is the percentage of test set samples that are correctly classified by the model If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known

Classification (A Two-Step Process) Data To Predict DM Engine Predicted Data Mining Model Mining Model DM Engine Training Data Mining Model

Classification Example Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model) Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured? Process (2): Using the Model in Prediction Process (1): Model Construction

Classification Example Given old data about customers and payments, predict new applicant’s loan eligibility. Good Customers Bad Customers Previous customers Classifier Rules Salary > 5 L Prof. = Exec Good/ bad Age Salary Profession Location Customer type New applicant’s data

Classification Techniques Decision Trees Bayesian Classification K-Nearest Neighbor Neural Network Genetic Algorithms Support Vector Machines (SVM) Fuzzy Set Approaches

Classification Techniques Decision Trees Bayesian Classification K-Nearest Neighbor Neural Network Classification Genetic Algorithms Support Vector Machines (SVM) Fuzzy Set Approaches …

Decision Trees Decision Tree is a tree where internal nodes are simple decision rules on one or more attributes leaf nodes are predicted class labels Decision trees are used for deciding between several courses of action Attribute Value age? student? credit rating? <=30 >40 no yes 31..40 Fair Excellent Yes No Classification

Desicion Tree Applications Decision trees are used extensively in data mining. Has been applied to: classify medical patients based on the disease, equipment malfunction by cause, loan applicant by likelihood of payment, ... Salary < 1 M Job = teacher Good Age < 30 Bad House Hiring

Decision Trees (Different Representation) DT Splits Area ( Different representation of decision tree) Minivan Age Car Type YES NO <30 >=30 Sports, Truck 30 60 Age YES NO Minivan Sports, Truck short medium tall short medium tall

Decision Tree Adv. DisAdv. Positives (+) Reasonable training time Fast application Easy to interpret (can be re-represented as if-then-else rules) Easy to implement Can handle large number of features Does not require any prior knowledge of data distribution Negatives (-) Cannot handle complicated relationship between features Simple decision boundaries Problems with lots of missing data Output attribute must be categorical Limited to one output attribute

Rules Indicated by Decision Trees Write a rule for each path in the decision tree from the root to a leaf.

Decision Tree Algorithms ID3 Quinlan (1981) Tries to reduce expected number of comparison C 4.5 Quinlan (1993) It is an extension of ID3 Just starting to be used in data mining applications Also used for rule induction CART Breiman, Friedman, Olshen, and Stone (1984) Classification and Regression Trees CHAID Kass (1980) Oldest decision tree algorithm Well established in database marketing industry QUEST Loh and Shih (1997)

Decision Tree Construction Which attribute is the best classifier? Calculate the information gain G(S,A) for each attribute A. The basic idea is that we select the attribute with the highest information gain.

Decision Tree Construction Which attribute first? Hava Sıcaklık Nem Rüzgar Tenis Güneşli Sıcak Yüksek Hafif Hayır Kuvvetli Bulutlu Evet Yağmurlu Ilık Serin Normal

Decision Tree Construction Hava Sıcaklık Nem Rüzgar Tenis Güneşli Sıcak Yüksek Hafif Hayır Kuvvetli Bulutlu Evet Yağmurlu Ilık Serin Normal Gain(S, Hava) = 0,246 Gain(S, Sıcaklık) = 0,029 Gain(S, Nem) = 0,151 Gain(S, Rüzgar) = 0,048

Decision Tree Construction Which attribute is next? Hava Güneşli Bulutlu Yağmurlu ? Evet

Decision Tree Construction Hava Sıcaklık Nem Rüzgar Tenis R1 Güneşli Sıcak Yüksek Hafif Hayır R2 Kuvvetli R3 Bulutlu Evet R4 Yağmurlu Ilık R5 Serin Normal R6 R7 R8 R9 R10 R11 R12 R13 R14 Hava Güneşli Bulutlu Yağmurlu Nem Yüksek Normal Hayır Evet Rüzgar Hafif Kuvvetli [R3,R7,R12,R13] [R4,R5,R10] [R6,R14] [R1,R2, R8] [R9,R11]

Another Example At the weekend: - go shopping, - watch a movie, - play tennis or - just stay in. What you do depends on three things: the weather (windy, rainy or sunny); how much money you have (rich or poor) - whether your parents are visiting.

Another Example

Classification Techniques Decision Trees Bayesian Classification K-Nearest Neighbor Neural Network Classification Genetic Algorithms Support Vector Machines (SVM) Fuzzy Set Approaches …

Classification Techniques 2- Bayesian Classification A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities. Foundation: Based on Bayes’ Theorem. Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem

Classification Techniques 2- Bayesian Classification C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (Age <=30, Income = medium, Student = yes Credit_rating = Fair)

Classification Techniques 2- Bayesian Classification X = (age <= 30 , income = medium, student = yes, credit_rating = fair) P(C1): P(buys_computer = “yes”) = 9/14 = 0.643 P(C2): P(buys_computer = “no”) = 5/14= 0.357 Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 P(X|C1) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|C2) : P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”)

Classification Techniques Decision Trees Bayesian Classification K-Nearest Neighbor Neural Network Classification Genetic Algorithms Support Vector Machines (SVM) Fuzzy Set Approaches …

K-Nearest Neighbor (k-NN) An object is classified by a majority vote of its neighbors (k closest members) . If k = 1, then the object is simply assigned to the class of its nearest neighbor. Euclidean Distance measure is used to calculate how close

K-Nearest Neighbor (k-NN)

Classification Evaluation (Testing) categorical categorical continuous class Test Set Learn Classifier Model Training Set

Classification Accuracy True Positive False Negative False Positive True Negative Which classification model is better?

Validation Techniques Simple Validation Cross Validation n-Fold Cross Validation Training set Test set Training set Test set Test set Training set Bootstrap Method