Monday, September 21, 2009

A15 Probabilistic Classification

In A14, we used Minimum Distance Classification. In this activity, we implement another kind of classification scheme, which is the linear discriminant analysis or LDA.

We have already done feature extraction in A14, so we only need to worry about LDA implementation.

In general, the probabilistic classification criterion is to minimize the total error of classification. That is it follows Bayes rule and assigns an object to the class with the highest conditional probability. So if for example

then object x will be classified to class i.

Most of the time, however, it is easier to find out the probability that a class will have a particular feature vector than the probability that the feature vector belongs to a class. These two quantities are related by the Bayes theorem.

where P(i) is the prior probability or the probability about the group i that is known without making any measurement. In practice we can assume the prior probability is equal for all groups or based on the number of sample in each group.

In practice, however, this implementation requires lots of data. Instead we just assume a distribution and get the probability that an object belongs to a certain class from there. Assuming that each class has a multivariate Normal distribution and all classes have the same covariance matrix, gives us the Linear Discriminant Analysis formula:

where object to group that has maximum

LDA assumes that the classes are linearly separable. (This is ok, since the plots from a14 show that the features are indeed linearly separable.) The classes can then be separated by a linear combination of features that describe the objects. A feature vector with only 2 elements will have a separator that is a line. Three features will be separated by a plane and more than three features will be separated by a hyperplane.

The actual implementation is as follows. We used the features extracted from A14.


n1=3;n2=3; //number of test elements for fruit1 and fruit2

////////the training set features in matrix form with objects as rows and features as columns x1=[perim1(1:3,:) mr1(1:3,:) mb1(1:3,:)];
x2=[perim2(1:3,:) mr2(1:3,:) mb2(1:3,:)];

x_all=[x1;x2];

/////// classification of the training set
y=[1;1;1;1;1;1;2;2;2;2;2;2];


////// mean features
mean1=mean(x1,1);

mean2=mean(x2,1);

mean_all=mean([x1;x2],1);
/// global mean vector


/////mean corrected data (data - global mean vector)
mean_corr1=x1-ones(n1,1)*mean_all;
mean_corr2=x2-ones(n2,1)*mean_all;


///covariance
matrix
c1=mean_corr1'*mean_corr1/n1;
c2=mean_corr2'*mean_corr2/n2;


//// Pooled covariance matrix
C=[];
[r,s]=size(c1);
for i=1:r
for j=1:s

C(i,j)=(n1*c1(i,j) + n2*c2(i,j))/(n1+n2);

end
end

invC=inv(C);

////prior probability
p=[n1;n2]./(n1+n2);


//// Discriminant function

f1=mean1*invC*x_all' - 0.5*mean1*invC*mean1' + log(p(1));

f2=mean2*invC*x_all' - 0.5*mean2*invC*mean2' + log(p(2));

classify=1*((f1-f2)<0)+1;


////////////////////////// Testing

/////// the testing set
//ordered
//x_predict=[perim1(4:6,:) mr1(4:6,:) mb1(4:6,:); perim2(4:6,:) mr2(4:6,:) mb2(4:6,:)];

//alternating
x_predict=[perim1(4,:) mr1(4,:) mb1(4,:); perim2(4,:) mr2(4,:) mb2(4,:);perim1(5,:) mr1(5,:) mb1(5,:); perim2(5,:) mr2(5,:) mb2(5,:);perim1(6,:) mr1(6,:) mb1(6,:); perim2(6,:) mr2(6,:) mb2(6,:)];

f1_predict=mean1*invC*x_predict' - 0.5*mean1*invC*mean1' + log(p(1)); f2_predict=mean2*invC*x_predict' - 0.5*mean2*invC*mean2' + log(p(2));
classify_predict=1*((f1_predict-f2_predict)<0)+1;

The results are summarized in the following tables.

TRAINING



All the training objects are classified correctly. There are zero errors in training.
We can now use the computed separator for the testing of objects not used in training.

TESTING

100% accuracy of classification. I give myself a grade of 10. =)

Reference:
http://people.revoledu.com/kardi/tutorial/LDA/LDA.html#LDA

No comments:

Post a Comment