"So after googling for the past 2 hours you have somehow just fitted (or trained in layman terms) your model successfully and now you would like to know the accuracy score of your fit. Here are a couple of ways in which you can do that in python"
(How I fit my models every day. Source: Twitter/thesmartjokes)
IntroductionBefore jumping right into the implementation let's first see how to calculate accuracy. As per the definition given in the course Udacity - Intro to Machine Learning : "Accuracy is defined as the number of test points that are classified correctly divided by the total number of test points."
Now that we understand how to calculate accuracy we can create a toy dataset and work on it.
Implementation in Python
import random random.seed(18) # total data point n_points = 100 # data points X = [(random.random(), random.random()) for ii in range(0,n_points)] # data labels y = [round(random.random()) for ii in range(0,n_points)] # split into train/test sets split = int(0.75*n_points) # Train:Test::0.75:0.25 X_train = X[0:split] # features_train X_test = X[split:] # features_test y_train = y[0:split] # labels_train y_test = y[split:] # labels_test # import Gaussian Naive Bayes (GaussianNB) from sklearn.naive_bayes import GaussianNB # define classifier clf = GaussianNB() # fit the training data features and it's labels clf.fit(X_train, y_train) # predict labels for the test dataset pred = clf.predict(X_test)
1. Generate total 100 data points which are nothing but our features (Value between 0 to 1) : X 2. Now generate their labels wrt to the features, also 100 in numbers. (Value 0 OR 1) : y 3. Split the data with training percent 75% and testing 25%. : split 4. 75% of the features (X) for training : X_train 5. 25% of the features (X) for testing : X_test 6. 75% of the labels (y) for training : y_train 7. 25% of the labels (y) for testing : y_test 8. Use Gaussian Naive Bayes as our classifier 9. Predict the labels for the features of our testing set : pred Note: In machine learning, we use 'small y' in contrast with 'capital Y' for the labels.
len(y_test) == len(X_test) >>> True
1. Without any use of the external libraryIn this approach, we will first count the total number of correctly predicted labels and then divide them by to the total number of labels (or test data points)
count = len(['matched' for idx, label in enumerate(y_test) if label == pred[idx]]) print((float(count) / len(y_test))) >>> 0.68
2. Using numpyWe will use the numpy sum() method to check the total number of correctly predicted labels in contrast to iterating over them.
import numpy as np print(float(np.sum(pred == y_test)/len(y_test))) >>> 0.68
3. Using sklearn.metricsWe will use the `sklearn.metrics.accuracy_score` which is recommended over the previous. ('Cause admit it, you are lazy)
from sklearn.metrics import accuracy_score print(accuracy_score(pred, y_test)) >>> 0.68
4. Using the score method of the classifierMost of the classifiers provide this method. It takes the testing features and it's labels as the parameters.
print(clf.score(X_test, y_test)) >>> 0.68
ConclusionSo in this article, we learned what is accuracy and how to find it. We also learned how to implement in python.
Reference : https://twitter.com/thesmartjokes/status/843730749404545024
 : https://in.udacity.com/course/intro-to-machine-learning--ud120-india
 : http://scikit-learn.org/stable/modules/classes.html#module-sklearn.naive_bayes
 : http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
 : https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
 : https://mahata.github.io/machine%20learning/2014/12/31/sklearn-accuracy_score/