Iris Classifier
Use kNN model, sklearn, python and the classic iris dataset to predict flower species based on features.
About:
This case study is for phase 1 of my 100 days of machine learning code challenge.
This is a homework solution to a section in Machine Learning Classification Bootcamp in Python.
Problem Statement:
Predict Species of Iris given 4 feature measurments
- Sepal Length (cm)
- Sepal Width (cm)
- Petal Length (cm)
- Petal Width (cm)
Technology used:
Model(s):
Dataset(s):
- The famous Iris dataset
Libraries:
Resources:
Contact:
If for any reason you would like to contact me please do so at the following:
KNN Iris Classifier¶
KNN used for classifier Compares to most similar data points
Import Libraries¶
In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
Import & Explore Data¶
In [2]:
iris = pd.read_csv('../datasets/iris/iris.csv')
In [3]:
iris.head()
Out[3]:
In [4]:
iris.tail()
Out[4]:
In [29]:
sns.pairplot(iris, hue = 'Species', vars = ['SepalLengthCm',
'SepalWidthCm',
'PetalLengthCm',
'PetalWidthCm' ])
Out[29]:
In [30]:
sns.scatterplot(x = 'SepalLengthCm',
y = 'PetalLengthCm',
hue = 'Species',
data = iris)
Out[30]:
In [7]:
# plot corrilations
plt.figure(figsize =(30,20))
sns.heatmap(iris.corr(), annot = True)
Out[7]:
Data Cleaning & Prep¶
In [8]:
X = iris.drop(['Species'], axis = 1)
In [9]:
X.shape
Out[9]:
In [10]:
X.head()
Out[10]:
In [11]:
y = iris['Species']
In [12]:
y.shape
Out[12]:
In [13]:
y
Out[13]:
In [14]:
# transform y data into digits (0,1)
from sklearn.preprocessing import LabelEncoder
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
In [15]:
y
Out[15]:
In [31]:
# Create train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.20,
random_state = 5,
stratify=y)
In [17]:
X_train.shape
Out[17]:
In [18]:
X_test.shape
Out[18]:
Train & test Model¶
In [19]:
from sklearn.neighbors import KNeighborsClassifier
In [32]:
classifier = KNeighborsClassifier(n_neighbors=3,
metric = 'minkowski',
p=2)
classifier.fit(X_train, y_train)
Out[32]:
In [21]:
y_pred = classifier.predict(X_test)
In [22]:
print(y_pred)
print(y_test)
In [23]:
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test, y_pred)
In [24]:
sns.heatmap(cm,annot = True)
Out[24]:
In [25]:
print(classification_report(y_test, y_pred))
In [26]:
import shap
# print the JS visualization code to the notebook
shap.initjs()
In [27]:
# explain all the predictions in the test set
explainer = shap.KernelExplainer(classifier.predict, X_train)
shap_values = explainer.shap_values(X_test)
In [28]:
shap.summary_plot(shap_values, X_test)