# Classificaion using Decision Tree Classifier in Scikit-Learn

In [None]:
# Import necessary libraries 
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

### Toy datasets

Scikit learn provides some built in toy datasets. There is an easy API call to load these datasets. Scikit-learn's toy datasets make it easy to test out many kinds of machine learning algorithms. The list of datasets is at this [link](https://scikit-learn.org/stable/datasets/toy_dataset.html#toy-datasets). The following code cell shows how to import a built-in toy dataset.

In [None]:
# Load the Iris dataset
iris = load_iris()

In [None]:
# Print the Iris dataset
print(iris)

In [None]:
# Print shape of the Iris data
iris.data.shape

What kind of object is iris? You can find this out by using the `type()` method.

In [None]:
type(iris)

This means we are working with an object of type `Bunch`. The `Bunch` object iris has the following attributes:

- `data`: the data matrix
- `target`: the classification target
- `feature_names`: the names of the dataset columns
- `target_names`: the names o the target classes

To access the numpy arrays that contain the data matrix and the target values you use these commands

1. `X = iris["data"]` or `iris.data`
2. `y = iris["target"]` or `iris.target`

The same syntax works for `feature_names` or `target_names`.

You can read more about this data type at this [link](https://scikit-learn.org/stable/modules/generated/sklearn.utils.Bunch.html). Let's see this in action

In [None]:
# Create Feature matrix X and target matrix y
X = iris.data
y = iris.target

In [None]:
# Print X
X

In [None]:
#print y
y

In [None]:
# import the train_test_split utility from scikit-learn
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [None]:
# Import the decision tree classifier model from scikit-learn
from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text

# Create and train the decision tree classifier
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)


In [None]:
# Train the model: Fit the decision tree classifier to the feature matrix and target array
clf.fit(X_train, y_train)

In [None]:
# Evaluate the classifier
train_accuracy = clf.score(X_train, y_train)
test_accuracy = clf.score(X_test, y_test)
print(f"Training Accuracy: {train_accuracy:.2f}")
print(f"Testing Accuracy: {test_accuracy:.2f}")


In [None]:
# Visualize the decision tree
plt.figure(figsize=(15, 10))
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True)
plt.title("Decision Tree Visualization")
plt.show()

In [None]:
# Print the textual representation of the decision tree
print("\nDecision Tree Rules:\n")
tree_rules = export_text(clf, feature_names=iris.feature_names)
print(tree_rules)

In [None]:
# Import the confusion matrix utility from scikit-learn 
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Compute and display the confusion matrix
y_pred = clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred, labels=clf.classes_)
print("\nConfusion Matrix:")
print(cm)

In [None]:
# Visualize the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)
disp.plot(cmap=plt.cm.Blues)
plt.title("Confusion Matrix")
plt.show()
