{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Classificaion using Decision Tree Classifier in Scikit-Learn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries \n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.datasets import load_iris"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Toy datasets\n",
    "\n",
    "Scikit learn provides some built in toy datasets. There is an easy API call to load these datasets. Scikit-learn's toy datasets make it easy to test out many kinds of machine learning algorithms.  The list of datasets is at this [link](https://scikit-learn.org/stable/datasets/toy_dataset.html#toy-datasets). The following code cell shows how to import a built-in toy dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the Iris dataset\n",
    "iris = load_iris()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the Iris dataset\n",
    "print(iris)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print shape of the Iris data\n",
    "iris.data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What kind of object is iris? You can find this out by using the `type()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(iris)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This means we are working with an object of type `Bunch`. The `Bunch` object iris has the following attributes:\n",
    "\n",
    "- `data`: the data matrix\n",
    "- `target`: the classification target\n",
    "- `feature_names`: the names of the dataset columns\n",
    "- `target_names`: the names o the target classes\n",
    "\n",
    "To access the numpy arrays that contain the data matrix and the target values you use these commands\n",
    "\n",
    "1. `X = iris[\"data\"]` or `iris.data`\n",
    "2. `y = iris[\"target\"]` or `iris.target`\n",
    "\n",
    "The same syntax works for `feature_names` or `target_names`.\n",
    "\n",
    "You can read more about this data type at this [link](https://scikit-learn.org/stable/modules/generated/sklearn.utils.Bunch.html). Let's see this in action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create Feature matrix X and target matrix y\n",
    "X = iris.data\n",
    "y = iris.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print X\n",
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#print y\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the train_test_split utility from scikit-learn\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "# Split the dataset into training and testing sets\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the decision tree classifier model from scikit-learn\n",
    "from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text\n",
    "\n",
    "# Create and train the decision tree classifier\n",
    "clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train the model: Fit the decision tree classifier to the feature matrix and target array\n",
    "clf.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Evaluate the classifier\n",
    "train_accuracy = clf.score(X_train, y_train)\n",
    "test_accuracy = clf.score(X_test, y_test)\n",
    "print(f\"Training Accuracy: {train_accuracy:.2f}\")\n",
    "print(f\"Testing Accuracy: {test_accuracy:.2f}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize the decision tree\n",
    "plt.figure(figsize=(15, 10))\n",
    "plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True)\n",
    "plt.title(\"Decision Tree Visualization\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the textual representation of the decision tree\n",
    "print(\"\\nDecision Tree Rules:\\n\")\n",
    "tree_rules = export_text(clf, feature_names=iris.feature_names)\n",
    "print(tree_rules)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the confusion matrix utility from scikit-learn \n",
    "from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n",
    "\n",
    "# Compute and display the confusion matrix\n",
    "y_pred = clf.predict(X_test)\n",
    "cm = confusion_matrix(y_test, y_pred, labels=clf.classes_)\n",
    "print(\"\\nConfusion Matrix:\")\n",
    "print(cm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize the confusion matrix\n",
    "disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)\n",
    "disp.plot(cmap=plt.cm.Blues)\n",
    "plt.title(\"Confusion Matrix\")\n",
    "plt.show()\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}