{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classificaion using Decision Tree Classifier in Scikit-Learn" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries \n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import load_iris" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Toy datasets\n", "\n", "Scikit learn provides some built in toy datasets. There is an easy API call to load these datasets. Scikit-learn's toy datasets make it easy to test out many kinds of machine learning algorithms. The list of datasets is at this [link](https://scikit-learn.org/stable/datasets/toy_dataset.html#toy-datasets). The following code cell shows how to import a built-in toy dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load the Iris dataset\n", "iris = load_iris()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print the Iris dataset\n", "print(iris)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print shape of the Iris data\n", "iris.data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What kind of object is iris? You can find this out by using the `type()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(iris)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means we are working with an object of type `Bunch`. The `Bunch` object iris has the following attributes:\n", "\n", "- `data`: the data matrix\n", "- `target`: the classification target\n", "- `feature_names`: the names of the dataset columns\n", "- `target_names`: the names o the target classes\n", "\n", "To access the numpy arrays that contain the data matrix and the target values you use these commands\n", "\n", "1. `X = iris[\"data\"]` or `iris.data`\n", "2. `y = iris[\"target\"]` or `iris.target`\n", "\n", "The same syntax works for `feature_names` or `target_names`.\n", "\n", "You can read more about this data type at this [link](https://scikit-learn.org/stable/modules/generated/sklearn.utils.Bunch.html). Let's see this in action" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create Feature matrix X and target matrix y\n", "X = iris.data\n", "y = iris.target" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print X\n", "X" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#print y\n", "y" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import the train_test_split utility from scikit-learn\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Split the dataset into training and testing sets\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the decision tree classifier model from scikit-learn\n", "from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text\n", "\n", "# Create and train the decision tree classifier\n", "clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Train the model: Fit the decision tree classifier to the feature matrix and target array\n", "clf.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Evaluate the classifier\n", "train_accuracy = clf.score(X_train, y_train)\n", "test_accuracy = clf.score(X_test, y_test)\n", "print(f\"Training Accuracy: {train_accuracy:.2f}\")\n", "print(f\"Testing Accuracy: {test_accuracy:.2f}\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualize the decision tree\n", "plt.figure(figsize=(15, 10))\n", "plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True)\n", "plt.title(\"Decision Tree Visualization\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print the textual representation of the decision tree\n", "print(\"\\nDecision Tree Rules:\\n\")\n", "tree_rules = export_text(clf, feature_names=iris.feature_names)\n", "print(tree_rules)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the confusion matrix utility from scikit-learn \n", "from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n", "\n", "# Compute and display the confusion matrix\n", "y_pred = clf.predict(X_test)\n", "cm = confusion_matrix(y_test, y_pred, labels=clf.classes_)\n", "print(\"\\nConfusion Matrix:\")\n", "print(cm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualize the confusion matrix\n", "disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)\n", "disp.plot(cmap=plt.cm.Blues)\n", "plt.title(\"Confusion Matrix\")\n", "plt.show()\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 4 }