{ "cells": [ { "cell_type": "markdown", "id": "1bc8526e", "metadata": {}, "source": [ "# Scikit-Learn Homework\n", "We will work with breast cancer dataset to compare three different classifiers:\n", "\n", "1. Logistic regression\n", "2. Random forest classifier\n", "3. K-nearest neighbors\n", "\n", "Add the appropriate lines of code to answer the questions in each cell.\n", "\n", "Please use the documentation on the [scikit learn](https://scikit-learn.org/stable/) webpage.\n", "\n", "You may also consult the example notebook from the lecture [Logistic Regression Classifier](http://rcs.bu.edu/classes/MSSP/sp23/MachineLearning/)." ] }, { "cell_type": "code", "execution_count": null, "id": "17a1caaa", "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_breast_cancer" ] }, { "cell_type": "code", "execution_count": null, "id": "ce23580e", "metadata": {}, "outputs": [], "source": [ "data = load_breast_cancer(as_frame=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "75d4f926", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 1 ###\n", "##################\n", "# Import the test_train_split function \n", "# Split the dataset so that 75% of the dataset is for training and 25% is for testing \n", "# Set the random state to 55\n", "\n", "# Print the value of 5th entry of y_test\n", "# Print the of row 3 and column 4 of X_train\n", "# Remember that Python uses 0-indexing\n", "# 2 points" ] }, { "cell_type": "code", "execution_count": null, "id": "acbb6f0f", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 2 ###\n", "##################\n", "# Scale the data to have 0 mean and unit variance\n", "# Print the value of row 3 and column 4 of X_train\n", "# 1 point" ] }, { "cell_type": "code", "execution_count": 1, "id": "2bf11145", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 3 ###\n", "##################\n", "# Use the Scikit-Learn documentation and create 3 classifiers\n", "# 1. log_r is LogisticRegression, set the solver parameter to 'liblinear' and max_iter to 10\n", "# 2. rnd_forest is RandomForestClassifier, set the n_estimators paramter to 5 and the max_depth to 3\n", "# 3. k_near is K-Nearest Neighbor classifier, set the n_neighbors parameter to 4\n", "# Set the random_state parameter to 55 in the LogisticRegression and RandomForestClassifer classes\n", "# 3 points" ] }, { "cell_type": "code", "execution_count": null, "id": "ae42333c", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 4 ###\n", "##################\n", "# Report the accuracy of all three models\n", "# Print which model is the most accurate?\n", "# 4 points" ] }, { "cell_type": "code", "execution_count": null, "id": "c7c3a0cd", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 5 ###\n", "##################\n", "# Compute the precision for each model\n", "# Print which model(s) has the best precision?\n", "# 4 points" ] }, { "cell_type": "code", "execution_count": null, "id": "30d584db", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 6 ###\n", "##################\n", "# Compute the recall for each model\n", "# Print which model(s) has the best recall?\n", "# 4 points" ] }, { "cell_type": "code", "execution_count": null, "id": "c56a2612", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 7 ###\n", "##################\n", "# Plot the confusion matrix for the model with the 2nd- highest accuracy.\n", "# Ensure the title contains the accuracy of the model." ] }, { "cell_type": "code", "execution_count": null, "id": "1735dad5", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 8 ###\n", "##################\n", "# Which model would you pick to predict whether a patient had breast cancer or not?\n", "# Why would you choose this model?\n", "# 2 points" ] }, { "cell_type": "code", "execution_count": null, "id": "144bc46b", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 9 ###\n", "##################\n", "# Compute and print the F1 score for the model you selected in Question 8.\n", "# 2 points " ] }, { "cell_type": "code", "execution_count": null, "id": "9735b05e", "metadata": {}, "outputs": [], "source": [ "##################\n", "### Question 10###\n", "##################\n", "# Plot the ROC curve for the model selected in Question 8.\n", "# Put the AUC score in the legend.\n", "# 3 points" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }