{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression: Predicting bicycle traffic " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Reload necessary libraries after reset\n", "import pandas as pd\n", "\n", "# File path\n", "file_path = 'processed_data.csv'\n", "\n", "# Read the dataset\n", "data = pd.read_csv(file_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Display the first few rows to understand the structure of the dataset\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Print a concise summary of the dataframe.\n", "data.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate simple descriptive statistics about the data.\n", "data.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error, r2_score\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "# Step 1: Prepare the data\n", "# Select features and target variable\n", "features = ['DayLightHrs', 'AvgTempInC', 'PRCP_IN', 'DryDay', 'holiday', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']\n", "X = data[features]\n", "y = data['TotalBikesCount']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the model from Scikit-Learn\n", "from sklearn.linear_model import LinearRegression\n", "\n", "# Step 2: Train the Linear Regression model\n", "model = LinearRegression()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fit the model to feature matrix X, and target labels in the array y\n", "model.fit(X, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Step 3: Make predictions using the best fit model\n", "y_pred = model.predict(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Step 4: Evaluate the model\n", "mse = mean_squared_error(y, y_pred)\n", "r2 = r2_score(y, y_pred)\n", "\n", "# Coefficients\n", "coefficients = pd.DataFrame({\n", " 'Feature': features,\n", " 'Coefficient': model.coef_\n", "})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print metric mean squared error\n", "mse" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print metric r-squared (Co-efficient of regression)\n", "r2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Print the values in the directory coefficients\n", "coefficients\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each coefficient represents the expected change in bicycle traffic count for a one-unit increase in the corresponding feature, while holding all other factors constant.\n", "\n", "1. DayLightHrs (115.36)\n", "For each additional hour of daylight, the expected bicycle traffic increases by ~115 bikes.\n", "More daylight hours likely encourage more people to bike.\n", "\n", "2. AvgTempInC (400.78)\n", "For each 1°C increase in average temperature, the expected bike traffic increases by ~401 bikes.\n", "This suggests that warmer weather significantly increases biking activity.\n", "\n", "3. PRCP_IN (-175,428.20)\n", "For each additional inch of precipitation (rainfall), bike traffic decreases by ~175,428 bikes.\n", "This very large negative coefficient suggests that heavy rainfall strongly discourages biking.\n", "It may indicate that even small amounts of rain drastically reduce bike traffic.\n", "\n", "4. DryDay (516.04)\n", "On dry days (i.e., no precipitation), there are ~516 more bikes counted compared to non-dry days.\n", "This confirms that people prefer biking in dry weather.\n", "\n", "5. holiday (-1207.59)\n", "On holidays, bike traffic decreases by ~1208 bikes.\n", "This suggests that fewer people commute by bike on holidays. \n", "\n", "6. Weekdays (Mon-Fri) have significantly higher bike traffic than weekends.\n", "Weekend days (Saturday and Sunday) see a major drop in biking activity, possibly because many people bike for commuting rather than leisure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# To summarize:\n", "\n", "* Weather matters: More daylight and higher temperatures increase bike traffic, while rain significantly reduces it.\n", "\n", "* Dry days encourage biking: Dry weather increases ridership, reinforcing the impact of precipitation.\n", "\n", "* Holidays reduce bike traffic: Fewer people bike on holidays, likely due to reduced commuting.\n", "\n", "* Weekdays have higher bike traffic: Likely due to work commutes, while weekends show a decline." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add the predictions to the original dataset as a separate column\n", "data['predicted'] = model.predict(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot to compare the predicted daily bike count against the measured total bike count from the data\n", "data[['TotalBikesCount', 'predicted']].plot(alpha=0.5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Step 5: Visualize actual vs. predicted values\n", "plt.figure(figsize=(8, 6))\n", "plt.scatter(y, y_pred, alpha=0.7, color=\"blue\")\n", "plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', linewidth=2)\n", "plt.title(\"Actual vs Predicted Bicycle Counts\")\n", "plt.xlabel(\"Actual Bicycle Count\")\n", "plt.ylabel(\"Predicted Bicycle Count\")\n", "plt.grid(True)\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 4 }