# Linear Regression: Predicting bicycle traffic 

In [None]:
# Reload necessary libraries after reset
import pandas as pd

# File path
file_path = 'processed_data.csv'

# Read the dataset
data = pd.read_csv(file_path)

In [None]:
# Display the first few rows to understand the structure of the dataset
data.head()

In [None]:

# Print a concise summary of the dataframe.
data.info()

In [None]:
# Generate simple descriptive statistics about the data.
data.describe()

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import numpy as np

# Step 1: Prepare the data
# Select features and target variable
features = ['DayLightHrs', 'AvgTempInC', 'PRCP_IN', 'DryDay', 'holiday', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
X = data[features]
y = data['TotalBikesCount']

In [None]:
# Import the model from Scikit-Learn
from sklearn.linear_model import LinearRegression

# Step 2: Train the Linear Regression model
model = LinearRegression()


In [None]:
# Fit the model to feature matrix X, and target labels in the array y
model.fit(X, y)

In [None]:
# Step 3: Make predictions using the best fit model
y_pred = model.predict(X)

In [None]:
# Step 4: Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

# Coefficients
coefficients = pd.DataFrame({
 'Feature': features,
 'Coefficient': model.coef_
})

In [None]:
# Print metric mean squared error
mse

In [None]:
# Print metric r-squared (Co-efficient of regression)
r2

In [None]:
#Print the values in the directory coefficients
coefficients


Each coefficient represents the expected change in bicycle traffic count for a one-unit increase in the corresponding feature, while holding all other factors constant.

1. DayLightHrs (115.36)
For each additional hour of daylight, the expected bicycle traffic increases by ~115 bikes.
More daylight hours likely encourage more people to bike.

2. AvgTempInC (400.78)
For each 1°C increase in average temperature, the expected bike traffic increases by ~401 bikes.
This suggests that warmer weather significantly increases biking activity.

3. PRCP_IN (-175,428.20)
For each additional inch of precipitation (rainfall), bike traffic decreases by ~175,428 bikes.
This very large negative coefficient suggests that heavy rainfall strongly discourages biking.
It may indicate that even small amounts of rain drastically reduce bike traffic.

4. DryDay (516.04)
On dry days (i.e., no precipitation), there are ~516 more bikes counted compared to non-dry days.
This confirms that people prefer biking in dry weather.

5. holiday (-1207.59)
On holidays, bike traffic decreases by ~1208 bikes.
This suggests that fewer people commute by bike on holidays. 

6. Weekdays (Mon-Fri) have significantly higher bike traffic than weekends.
Weekend days (Saturday and Sunday) see a major drop in biking activity, possibly because many people bike for commuting rather than leisure.

# To summarize:

* Weather matters: More daylight and higher temperatures increase bike traffic, while rain significantly reduces it.

* Dry days encourage biking: Dry weather increases ridership, reinforcing the impact of precipitation.

* Holidays reduce bike traffic: Fewer people bike on holidays, likely due to reduced commuting.

* Weekdays have higher bike traffic: Likely due to work commutes, while weekends show a decline.

In [None]:
# Add the predictions to the original dataset as a separate column
data['predicted'] = model.predict(X)

In [None]:
# Plot to compare the predicted daily bike count against the measured total bike count from the data
data[['TotalBikesCount', 'predicted']].plot(alpha=0.5)

In [None]:
# Step 5: Visualize actual vs. predicted values
plt.figure(figsize=(8, 6))
plt.scatter(y, y_pred, alpha=0.7, color="blue")
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', linewidth=2)
plt.title("Actual vs Predicted Bicycle Counts")
plt.xlabel("Actual Bicycle Count")
plt.ylabel("Predicted Bicycle Count")
plt.grid(True)
plt.show()
