Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kashaziz/217340c8eb875d4097894ca310fd688a to your computer and use it in GitHub Desktop.
Save kashaziz/217340c8eb875d4097894ca310fd688a to your computer and use it in GitHub Desktop.
Predict House Pricing Date using Liner Regression and Python
"""
This script utilizes machine learning techniques to predict house prices based on various input features.
It performs data preprocessing, splits the data into training and testing sets, trains a linear regression model,
evaluates its performance using mean squared error (MSE) and R-squared.
"""
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the data
data = pd.read_csv('data/houseprices.csv')
# One-hot encode the location feature
encoder = OneHotEncoder()
location_encoded = encoder.fit_transform(data[['location']]).toarray()
# Combine encoded location with other features
X = np.hstack((data[['area', 'bedrooms', 'total_living_area', 'house_age_in_years']].to_numpy(), location_encoded))
y = data['price'].to_numpy()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)
print('Mean Squared Error:', mse)
print('R-squared:', r2)
area bedrooms location total_living_area house_age_in_years year_of_construction proximity_to_amenities price
1000 3 Suburban 850 7 2015 High 250000
1200 4 Urban 1020 13 2009 Medium 365000
1500 5 Rural 1275 14 2008 Low 415000
800 2 Suburban 680 5 2012 High 190000
900 3 Urban 765 7 2010 Medium 260000
1100 4 Rural 935 6 2014 Low 300000
1300 5 Suburban 1105 8 2011 High 450000
600 2 Urban 510 13 2007 Medium 180000
700 3 Rural 595 9 2013 Low 200000
1400 5 Suburban 1190 9 2010 High 465000
1200 3 Urban 1050 8 2011 Medium 320000
1500 4 Rural 1350 10 2005 Low 410000
800 3 Suburban 720 10 2012 High 210000
900 4 Urban 810 6 2009 Medium 290000
1100 5 Rural 980 7 2008 Low 310000
1300 4 Suburban 1150 9 2013 High 440000
650 2 Urban 550 8 2006 Medium 170000
750 3 Rural 650 6 2004 Low 220000
1450 5 Suburban 1250 11 2011 High 480000
1100 3 Urban 950 7 2009 Medium 310000
1500 4 Rural 1400 10 2004 Low 425000
750 2 Suburban 630 5 2008 High 185000
850 3 Urban 710 7 2013 Medium 250000
1050 4 Rural 900 6 2006 Low 280000
1250 5 Suburban 1050 8 2014 High 430000
550 2 Urban 450 13 2012 Medium 160000
650 3 Rural 550 9 2010 Low 195000
1300 5 Suburban 1100 9 2009 High 450000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment