Created
December 3, 2023 11:24
-
-
Save kashaziz/217340c8eb875d4097894ca310fd688a to your computer and use it in GitHub Desktop.
Predict House Pricing Date using Liner Regression and Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This script utilizes machine learning techniques to predict house prices based on various input features. | |
It performs data preprocessing, splits the data into training and testing sets, trains a linear regression model, | |
evaluates its performance using mean squared error (MSE) and R-squared. | |
""" | |
import pandas as pd | |
import numpy as np | |
from sklearn.preprocessing import OneHotEncoder | |
from sklearn.model_selection import train_test_split | |
from sklearn.linear_model import LinearRegression | |
from sklearn.metrics import mean_squared_error, r2_score | |
# Load the data | |
data = pd.read_csv('data/houseprices.csv') | |
# One-hot encode the location feature | |
encoder = OneHotEncoder() | |
location_encoded = encoder.fit_transform(data[['location']]).toarray() | |
# Combine encoded location with other features | |
X = np.hstack((data[['area', 'bedrooms', 'total_living_area', 'house_age_in_years']].to_numpy(), location_encoded)) | |
y = data['price'].to_numpy() | |
# Split the data into training and testing sets | |
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) | |
# Create and train the linear regression model | |
model = LinearRegression() | |
model.fit(X_train, y_train) | |
# Make predictions on the test set | |
predictions = model.predict(X_test) | |
mse = mean_squared_error(y_test, predictions) | |
r2 = r2_score(y_test, predictions) | |
print('Mean Squared Error:', mse) | |
print('R-squared:', r2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
area | bedrooms | location | total_living_area | house_age_in_years | year_of_construction | proximity_to_amenities | price | |
---|---|---|---|---|---|---|---|---|
1000 | 3 | Suburban | 850 | 7 | 2015 | High | 250000 | |
1200 | 4 | Urban | 1020 | 13 | 2009 | Medium | 365000 | |
1500 | 5 | Rural | 1275 | 14 | 2008 | Low | 415000 | |
800 | 2 | Suburban | 680 | 5 | 2012 | High | 190000 | |
900 | 3 | Urban | 765 | 7 | 2010 | Medium | 260000 | |
1100 | 4 | Rural | 935 | 6 | 2014 | Low | 300000 | |
1300 | 5 | Suburban | 1105 | 8 | 2011 | High | 450000 | |
600 | 2 | Urban | 510 | 13 | 2007 | Medium | 180000 | |
700 | 3 | Rural | 595 | 9 | 2013 | Low | 200000 | |
1400 | 5 | Suburban | 1190 | 9 | 2010 | High | 465000 | |
1200 | 3 | Urban | 1050 | 8 | 2011 | Medium | 320000 | |
1500 | 4 | Rural | 1350 | 10 | 2005 | Low | 410000 | |
800 | 3 | Suburban | 720 | 10 | 2012 | High | 210000 | |
900 | 4 | Urban | 810 | 6 | 2009 | Medium | 290000 | |
1100 | 5 | Rural | 980 | 7 | 2008 | Low | 310000 | |
1300 | 4 | Suburban | 1150 | 9 | 2013 | High | 440000 | |
650 | 2 | Urban | 550 | 8 | 2006 | Medium | 170000 | |
750 | 3 | Rural | 650 | 6 | 2004 | Low | 220000 | |
1450 | 5 | Suburban | 1250 | 11 | 2011 | High | 480000 | |
1100 | 3 | Urban | 950 | 7 | 2009 | Medium | 310000 | |
1500 | 4 | Rural | 1400 | 10 | 2004 | Low | 425000 | |
750 | 2 | Suburban | 630 | 5 | 2008 | High | 185000 | |
850 | 3 | Urban | 710 | 7 | 2013 | Medium | 250000 | |
1050 | 4 | Rural | 900 | 6 | 2006 | Low | 280000 | |
1250 | 5 | Suburban | 1050 | 8 | 2014 | High | 430000 | |
550 | 2 | Urban | 450 | 13 | 2012 | Medium | 160000 | |
650 | 3 | Rural | 550 | 9 | 2010 | Low | 195000 | |
1300 | 5 | Suburban | 1100 | 9 | 2009 | High | 450000 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment