(Day 10) Simple linear regression to predict GCPA

Ivan Ivanov · January 11, 2024

Hello :) Today is Day 10!

A quick summary of today:

  • House price basic prediction model comparison on Kaggle

Data

image

Majority was categorical data, so I used pd.get_dummies() to prepare them.

Some viz

  • Area and price correlation image

  • Outliers image image

I used z-score or IQR to deal with them.

Top features

I used the correlation table to find the top-10 features that correlated with price image

Model creation

I using LinearRegression, DecisionTreeRegressor, RandomForestRegressor, XGBRegressor (wanted to compare results)

Results

LinearRegression - Train MSE: 0.81978 Test MSE: 1.37829
DecisionTreeRegressor - Train MSE: 1.06803 Test MSE: 1.64172
RandomForestRegressor - Train MSE: 0.91908 Test MSE: 1.41498
XGBRegressor - Train MSE: 0.26813 Test MSE: 1.60890

Lastly I tried using some Polynomial features, and the results are: image

I spilled boiling water on my finger, so writing this post was very hard :(

That is all for today!

See you tomorrow :)

Original post in Korean