Hello :) Today is Day 10!
A quick summary of today:
- House price basic prediction model comparison on Kaggle
Data
Majority was categorical data, so I used pd.get_dummies()
to prepare them.
Some viz
-
Area and price correlation
-
Outliers
I used z-score or IQR to deal with them.
Top features
I used the correlation table to find the top-10 features that correlated with price
Model creation
I using LinearRegression, DecisionTreeRegressor, RandomForestRegressor, XGBRegressor (wanted to compare results)
Results
LinearRegression - Train MSE: 0.81978 | Test MSE: 1.37829 |
DecisionTreeRegressor - Train MSE: 1.06803 | Test MSE: 1.64172 |
RandomForestRegressor - Train MSE: 0.91908 | Test MSE: 1.41498 |
XGBRegressor - Train MSE: 0.26813 | Test MSE: 1.60890 |
Lastly I tried using some Polynomial features, and the results are:
I spilled boiling water on my finger, so writing this post was very hard :(
That is all for today!
See you tomorrow :)