top of page
Search

RENTING COST IN 5 BRAZIL CITIES

tuyetgiangst93

This was my very first project after taking Regression Theory and R courses. The project used a dataset on Kaggle. There were 10,962 observations with 13 different features. R was utilized to build the regression model in predicting the rent amount based on 11 features.

I used ggplot2 to summary and clean the data before building the regression. The potential wrong values would be removed from the dataset to avoid wrong prediction. To determining the wrong observations, I did research about the cities that the dataset took place. The aspects of economic, population, standard living, geography were included in the research. Along with the information, I use the boxplot and summary table of each feature to decide. For example, the observation with the highest floor #301 was deleted because, in fact, there is no building in Brazil that has 301 stories. Similar procedures applied for the other features.


Boxplots of data before and after removing the potential wrong observations

After cleaning the dataset, I build the full model to predict the rent amount. There were two violations in this regression model, multicolinearity and non-normality. The multicolinearity was detected by using cor function to check the correlation. I also used Variation Inflation Factor to double check. Centering the data help solved the multicolinearity problem in quadratic term while transforming in the response helped with the non-normality issue.


Residuals and QQ Plots to check for model adequacy

In the end, I got the regression model with second degree that had minor violation which we can ignore. Here are some conclusions:

  1. About 91.48% of the variability in the rent amount is being explained by the predictors in the fitted model.

  2. The feature area surprising does not effect the rent amount. Similarly, the rent will not change if the owner changes from accept to not accept animal.Sao Paulo is the capital of Brazil, so it makes sense that when the tenant shifts their home from Belo Horizonte to Sao Paulo, their rent will increase. Also for this dataset, as the area, room, HOA and insurance increase, the rent will also rise up, but at some point, the rent will not increase no matter how much these feature increase. The property tax seems to be different that it inversely proportional to the rent amount. The regression model initally violated the normality assumption. I checked and did transformation on the response to solve it.

The project used the dataset on Kaggle. You can find the dataset here

You can find my codes in R here

20 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post

©2022 by Thi Giang. Proudly created with Wix.com

bottom of page