We all know that university housing is expensive.Very expensive.I remember when I lived around the University of Texas at Austin campus each year my rent went up by a hundred dollars or more.
Hypothesis: University towns have their mean housing prices less affected by recessions. Run a t-test to compare the ratio of the mean price of houses in university towns the quarter before the recession starts compared to the recession bottom.
A recession is defined as starting with two consecutive quarters of GDP decline and ending with two consecutive quarters of GDP growth.We will compare the ratio of the mean price of houses in university towns the quarter before the recession starts (first quarter of the year 2008) compared to the recession bottom. A recession bottom is a quarter within a recession which had the lowest GDP.
The data that is from different three different resources. First, the housing data is from Zillow research data site. The data file as all homes at a city level with median home sales prices. Second, from the Wikipedia page college town list of university towns in the United States which was copied and pasted into the .txt file. Lastly, Bureau of Economic Analysis, US Department of Commerce, the GDP over time of the United States in current dollars, in quarterly intervals, in the .xls file. We will only look at GDP data from the first quarter of 2000 onward. All data sets can be found on my Github.
First, I imported all the useful libraries that will help us compute data. Below I’ve shown how we get access to everything in the Pandas library – we import it! This was to note that this is the standard way to import the Pandas library. We should always be sure that if we are importing the entire pandas library, we follow this syntax. It’s common practice to use ‘pd’ as the alias. This makes it easier for others to read our code.
The list that I simply copy pasted in a .txt file had a lot of irregularities such as the name of the university. For example under Texas: “Austin ( the University of Texas at Austin, St. Edwards University, Huston-Tillotson University)”. The format of the data frame that we are interested includes only States and Region Name. For “State”, removing characters from “[” to the end. For “Region Name”, when applicable, removing every character from ” (” to the end. After much cleaning, I ended up with the desired data frame. Before and After results:
The recession year for this assignment starts in 2008, and end in 2009. So we need to get all those values from the US Department of Commerce .xls file. You can find the code for that on my GitHub.
Next, I converted the housing data into quarterly(three month periods), and returned the mean in a new data frame. The data frame that was created has a multi-index shape of [‘State’,’RegionName’]. Looks something like this:
Lastly, I showed the decline or growth of housing prices
between the recession start and the recession bottom. Then ran a t-test
comparing the university town values to the values of the non-university town,
return whether the alternative hypothesis (that the two groups are the same)
is true or not as well as the p-value of the confidence.
Return the tuple (different, p, better) where different=True if the t-test is
True at a p<0.01 (we reject the null hypothesis), or different=False if otherwise
(we cannot reject the null hypothesis). The variable p should
be equal to the exact p value returned from scipy.stats.ttest_ind(). The
value for better should be either “university town” or “non-university town”
depending on which has a lower mean price ratio (which is equivalent to a
reduced market loss).’
The result of t-test shows us that we can confidently say that college towns were more resilient against the “Great Recession” starting in 2008. Our p-value fell well below the .01 threshold. This means that whether a city was a college town or not played a significant role in the change of its mean house price in the time spanning the start of the recession to the bottom of the recession.