We have heard a lot in recent years about so-called “big data.” Having spent my career doing research and creating products using data, I was intrigued to learn more about how big data might be used for commercial real estate.
Some people seem to confuse “big data” with a big database. But big data means much more that large quantities of data. Examples of what we mean by big data are as follows:
- Documents found on websites
- News media
- Review sites
- Blogs, discussion forums, etc.
What differentiates this data from more traditional sources is that it is very unstructured and does not fit neatly into traditional databases. Different techniques must be used with this type of data to utilize it for analysis and predicting trends (predictive analytics).
One approach to this type of data is to perform text mining of documents. Text mining usually involves searching the documents for the occurrence of key words. This might be done in conjunction with the mention of a location. For example, how often does the word “crime” appear in a document in conjunction with a location. Doing this for locations around the country or the world can result in an index that turns out to be highly correlated with government measures of crime rates.
Another example is examining review sites for certain words that indicate either good or bad things about a property, a location or even a person. This has been used to identify investment opportunities. For instance, a number of people saying bad things about an apartment building might indicate it is currently under poor management and may be a turnaround investment opportunity.
A second approach to creating variables from internet related data is Google search frequency -- that is, how many people are searching for a certain term or terms. For example, one could query searches for apartments in the US by state.
This can be broken down further, perhaps as searches for apartments in Chicago.
There are lots of other techniques and methodologies to deal with big data. The amount of data is growing exponentially each year and it seems clear that there are opportunities to apply it to commercial real estate. The caveat is that there needs to be careful consideration around whether the data that is being used has some theoretical basis and isn’t just correlated with historic trends of interest.
Following are Google searches that are highly correlated with housing starts. Only one of the lists seems to make sense theoretically, which is “real estate exam.” In statistics courses, we learn that “correlation does not imply causation.” Many artificial intelligence (AI) researchers just throw a lot of data into the model to see what comes out. This may work at first, but the model may quickly break down due to using data that is correlated only by coincidence.
As many startup companies promise to give you new insights by using “big data” or techniques such as artificial intelligence, it will be important to try to determine which companies are just hype versus those with an understanding of commercial real estate that results in good models.