By Brian Blum; reposted with permission from AIM Group‘s Classified Intelligence Report.
Deep Varma knows big data. As vice president of data engineering at U.S. real estate portal Trulia, the Silicon Valley veteran computer scientist oversees the management of 1.5 terabytes of data every single day.
The term “big data” is popping up everywhere these days, Varma told the AIM Group, but to understand what it means in a tangible way, one need only look at a site like Trulia.
Challenge lies in collecting huge scope of data
When Trulia recommends “similar properties,” that is big data at play.
Generating a specific set of suggestions involves combining what Trulia has tagged about a property — everything from the “easy” data like number of bedrooms and bathrooms down to the nitty-gritty tech specs such as the type of marble on the kitchen island or faucet design — with user behavior on the site.
Trulia tracks what it calls consumer “intent,” Varma said. In just a few minutes of engagement on trulia.com, a visitor will “generate an average of 18 to 20 events — or signals — about their intent.” This includes what images they’ve looked at — did they poke around the closets or inspect the size of the kitchen cabinets? — as well as data external to the property itself, such as neighborhood crime scores or local school ratings.
An in-house ‘personalization hub’
Varma and his team of 50 Big Data specialists at Trulia have built an in-house “personalization hub” to serve up the right content to visitors.
“Personalization is when we show you similar properties in your price range in the particular neighborhood you’re looking at,” Varma said. “Individualization takes a broader look — if you’re looking for certain types of schools, we can show you similar properties but in different neighborhoods.”
All that sounds simple enough, but when you have millions of monthly visitors all generating dozens of “events,” coupled with the 4 million listings that Trulia processes every day plus another 10 million public records, it is easy to see why Big Data is big business.
External data, such as public records, are just as valuable
Data at Trulia falls into two core sets, Varma said, with separate teams processing the information (the Trulia and Zillow brands remain completely silo’d within the Zillow Group corporate umbrella).
Trulia processes 4 million listings every day, plus another 10 million public records.
The first dataset comprises listings and public records. Varma called listings a “commodity item” — most are provided to Trulia via feeds from MLSs, brokers and agents.
Public records are trickier. These are the deeds, taxes and assessment data that give visitors to Trulia the historical perspective it needs to understand a property’s true value.
There are 3,000 counties in the U.S., but no standards across counties or even between different types of public records. So data schema, format and accessibility can be wildly different.
Standardizing addresses is undoubtedly “the biggest problem we face,” Varma said.
One misassignment “can screw up all our unique insights.” Trulia built its own tools for address standardization — it’s definitely not something off-the-shelf, Varma said proudly. (For techies, it involves using an open standard format called JSON, derived from JavaScript.)
Into this mix of text listings and records, Trulia adds pictures (“we’ve built our own image recognition technology that can tag a kitchen, bathroom or front yard as such, as well as an object recognition technology that can do the same for dishwashers and stainless steel stoves,” Varma said) and “location aware data” from external sources, such as local amenities and school rankings.
Using the standardized addresses, everything is linked together before it’s merged, indexed and run through Trulia’s Data Service API, which makes the resulting content searchable by visitors to Trulia.com. And of course, it all has to happen in near real time.
On the consumer behavior side, Trulia uses “deep data science and machine learning” in order to build “a digital signature.” Providing the same experience whether a visitor is registered and logged in or not is critical to Varma.
“Our goal is to help consumers make the best decisions. We don’t need their names for that. In fact, the majority of our users are anonymous. We’re not on a path of pushing a person into a funnel towards an agent. It’s up to the consumer. They can do that when they’re ready,” Varma said.
The future is virtual reality and AI
What is the future for big data at Trulia? Varma expects virtual reality to catch on and become another even more valuable source of Big Data to be processed. If today Trulia can track user “events” generated by a click on an image of a refrigerator or the en-suite bathroom, imagine what happens when that same user straps on a pair of VR goggles at home and virtually walks around a home for sale. Every touch, every linger and glance can — and will — be added to the shopper’s digital signature, Varma said.
Before virtual reality goes mainstream, though, the term big data may fall out of use.
“Today, people think, oh my God, it must be a big thing, since the name starts with big. But in the next two to three years, it will become an integral part of every consumer business,” Varma said.
In the future, we may be talking more about the Internet of Things, which is basically the same as big data, or we may simply call it “artificial intelligence.”
Whatever name is used, big data is here to stay, and so is Deep Varma, Trulia’s big data daddy.
© 2016 Advanced Interactive Media Group LLC / Classified Intelligence, reprinted with permission