How an online real estate company optimized its Hadoop clusters
San Francisco-based online residential real estate company Trulia lives and dies by data. To compete successfully in today's housing market, tt must deliver the most up-to-date real estate information available to its customers. But until recently, doing so was a daily struggle.Acquired by online real estate database company Zillow in 2014 for $3.5 billion, Trulia is one of the largest online residential real estate marketplaces around, with more than 55 million unique site visitors each month.Hadoop at heart With so much data to store and process, the company adopted Hadoop in 2008 and it has since become the heart of Trulia's data infrastructure. The company has expanded usage of Hadoop to an entire data engineering department consisting of several teams using multiple clusters. This allows Trulia to deliver personalized recommendations to customers based on sophisticated data science models that analyze more than a terabyte of data daily. That data is drawn from new listings, public records and user behavior, all of which is then cross-referenced with search criteria to alert customers quickly when new properties become available.To read this article in full or to leave a comment, please click here