Inman

Getting a grip on real estate data scraping

Editor’s note: In this four-part series, Inman News looks at real estate security in the Internet era. It’s no longer just a matter of keeping clients’ home keys in a safe place. The Web has opened up vulnerabilities to data scraping, and new responsibilities for consumer privacy, MLS passwords and lockbox pass codes. (See Part 1: Real estate industry steps up MLS security; Part 2: Keep real estate clients’ private info private and Part 4: Security gurus spread the word.)

These days, multiple listing services house a lot more than just home listings data. Some allow members to hold all their past clients’ contact information, and some include home showing instructions with such clues as “Don’t show home between 3 p.m. and 5 p.m. – children home alone.” Or “Owner overseas until Oct. 12.”

Gregg Larson, CEO of Clareity Consulting, which performs security audits for real estate companies, pointed to a situation in New York in which someone had been using unauthorized access to an MLS to find homes to hold rave parties in.

This scenario coupled with recent legal opinions that say MLS executives and board members can be held liable for the misuse of their database information serves as a loud wakeup call to real estate companies across the industry, experts say.

While the Internet has become an essential medium for real estate information exchange, the ease of access now poses a security challenge for industry professionals. Real estate companies are now looking at ways to safeguard property listings information and sensitive client information from a variety of vulnerabilities.

One problem that’s been causing angst for real estate brokers since listings data was unleashed on the Web is data scraping, which uses automated programs to strip information from public Web sites.

Data scraping of all kinds is common across the Web, and some companies even specialize in data extraction services or provide code for creating data-scraping programs. Data scraping in most cases is 100 percent legal, according to experts, and scrapers in many instances can remain anonymous.

“You’re not going to stop the distribution of listings data on the Web – once it’s out there it’s going to be scraped,” said Ira Luntz, CEO of Threewide, which provides technology products to help MLSs export and track data.

Since the listings genie is already out of the bottle with data available for public view all over the Internet, Threewide has approached the problem by creating programs to identify the data in specific ways so companies will be able to find out where their data was exported.

“In one button, the association can track who got what data,” Luntz said.

Threewide in March launched ListSecure, a product that pushes data from MLS systems to registered users using controlled, monitored and audited methods. The process includes end-user registration, data encryption, one-time-use data files and embedded data tagging, and image watermarking unique for each export, Luntz said.

Threewide moves more than 80 million MLS listings and 30 million images through its pipelines each month, according to Luntz. The company’s ListAndSend product suite enables MLSs to move their data and images back and forth among a variety of sources, controlling who receives the data and when they will receive it.

Threewide does not track the MLS data, Luntz noted, but its software includes a data-tagging element that enables the MLS to track it. Data sent through Threewide also is encrypted for added security.

It’s not just MLSs and Realtor associations starting to ramp up data security efforts. Luntz said brokers also are feeling the heat to take these extra steps with their MLS data. “We’re getting inquiries from brokerage companies because they want the ListSecure product,” he said.

Brokers may view data security as an extra step they can take to show clients how they will handle their sensitive information. And some Realtor associations and MLSs are using security technologies as promotional offerings for members as well, so they know their data is being tagged, Luntz said.

The National Association of Realtors’ Center for Realtor Technology also has developed two technologies to help members protect their listings data from being scraped off the Web – NoScrape and reCaptcha. Both technologies are modeled after those used in the financial and ticket industries, according to the CRT.

NoScrape uses rendering, an approach that differentiates data from information, according to NAR’s Web site. Rendering generates an image containing the combined data and image so that “bots” – robots that extract data and images from Web sites – can’t easily strip the data from the HTML code.

ReCaptcha can identify a party trying to access a Web site as a human or a computer program by generating questions only a human can answer correctly. For example, reCaptcha will display a distorted image of a word and ask the site user to correctly type the word into a specified area.

The Center for Realtor Technology also has created a management guide on protecting listings, and released a series of white papers in 2004 on how to defeat scraping.

Though technology can go a long way in tracking where data goes and keeping automated scraping programs from easily obtaining information, experts say company policies and education can are essential in data security issues.

Matt Cohen, chief technologist with Clareity Consulting, suggests real estate companies create a “terms of use” policy that specifies how the information on their Web sites can and cannot be used.

Also, “license agreements with (MLS) members should state specifically what use can be made of data,” Cohen said.

He also suggests that companies put coding practices in place to prevent coding errors that make it easy to scrape the site later, educate members on data use rules, implement processes to monitor data use compliance, encourage members to report suspected non-compliance and enforce data use rules.

“The biggest challenge to data scraping is that the data is often easily downloaded from the MLS and misused or put on agent or broker Web sites with little protection,” Cohen said. ” The agent’s ‘Joe Webmaster’ may not have the resources to properly protect the content they are exposing on the Internet.”

***

Send tips or a Letter to the Editor to jessica@inman.com or call (510) 658-9252, ext. 133.