We had a problem. Kind of a big one.
At Faraday we pull in data from dozens of sources, but most of it starts with an address; that's our fuzzy, imperfect primary key.
Many of our customers come from the solar industry, and for those folks we pull in a unique data source: Lidar-derived roof structure information for 27 million households. Unfortunately, these were provided with just a set of lat/lon coordinates. It's possible to match these to our other sources using just spatial data, but an address is better: consider multi-unit dwellings, geocoding accuracy, proximity thresholds. . . .
We were missing our primary key, and in order to get it we'd need to reverse geocode the whole batch.
After getting a few eye-popping quotes from service providers, we tried to do it in-house; there are a few available reverse geocoding servers that rely on linear referencing to do the job. But in an unacceptable number of cases they produced addresses that didn't exist or were positioned incorrectly:
That's when we turned to the OpenAddresses project, and Mapzen's excellent API exposure of it. With some help from their engineers and a batch-reverse-geocoding library we wrote, we were able to tag more than 20 million of our rooftop records with valid, formatted, current addresses. We couldn't have done any better by driving a fleet of cars around with cameras and GPS units.
Now our roof structure data is matched to all our other providers with the highest level of accuracy. We owe a debt of gratitude to the citizens who fund open data projects like these, the OpenAddresses crew for knitting them all together, and to Mapzen for turning it into a brilliant API.