Deep dive: FIG v2 historical data and what it means for your models

If you've been following our FIG v2 series, you already know about the overhauled data catalog and the broader announcement of what's coming. Today I want to go deep on what I think is the most technically significant upgrade in the entire release: historical data.

What "historical data" actually means

The Faraday Identity Graph has always given you an up-to-date snapshot of every US adult—what we know about them right now. FIG v2 changes that fundamentally. Instead of a single current value for each attribute, we have the full history of values going back many years, each one timestamped and annotated with precision and source metadata.

Think of it this way: rather than knowing that someone's income is $85,000, you now know that their income was $62,000 five years ago, $74,000 three years ago, and $85,000 today. Each of those data points has a timestamp, a confidence level, and information about where it came from. For you, this means every data attribute isn’t just a point-in-time number, but instead a timeline of historical events.

This is possible because Faraday has been a careful steward of consumer data for over a decade. We've been collecting, normalizing, and preserving these observations the whole time. FIG v2 is the first time we've been able to surface all of that accumulated history in a structured, ML-ready format.

Why this matters most for modeling

The biggest immediate payoff from historical data is better predictive models. To understand why, it helps to think through how model training actually works.

Let's start with a propensity-to-buy model (i.e., a lead score model). When you build that model, you feed the algorithm historical examples: leads that converted and leads that didn't. The algorithm finds patterns—the characteristics that consistently differentiated the converters from the non-converters—and encodes those patterns into a model. Later, when new leads come in, the model applies those patterns to score them based on which is most likely to buy.

Here's the problem that a model without historical data has: if it’s trained without historical data then it is actually training on what the person looks like after they purchased, NOT the actual time of purchase. Take a home remodeling brand trying to predict which prospects are likely to purchase new windows. To train that model, you feed it historical examples: people who bought windows and people who didn't. But here's the catch—a person looks very different in the data after buying windows than they did before. After the purchase, there's a contractor transaction on their record, a home equity line of credit may have appeared, and their home improvement spend has spiked. Those signals weren't there when they were still a prospect. If your training data captures people as they look after converting rather than just before, the model learns the wrong patterns entirely—and the prospects it finds will be people who match a post-purchase profile, not a pre-purchase one.

Until now, this was a problem we could only partially solve. If a brand had leads going back three years, we could append current FIG data to those leads, but "current" data for a lead from three years ago is not the same as that person's data at the time they became a lead. A lot has changed in three years.

With FIG v2, we can press rewind and assemble a training dataset based on what they looked like just before they submitted the lead form and what they looked like after they purchased. The model learns what a genuine pre-purchase prospect actually looks like—and gets dramatically better at finding more of them.

A first-of-its-kind resource for data science teams

Better internal modeling is the most immediate benefit, but FIG v2's historical dataset also opens up something that hasn't really existed before commercially: a comprehensive, historical consumer dataset that's fully ML-ready out of the box.

Every historical observation comes with the kind of metadata that data scientists actually need—data types, statistical types, directionality guidance, coverage breakdowns, and more. If your team is building your own internal models and wants to incorporate rich consumer context that goes back in time, this is the dataset to do it with.

What's coming next: change detection

We're also excited about something that historical makes possible further down the road: monitoring when a person's profile changes in a meaningful way.

The most valuable signal isn't always the value of an attribute—it's the fact that the attribute just changed. Someone who just became a new homeowner, or just had their first child, or just crossed an income threshold is a fundamentally different prospect than someone who has been in that state for years. Life events create moments of opportunity, and right now, most brands have no reliable way to know when those moments arrive.

Having a full historical vector for every attribute makes this kind of change detection possible. We'll be sharing more about how we're building that capability in Q2.

Try it this month

FIG v2 is in beta now and releasing this month. If you want to get your hands on the historical data and see how it affects your model performance, reach out to your account management team. And keep an eye out—we have more deep dives coming before the full release.

Deep dive: FIG v2 historical data and what it means for your models

What "historical data" actually means

Why this matters most for modeling

A first-of-its-kind resource for data science teams

What's coming next: change detection

Try it this month

Andy Rossmeissl

Ready for easy AI?

Now live: FIG v2 - the biggest upgrade to our customer data system, ever

Deep dive: FIG v2 data freshness and release pinning

Deep dive: FIG v2 data catalog

How data enrichment actually works with Faraday, and what data you can find

Deep dive: FIG v2 data freshness and release pinning

Get the B2C data your brand needs to stop patching gaps and start executing