Preparing your data for seamless ingestion - a marketer's overview
To deploy predictive models effectively with Faraday, clients need to start with clean, well-structured data that aligns with their business objectives, ensuring smooth ingestion and accurate predictions. This blog serves as a high-level introduction to this process.


This post is part of a series called Getting started with Faraday that helps to familiarize Faraday users with the platform
When brands partner with Faraday, the goal is clear: deploy predictive models as quickly and effectively as possible to drive business ROI. But before any predictions can happen, the foundation has to be set, and that means getting the right data into the platform.
For marketers without a dedicated data scientist on their team, this step can feel like a hurdle. What data does Faraday need? In what format? How can you ensure a smooth process that avoids issues with matching and model performance?
These are big questions, and answering them correctly could be the difference between actionable predictions in a week or two versus a much slower and less efficient process.
Start with the big picture
Before diving into spreadsheets and data exports, take a step back. The first question we ask any new client is: What problem are you trying to solve? Your goal determines the data you’ll need. For example:
- Are you looking to improve conversion rates? You’ll need historical data on past conversions.
- Want to identify high-value customers? Transactional data will be key.
- Need to optimize outreach? Engagement metrics and response data should be included.
Knowing your objective up front ensures that the data you provide aligns with your business needs.
So what data do we need? The best answer is probably to give us access to *everything—*or at least, as much as is feasible for your company. While some data points are necessary to make the right predictions, all of them can help us to build a better and more holistic understanding of your existing customer base and the best potential customers you could reach within the US population.
So more IS better, but formatting matters
So we’ll just give Faraday all our data as is and let them figure it out?
So, the answer is both yes and no. While more data is generally better, it’s not just about volume—it’s about data quality. Brands without an established system to ensure data hygiene may encounter issues like malformed addresses (e.g., missing state or zip code), missing customer details (e.g. no address at all), or improperly formatted columns (e.g., first and last names in one field). While Faraday’s team can spot these issues early on, working with clean, well-structured data from the start will drastically speed up the process, leading to faster, more accurate predictive models.
If you're facing challenges with data hygiene, don’t hesitate to reach out. In addition to our standard data cleanup, we offer more robust professional services upon request, with a variety of packages designed to help you wrangle your datasets. Even if your data isn’t perfect, we’ll help you identify and fix any issues, setting you up for quicker results.
So what data should I include?
At a minimum, the following fields should be included to ensure proper matching and predictive accuracy:
- Customer identifiers: Name, email, phone, and address (as complete as possible)
- Leads: Name and the date that someone became a lead
- Conversions (transaction-level preferred): Dates of individual transactions, along with product details and amounts (if applicable)
- Engagement metrics: Website visits, email opens, and interactions
Customer-Level vs. Transaction-Level Data
While you can provide customer-level conversion data (e.g., first or last transaction date), we usually do not need it and cannot use it for a holdout test. Instead, transaction-level data—with a date for each transaction—is preferred, as it enables more accurate predictive modeling.
And if you're unsure where to start, focus on conversion rates. Defining how your brand measures conversion—whether through completed purchases or booked appointments—helps us create more accurate predictive models and simplifies the entire process.
Where does your data live?
Once you’ve identified the right data, the next step is understanding where it lives. Is it spread across multiple systems like a CRM, e-commerce platform, and customer support database? Do you have easy access to it and permissions to share it externally, or will you need internal approvals? For enterprise brands, pulling data may require specific requests within your organization. Understanding these logistics early on can prevent delays.
In the best case, your data lives in an organized data warehouse like Snowflake, Bigquery, or Redshift. This makes ingress easy, as Faraday is a fully composable platform that can integrate with existing data warehouses, functionally sitting on top of them as a decisioning layer, and then automatically routing predictions to channels like meta ads, Shopify, or many others. To learn more about our available integrations, take a look at our integrations landing page.
If your data isn’t stored in a warehouse, you can upload individual CSV files through our dashboard, but if you do make sure to pay attention to your data hygiene!
Data hygiene, what’s that?
Data hygiene is the practice of keeping your data clean, structured, and accurate to ensure the best possible outcomes when using predictive modeling and AI tools like Faraday. Poorly formatted, incomplete, or inconsistent data can lead to mismatches, errors, and weaker insights. Clean data doesn’t just make processing smoother—it directly impacts the quality of predictions and business decisions, and this is the single most important thing to pay attention to when importing your data into Faraday!
The first step to good data hygiene is ensuring consistency in formatting. Datapoints such as names, addresses, and dates should follow a structured format, ideally with first and last names in separate columns and dates consistently standardized in a format like YYYY-MM-DD (ISO 8601). Additionally, emails and phone numbers should be complete (no 7 digit phone numbers or filler values like “noemail@none.com”) to enable seamless enrichment with third-party attributes in the Faraday Identity Graph. Clean PII helps ensure the best possible enrichment rates for model performance.
Properly labeled and complete fields prevent confusion and make it easier to match records correctly whereas missing or incomplete records can significantly weaken predictive accuracy.
Despite the fact that data gaps can be somewhat mitigated by aggregated scoring, better results are always seen when providing a sound dataset from the get-go. If certain data points are unavailable, we can still make predictions, but providing as much information as possible—especially time-bound event data—will dramatically improve our ability to analyze your data quickly and accurately.
So in conclusion, taking the time to check, clean and structure your data before ingesting it into Faraday will save time, prevent headaches, and lead to stronger, more reliable predictions. If you have questions about how to prepare your data, don’t hesitate to reach out to your Strategic Account Manager or Faraday support.
And speaking of our supportive team…
Leveraging forward-deployed data science (FDS) for a faster start
Getting your data into Faraday doesn’t have to be a challenge—we offer multiple options to help you onboard seamlessly. Enterprise clients can access Forward-Deployed Data Science (FDS) for hands-on guidance, while all clients benefit from professional services, dedicated support, and self-serve resources to streamline data ingestion.
Our team helps eliminate bottlenecks by structuring and preparing data, optimizing predictive models, and providing strategic guidance. Whether you need help standardizing fields, refining lead suppression strategies, or aligning predictions with business goals, we ensure your models are built on a strong foundation.
For enterprise clients, FDS embeds our experts directly into your workflow, continuously refining models and adapting strategies to market shifts. No matter your level of support, we’re here to help you get started quickly and drive measurable impact from day one.
If you’d like to learn more about this available resource, take a look at our recent blog post detailing our decision to forward deploy our data science team.
Final checklist: Setting your team up for success
To make the handoff between your business and technical teams easier, follow this checklist:
✅ Clearly define your business objective
✅ Identify where relevant data lives
✅ Ensure core customer identifiers are included
✅ Provide date fields for key events
✅ Clarify how conversion rates are measured
✅ Work with FDS to streamline the process
By following these steps, you’ll avoid common pitfalls, speed up deployment, and get to actionable predictions faster—maximizing the impact of AI-driven insights on your business.
In conclusion
Getting your data into Faraday doesn’t have to be complex. With the right preparation and support, you can streamline the process, move quickly from data ingestion to predictive modeling, and unlock actionable insights faster.
If you’re ready to learn more or have any questions about preparing your data for Faraday, don't hesitate to reach out. Or if you're feeling prepared to go at it alone, get started for free today, and sign up here!
Ready for easy AI agents?
Skip the struggle and focus on your downstream application. We have built-in sample data so you can get started without sharing yours.