AI and machine learning visualization
AI & ML

Is Your Data Actually Ready for AI? A Practical Checklist

Before you invest in AI, make sure your data foundation won't be the thing that holds you back. A no-nonsense readiness assessment from people who've been asked this question a lot.

BE
BISTEC Data Engineers
January 2025 ยท Data Elevator
7 min read

Every week we talk to companies who want to "use AI". Sometimes that means a recommendation engine. Sometimes it means predictive churn modelling. Sometimes it means "something with ChatGPT". But in almost every conversation, before we can discuss what's possible with AI, we need to have a harder conversation about the data that would power it.

Because here's the inconvenient truth: most AI projects fail not because of the model, but because of the data. Bad inputs, missing data, inconsistent definitions, untracked pipelines โ€” these are the real failure points.

"You can't build a skyscraper on sand. And you can't build a reliable AI system on unreliable data."

The Readiness Checklist

Go through this section by section. Be honest. The goal isn't to score perfectly โ€” it's to know exactly where the gaps are before you start writing a brief for an AI vendor.

๐Ÿ“‚ Data Availability
โœ“

12โ€“24 months of historical data for the target problem

Most ML models need substantial history to find meaningful patterns โ€” less than a year rarely works for anything seasonal.

!

Data covers the full range of scenarios (including edge cases)

If your data only captures "normal" operations, your model will fail on outliers โ€” the exact cases you often care about most.

โœ“

Data is in a central warehouse โ€” not scattered in source systems

You need a queryable, consolidated layer. Spreadsheets and API calls don't count.

๐Ÿงช Data Quality
โœ“

Key fields have low null rates (<5% for critical features)

High null rates mean the data was never captured or there's a systematic collection problem. Both are serious.

โœ—

No duplicate records in entity tables (customers, orders, products)

Duplicates inflate counts, distort aggregates, and confuse models significantly. This is the #1 quality issue we encounter.

!

Categorical fields use consistent values

No "UK", "United Kingdom", "U.K." for the same thing. Inconsistent categories require significant cleaning effort before modelling.

๐Ÿ” Data Governance
โœ“

You know who owns each data source

AI models drift when underlying data changes. You need a contact when source systems are updated.

โœ—

Business definitions are documented and agreed

If your team debates what an "active customer" is, your model will be trained on an ambiguous target โ€” and nobody will trust the output.

โš™๏ธ Infrastructure
โœ“

Data pipelines are automated with no manual steps

A model needing fresh data is only as reliable as the pipeline feeding it. Manual steps are failure points.

!

You can version and store model outputs alongside input data

When a prediction goes wrong, you need to reconstruct exactly what data the model saw. This requires infrastructure, not just code.

โœ“ Ready  ยท  ! Needs work  ยท  โœ— Not in place โ€” assess your current state honestly before committing to an AI project

How to Read Your Score

All โœ“
Ready to start
Pick one well-scoped problem and begin a focused pilot.
Mixed
Proceed with caution
Resolve data quality and governance gaps first โ€” they compound during model development.
Several โœ—
Build foundation first
6 months of data infrastructure investment will make your AI project 10ร— more likely to succeed.
Your score is a compass, not a verdict โ€” every gap is fixable with the right priorities

The Question We Always Ask

Before a client commits to an AI project, we ask: "Can you currently answer your top five business questions reliably from your data?"

If the answer is no โ€” if reports take days to produce, if numbers are disputed, if there's no agreed source of truth โ€” then AI is not the next step. Good analytics is.

The honest truth: A business that can reliably answer its operational questions with clean data will get more value from that capability than from a complex ML model built on shaky foundations. Get the basics right first.

What "Ready" Actually Looks Like

Ready for AI doesn't mean perfect data. It means:

That last point is more important than any technical criteria. The best model in the world produces no value if the organisation isn't set up to act on its predictions.


If you'd like to run through this checklist with your own data, we offer a half-day AI readiness workshop. Practical, honest, and no sales pitch.

AI & ML Data Readiness Data Quality Machine Learning Data Governance