Is Your Data Ready for AI?

Every week we talk to companies who want to "use AI". Sometimes that means a recommendation engine. Sometimes it means predictive churn modelling. Sometimes it means "something with ChatGPT". But in almost every conversation, before we can discuss what's possible with AI, we need to have a harder conversation about the data that would power it.

Because here's the inconvenient truth: most AI projects fail not because of the model, but because of the data. Bad inputs, missing data, inconsistent definitions, untracked pipelines — these are the real failure points.

"You can't build a skyscraper on sand. And you can't build a reliable AI system on unreliable data."

The Readiness Checklist

Go through this section by section. Be honest. The goal isn't to score perfectly — it's to know exactly where the gaps are before you start writing a brief for an AI vendor.

📂 Data Availability

✓

12–24 months of historical data for the target problem

Most ML models need substantial history to find meaningful patterns — less than a year rarely works for anything seasonal.

Data covers the full range of scenarios (including edge cases)

If your data only captures "normal" operations, your model will fail on outliers — the exact cases you often care about most.

✓

Data is in a central warehouse — not scattered in source systems

You need a queryable, consolidated layer. Spreadsheets and API calls don't count.

🧪 Data Quality

✓

Key fields have low null rates (<5% for critical features)

High null rates mean the data was never captured or there's a systematic collection problem. Both are serious.

✗

No duplicate records in entity tables (customers, orders, products)

Duplicates inflate counts, distort aggregates, and confuse models significantly. This is the #1 quality issue we encounter.

Categorical fields use consistent values

No "UK", "United Kingdom", "U.K." for the same thing. Inconsistent categories require significant cleaning effort before modelling.

🔐 Data Governance

✓

You know who owns each data source

AI models drift when underlying data changes. You need a contact when source systems are updated.

✗

Business definitions are documented and agreed

If your team debates what an "active customer" is, your model will be trained on an ambiguous target — and nobody will trust the output.

⚙️ Infrastructure

✓

Data pipelines are automated with no manual steps

A model needing fresh data is only as reliable as the pipeline feeding it. Manual steps are failure points.

You can version and store model outputs alongside input data

When a prediction goes wrong, you need to reconstruct exactly what data the model saw. This requires infrastructure, not just code.

✓ Ready · ! Needs work · ✗ Not in place — assess your current state honestly before committing to an AI project

How to Read Your Score

All ✓

Ready to start

Pick one well-scoped problem and begin a focused pilot.

Mixed

Proceed with caution

Resolve data quality and governance gaps first — they compound during model development.

Several ✗

Build foundation first

6 months of data infrastructure investment will make your AI project 10× more likely to succeed.

Your score is a compass, not a verdict — every gap is fixable with the right priorities

The Question We Always Ask

Before a client commits to an AI project, we ask: "Can you currently answer your top five business questions reliably from your data?"

If the answer is no — if reports take days to produce, if numbers are disputed, if there's no agreed source of truth — then AI is not the next step. Good analytics is.

The honest truth: A business that can reliably answer its operational questions with clean data will get more value from that capability than from a complex ML model built on shaky foundations. Get the basics right first.

What "Ready" Actually Looks Like

Ready for AI doesn't mean perfect data. It means:

You have enough clean historical data covering the problem space.
The data is in a warehouse, automated, and reasonably well governed.
You have a specific, well-scoped question you want the model to answer.
There's a business owner who will act on the model's output.

That last point is more important than any technical criteria. The best model in the world produces no value if the organisation isn't set up to act on its predictions.

If you'd like to run through this checklist with your own data, we offer a half-day AI readiness workshop. Practical, honest, and no sales pitch.

AI & ML Data Readiness Data Quality Machine Learning Data Governance

Is Your Data Actually Ready for AI? A Practical Checklist

The Readiness Checklist

12–24 months of historical data for the target problem

Data covers the full range of scenarios (including edge cases)

Data is in a central warehouse — not scattered in source systems

Key fields have low null rates (<5% for critical features)

No duplicate records in entity tables (customers, orders, products)

Categorical fields use consistent values

You know who owns each data source

Business definitions are documented and agreed

Data pipelines are automated with no manual steps

You can version and store model outputs alongside input data

How to Read Your Score

The Question We Always Ask

What "Ready" Actually Looks Like

More from Data Insights