AI Data Readiness Infrastructure

Is your data ready for AI?

AIDRIN is open-source infrastructure that quantitatively assesses and improves your dataset's readiness for AI and machine learning, then helps you remediate what's holding it back.

Demo Get Started

How it works

Inspect. Remediate. Transform.

AIDRIN is more than an assessment tool. It closes the loop from inspection to an AI-ready dataset.

Inspect

Quantitatively assess readiness across six dimensions.

Remediate

Apply built-in remedies to fix detected issues.

Transform

Export a cleaned, AI-ready dataset.

Assessment

Seven readiness dimensions

Six core dimensions, color-coded across the readiness spectrum, plus AI Application-Specific readiness that cuts across them all.

Data Quality

Completeness, outliers, duplicates, and overall integrity.

Data Governance

Privacy, sensitivity, and responsible-use signals.

Understandability & Usability

Documentation, metadata, and ease of reuse.

Fairness & Bias

Class imbalance and representation across groups.

Impact on AI

Feature relevance and correlation that shape model outcomes.

Structure & Organization

Schema, formats, and structural consistency.

AI Application-Specific

Cross-cutting

Readiness judged against the needs of your specific AI application, cutting across all six dimensions rather than standing apart from them.

Access

Use AIDRIN your way

One engine, six ways in, from a zero-setup browser app to an AI agent driving it for you.

Web Inspector

Upload and assess datasets in your browser. No setup.

Python Library

pip install and score datasets in scripts and notebooks.

CLI

Agent-ready

Headless command-line evaluation, scriptable and CI-friendly.

MCP Server

Agent-ready

Expose AIDRIN to AI agents via the Model Context Protocol.

Globus Remote Compute

Run metrics on remote datasets without transferring files.

LLM Explanations

Generate plain-language explanations of metric results.

Built to extend

Agentic Evaluation

Let an AI agent inspect, remediate, and report autonomously via the CLI and MCP server.

Custom Metrics

Define your own metrics and remedies through an extensible framework.

OpenTelemetry

Emit traces and metrics for observability into evaluation runs with OpenTelemetry support.

APPFL

Assess data readiness inside privacy-preserving federated learning workflows.

Inputs

Bring your data

AIDRIN reads the formats scientific and ML datasets actually ship in.

CSV
Excel
JSON
Parquet
NumPy
HDF5

Get started

Up and running in minutes

Install the Python library, drive it from the CLI, run it from source, or self-host the full web app.

# pip install aidrin
from aidrin import calculate_completeness, calculate_outliers

# file_info = (path, name, type)
file_info = ("data/adult.csv", "adult.csv", ".csv")

calculate_completeness(file_info)
# {'Overall Completeness': 0.97, 'Completeness scores': {...}}

calculate_outliers(file_info)
# {'Outlier scores': {...}}

aidrin list                       # available metrics
aidrin data-quality data.csv      # completeness, duplicity, outliers
aidrin run completeness data.csv  # a single metric

git clone https://github.com/idtlab/AIDRIN.git
cd AIDRIN
conda create -n aidrin-env python=3.10 -y
conda activate aidrin-env
python -m pip install -e .

# 1) Redis
redis-server --port 6379
# 2) Celery worker
PYTHONPATH=. celery -A worker.make_celery worker --beat --loglevel=info
# 3) Flask app  ->  http://127.0.0.1:5000
flask --app 'web:create_app()' run --debug

Read the full documentation

Research

Backed by peer-reviewed research

AIDRIN grows out of published work on data readiness for AI (2024–2025).

The bottom line

Stop guessing whether
your data is ready.

One engine, every interface. Measure, fix, and ship an AI-ready dataset from the browser, your code, the command line, or an AI agent.

Demo Get Started

Web Inspector CLI Python Library MCP Agent

demo.aidrin.org

data.parquetassessed

Completeness0.92

7 readiness metrics computed

$ aidrin data-quality data.csv

▸ scanning 1.2M rows…

✓ completeness · duplicity · outliers

In [1]from aidrin import calculate_completeness

calculate_completeness(file_info)

Out[1]{'Overall Completeness': 0.97}

Is data.parquet ready for training?

Completeness 0.92, balanced classes, no PII. Imputed 312 nulls and normalized 3 columns.