AI/ML Due Diligence: Evaluating Machine Learning Systems in Acquisitions

AI and machine learning capabilities are increasingly cited as core value drivers in software acquisitions. Yet for every legitimate ML-powered platform, there are targets where "AI" is a marketing veneer over hardcoded business rules and manual data processing. For PE firms, distinguishing genuine ML moats from AI theater requires a specialized diligence framework that most traditional technical assessments fail to address.

This briefing provides acquirers with a structured methodology for evaluating the maturity, defensibility, and scalability of machine learning systems, covering the six vectors that determine whether an AI capability is a durable competitive advantage or an unsustainable cost center.

1. Model Governance & Documentation

Model governance determines whether an ML system is a managed asset or an uncontrolled experiment running in production.

Model Registry: Does the target maintain a centralized model registry (MLflow, Weights & Biases, Neptune) that catalogs all production models with version history, training configurations, and performance benchmarks? Without a registry, the target cannot reproduce its own models—a catastrophic risk if key ML engineers depart post-close.
Model Cards & Documentation: Request model cards for each production model documenting intended use, training data, performance metrics, limitations, and known biases. Google's Model Card framework is the industry standard. Absence of model documentation means the acquirer is purchasing black boxes.
A/B Testing & Experiment Tracking: How does the target evaluate model improvements before production deployment? Mature ML organizations maintain rigorous experiment tracking with controlled rollouts. Teams that deploy model updates without A/B testing are flying blind.

2. Training Data Provenance & Quality

In machine learning, the training data is the product. A model is only as defensible as the data it was trained on, and data provenance issues can create existential legal exposure.

Data Lineage Documentation: Can the target trace every training dataset back to its original source with documented consent, licensing terms, and transformation history? The inability to demonstrate data lineage is a ticking time bomb in the current regulatory environment (EU AI Act, state-level AI legislation).
Web Scraping Exposure: Was training data acquired through web scraping without explicit licensing? The legal landscape for scraped training data is rapidly evolving (New York Times v. OpenAI, Getty v. Stability AI). Acquirers must assess litigation exposure for training data obtained without clear commercial rights.
Data Quality Pipeline: Does the target have automated data quality checks (schema validation, distribution drift detection, anomaly detection) in the training pipeline? Models trained on degraded data produce degraded predictions—and the degradation often goes undetected until customer impact is severe.
PII in Training Data: Verify that Personally Identifiable Information has been properly anonymized or removed from training datasets. Models trained on PII can memorize and regurgitate personal information, creating GDPR and CCPA liability.

3. MLOps Maturity

MLOps—the discipline of operationalizing machine learning—determines whether the target can reliably retrain, deploy, and monitor models at scale.

Automated Retraining Pipelines: Are model retraining pipelines fully automated (Kubeflow, SageMaker Pipelines, Vertex AI Pipelines) or does retraining require manual intervention from a data scientist? Manual retraining doesn't scale and creates key-person dependency on the ML team.
Feature Store: Does the target maintain a feature store (Feast, Tecton, Hopsworks) that ensures consistent feature computation between training and inference? Training-serving skew—where features are calculated differently in training versus production—is a silent killer of model accuracy.
Model Serving Infrastructure: How are models deployed for inference? Purpose-built serving infrastructure (TorchServe, TensorFlow Serving, Triton Inference Server) with proper scaling, batching, and health monitoring indicates operational maturity. Models running as Python scripts inside web application processes are not production-grade.

4. Model Drift Monitoring & Performance Tracking

ML models degrade over time as the real world drifts from the distribution the model was trained on. Without monitoring, degradation is invisible until customers complain.

Data Drift Detection: Does the target monitor for statistical drift between the training data distribution and live inference data? Tools like Evidently AI, WhyLabs, or custom monitoring dashboards should track feature distributions and alert on significant shifts.
Model Performance Monitoring: Are production model metrics (accuracy, precision, recall, latency) tracked in real-time? What are the defined thresholds for triggering retraining? A model deployed without performance monitoring is a depreciating asset with no visibility into its rate of depreciation.
Feedback Loop Implementation: How does the target incorporate real-world outcomes back into model improvement? Closed feedback loops—where production predictions are validated against actual outcomes and fed back into retraining—are the hallmark of mature ML systems.

5. Compute Cost Projection

ML compute costs can scale non-linearly with data volume and model complexity. Acquirers must model infrastructure costs as a function of growth to validate margin assumptions.

Training Compute Costs: Request the GPU/TPU costs for a full model retraining cycle. How frequently does retraining occur? If the target retrains weekly on GPU clusters costing $50K per run, the annual training compute budget alone is $2.6M—a material expense that must be reflected in the financial model.
Inference Cost per Prediction: Calculate the cost per inference (API call). As usage scales, does inference cost scale linearly, or has the target optimized with model distillation, quantization, or edge deployment? Linear cost scaling on a usage-based pricing model creates margin compression at scale.
GPU Procurement Strategy: Is the target reliant on on-demand GPU instances (AWS p4d, GCP A100), or do they maintain reserved capacity or on-premise GPU clusters? On-demand GPU pricing is 3-5x more expensive than reserved commitments. The procurement strategy directly impacts gross margin sustainability.

6. IP Ownership of Training Data & Models

The intersection of AI and intellectual property law is rapidly evolving. Acquirers must assess IP defensibility with heightened scrutiny.

Training Data Licensing: Does the target have clear commercial licenses for all third-party data used in model training? Data acquired under academic, non-commercial, or research licenses cannot legally underpin a commercial product. This is a representation and warranty issue that must be explicitly addressed in the purchase agreement.
Model IP Ownership: If the target fine-tuned a foundation model (GPT, Llama, Claude), review the foundation model's license terms. Some licenses restrict commercial use of derivative models or require attribution. The target's "proprietary AI" may be legally constrained by upstream licensing terms.
Employee IP Assignments: Verify that all ML engineers and data scientists have signed IP assignment agreements. Models and training methodologies developed by employees without clear IP assignments create ownership ambiguity that complicates post-close IP consolidation.

Separating AI Signal from AI Noise

In the current market, "AI-powered" has become a ubiquitous marketing claim that demands rigorous verification. Acquirers paying AI multiples for businesses running glorified if-else statements are overpaying for a narrative, not a technology moat.

badcop.tech cuts through AI theater by algorithmically interrogating engineering and ML leadership on model architecture, data provenance, MLOps practices, and compute economics. The platform generates a quantitative AI maturity score that enables acquirers to distinguish genuine ML capabilities from marketing narratives—and to price AI risk appropriately in the deal model.