Research February 23, 2026 • 18 min read

Towards a Commerce AI Readiness Framework: Four Dimensions for Measuring Agentic Accessibility_

The first structured definition of commerce AI readiness a four-dimension scoring model built from first principles while developing RetrieveAI, an AI retrieval and visibility audit platform.

Akshay Dahiya

Growth & MarTech Specialist

The emergence of AI agents as autonomous purchasers rather than passive research assistants creates an urgent and largely unmeasured infrastructure problem for e-commerce operators. While the AI visibility research community has developed sophisticated frameworks for measuring brand citation rates, no existing tool or framework addresses the upstream question: can an AI agent actually execute a transaction on a given website?

This paper presents the Commerce AI Readiness Framework (CARF), a four-dimension scoring model developed through the construction of RetrieveAI, an AI retrieval and visibility audit platform. The four dimensions MCP Compatibility, API Readiness, Tool-Call Compatibility, and Inventory Simulation each weighted at 25%, produce a composite Commerce AI Readiness Score (CARS) between 0 and 100.

This paper covers:

Why existing AI visibility frameworks are insufficient for agentic commerce
The four CARF dimensions and scoring logic
Weighting rationale why equal 25/25/25/25 distribution
Grade thresholds and benchmark scoring
How the framework is implemented inside RetrieveAI's audit pipeline
Competitive positioning vs 80+ existing tools

1. The Agentic Commerce Inflection

On November 25, 2024, Anthropic released the Model Context Protocol (MCP) an open standard for connecting AI systems to external data and tools.^[1] Within sixteen months, MCP had accumulated over 10,000 active public servers, 97 million monthly SDK downloads, and adoption by OpenAI, Google DeepMind, Microsoft, and every major AI infrastructure provider.^[3]

On March 26, 2025, OpenAI's Sam Altman stated: "People love MCP and we are excited to add support across our products."^[4] Shopify shipped four MCP servers making every one of its 5.6 million merchant stores natively queryable by AI agents.^[6] OpenAI and Stripe co-developed the Agentic Commerce Protocol (ACP), enabling purchase transactions directly inside ChatGPT.^[7] Google released the Universal Commerce Protocol (UCP) in partnership with Shopify, Etsy, Wayfair, Target, and Walmart.^[8]

$15T

B2B purchases mediated by AI agents by 2028

Gartner, Oct 2025

$5T

Global agentic commerce by 2030

McKinsey, Oct 2025

25%

Of US e-commerce via AI agents by 2030

Bain & Co., Dec 2025

4,700%

YoY growth in AI-driven traffic to US retail sites

Adobe Analytics, Jul 2025

These are not projections about a distant future. During Cyber Week 2025, AI and agents drove $67 billion in sales, influencing 20% of all purchases.^[12] Nearly 60% of Americans now use generative AI for online shopping.^[17]

In this context, we identified a critical gap while building RetrieveAI: no metric existed to measure whether a website was actually accessible to an AI agent attempting a transaction. The entire AI visibility research community measures LLM outputs what AI systems say about brands. Nobody measures the inputs whether an AI agent can find, parse, understand, and act on a product.

2. Why Existing Frameworks Are Insufficient

The state of AI readiness measurement in e-commerce can be summarised as follows: frameworks exist either at the infrastructure layer (can AI agents technically connect?) or the visibility layer (does AI mention my brand?), but none bridge the two with a unified commerce-specific scoring model.

The GEO research community including the foundational Princeton/IIT Delhi paper "GEO: Generative Engine Optimization"^[21] focuses on optimising for LLM citation frequency in response outputs. Tools like Profound track share of voice across 10+ AI platforms. AthenaHQ measures entity signal density. Otterly scores 25+ on-page factors. None contain a dimension for MCP compatibility, tool-call response structure, or inventory data completeness.

When building the commerce phase of RetrieveAI's 17-phase audit pipeline, we found no existing scoring rubric that answered the question an AI agent actually asks: "Is this product retrievable, parseable, priced, and purchasable through a machine interface?" The Commerce AI Readiness Framework was built to answer that question.

3. The Commerce AI Readiness Framework (CARF)

CARF comprises four dimensions, each contributing 25% to a composite Commerce AI Readiness Score (CARS) between 0 and 100. The equal weighting reflects a deliberate design choice: all four dimensions are necessary conditions for full agentic accessibility, and weakness in any single dimension substantially degrades agent performance regardless of the others.

CARF Dimension Weighting Commerce AI Readiness Score (CARS)

D1 · MCP Compatibility 25%

D2 · API Readiness 25%

D3 · Tool-Call Compatibility 25%

D4 · Inventory Simulation 25%

CARS = (D1 + D2 + D3 + D4) / 4 · Each dimension scored 0–100 · Composite 0–100

Dimension 1: MCP Compatibility (25%)

Measures whether the domain exposes a functional Model Context Protocol endpoint that AI agents can use to discover and retrieve structured product data. MCP is the emerging universal standard for AI-to-data connectivity 10,000+ public servers and adopted by every major AI platform as of December 2025.^[3]

High scores require: a discoverable MCP server endpoint, support for product listing and detail tool calls, valid JSON-RPC 2.0 response structure, and schema.org/Product entity alignment in tool responses. Shopify confirmed all 5.6M+ stores expose MCP endpoints by default since Summer 2025.^[6] Outside Shopify, MCP adoption in e-commerce remains near zero making this dimension the single most discriminating signal between agentic-ready and agentic-blind operators.

Scoring signals

MCP endpoint present Tool discovery valid JSON-RPC 2.0 structure Product tool callable Schema.org alignment

Dimension 2: API Readiness (25%)

Measures whether the site exposes machine-readable product data endpoints that AI agents can programmatically query without MCP the pre-MCP layer of agentic accessibility. A site without MCP can still partially serve AI agents through RESTful or GraphQL product APIs.

High scores require: at least one publicly accessible product data endpoint (e.g. /products.json, /api/products, /graphql), JSON response format with structured product objects, and endpoint density above a minimum threshold. OpenAPI/Swagger specification presence is scored as a strong positive signal. RetrieveAI's API detection phase (Phase 4a) extracts endpoints from JavaScript bundles, network request patterns, link-rel headers, and sitemap metadata.

Scoring signals

Product endpoints detected JSON response format OpenAPI spec present Endpoint density score GraphQL introspection

Dimension 3: Tool-Call Compatibility (25%)

Measures the structural quality of data returned in tool-call responses the dimension most frequently overlooked by existing readiness frameworks. An AI agent that successfully calls a product endpoint but receives malformed, incomplete, or poorly typed data cannot make a reliable purchase decision.

High scores require: consistent parameter typing (string/number/boolean without mixed types), enumerated value sets for categorical fields, complete price fields with currency codes, product identifiers (SKU/GTIN/MPN) present and parseable, and response latency below the agent timeout threshold. Academic research confirms that LLMs grounded in well-structured data achieve 300% higher task accuracy compared to those relying on unstructured sources.^[20]

Scoring signals

Consistent param typing Enumerated categoricals Price + currency complete SKU/GTIN present Response latency

Dimension 4: Inventory Simulation (25%)

Measures whether real-time inventory state is machine-accessible the final link in the agentic purchase chain. An AI agent comparing products needs not just price and description, but current availability, variant-level stock counts, and estimated shipping windows. Without this, agent recommendations are potentially stale at transaction time.

High scores require: schema.org/Offer with availability property present and up-to-date, variant-level inventory data accessible, and a cart or checkout simulation endpoint that confirms actual purchase feasibility. Sites where availability signals are embedded only in JavaScript-rendered state invisible to non-headless agents receive significantly penalised scores. RetrieveAI's rendering gap audit (Phase 3.5) specifically detects this.

Scoring signals

Schema.org/Offer present Availability property Variant-level stock Cart/checkout endpoint JS-independent state

4. Weighting Rationale: Why Equal Distribution?

The equal 25/25/25/25 weighting requires explicit justification, as alternative schemes are plausible one might argue MCP compatibility should carry greater weight given its status as the emerging universal standard.

We arrived at equal weighting through three arguments tested against RetrieveAI's pipeline behaviour:

The complementarity argument: The four dimensions are not substitutes but complements. A score of 100 on MCP Compatibility provides no commercial benefit if Inventory Simulation scores 0 the agent can discover products it cannot verify as available. This interdependence argues against any single dimension dominating the composite.

The temporal stability argument: MCP is the current dominant standard, but OpenAI's ACP, Google's UCP, and Anthropic's MCP are all gaining commercial adoption simultaneously. An overweight on MCP today risks penalising sites that implement ACP or UCP equally well.

The pipeline argument: In RetrieveAI's implementation, each dimension corresponds to a distinct data collection phase. Equal weighting decouples the scoring model from temporal market conditions, making the framework more stable as a long-term benchmark.

5. Scoring Grades and Benchmark Thresholds

CARF produces a composite CARS between 0 and 100. The following grade thresholds are derived from the score distribution produced by RetrieveAI's commerce phase:

CARS Grade Thresholds

A · Excellent 85–100 Full agentic accessibility. MCP functional, API complete, tool-call responses well-structured, inventory machine-readable. Ready for autonomous AI agent transactions.

B · Good 70–84 Substantially agent-accessible. One or two dimension gaps typically MCP absent or inventory partially JS-dependent. Agents can evaluate products but may face friction at transaction.

C · Adequate 50–69 Partial agent accessibility. Functional APIs with poor data quality, or strong schema but no MCP or tool-call surface. Agents can retrieve information but cannot execute transactions reliably.

D · Poor 25–49 Minimal agent accessibility. No MCP, no structured API, availability JS-dependent. Agents can read product names but cannot act programmatically.

F · Invisible 0–24 Effectively agent-invisible. No structured surfaces, no machine-readable endpoints, heavily JS-dependent. AI agents retrieve near-zero actionable product data.

6. What Existing Tools Miss

Mapping CARF against the most prominent AI visibility tools as of April 2026 reveals a consistent pattern: every existing tool either monitors LLM outputs or audits content quality, but none score all four dimensions of commerce agent accessibility.

Tool	MCP Score	API Ready	Tool-Call	Inventory
RetrieveAI CARF	✓ Full	✓ Full	✓ Full	✓ Full
Profound	✗	✗	✗	✗
AthenaHQ	✗	✗	~ Partial	✗
Conductor	~ Outbound only	~ Limited	✗	✗
LLMClicks.ai	✗	~ Checklist	✗	✗
Goodie AI	✗	✗	✗	~ Schema only

The comparison reveals that no existing commercial tool implements all four CARF dimensions. Conductor is the closest having built MCP server infrastructure but this enables Conductor's platform to connect to AI tools, not to score a website's readiness to serve AI agents. These are fundamentally different problems.

7. The Structured Data–Agent Performance Relationship

A central theoretical assumption of CARF is that structured, machine-readable product data improves AI agent task performance. This assumption has strong empirical support from multiple research streams.

Microsoft Research's "Table Meets LLM" study (WSDM '24, 91 citations) found that HTML-formatted data outperformed CSV/TSV by 6.76% on structured task benchmarks demonstrating that format and structure meaningfully affect agent performance.^[24] Research from data.world found that LLMs grounded in knowledge graphs achieve 300% higher accuracy versus unstructured text.^[20]

Microsoft's Bing/Copilot team explicitly confirmed at SMX Munich in March 2025 that "schema markup helps Microsoft's LLMs understand content" and that freshness of structured data is specifically valued.^[16] AccuraCast's analysis of 9,000 AI citation sources found that 81% of AI-cited pages include schema markup.^[14]

An AI agent that can discover a product via MCP but cannot parse its pricing structure due to poor tool-call response formatting is no more useful than one that cannot connect at all. Commerce AI readiness is a chain; its strength is determined by the weakest link.

8. Implementation in RetrieveAI

The Commerce AI Readiness Framework is fully implemented in RetrieveAI's audit pipeline as Phase 19, following 16 prior phases that establish the content, entity, and structural context required for accurate commerce scoring.

Phase 3 (Headless Crawl) Playwright-based rendering that executes JavaScript to surface dynamically loaded product data, cart states, and API calls invisible to lightweight crawlers. Essential for scoring Dimension 4's JS-independence signal.
Phase 3.5 (Rendering Gap Audit) Explicit comparison between raw HTML and rendered content. Sites with high rendering gaps on product data receive Dimension 4 penalties regardless of schema quality.
Phase 4a (API Detection) Extraction of API endpoints from JavaScript bundles, network patterns, and sitemap metadata. Feeds Dimension 2 scoring.
Phase 4b (Commerce Data Extraction) Extraction of product schema (JSON-LD and microdata), offer data, pricing, availability, SKU, and variant structure. Feeds Dimensions 3 and 4.
Phase 19 (Commerce Audit) Assembly of the four CARF dimension scores into a weighted composite CARS, alongside dimension-level breakdowns and failed service flags.

Within RetrieveAI's overall AI Visibility Score, the CARS contributes 10% (alongside Entity Strength at 45% and Prompt Coverage at 45%). Commerce readiness is treated as a necessary but insufficient condition for AI visibility reflecting the current reality that most AI visibility use cases remain informational rather than transactional. As agentic commerce grows, this weighting will be revisited.

Key Takeaways

No existing AI visibility tool implements all four dimensions of commerce agent accessibility
MCP compatibility is the single most discriminating signal near zero adoption outside Shopify
Tool-call response quality is distinct from endpoint presence and has significant practical consequences
JavaScript-dependent inventory data is effectively invisible to non-headless AI agents
Equal 25/25/25/25 weighting reflects complementarity weakness in any one dimension degrades the whole chain
McKinsey projects $3–5T global agentic commerce by 2030; CARF provides the first systematic tool for measuring readiness

Conclusion

The question is no longer whether AI agents will purchase on behalf of consumers. The question is whether your store can be found, understood, and transacted with when they do.

The Commerce AI Readiness Framework (CARF) presents the first structured scoring model for measuring e-commerce accessibility to autonomous AI agents built through the implementation of RetrieveAI. The framework makes three original contributions: defining MCP compatibility as a first-class readiness dimension, distinguishing tool-call response quality from endpoint presence, and embedding commerce scoring within a multi-phase audit that accounts for JavaScript-dependent content invisible to non-headless agents.

A brand that scores an F on CARS in 2026 will be structurally excluded from the agentic commerce channel as it grows toward Bain's projected 25% of e-commerce. The cost of that exclusion compounds annually.

References

[1] Anthropic. "Introducing the Model Context Protocol." November 25, 2024. anthropic.com/news/model-context-protocol
[3] Anthropic. "Donating MCP and establishing the Agentic AI Foundation." December 9, 2025. anthropic.com
[4] Altman, S. (OpenAI). Quoted in TechCrunch, March 26, 2025.
[6] Shopify. "About Storefront MCP." shopify.dev/docs/apps/build/storefront-mcp
[7] Stripe / OpenAI. "Agentic Commerce Protocol." September 2025. stripe.com
[8] Google Developers Blog. "Universal Commerce Protocol (UCP)." developers.googleblog.com
[9] Gartner. "60% of Brands Will Use Agentic AI." January 15, 2026. gartner.com
[10] Gartner. "Top Predictions 2026." October 21, 2025. gartner.com
[11] Bain & Company (via Digital Commerce 360). "Agentic AI: 25% of US e-commerce by 2030." December 2025.
[12] Salesforce. "Cyber Week 2025." December 5, 2025. salesforce.com
[13] McKinsey. "The Agentic Commerce Opportunity." October 17, 2025. mckinsey.com
[14] AccuraCast. "Does Schema Markup Increase Generative Search Visibility?" December 2025. accuracast.com
[16] Canel, F. (Microsoft). SMX Munich, March 2025. searchengineland.com
[17] Omnisend Survey. "60% of Americans Use Gen AI for Shopping." July 2025. prnewswire.com
[20] data.world benchmark. "LLMs + knowledge graphs: 300% higher accuracy." 2023.
[21] Aggarwal et al. "GEO: Generative Engine Optimization." Princeton/IIT Delhi. arXiv:2311.09735.
[24] Sui et al. (Microsoft Research). "Table Meets LLM." WSDM '24. arXiv:2305.13062.

Akshay Dahiya

Growth & MarTech Specialist

Digital marketing professional with 7+ years of experience in SEO, analytics, and marketing automation. Currently building RetrieveAI, MarAI, and RankScan tools that solve real problems I've run into working in growth and search.

SEO

February 5, 2026 • 14 min read

Programmatic GEO Pages at Scale Using Search Intent Clustering

How to build scalable local SEO pages using AI validation and intent clustering without

Analytics

February 1, 2026 • 15 min read

Measuring Visibility Loss to AI Answers Using Click-Through Suppression Models

How to quantify traffic loss caused by AI-generated answers and build CFO-level

SEO

January 27, 2026 • 13 min read

Engineering Content for Citation by AI Search Engines

How to optimize content for AI retrieval and citation, not just rankings—turning your pages into trusted sources

Towards a Commerce AI Readiness Framework: Four Dimensions for Measuring Agentic Accessibility_

This paper covers:

1. The Agentic Commerce Inflection

2. Why Existing Frameworks Are Insufficient

3. The Commerce AI Readiness Framework (CARF)

Dimension 1: MCP Compatibility (25%)

Dimension 2: API Readiness (25%)

Dimension 3: Tool-Call Compatibility (25%)

Dimension 4: Inventory Simulation (25%)

4. Weighting Rationale: Why Equal Distribution?

5. Scoring Grades and Benchmark Thresholds

6. What Existing Tools Miss

7. The Structured Data–Agent Performance Relationship

8. Implementation in RetrieveAI

Key Takeaways

Conclusion

References

Akshay Dahiya

Related Posts

Programmatic GEO Pages at Scale Using Search Intent Clustering

Measuring Visibility Loss to AI Answers Using Click-Through Suppression Models

Engineering Content for Citation by AI Search Engines