ai

Beyond SEO - Part 2

Chris Eaton & Eric Thanenthiran·24 May 2026·8 min read

Part 2 of 3 in our series, Beyond SEO: Making Your Products Legible to AI. The first post made the case for why AI shopping is a real channel that reads your business differently. This one gets specific about what "AI-ready data" looks like in practice, and why most catalogues fall short.

What "AI-ready data" actually looks like

AI shopping bypasses your page in two distinct ways. The first is direct feed ingestion. Merchants push structured catalogue data to AI platforms. OpenAI accepts feeds delivered via SFTP in formats including Parquet (preferred), JSONL, CSV, and TSV. Google's Shopping Graph ingests Merchant Center feeds. Your data is pushed to these systems and indexed there. The AI never visits your page.

The second is structured live querying via MCP (Model Context Protocol). Shopify's Storefront MCP gives every store a public endpoint at https://{shop}.myshopify.com/api/mcp where AI agents can call tools like search_shop_catalog, get_cart, and update_cart to actually shop the store on a customer's behalf. This is not a feed. It is an API designed for agents to query and transact in real time.

In both cases, what matters most is how much of your product data is structured. Descriptions still help, but they cannot compensate for missing or unstructured attributes.

Two mechanisms, one underlying need

It is worth being precise about what each platform expects. OpenAI's current ChatGPT feed schema is relatively minimal. Title, description, price, availability, identifiers (GTIN, UPC, MPN), categories, media, and free-form variant options like colour or size. Google Merchant Center is far richer. Dozens of typed fields covering colour, size, size system, gender, age group, material, audience attributes, and category-specific requirements. For apparel, several of these are mandatory. Schema.org and JSON-LD on your product pages add another layer, with dedicated types for sizing, nutrition, certifications, reviews, and arbitrary additional properties.

The principle across all of them is the same. The more of your catalogue that lives in typed fields rather than free-text descriptions, the better your products tend to match queries and rank against alternatives. Going beyond the bare minimum is where structured data genuinely rewards investment.

Copy still does work. Feed-ingesting platforms use your description to generate summaries, refine titles, and provide ranking signal. MCP tools like search_shop_catalog return the description alongside structured fields, so an agent can read it. But for the part of the work that determines whether your product even surfaces against a query (filtering and matching against constraints like size, price, material, and availability), structured fields are what the systems actually use. Copy is supporting evidence. It is not the primary filter.

Both channels reward the same underlying discipline. Typed attributes, controlled vocabularies, and complete variant-level data.

A short note on terms. Typed attributes are values with a defined data type (a number, boolean, date, or quantity with units) rather than free text. "Weight: 240g" as a string is unfilterable. weight: { value: 240, unitCode: "GRM" }, where value is a number and GRM is the UN/CEFACT code for grams, can be sorted, compared, and converted by any system. Controlled vocabularies are defined finite lists of allowed values. "Hi-Vis Lemon Zest" as a colour might be evocative marketing copy, but agents can struggle to reliably match it to a customer asking for "yellow." Pairing it with a colour family from a defined list (Yellow, Blue, Red, and so on) makes the field queryable. The gold standard is both together. size: { name: "M", sizeSystem: "WearableSizeSystemUK", sizeGroup: "WearableSizeGroupRegular" } is typed and drawn from controlled vocabularies, making it unambiguous across stores, regions, and agents. Google Merchant Center, Schema.org, and Shopify's standard metafields all ship predefined controlled vocabularies for the most common attributes.

Where catalogues commonly fall short

The same patterns tend to appear across catalogues we look at.

Critical attributes live only in product description. Material composition, dimensions, compatibility, care instructions, certifications, dietary information, country of origin. These often sit in description bullet points or paragraphs which can be invisible to feeds and harder to extract reliably for agents.

Attributes that are structured are not typed properly. "Weight: 240g" as a free-text string is not filterable. weight: { value: 240, unitCode: "GRM" } is. The same applies to dimensions, capacity, wattage, and other physical properties. AI shopping systems increasingly expect typed numbers with units, not strings.

Free-text where controlled vocabularies are needed. "Hi-Vis Lemon Zest" as a colour is not easily filterable. A primary colour family from a defined list, alongside the marketing name, is. The same applies to materials, sizes, sustainability claims, and other categorical attributes. If a field accepts anything anyone types, it is noise. If it has to be one of a defined set of values, it becomes a queryable facet.

Per-variant data can be thin. Many catalogues have rich product-level data and less at the variant level. But "in stock in size M" is a variant fact, not a product fact. AI agents need depth at the variant level too. Sizes, GTINs, prices, availability, sometimes per-variant images and dimensions.

Identifiers are missing or inconsistent. GTIN (the universal product number behind every barcode) is among the strongest signals that your product is the same as someone else's listing of it. A GTIN helps AI systems aggregate reviews, compare prices, and confidently match your product. Most stores have the field. Shopify exposes it as the variant "Barcode". But coverage is often patchy. MPN (manufacturer part number) is rarer still. Both Google Merchant Center and OpenAI's feed accept these identifiers, and they materially affect product matching across platforms.

Absence claims are unsupported. "No artificial sweeteners," "no leather," "vegan-friendly". Agents can verify these more confidently when they are typed claims. Google Merchant Center's product_detail field can carry attribute-value pairs like Vegetarian:True, which is far more useful than the same claim in marketing copy. Without typed ingredient or material lists, agents can search for the presence of things but struggle to confirm the absence of things. This is one of the harder gaps to close.

Content beyond products has the same shape of problem. It is not just product data. FAQ pages, shipping policies, returns information, sizing guides, ingredient explainers. These are often pulled in by AI assistants when answering customer questions, and they tend to be more quotable when structured (FAQ schema, dated, atomic claims). Many stores treat these as low-priority CMS pages.

The cumulative effect: a typical catalogue today fails some of the constraints in a realistic AI shopping query. Either the structured data is not there, or it is locked in copy the agent cannot reliably extract. Closing those gaps is the work.

Shopify setting the pace

Shopify has been bold in shifting to a new AI paradigm because so many ecommerce stores run on it, and because the platform has been moving quickly to support agentic commerce.

Out of the box, Shopify's modern themes inject JSON-LD product schema using a built-in Liquid filter:

<script type="application/ld+json">
  {{ product | structured_data }}
</script>

That filter emits the basics. Name, description, price, availability, brand, image, URL. Enough to qualify for Google rich results. But its output is fixed, and even when richer data exists on your products, the filter does not render it. GTIN is not emitted in the default output, even though it lives in the variant Barcode field. MPN is not there, because Shopify currently has no native MPN field. Reviews appear only via apps. Shipping details, return policy, materials, certifications, dietary information. None of these are in the default output. To include them, the snippet needs to be extended or replaced.

Then there is the data itself. Shopify supports metafields, a flexible system for storing custom typed attributes against products and variants. Used well, they function as a built-in PIM (Product Information Management system). Three things commonly go wrong.

First, metafields are not defined. The data simply does not exist anywhere structured.

Second, when metafields exist, they are not exposed. Metafields are not automatically rendered in JSON-LD, included in the Google Merchant Center feed, or made queryable via the Storefront API. Each channel needs configuration.

Third, the data is not backfilled at scale. Defining a "material composition" metafield is straightforward. Populating it accurately for hundreds of SKUs typically requires a pipeline (integrating supplier data, manufacturer feeds, or PIM exports) that most teams have not built.

Shopify itself has been investing heavily here. The platform now offers a public Storefront MCP server for every store, co-developed the Universal Commerce Protocol with Google, and ships Shopify Catalog as the structured data system that powers AI distribution. As of March 2026, eligible Shopify merchants are activated by default into Agentic Storefronts, which makes products discoverable through ChatGPT, Microsoft Copilot, AI Mode in Google Search, and the Gemini app. The consistent message from Shopify and from analysts looking at the data is that the bottleneck is rarely the integration. It is product data quality. If your attributes are thin, the integrations do not help.

A typical Shopify-AI gap looks like this. Data missing entirely, or data present in metafields but invisible to the channels an AI consumer uses. Both are fixable, and the fix is engineering work rather than a marketing tweak.

The third post in this series covers the channels that matter and the actual engineering process to close the gaps.

aiengineering