I’ve been having a blast vibecoding on top of Shopify UCP and the Catalog APIs. I’ve been building a bunch of little prototypes against both the global catalog and individual merchant catalogs, and it is honestly a little wild that these APIs exist and are this easy to build on.

That is part of why this edge case stuck with me. The surface is powerful enough to unlock a whole new class of commerce experiences, so the boring reliability details suddenly feel a lot more important. Agent-facing APIs invite automated clients by design, and today’s agents are pretty good at working through errors, but often in subtly incorrect ways. A muddy error can send an agent down the wrong path, cause it to make a bad assumption, or encourage it to paper over a backend failure instead of digging into what actually happened.

This came out of poking at Shopify’s newly announced UCP / agentic commerce surface. The launch was publicly discussed by Shopify, including Ilya Grigorik’s announcement post on X:

Disclosure

Before publishing this note, I shared the behavior with Shopify in two ways:

  1. I reported it through Shopify’s security bug bounty program.
  2. I also disclosed it privately via DMs.

Shopify reviewed the bug bounty submission and closed it as not a valid security vulnerability.

That is fine. I am not publishing this as a confirmed vulnerability, exploit, or proof of practical denial of service. I’m publishing it as a small reliability and API-hardening observation around a new agentic commerce surface.

Minimal reproduction

Single request observed against a Shopify storefront UCP MCP endpoint:

curl -sS -w '\nHTTP_STATUS=%{http_code}\nTIME_TOTAL=%{time_total}\n' \
  -X POST 'https://fnova.myshopify.com/api/ucp/mcp' \
  -H 'content-type: application/json' \
  -H 'accept: application/json' \
  --data @fnova-dress-limit-200.json

Request body:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "id": "fnova-dress-limit-200-repro",
  "params": {
    "name": "search_catalog",
    "arguments": {
      "meta": {
        "ucp-agent": {
          "profile": "https://shopify.dev/ucp/agent-profiles/2026-04-08/valid-with-capabilities.json"
        }
      },
      "catalog": {
        "query": "dress",
        "context": {
          "address_country": "US",
          "currency": "USD",
          "intent": "Shopping this merchant's catalog for products matching the buyer query."
        },
        "pagination": {
          "limit": 200
        }
      }
    }
  }
}

Observed response:

HTTP 500
~3.07s total time

Internal error calling tool search_catalog:
3024: Query execution was interrupted, maximum statement execution time exceeded (trilogy_read_row)

Why it caught my attention

There are two separate issues here.

First, the API returns a raw-ish infrastructure flavored error. trilogy_read_row looks like an implementation detail escaping through the API boundary.

Second, a cheap client request appears to drive backend database work until a statement timeout interrupts it.

That does not automatically mean “DoS.” It does mean the API may be letting expensive query shapes get too far downstream before rejecting them.

Even if there is no security impact here, the baseline behavior is still surprising. A public UCP API can be made to return an improperly handled 500 with a simple request, and the error appears to expose database-layer implementation detail. For a newly promoted agentic commerce API, that is worth fixing even if it never becomes more than a reliability bug.

An agent-specific wrinkle

One agent-specific wrinkle: my coding agent did not immediately know what to do with this.

It saw a valid-looking UCP tool call return a JSON-RPC internal error and initially treated it like an ordinary tool failure to route around. The database timeout detail was not part of the UCP contract, and it took human steering to frame it correctly: Shopify was returning the 500, the timeout was server-side, it was reproducible, and a database implementation detail appeared to be leaking through the API boundary.

The practical issue was not just recognizing the error. I also had to steer the agent toward bisecting the request shape that triggered it. In this case, the failure was sensitive to the requested pagination limit, but the “safe” limit was not a fixed number across shops or queries. Some requests worked at higher limits, while others failed at lower ones depending on the merchant catalog and query.

That matters when you are prototyping on top of an agent-first API. A human can look at this and say, “back off the limit, binary search the working range, cache the result, and treat this as a server-side guardrail.” An agent may just see a transient tool failure, retry the same shape, or add brittle one-off handling.

In my prototypes, the workaround became programmatic: lower the requested limit when the Storefront UCP endpoint returned this timeout-shaped 500, retry with a smaller page size, and avoid assuming that one globally safe limit exists.

That is a new-ish wrinkle for agent-facing APIs. Error payloads are not just for humans reading curl output anymore. Agents will consume them and decide whether to retry, fan out, back off, degrade gracefully, or report a bug. Muddy error boundaries can produce muddy agent behavior.

Napkin math

The simplest way I know to reason about this is thread-seconds, not requests.

If a query reliably reaches MySQL and runs until a 3 second statement timeout, then each request can create up to 3 seconds of backend database wall time.

1 request * 3 seconds = 3 MySQL thread-seconds

That gives a rough saturation model:

rps_to_keep_busy ~= effective_expensive_query_concurrency / 3
Effective expensive-query slots Approx RPS to keep occupied Requests over 30 minutes Backend thread-time over 30 minutes
32 10.7 rps 19,200 16 thread-hours
64 21.3 rps 38,400 32 thread-hours
128 42.7 rps 76,800 64 thread-hours
256 85.3 rps 153,600 128 thread-hours
512 170.7 rps 307,200 256 thread-hours

That table is intentionally crude. It does not mean this amount of work would all hit one database, one shard, one replica, or one resource pool. Shopify’s public engineering writing makes it clear this is not a small database setup. They have written about hundreds of MySQL shards, writers with five or more replicas, thousands of database VMs, KateSQL on GKE, and large MySQL instances. One KateSQL post mentions an 80GB InnoDB buffer pool during debugging.

It is also probably the wrong abstraction in several ways. Effective expensive-query concurrency is not the same thing as CPU cores, MySQL connections, service-level concurrency, or total database fleet capacity. The real limiting factor could be CPU, IO, a connection pool, per-tenant throttles, query admission control, or circuit breakers. I am also rusty on MySQL configuration and internals, so treat this as naive outside-in reasoning, not a confident model of Shopify’s infrastructure.

I also assume Shopify has other load shedding, circuit breakers, throttles, and isolation mechanisms that would prevent this from turning into a practical DoS. The interesting part, at least to me, is the operational hint exposed by the error: somewhere on this path, Shopify appears to have a roughly 3-second MySQL statement execution limit for this class of read query.

That kind of detail is useful to defenders, but it also gives curious outsiders a new thing to reason about. Is the limit specific to Storefront UCP? Is it shared across other catalog read paths? Does it apply per replica pool, per tenant, per service, or per query class? Are there query shapes that fail fast, and others that reliably burn almost the full timeout window?

I do not think the public evidence is enough to call this a security vulnerability. My read is simpler: this is not a smoking gun, but it is a weird edge in a shiny new API surface. Even a boring timeout value can become a small map of how the backend behaves under stress.

Agentic APIs invite automated clients by design. That makes boring reliability controls matter a lot: clean errors, bounded query shapes, early rejection, rate limiting, tenant isolation, and careful resource accounting.

If anyone with better MySQL internals knowledge has a cleaner way to reason about this, or can point out where this model is completely wrong, I would be interested. The useful answer may just be “this is messy error handling and nothing more,” but I would like to understand the right mental model.

References