How Marqo Works

This page is for technical evaluators, AI/ML teams, and decision-makers who want to understand the architecture behind Marqo's performance. For a business-level overview, see the Introduction.

Marqo is built on a fundamentally different architecture to traditional ecommerce search. Where most platforms apply generic AI on top of a keyword search engine, Marqo is built from the ground up to optimize for business outcomes - conversion rate, revenue per session, and gross margin - at every layer of the stack.

This page explains the technology, why it matters commercially, and how it compares to alternative approaches.

The Six-Layer Stack

Most ecommerce search platforms share a common architecture: a text embedding model (often generic and shared across all customers), session-level behavioral data, and a re-ranker trained on aggregated click data. Marqo operates differently across every layer.

1. Agentic Query Understanding

Before a query reaches the embedding or ranking layers, Marqo runs it through an LLM-based query understanding pipeline:

Intent detection - distinguishing brand queries, category queries, attribute queries, and use-case queries, and routing each appropriately
Query expansion - enriching the query with semantically related terms to improve recall across the catalogue
Automated faceting - dynamically determining which facets are relevant for a given query, without requiring merchandisers to configure rules manually

This handles the long tail of search queries - which in most catalogues account for the majority of search volume - without requiring manual synonym lists, spelling corrections, or query rewrite rules.

2. Multimodal Domain-Specific Base Models

Marqo's embedding models are multimodal from the ground up - they encode images and text into the same semantic space, so a query for "floral summer dress" retrieves products based on their visual appearance, not just their text descriptions. This matters because a significant portion of ecommerce discovery is visual: shoppers often recognise what they want before they can describe it precisely.

Beyond multimodality, Marqo develops domain-specific models trained on ecommerce data for specific verticals. Marqo Fashion SigLIP, for example, is trained on fashion imagery and product data - it understands style, silhouette, colour, and pattern in ways a general-purpose model cannot. These domain-specific models form the foundation everything else builds on.

3. Per-Customer Embedding Model Finetuning, Aligned to Your Business Objectives

Generic models represent the average of their training data, not your customers or your business. Marqo finetunes its embedding models on a per-customer basis, using each retailer's own clickstream and purchase data - and critically, with that retailer's specific business objectives built into the training objective.

This means the model doesn't just learn what shoppers click - it learns what drives revenue, what products convert at the margins you care about, and what catalogue relationships matter for your specific business. If you optimise for gross margin over raw conversion, the model learns that. If certain product categories are strategic priorities, the model encodes that.

The platform manages periodic full retraining automatically. As your catalogue evolves, new products launch, and seasonal patterns shift, the models are retrained on updated data without requiring intervention from your team.

No other ecommerce search provider finetunes embedding models per-customer, let alone with customer-specific business objectives built into training. This is the deepest source of differentiation in the stack.

4. Per-Search Clickstream Tracking

Most platforms track clickstream at the session level - they record that a user clicked a product during a session but lose the context of which specific search produced that click. Marqo tracks at the individual search level: every click, add-to-cart, and purchase is attributed to the exact query that generated it.

This produces substantially higher-signal training data. The model learns not just "this product converts" but "this product converts for this specific query" - a far richer signal for both embedding finetuning and ranking optimization. This more granular data is also what enables the continuous optimization described below.

5. Multimodal Embedding-Based Personalization

Marqo's personalization layer is built on its multimodal embedding models rather than collaborative filtering or behavioural segments. Each user's interaction history is encoded as a multimodal embedding capturing their preferences across visual and semantic dimensions simultaneously: style, colour, category, price sensitivity, and brand affinity, all in a single representation.

This embedding steers search and collection results toward the individual shopper's preferences while preserving relevance to the query. A search for "trainers" returns different products for a shopper whose history suggests premium minimalist preferences versus one who consistently engages with brightly coloured performance gear - without sacrificing result quality.

Because personalization operates in the same multimodal embedding space as the search layer, it integrates into the retrieval process rather than being applied as an afterthought.

6. Continuous Conversion-Optimized Re-Ranking

The final layer uses an LLM trained to maximize conversion, revenue, and margin - not semantic similarity. It operates on the candidate set retrieved by the embedding search and re-ranks it based on predicted business outcomes for the specific user, query, and business context.

Critically, this layer updates continuously as new interaction data arrives from the Marqo pixel. There is no manual retraining cycle or scheduled batch job for ranking optimization - the system learns in near-real-time from every search and purchase, so rankings improve automatically as your business runs.

How the Layers Compound

Each layer makes every layer above it more effective:

Layer	Business Outcome
Agentic query understanding	Better candidate sets, long-tail coverage without manual rules
Domain-specific base models	Better baseline relevance for your product category
Per-customer finetuning with business objectives	Embeddings aligned to your revenue and margin goals, not a generic average
Per-search clickstream	Richer training signal for every subsequent layer
Multimodal personalization	Results pre-filtered for individual shopper preferences
Continuous re-ranking	Rankings that improve automatically with every interaction
Agentic interfaces	Conversational discovery and multi-turn shopping experiences beyond the search bar

The compounding effect is what separates Marqo from providers who match one or two layers: a system optimized end-to-end specifically for your business improves over time in ways that generic platforms cannot.

Validate With an A/B Test

Marqo supports controlled A/B testing so you can measure the impact directly against your existing solution - conversion rate, revenue per session, and other business metrics - using your real traffic, your real catalogue, and your real customers. Contact your Marqo representative to set one up.

The Six-Layer Stack​

1. Agentic Query Understanding​

2. Multimodal Domain-Specific Base Models​

3. Per-Customer Embedding Model Finetuning, Aligned to Your Business Objectives​

4. Per-Search Clickstream Tracking​

5. Multimodal Embedding-Based Personalization​

6. Continuous Conversion-Optimized Re-Ranking​

How the Layers Compound​

Validate With an A/B Test​