AI/MLSystem DesignRecommendations

Stop Retraining Your Recommendation System

January 26, 20266 min read

In many recommendation platforms, the business strategy (e.g., upsell, cross-sell, bundling, margin optimization) is tightly coupled with model behavior. Changing the strategy often means re-training models, re-tuning pipelines, or redeploying logic-heavy services. This slows experimentation and makes AI systems expensive to evolve.

This article explores a different approach:

A policy-driven architecture where recommendation strategies are injected at runtime, and LLMs act as decision-layer controllers rather than just generators.

The Core Problem

Traditional recommender systems often entangle:

Model behavior
Business strategy
Ranking logic

For example, a system trained for cross-sell might not adapt well to an upsell objective without retraining or heavy rule modifications.

This creates three issues:

Strategy changes are slow
Business teams cannot experiment easily
AI systems become rigid instead of adaptive

What we need instead is:

A system where strategy is a runtime concept, not something baked into model weights.

Key Idea: Separate Policy from Prediction

Think of the system in two planes:

Plane	Responsibility
Data Plane	Candidate generation & scoring
Control Plane	Decision policy & strategy reasoning

LLMs operate in the control plane, defining how the system should behave, while retrieval systems operate in the data plane, executing those instructions.

This allows recommendation systems to shift from:

Model-driven $\rightarrow$ Policy-driven

High-Level Architecture

Layer 1 — Context Construction Layer

This layer transforms raw inputs into structured decision context.

It aggregates signals such as:

User profile data (if available)
Behavioral history
Current session state (e.g., cart contents, recent interactions)
Derived behavioral metrics (e.g., spending patterns, purchase frequency)
Dynamic budget or constraint signals
Strategy definition (e.g., upsell, cross-sell, bundle, etc.)
Intent of the interaction

Context Construction Layer

The goal of this layer is not ranking — it is context shaping.

It produces a structured representation of: "Who is this user, what is happening now, and what constraints exist?"

Layer 2 — Policy Generation Agent (LLM)

Policy Generation Agent

This is the decision brain of the system.

Instead of directly recommending items, this agent:

Interprets the strategy definition
Infers user persona and affinities
Produces a runtime decision policy

This policy can include:

Weighting between different retrieval sources such as (vector and co-occurance graph). — More details given in Layer 3 below.
Filtering constraints for the vector database.
Optimization goals (e.g., relevance, diversity, budget alignment)
Instructions for downstream ranking (For layer 4)
Semantic search guidance (e.g., key themes or attributes to focus on, it generates the vector query string)

Importantly:

This layer does not fetch items. It defines how retrieval and ranking should behave.

This turns strategy into a runtime-controlled variable, not a hardcoded rule set.

Layer 3 — Hybrid Retrieval Layer

Hybrid Retrieval Layer

This layer executes the policy.

Instead of relying on a single signal, it combines multiple candidate sources, such as:

Semantic retrieval (e.g., vector similarity over product/content representations)
Behavioral retrieval (e.g., co-occurrence or interaction graphs)

Hybrid Retrieval: Combining Semantic and Behavioral Signals

A key part of this architecture is that candidate generation does not rely on a single signal. Instead, it blends semantic understanding with behavioral relationships.

This allows the system to move beyond “items that look similar” toward “items that make sense together.”

Two Complementary Data Sources

1. Semantic Knowledge Base (Vector Retrieval) This source captures what an item is.

Products or content are represented as embeddings derived from attributes such as:

textual descriptions
metadata
features
category context

Vector search helps retrieve items that are: semantically related to the user’s current context or intent.

This enables discovery of conceptually relevant candidates.

2. Behavioral Co-Occurrence Graph This source captures how items are used together in real life.

A co-occurrence graph is constructed from historical interaction patterns where:

items frequently appear together
user behaviors reveal implicit associations
combinations reflect practical or complementary usage

Instead of relying purely on similarity, this graph models: behavioral proximity using Jaccard Similarity

It helps surface items that may not be textually similar but are contextually compatible.

Why Both Signals Matter

When used together, the system balances:

“What fits conceptually” with “What works in practice”

Strategy-Controlled Blending

The policy layer (LLM-driven) determines how these sources should be combined at runtime.

Depending on the strategy, the system may:

emphasize semantic similarity
favor behavioral relationships
enforce constraints or filters
balance exploration vs. proven combinations

This ensures that:

Retrieval behavior adapts to strategy, not just static scoring rules.

The policy from the previous layer determines:

How many candidates to fetch from each source
What filters to apply
How to weight different signals

This creates a strategy-aware candidate pool rather than a static one.

Layer 4 — Persona-Aware Ranking Agent (LLM)

Persona-Aware Ranking Agent

Once a candidate set is available, the final ranking is performed by a second reasoning layer.

This agent:

Considers user persona
Interprets the strategy objective
Aligns recommendations with contextual constraints
Applies higher-level reasoning (e.g., balance, complementarity, budget awareness)

This is not just similarity scoring — it is reasoning-based reordering.

Here, LLMs function as:

Decision optimizers, not content generators.

Final Validation Layer

Before results are returned, a deterministic validation stage can ensure:

Items are available/valid
Compliance or rule checks pass
Data integrity is preserved

This keeps the system safe and reliable.

Why This Architecture Matters

This design enables:

Faster experimentation — Strategies can change without retraining models.
Business-controlled policies — Decision logic can be adjusted via configuration and prompts rather than code.
Decoupled AI systems — Model behavior is separated from business strategy.
Safer evolution — New objectives can be tested without destabilizing core pipelines.
Multi-objective optimization — Relevance, diversity, budget alignment, and business goals can be balanced dynamically.

The Big Shift

The most important conceptual shift is this:

LLMs are not only generators — they can act as runtime decision controllers.

In this architecture, LLMs operate as:

Policy interpreters
Strategy translators
Ranking reasoners

This moves recommendation systems toward a future where:

Intelligence lies not just in models, but in adaptive decision layers.

Closing Thoughts

As AI systems become central to product experiences, the ability to change what the system optimizes for becomes as important as model accuracy.

Runtime-pluggable strategy architectures offer a path toward:

More agile AI systems
Tighter alignment with business goals
Safer and faster innovation

The future of recommender systems may not just be better models — it may be better decision architectures.