Back
AI/MLSystem DesignRecommendations

Stop Retraining Your Recommendation System

January 26, 20266 min read
Also on Medium

In many recommendation platforms, the business strategy (e.g., upsell, cross-sell, bundling, margin optimization) is tightly coupled with model behavior. Changing the strategy often means re-training models, re-tuning pipelines, or redeploying logic-heavy services. This slows experimentation and makes AI systems expensive to evolve.

This article explores a different approach:

A policy-driven architecture where recommendation strategies are injected at runtime, and LLMs act as decision-layer controllers rather than just generators.

The Core Problem

Traditional recommender systems often entangle:

  • Model behavior
  • Business strategy
  • Ranking logic

For example, a system trained for cross-sell might not adapt well to an upsell objective without retraining or heavy rule modifications.

This creates three issues:

  1. Strategy changes are slow
  2. Business teams cannot experiment easily
  3. AI systems become rigid instead of adaptive

What we need instead is:

A system where strategy is a runtime concept, not something baked into model weights.

Key Idea: Separate Policy from Prediction

Think of the system in two planes:

Plane Responsibility
Data Plane Candidate generation & scoring
Control Plane Decision policy & strategy reasoning

LLMs operate in the control plane, defining how the system should behave, while retrieval systems operate in the data plane, executing those instructions.

This allows recommendation systems to shift from:

Model-driven $\rightarrow$ Policy-driven

High-Level Architecture

High-Level Architecture

Layer 1 — Context Construction Layer

This layer transforms raw inputs into structured decision context.

It aggregates signals such as:

  • User profile data (if available)
  • Behavioral history
  • Current session state (e.g., cart contents, recent interactions)
  • Derived behavioral metrics (e.g., spending patterns, purchase frequency)
  • Dynamic budget or constraint signals
  • Strategy definition (e.g., upsell, cross-sell, bundle, etc.)
  • Intent of the interaction

Context Construction Layer

The goal of this layer is not ranking — it is context shaping.

It produces a structured representation of: "Who is this user, what is happening now, and what constraints exist?"

Layer 2 — Policy Generation Agent (LLM)

Policy Generation Agent

This is the decision brain of the system.

Instead of directly recommending items, this agent:

  • Interprets the strategy definition
  • Infers user persona and affinities
  • Produces a runtime decision policy

This policy can include:

  • Weighting between different retrieval sources such as (vector and co-occurance graph). — More details given in Layer 3 below.
  • Filtering constraints for the vector database.
  • Optimization goals (e.g., relevance, diversity, budget alignment)
  • Instructions for downstream ranking (For layer 4)
  • Semantic search guidance (e.g., key themes or attributes to focus on, it generates the vector query string)

Importantly:

This layer does not fetch items. It defines how retrieval and ranking should behave.

This turns strategy into a runtime-controlled variable, not a hardcoded rule set.

Layer 3 — Hybrid Retrieval Layer

Hybrid Retrieval Layer

This layer executes the policy.

Instead of relying on a single signal, it combines multiple candidate sources, such as:

  • Semantic retrieval (e.g., vector similarity over product/content representations)
  • Behavioral retrieval (e.g., co-occurrence or interaction graphs)

Hybrid Retrieval: Combining Semantic and Behavioral Signals

A key part of this architecture is that candidate generation does not rely on a single signal. Instead, it blends semantic understanding with behavioral relationships.

This allows the system to move beyond “items that look similar” toward “items that make sense together.”

Two Complementary Data Sources

1. Semantic Knowledge Base (Vector Retrieval) This source captures what an item is.

Products or content are represented as embeddings derived from attributes such as:

  • textual descriptions
  • metadata
  • features
  • category context

Vector search helps retrieve items that are: semantically related to the user’s current context or intent.

This enables discovery of conceptually relevant candidates.

2. Behavioral Co-Occurrence Graph This source captures how items are used together in real life.

A co-occurrence graph is constructed from historical interaction patterns where:

  • items frequently appear together
  • user behaviors reveal implicit associations
  • combinations reflect practical or complementary usage

Instead of relying purely on similarity, this graph models: behavioral proximity using Jaccard Similarity

It helps surface items that may not be textually similar but are contextually compatible.

Why Both Signals Matter

When used together, the system balances:

“What fits conceptually” with “What works in practice”

Strategy-Controlled Blending

The policy layer (LLM-driven) determines how these sources should be combined at runtime.

Depending on the strategy, the system may:

  • emphasize semantic similarity
  • favor behavioral relationships
  • enforce constraints or filters
  • balance exploration vs. proven combinations

This ensures that:

Retrieval behavior adapts to strategy, not just static scoring rules.

The policy from the previous layer determines:

  • How many candidates to fetch from each source
  • What filters to apply
  • How to weight different signals

This creates a strategy-aware candidate pool rather than a static one.

Layer 4 — Persona-Aware Ranking Agent (LLM)

Persona-Aware Ranking Agent

Once a candidate set is available, the final ranking is performed by a second reasoning layer.

This agent:

  • Considers user persona
  • Interprets the strategy objective
  • Aligns recommendations with contextual constraints
  • Applies higher-level reasoning (e.g., balance, complementarity, budget awareness)

This is not just similarity scoring — it is reasoning-based reordering.

Here, LLMs function as:

Decision optimizers, not content generators.

Final Validation Layer

Before results are returned, a deterministic validation stage can ensure:

  • Items are available/valid
  • Compliance or rule checks pass
  • Data integrity is preserved

This keeps the system safe and reliable.

Why This Architecture Matters

This design enables:

  • Faster experimentation — Strategies can change without retraining models.
  • Business-controlled policies — Decision logic can be adjusted via configuration and prompts rather than code.
  • Decoupled AI systems — Model behavior is separated from business strategy.
  • Safer evolution — New objectives can be tested without destabilizing core pipelines.
  • Multi-objective optimization — Relevance, diversity, budget alignment, and business goals can be balanced dynamically.

The Big Shift

The most important conceptual shift is this:

LLMs are not only generators — they can act as runtime decision controllers.

In this architecture, LLMs operate as:

  • Policy interpreters
  • Strategy translators
  • Ranking reasoners

This moves recommendation systems toward a future where:

Intelligence lies not just in models, but in adaptive decision layers.

Closing Thoughts

As AI systems become central to product experiences, the ability to change what the system optimizes for becomes as important as model accuracy.

Runtime-pluggable strategy architectures offer a path toward:

  • More agile AI systems
  • Tighter alignment with business goals
  • Safer and faster innovation

The future of recommender systems may not just be better models — it may be better decision architectures.