Intent Is Not a Point
Most retrieval systems treat a query as a snapshot. Reasoning doesn't work that way.
There’s a moment in a recording session (anyone who’s spent time in one knows it) where you realize a part that sounded right in isolation is wrong in context. Not wrong in pitch. Not wrong in rhythm. Wrong in direction. The note is fine. The momentum it creates is not.
A chord progression is a vector. It implies where things are going. Play something that ignores that direction, even something technically correct, and the music resists you. The listener doesn’t know why, but they feel it.
I’ve been working with GraphRAG-based retrieval systems for a while, and I kept hitting the same problem. The retrieval was technically correct. The wrong things kept surfacing anyway. The queries were right. The graph had the answers. But the system had no sense of where the reasoning was going.
It was playing the right notes in the wrong direction.
The photograph problem
Every retrieval system I’ve built treats a query the same way: as a photograph. A static capture of what the caller wants at a single instant. Embed it. Find the nearest neighbors. Return them. The next query is a new photograph. The system has no memory of the album.
This is fine for search. You type something into Google, you get results, you’re done. The session has no structure beyond the individual lookup.
Agentic systems are different. A crew agent running a company analysis doesn’t issue one query. It issues ten. Fifteen. Each one building on the last: narrowing, pivoting, drilling. The queries aren’t independent. They’re a line of reasoning traced through (information?) space.
A sequence of queries from an agent session doesn't just describe what the agent wants. It describes where the agent is going.
The photograph metaphor breaks down here. What we actually have isn’t a photograph. It’s a trajectory.
In a real analysis run, the queries form a line. Each one narrows the frame. By the time the third or fourth query arrives, the system has accumulated context that shapes what any answer actually means. A question about competitive position asked after questions about funding and leadership is a different question than the same words asked cold. The words are identical. The intent is not.
A retrieval system that ignores that context is leaving signal on the table. Not because it lacks information, but because it’s treating each query as if no queries came before it.
What a trajectory actually is
An embedding is a point in a high-dimensional vector space. We use cosine similarity to measure how close two points are, how semantically related two pieces of text are to each other.
A sequence of query embeddings is a path through that space. And a path has properties that a point doesn’t: direction, velocity, coherence. The difference between consecutive points tells you something the points themselves don’t.
That difference is what I’m calling the intent trajectory. Not the current query. The weighted mean direction of recent queries, adjusted for how consistently they’ve been pointing the same way.
The second part is the real insight. A mean direction that comes from three queries all pointing roughly the same way should be trusted more than a mean direction that comes from three queries scattered across different topics. The trajectory needs to earn its influence.
I measure this with what I’m calling coherence: the mean cosine similarity between consecutive query pairs in the session window. If an agent is drilling into a specific topic: it’s going to execute consecutive queries that will point in similar directions. Coherence will be high. If the agent is doing broad exploratory work, coherence is low. The trajectory only steers retrieval when it has earned the right to do so.
coherence = mean( cos(qi, qi+1) ) for i in window
retrieval_vector = a · current_query + b · trajectory
b = min_b + coherence · (max_b − min_b)The trajectory influence scales with its own coherence. It earns its weight.
When coherence is zero (first query, or a scattered session), β is zero. The retrieval vector equals the current query embedding. The system degrades gracefully to standard behavior. No special cases. No fallback logic. Just math.
The implementation
The core class is IntentTrajectory. It maintains a rolling window of QueryMoment objects, each holding a query, its unit-normalized embedding, and a timestamp. The timestamp matters: older queries get less weight via exponential decay. A query from five minutes ago carries half the weight of the current one. This suits agentic workflows where tool calls happen in clusters.
@dataclass
class QueryMoment:
“”“A single point in a reasoning trajectory.”“”
query: str
embedding: np.ndarray # unit-normalized
timestamp: float = field(default_factory=time.time)
def age(self) -> float:
“”“Seconds since this query was recorded.”“”
return time.time() - self.timestampThe _compute_state method is where the trajectory is actually calculated. It applies recency weights, computes the weighted mean direction, measures coherence across consecutive pairs, and derives alpha and beta from that coherence score.
def _compute_state(self) -> TrajectoryState:
# Recency weights: exponential decay by age
weights = np.array([
self._recency_weight(m.age()) for m in self._window
], dtype=np.float32)
weights /= weights.sum()
# Weighted mean embedding: the trajectory direction
matrix = np.stack([m.embedding for m in self._window])
trajectory = (weights[:, None] * matrix).sum(axis=0)
trajectory = self._normalize(trajectory)
# Coherence: mean cosine sim between consecutive pairs
if n >= 2:
sims = [
np.clip(np.dot(self._window[i].embedding,
self._window[i+1].embedding), -1.0, 1.0)
for i in range(n - 1)
]
# Map [-1, 1] → [0, 1]
coherence = (float(np.mean(sims)) + 1.0) / 2.0
else:
coherence = 0.0 # single query: no direction yet
beta = self.min_beta + coherence * (self.max_beta - self.min_beta)
alpha = 1.0 - beta
return TrajectoryState(
trajectory_vector=trajectory,
coherence=coherence, depth=n,
alpha=alpha, beta=beta,
dominant_direction=trajectory
)Notice the beta ceiling. Even at maximum coherence, trajectory influence is capped at max_beta=0.45. The current query always contributes at least 55% of the retrieval signal. The trajectory steers. It doesn’t take over.
Plugging it into retrieval
In the GraphRAG tool, the trajectory sits as a second pass after the CrossEncoder reranker. CrossEncoder handles query-result relevance for the current query. The trajectory handles momentum-based reordering across the session. They're doing different jobs and composing cleanly.
# Record this query and get the blended retrieval vector
trajectory = self._get_trajectory(trajectory_session_id)
current_embedding = trajectory.embed_query(query)
trajectory.record(query, current_embedding)
blended_embedding, traj_state = trajectory.blend(current_embedding)
# ... run retrieval (graph + vector) ...
# ... run CrossEncoder reranking ...
# Trajectory reranking: second pass, only active once depth >= 2
if use_trajectory and traj_state.depth >= 2 and traj_state.beta > 0.0:
graph_entities = trajectory.rerank(
blended_embedding, graph_entities,
score_fn=graphrag_trajectory_score_fn
)
graph_relationships = trajectory.rerank(
blended_embedding, graph_relationships,
score_fn=graphrag_trajectory_score_fn
)The session isolation is worth noting. Multiple concurrent crew runs share a single tool instance. Each passes a trajectory_session_id, a per-run identifier, and gets its own isolated trajectory. The trajectories don’t bleed into each other.
The combined context that gets passed to the agent now includes a trajectory summary: session depth, coherence score, recent query progression. The agent knows not just what was retrieved, but why: what direction the retrieval was biased toward and how confident the system was in that bias.
The agent can now see its own reasoning momentum reflected back at it. That changes how it synthesizes results.
What this is not
It’s worth being clear about the boundaries.
This is not memory. Memory systems store facts from prior sessions and inject them into context. The trajectory stores nothing durable. Clear the window and it resets completely. It’s session-scoped reasoning momentum, not persistent knowledge.
This is not chain-of-thought. CoT is about generation: getting a model to reason step-by-step toward an answer. The trajectory is about retrieval, shaping what information surfaces to support the reasoning that’s already happening.
This is not personalization. Personalization adapts to user preferences across sessions. The trajectory adapts to reasoning direction within a single workflow run. A different run of the same agent, asking the same queries in the same order, would produce the same trajectory. It’s not about the user. It’s about the shape of the reasoning.
The closest thing in the literature is probably contextual bandits applied to retrieval, but those optimize for click-through or satisfaction signals, not reasoning coherence. There’s also adjacent work in conversational dense retrieval (ColBERT-based), but nothing that explicitly models trajectory as a geometric object with self-regulating influence.
The real claim
The dominant framing in intent engineering right now is about the query. Write better prompts. Add more context. Parse the query more carefully. Extract entities. Understand what the user is asking.
That framing treats intent as something that lives in the current message. But in agentic systems, intent accumulates. It has a past. It has a direction. The query is a derivative of something larger: a reasoning process unfolding over time.
Retrieval systems that ignore this are measuring position when they should be measuring velocity. They’re taking photographs when they should be reading vectors.
A system that knows where the reasoning has been can make much better guesses about what will actually be useful. Not because it’s smarter. Because it has a frame of reference that a single-query system doesn’t.
The music analogy holds. The note that’s right depends on where the progression is going. Change the direction, and the same note becomes wrong. Context isn’t just what came before. It’s the momentum those prior moments created.
Intent is not a point. It’s a vector field. And retrieval systems that don’t model it that way are leaving real signal on the floor.
I did have AI help drafting this article and worked with AI to help me refine the statistics/math portion of this concept.
