Why Your AI Search Might Be Missing Obvious Answers (And How RRF Fixes It)

We recently encountered a puzzling bug that perfectly illustrates why hybrid search systems can fail in surprisingly obvious ways. A user asked their AI assistant a simple question, and despite having the exact answer in their chat history, the AI confidently claimed no such information existed.

The Problem

AI Missing Obvious Answers

User Question to AI

"What date was I supposed to put in the agreement?"

Actual Chat History

Let me know what date I should put in the agreement

11:57 AM

THE ANSWER1st February will be great. So I can start work on it

12:04 PM

Cool. I'll mention that date

12:06 PM

AI RESPONSE

I checked your messages, and there's no mention anywhere of what date you were supposed to put in the agreement — none of the retrieved messages reference a signing date.

The root cause: Score scale mismatch

Our system uses hybrid search — combining two different search methods to get the best of both worlds:

Vector Search (Semantic): Understands meaning and context. Great for "find messages about contract terms" even if the word "contract" isn't used.
BM25 (Keyword): Classic text matching. Perfect for finding exact phrases like "1st February" or specific names.

The problem? These methods use completely different scoring scales:

The Data

Score Scale Mismatch

Search Type	Message Found	Raw Score	Final Rank
VECTOR	"As per the agreement... Exclusivity in Nigeria"	0.65	#1Wrong answer
VECTOR	"Ok. Will counter sign and send back"	0.58	#2Wrong answer
BM25	"1st February will be great"	0.016	#47Correct but buried!

The problem: BM25 scores (0.016) are 40x smaller than Vector scores (0.65). When sorted by raw score, keyword matches always lose — even when they're the correct answer!

When the user asked "What date should I put in the agreement?", BM25 correctly found the "1st February" message at rank #1 (it matched the exact keywords). But Vector search ranked it #47 because semantically, talking about a date doesn't strongly relate to general "agreement" discussions.

Since we were sorting by raw score, Vector's 0.65 beat BM25's 0.016 every time. The correct answer got buried at position #47, and only the top 10 results were sent to the AI.

The fix: Reciprocal Rank Fusion (RRF)

The solution is elegant: instead of comparing raw scores, compare rankings. This is called Reciprocal Rank Fusion (RRF), and the formula is beautifully simple:

RRF Formula

score = 1/(60 + rank_vector) + 1/(60 + rank_bm25)

Why this works: Being ranked #1 in either search method contributes equally to the final score. The constant 60 (called "k") dampens the effect of lower rankings so that #1 is significantly better than #2, but #47 isn't much different from #48.

The Solution

Before vs After RRF

Before: Sorted by ScoreBROKEN

#1Exclusivity in Nigeria0.65

#2Counter sign and send0.58

...45 more results

#471st February will be great0.016

After: Sorted by RRFFIXED

#11st February will be great0.025

#2Exclusivity in Nigeria0.016

#3Counter sign and send0.016

...More results

How RRF calculates the final score

Let's trace through the math for our "1st February" message:

Search Method	Rank	RRF Contribution
BM25 (keyword)	#1	1/(60+1) = 0.0164
Vector (semantic)	#47	1/(60+47) = 0.0093
Total RRF Score		0.0257

Compare that to the "Exclusivity in Nigeria" message that was previously #1:

Search Method	Rank	RRF Contribution
Vector (semantic)	#1	1/(60+1) = 0.0164
BM25 (keyword)	Not found	1/(60+max) = ~0
Total RRF Score		0.0164

The message that was #1 in one search method (0.0257) now outranks the message that was #1 in one search method but didn't appear in the other (0.0164). Fair is fair.

The key insight

"RRF doesn't care about the raw scores. It only cares about the order. Being #1 in keyword search carries the same weight as being #1 in semantic search. Results found by both methods rank highest."

What happens when a result only appears in one search?

For the search method where a result doesn't appear, we assign it a "max rank" (the total number of results + 1). This means:

Results in both searches: Get contributions from both, rank highest
Results in one search: Get contribution from one, plus minimal contribution from the other
Same-method ties: Decided by their ranking in the other method

This elegantly handles the case where semantic search finds conceptually related messages that keyword search misses (and vice versa).

Implementation details

The code change was straightforward. Instead of:

// OLD: Sort by raw score (broken)
results.sort((a, b) => b.score - a.score);

We now:

// NEW: Calculate RRF score from rankings
const RRF_K = 60;
const vectorRank = vectorRanks.get(id) ?? maxRank;
const bm25Rank = bm25Ranks.get(id) ?? maxRank;
const rrfScore = 1/(RRF_K + vectorRank) + 1/(RRF_K + bm25Rank);

// Sort by RRF score
results.sort((a, b) => b.rrfScore - a.rrfScore);

Why this matters for AI search

Modern search systems often combine multiple retrieval methods:

Vector embeddings for semantic understanding
BM25/TF-IDF for exact keyword matches
Graph-based retrieval for relationship-aware search
Temporal signals for recency

Each method has different score distributions. RRF provides a principled way to combine them without needing to normalize scores or tune complex weighting schemes.

"The beauty of RRF is its simplicity. No hyperparameter tuning beyond the k value (60 works well in practice). No score normalization. Just ranks and a formula."

The result

After implementing RRF, the same query "What date should I put in the agreement?" now correctly retrieves the "1st February" message at the top of results. The AI finds the answer immediately.

Fixed: Messages with exact keyword matches now rank appropriately alongside semantically similar results. The correct answer bubbles to the top.

Key takeaways

Don't compare raw scores across different search methods — they live on different scales
RRF converts scores to ranks, making apples-to-apples comparison possible
The k=60 constant is well-established in research and works across most use cases
Results in multiple search methods naturally rank higher — which is usually correct

Sometimes the fix for a confusing AI behavior isn't in the AI itself — it's in how you feed it information.

Want smarter AI search for your conversations?

Querygen uses advanced hybrid search with RRF to ensure you never miss important information in your WhatsApp messages.

Try Querygen Free