Founder Story: The Tenant's Voice

0-to-1: Building & Scaling a RAG System for Social Good

View Live Project View Product Requirements

The Context & Problem

Like many renters, I’d had negative experiences with landlords, but lacked the confidence to challenge unfair situations. That changed when a landlord tried to deduct £2,625 from my girlfriend's tenancy deposit. This time, we decided to contest it.

I turned to Google's NotebookLM, uploading official UK tenancy law documents to guide our response. Armed with a clear understanding of "fair wear and tear" and "betterment," we drafted a formal rebuttal. The result was a success: the Tenancy Deposit Scheme (TDS) returned £2,350 of the £2,625 claimed. This victory proved that with the right information, tenants can effectively defend their rights.

The Flaw in General-Purpose AI

I wanted to share what I'd learned, but quickly hit two roadblocks. First, people were hesitant to trust a general-purpose tool like ChatGPT for specific legal matters. Second, I found that even NotebookLM could pull information from incorrect sources. For something this important, **accuracy and trust are non-negotiable**.

This is why I built The Tenant's Voice. It was created to be a dedicated, reliable platform for tenants.

£2,350

Successfully Recovered

The personal victory that sparked the idea and validated the user need.

The Approach & Solution

The goal was never just to provide answers, but to differentiate from ChatGPT by empowering users to *easily encourage action*. The product had to help tenants "think less and do more."

Guiding Principles

Trust & Accuracy

Only use verified, official sources.

Encourage Action

Make the next steps clear and simple.

Mobile First

Design for users in real-world situations.

The Hands-On Build: A Production-Grade RAG

I made the conscious, senior-level product trade-off of sacrificing some speed of output for guaranteed accuracy. I built a production-grade RAG (Retrieval-Augmented Generation) system from the ground up, using a modern, scalable tech stack. The core of the system is a vector database built using only legitimate sources (gov.uk, Shelter, Citizens Advice, TDS).

To ensure accuracy, content was chunked using a recursive character text splitter, and we ingested the "last modified" dates for each document. This critical step prevents the model from sharing outdated advice—for example, citing a law from 1985 that was superseded in 2015—ensuring every piece of guidance is grounded in the most current, reliable facts.

Runtime RAG Pipeline

How user queries are processed in real-time:

User Query
+ Chat History

Query Transform
(AI Keywords)

Embedding
text-embedding-004

Vector Search
pgvector (Top 5)

LLM Generation
gemini-2.5-flash

Structured JSON
{text, actions}

Data sources (gov.uk, Shelter, Citizens Advice) pre-processed offline with AI-generated keywords and embeddings.

See It In Action

Real examples of how users interact with The Tenant's Voice and the actionable guidance they receive.

Product Overview

Action: Write Letter to Landlord

AI-generated letter writing action with structured guidance

Action: Prepare for Phone Call

AI-generated call preparation guide with talking points

Technical Deep Dive & Optimizations

As the product manager, I tracked, diagnosed, and solved issues that moved our product from unstable to user-centric by applying design thinking in every build. This log breaks down how I identified each problem, analyzed the user impact, and implemented the solution.

Part 1: RAG Optimization & Database Indexing (Performance)

The Problem: A Very Slow Query

The application felt slow. Profiling revealed that a single database call—the match_documents function—was responsible for 68.2% of all query time. This was the critical bottleneck:

Mean Time: 911 ms (nearly a full second on average)
Max Time: 2,620 ms (a very noticeable 2.6 seconds)

Even though the end-to-end user experience felt slow (6–8 seconds), we knew that fixing this expensive database call was the critical first step.

What Was Wrong with the Initial Setup

Two primary issues in the initial database design caused the bottleneck:

Index Mismatch: The vector index was built with vector_l2_ops, which optimizes for L2 (Euclidean) distance. However, for semantic search with modern embedding models like Google's text-embedding-004, Cosine Similarity is the correct metric—it measures the angle between vectors, not the straight-line distance. This mismatch prevented the index from being used effectively, forcing a slow brute-force scan.
No Pre-filtering: The match_documents function performed a vector search across all 6,000 documents every single time. A keywords column existed but wasn't being used to narrow the search space before the expensive vector comparison.

The Fix: A Two-Stage Filtering Strategy

We implemented a series of coordinated changes to drastically reduce the search space and let indexes do their job:

Corrected the Vector Index: Dropped the old index and created a new one using vector_cosine_ops. We tuned the lists parameter to 80 (based on √6000), which is a best practice for IVFFlat index optimization.
Added a Keyword Index: Created a GIN index on the keywords column, which is incredibly fast for searching within array columns.
Upgraded the SQL Function: Rewrote match_documents to accept a p_keywords array parameter. The new logic first uses the GIN index to filter documents (WHERE keywords @> p_keywords), then performs the vector search only on that much smaller, pre-filtered set.
Updated the Edge Function: Modified index.ts to extract keywords from the user's query and pass them to the parameterized SQL function, enabling pre-filtering on every request.

The Results: Massive Performance Gains

Query logs after the changes show dramatic improvement:

Mean Time: Dropped ~24×, from 911 ms down to ~37 ms
Max Time: Dropped ~66×, from 2,620 ms down to ~40 ms
Consistency: The huge variability between minimum and maximum query times disappeared. The function now performs consistently fast, removing the database as the bottleneck.

Key Learnings for the Future

This work reinforced several critical best practices for building high-performance RAG systems:

Filter First, Search Second: Always reduce rows with relational filters (e.g., WHERE user_id = '...' or WHERE keywords @> '...') before expensive vector operations. This is the single most important optimization you can make.
Use the Right Index for the Job: A vector index is not one-size-fits-all. Ensure your index type (IVFFlat, HNSW) and distance metric (cosine_ops, l2_ops) match your embeddings and query patterns. For semantic search, cosine similarity is almost always correct.
Cache is Your Friend (and a Clue): The 100% cache hit rate told us the slowness was compute-bound, not disk-bound—use profiling signals like this when diagnosing.
Isolate the Bottleneck: Measure each layer. After fixing the database, timing logs on the Edge function revealed the next bottleneck: sequential AI API calls, which is where the remaining 6–8 seconds of latency lies.

Part 2: Solving the 36k-Byte Crash (Stability)

The Problem: Crashing on Long Conversations

I identified that the core function was crashing with a 400 Bad Request error. I realized that my most engaged users—those with long, detailed conversations—were the most likely to experience a total app failure, breaking trust and halting their journey.

What Was Wrong

I dove into the logs and saw that the text-embedding-004 model has a small 36,000-byte limit. My code was sending the entire chat history for vectorization, which was inefficient and, for long chats, fatal.

The Fix: A Dual-History Approach

I identified two different needs: RAG only needs recent context to find relevant documents, while the AI needs full context to understand the user's journey. I implemented a solution by creating two history variables:

recentHistoryText: A small, truncated history sent to the embedding model for efficient document retrieval.
fullHistoryText: The complete history sent to the final chat model (gemini-2.5-flash) to maintain conversational context.

Part 3: Chatbot Optimization & Stability (Performance & Reliability)

The Problem: Two Critical Bottlenecks

The chatbot application faced two severe issues that directly impacted user trust and experience:

Stability Failures: The server function frequently crashed with a 500 error when the AI's response didn't match the expected JSON format. This occurred when the AI produced conversational plain text instead—often a symptom of poor RAG results.
Unacceptable Latency: End-to-end response times exceeded 40 seconds per query, making the tool feel broken and unreliable for real-time user interactions.

What Was Wrong with the Initial Architecture

Root cause analysis revealed two distinct issues:

No Error Handling: The JSON.parse() call had no try-catch wrapper. When the AI returned plain text (e.g., "Please pro..."), the parser crashed immediately, with no graceful fallback.
Redundant AI Pre-processing Step: The function spent 27.7 seconds on an initial AI call to generate a searchQuery and p_keywords for hybrid search. However, profiling showed the actual database query (with keywords and vectors) took only 250–1,100 ms. The expensive AI step was the true bottleneck, and it was unclear if it actually improved search quality.

The Fix: Error Handling + Vector-Only Search

We implemented a two-part solution:

Wrapped JSON.parse() in Try-Catch: If parsing fails, the function logs the error and returns a valid JSON object with a user-friendly fallback message: "I'm sorry, I had trouble processing that request."
Removed Redundant AI Pre-processing: We eliminated the 27.7-second Query & Keyword Generation step. The new logic:
- Combines the user's query and chat history into a single string
- Generates one vector embedding from that combined string
- Calls match_documents using only the vector embedding, passing an empty array [] for keywords

The Results: Stability + Speed Breakthrough

The optimization validated our hypothesis that vector-only search would be fast and accurate:

Latency Reduction: Eliminated the 27.7-second bottleneck entirely. Total server response time dropped from ~41 seconds to ~3.8 seconds—a 90% reduction.
Database Efficiency: The new vector-only query remained fast and efficient at ~1.1 seconds (1,072 ms) across 6,000 documents.
Answer Quality (HITL Evaluation): Side-by-side comparison of hybrid vs. vector-only responses showed no significant quality difference. Both versions provided legally correct, relevant, and easy-to-understand answers. The vector-only method occasionally produced slightly more comprehensive results.
Reliability: Error handling prevented further crashes, ensuring graceful degradation when the AI produced unexpected output.

Key Learnings for the Future

This optimization reinforced critical lessons about building resilient AI products:

Always Have a Fallback: When parsing or processing external AI outputs, assume it can fail. Wrap critical calls in error handling and provide sensible user-facing fallbacks.
Question Every Expensive Step: The AI pre-processing step seemed reasonable in theory but consumed 67% of total latency. Always profile before optimizing; measure impact before adding complexity.
Simpler is Often Better: Removing a step entirely (instead of trying to optimize it) proved faster and just as effective. Occam's Razor applies to system architecture.
HITL Validation is Essential: We didn't assume vector-only search would be "good enough"—we tested it manually with real examples to confirm quality parity before deploying at scale.

Caveats & Trade-offs

While vector-only search proved effective, important caveats apply:

Domain-Specific: This optimization works because RAG is already filtering to the right domain (UK tenancy law). In broader search scenarios, the AI keyword extraction might still add value.
Query Complexity: Very complex or ambiguous queries might benefit from explicit keyword expansion. We monitor for this and can adjust if needed.
Not a Universal Pattern: Removing pre-processing works here because our embedding model (text-embedding-004) is strong enough to handle conversational queries directly. Other embedding models or use cases may require explicit keyword extraction.

Part 4: Fixing Accidental Submissions (Usability)

The Problem: User Frustration from Quickfire Questions

I noticed that users who were asked for details (e.g., "how long has the mould been present?") would try to type a multi-line answer. When they hit Enter for a new line, the UI submitted their partial, incomplete thought, confusing the AI and forcing it to ask the same questions again.

What Was Wrong

The UI was fighting the user's intent. A single-line <input> box that submitted on Enter was preventing users from providing the detailed, multi-line answers the AI needed.

The Fix: Aligning the UI with User Intent

My solution was to re-align the UI to match the user's natural workflow. I implemented this by:

Replacing the single-line <input> with a multi-line, auto-resizing <textarea>.
Changing the submit event from "Enter" to "Ctrl+Enter" (or "Cmd+Enter").
Updating the placeholder text to teach this new, more deliberate interaction.

Product Decision Framework

Every technical decision was driven by a product-first mindset. Here are the critical trade-offs I navigated as both founder and PM.

B2C vs B2B

Decision: Build for tenants (B2C), not landlords (B2B).

Rationale: Landlords already have access to resources and legal advice. Tenants are the underserved market facing an information gap. This aligns with the mission of empowerment over profit.

Stateless by Design

Decision: 100% stateless—no server-side conversation memory.

Rationale: Tenancy issues are sensitive. Guaranteeing anonymity builds trust. Chat history is passed client-side, giving users full control while maintaining context.

Accuracy Over Speed

Decision: Accept a few hundred milliseconds of latency for guaranteed accuracy.

Rationale: For legal guidance, incorrect answers are worse than slow answers. The RAG pipeline ensures every response is grounded in verified sources.

Integrated Architecture

Decision: Single Supabase Edge Function + pgvector vs. multi-agent workflow + Pinecone.

Rationale: Fewer services = lower latency, lower cost, and fewer points of failure. Simplicity enables faster iteration and debugging.

The Job to be Done

"When I have a problem with my tenancy, help me understand my rights and confidently take the correct, formal next step... without me having to pay for a lawyer or spend hours reading dense legal documents."

— The "Frustrated Renter" persona

The Results & Impact

The tool quickly found its audience. By sharing the exact advice I pulled from the tool, I became a 'star contributor' on several large tenant and landlord Facebook groups. This grassroots adoption is a clear sign of Product-Market Fit.

50+ Daily Active Users

Achieved steady daily usage through organic, community-led growth with zero marketing spend.

Community Recognition

Became a trusted voice in the target community, validating the tool's accuracy and usefulness.

Future Roadmap

The current tool is just the beginning. I have a clear, three-pronged vision for the future, including a high-value multimodal feature and potential monetization paths.

The Contract Analyzer (Multimodal)
Allow users to upload tenancy agreements. The tool would use AI to highlight sketchy or unenforceable clauses, providing a critical service that would justify a monetization model to cover processing costs.
Solicitor Referral Network
Use the tool as a referral point to recommend solicitors for complex cases, creating a revenue stream via referral fees.
Charity & Council Integration
Partner with councils, support groups, and charities to integrate the tool into their websites, improving the user experience and helping tenants plan their next steps more effectively.