From Zero to Hero: Implementing RAG using PostgreSQL

Large Language Models (LLMs) are powerful, but they operate without real-time access to your data. They struggle with private, changing, or domain-specific information, and they may hallucinate when their training data is incomplete. Retrieval-Augmented Generation (RAG) addresses this by pulling in relevant context from an external data store at query time and feeding it into the model.
To enable fast and context-aware retrieval given a user’s query, most systems rely on semantic search over embeddings. You chunk your documents, embed them into a high-dimensional vector space, and do the same for the user’s query. Similarity search then finds the closest matches. However, semantic search alone has gaps. Rare terms, exact phrases or code snippets are often better handled through keyword-based search. This is why many production setups use a hybrid approach that combines both methods.
A common choice is to use managed vector databases like Pinecone, Weaviate, or Chroma. These work well, but they also add operational overhead: more infrastructure, more APIs to manage, and more cost. For many projects, there is a simpler option. PostgreSQL, together with the pgvector extension, can act as a vector store and handle semantic search directly—while also supporting keyword search via its built-in Full Text Search (FTS) capabilities. Since PostgreSQL is already the primary database in many backends, this approach makes it convenient to keep everything in one place and still support hybrid search efficiently.
This post is a hands-on walkthrough covering three parts:
- Vector search with pgvector
- Keyword search with Full Text Search
- Row-Level Security (RLS) to restrict access to private data
By the end, you will have a working PostgreSQL-based retrieval setup that can power a RAG system without adding another piece of infrastructure.
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Vector Search with pgvector
pgvector adds native vector types and nearest-neighbor search to PostgreSQL, so you can store embeddings next to your relational data and query them with SQL. To use pgvector, enable it once per database:
CREATE EXTENSION IF NOT EXISTS vector;
pgvector supports several vector types with different limits:
- VECTOR – up to 2,000 dims
- HALFVEC – up to 4,000 dims
- BIT – up to 64,000 dims
- SPARSEVEC – up to 1,000 non-zero elements
For any embedding model with dimensionality <2k (e.g., OpenAI’s text-embedding-3-small) VECTOR is the default choice. For models that exceed the 2k dims (e.g., OpenAI’s `text-embedding-3-large`), ` HALFVEC` is a straightforward choice which cuts storage roughly in half.
Setting Up a VECTOR Column for Embeddings
For any table, you can add one (or more) vector columns. If you know the embedding
dimensionality, it’s best to specify it, which enables early checks and better planning. For example:
CREATE TABLE documents (
id BI GSERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(1536)
);
PostgreSQL does not create embeddings for you. You must compute them using an embedding model and store them in the database (along with the existing data).
Executing a Vector Search
When querying the database, you first calculate an embedding for your query using the same model you used when storing document embeddings. This allows you to compare the query vector against the vectors in your database and retrieve the most relevant matches.
To measure how close two vectors are in this space, several similarity metrics are commonly
used:
- L2 distance: <->
- Inner product (negative, for ASC ordering): <#>
- Cosine distance: <=>
- L1 distance: <+>
- Bit vectors: Hamming <~> and Jaccard <%>
Cosine distance is a good default because it measures directional similarity rather than raw magnitude. Most embedding models produce vectors where direction encodes meaning, and many providers normalize vectors to unit length. Under normalization, cosine, dot product, and L2 distance produce identical rankings, but cosine and dot product are usually faster to compute and less sensitive to differences in vector scale.
Example query:
SELECT id, title FROM documents ORDER BY embedding <=> $1 -- $1 is a 1536-dim vector literal like '[...]' LIMIT 10;
Indexing for performance
Exact vector search provides perfect recall but does not scale well. With large datasets, latency increases because every vector must be scanned. To reduce latency, add an approximate nearest neighbor (ANN) index. pgvector supports two ANN index types: IVFFlat (inverted file with flat compression) and HNSW (hierarchical navigable small world).
IVFFlat partitions the vector space into clusters (so-called lists). At query times, only the most similar lists are searched. It builds quickly and uses modest memory, but recall is lower than with HNSW. Build the index after loading representative data, choose the number of lists when creating the index, and tune ivfflat.probes – which controls how many of those clusters are searched – per query to balance recall and speed.
CREATE INDEX ON documents USING ivfflat (embedding halfvec_cosine_ops) WITH (lists = 100); -- per-query recall/speed trade-off SET ivfflat.probes = 10;
Rules of thumb: start with lists = rows / 1000 up to 1M rows and sqrt(rows) above that; set probes ≈ sqrt(lists).
HNSW creates a multi-layer graph structure where each vector is connected to several neighbors. Queries navigate from coarse upper layers down to more detailed lower layers, which makes lookup fast while keeping recall high. It generally performs well for read-heavy workloads but uses more memory and takes longer to build than IVFFlat. It does not require a separate training step.
You control index quality and resource use with two build-time parameters:
- m: how many connections each node keeps. Higher values improve recall but increase memory.
- ef_construction: how broadly the index searches for neighbors during build. Higher values improve graph quality but slow indexing
At query time, you tune:
- hnsw.ef_search: how many candidate neighbors are explored. Higher values increase recall but slow queries.
CREATE INDEX ON documents USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);
Which to choose? Use HNSW if you want the best recall and lowest latency and have memory to spare. Use IVFFlat if you need faster builds or tighter memory usage. With both index types, keep ORDER BY … LIMIT in your query so the planner can use the index.
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Keyword search with Full Text Search
With vector search in place, we can add keyword-based retrieval using PostgreSQL’s native Full Text Search (FTS). FTS is well-suited for exact terms, rare words, or code snippets— making it a useful complement to embeddings in a hybrid search setup.
Setting Up a TSVECTOR Column for Searchable Text
PostgreSQL stores searchable text in a TSVECTOR, which contains normalized lexemes and optional positional data. For example:
“Learn how to use tsvector and tsquery” → ‘learn’:1 ‘use’:4 ‘tsvector’:5 ‘tsquery’:7
The functions to produce and query these vectors are built in. Precompute the TSVECTOR
and store it in the table to avoid recomputing on every query. A generated stored column
keeps it in sync.
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector(‘simple’, coalesce(title, '') || ' ' || coalesce(content, ''))
) STORED;
Tokenization configuration
This example above uses the ‘simple’ text search configuration (no stemming or stopwords removal) to generate the TSVECTOR. PostgreSQL also offers language-specific analyzers (english, german, etc.) which handle stemming and stopword removal for that specific language.
To list all supported languages execute:
SELECT cfgname FROM pg_ts_config;
For multilingual data, you can also store a per-row language and call to_tsvector(ts_config, text).
Weighting
PostGreSQL also supports weighting fields differently (e.g., title > body) using weighted vectors:
ALTER TABLE films
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', description), 'B')
) STORED;
The default weight array used by PostGreSQL is {D=0.1, C=0.2, B=0.4, A=1.0} but can be
changed during query time.
Indexing
PostgreSQL supports two index types for Full Text Search: GIN (Generalized Inverted Index) and GiST (Generalized Search Tree).
GIN stores a true inverted index: for each lexeme, it keeps a list of document references (and optional positions). This makes containment checks (which documents contain a given keyword) fast and exact. The trade-offs are a larger index on disk and slower writes, because updates must maintain these posting lists. This is usually acceptable for read-heavy workloads, and GIN remains the standard choice for most text search scenarios.
GiST works differently. Instead of storing full posting lists, it compacts each tsvector into a small bitmask called a signature. These signatures summarize which lexemes appear in a document, but they are not perfectly precise. Because different documents can produce the same bit pattern, GiST may return rows that only might match, and PostgreSQL must recheck the underlying tsvector. The benefit is a smaller index that is faster to build and cheaper to update. The cost is occasional false positives and slower complex queries.
Which to choose? Choose GIN when you want fast, accurate text search and can afford slightly slower index maintenance. Choose GiST when you have many writes, frequent updates, or limited storage and can tolerate a recheck step.
Example:
CREATE INDEX idx_products_search_vector_gin ON products USING GIN (search_vector); CREATE INDEX idx_products_search_vector_gist ON products USING GIST (search_vector gist_tsvector_ops);
Executing a Full-Text Search
To run a full-text search in PostgreSQL, you first convert your raw text query into the TSQUERY format. A TSQUERY is the query representation used by PostgreSQL to match against TSVECTOR columns, like how an embedding vector is used to query a vector index. The TSQUERY defines the search terms and logic, and PostgreSQL uses it to check which documents match.
Converting input to a TSQUERY is more involved than vector search. For embeddings, you compute a vector once and search with it. For keyword search, you must decide how to turn a natural language query into a meaningful TSQUERY. Raw queries from users or from an LLM in a RAG system are usually not optimized for keyword search.
The simplest conversion is plainto_tsquery. It normalizes the text, removes stopwords, and lemmatizes terms, but it always produces an AND query. All terms must appear in the document. This is often too strict in practice.
SELECT plainto_tsquery('english', 'deep learning models'); -- 'deep' & 'learn' & 'model'
If you need broader recall, there are two alternatives:
- Post Processing: Improve the query before calling FTS. You can apply your own heuristics or let an LLM shape the query (e.g., if used as a tool). PostgreSQL provides websearch_to_tsquery, which accepts a Google-style syntax and never errors on malformed input. This is useful when the query is already processed.
- Convert the AND query to OR: Generate the TSQUERY with plainto_tsquery (as shown above) to benefit from parsing and normalization, then replace & with |. This is a small “hack” to widen the match set with no extra latency, no LLM calls, and no custom heuristics, but should be used with caution. OR queries can become overly broad, especially if the input contains terms that are not removed as stopwords. In the worst case, almost all documents match, reducing the value of the filter step. It is a quick fix to increase recall, but it can weaken filtering significantly.
Given the computed TSQUERY, a full-text search query runs in two steps. First, you filter rows using the TSQUERY and the @@ operator which benefits from FTS index. It determines which documents match at all. Second, the matching rows are ranked by relevance. Ranking is slower than indexed filtering, so even though the first step is technically optional, it is recommended to keep the match set small to minimize latency during ranking.
PostgreSQL provides two functions to compute a relevance score: ts_rank and ts_rank_cd. Both take a TSVECTOR (document) and a TSQUERY (query) and return a numeric score, where higher means a better match.
ts_rank scores documents based on term frequency and optional weights. More
occurrences of query terms increase the score, but with diminishing returns. Very frequent
terms are discounted, so they do not dominate.
ts_rank_cd includes everything from ts_rank but adds cover density. It considers how close the query terms appear to each other. Documents where terms occur in a tight span receive a higher score than those where terms are far apart. This usually improves ranking for multi term queries, because clustering signals a more relevant context. The extra logic makes ts_rank_cd slower, as it must examine positional data. For short documents or one-word queries, the difference is small. For longer documents or multi-term queries, ts_rank_cd typically yields better ranking.
Both functions support a normalization bitmask to adjust raw scores. Normalization can reduce bias towards long documents and scale scores into comparable ranges. Bit 1 divides by 1 + log(document_length) and bit 2 divides by document length, both reducing length bias. Bits 4 and 8 consider the number of unique terms for scaling. Bit 32 is common in practice and maps scores to score / (score + 1), producing values between 0 and 1.
To summarize, a typical full-text search combines filtering with @@ and ranking with ts_rank or ts_rank_cd. The filter uses the FTS index to select candidates quickly, and ranking then orders only those matches. An example might look as follows:
SELECT id, title, ts_rank_cd(search_vector, query, 32) AS score FROM products WHERE search_vector @@ query ORDER BY score DESC LIMIT 10;
Hybrid Retrieval: Combining pgvector and FTS
Now that you have both vector search and keyword-based search in place, you can combine them to improve result quality. Hybrid search fuses semantic matches from pgvector with keyword matches from Full Text Search. You run two independent queries—one vector search and one FTS search—and then merge the results. Both queries can be executed in parallel.
Reciprocal Rank Fusion (RRF) combines the two result lists by assigning a score based on each document’s rank in the individual lists and summing them. It is popular in many retrieval setups because it is simple, rank-based, and robust to outliers.

Where:
- r = rank position of document d in a result list (1 = best)
- R_d = set of ranks for d across all lists
- k = constant to smooth the contribution of lower-ranked results.
The value of k is commonly set to 60 based on empirical evaluation in the information retrieval literature, where it produced stable performance across benchmarks without tuning. Larger values reduce the influence of tail results, ensuring top-ranked documents dominate the fused score.
Overall, RRF avoids the need to calibrate or normalize scores from different search methods. It is a practical default when combining vector and keyword search.
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Row-Level Security (RLS)
When powering a RAG system from PostgreSQL, you often store user-specific data alongside shared content. A vector search alone does not guarantee isolation: without additional rules, embeddings and text rows from different users can surface in results. To prevent cross-user leakage and ensure each query only returns data the caller is allowed to see, you enforce access control at the table level.
Row-Level Security (RLS) is a feature in PostgreSQL that allows fine-grained control over which rows each user or role can access or modify. RLS enforces restrictions within a table, ensuring each query only sees the rows it’s permitted to see.
One of the primary advantages of RLS is that access rules are enforced directly in the database, rather than scattered throughout application code. This centralization means there’s one place (the database table policy) to define and maintain your security logic, reducing the chance of an application bug exposing data (for example, a missing filter in an API endpoint). If RLS is properly configured on a table, every query (SELECT, UPDATE, DELETE) executed by non-privileged roles will automatically include the security filter, so you cannot bypass it with a badly written query.
Enabling and Applying Row-Level Security Policies
RLS is implemented through policies defined on tables. By default, tables have no RLS policy, so if a user has SELECT access, they can see all rows. Once RLS is enabled, PostgreSQL requires each row to pass a policy check before it can be returned or modified. If RLS is enabled and no policy exists, PostgreSQL defaults to “deny all,” meaning no rows are visible or editable until a policy is added.
Enabling RLS on a table is done with the following command:
ALTER TABLE my_table ENABLE ROW LEVEL SECURITY;
After enabling it, you create one or more policies using CREATE POLICY. A policy defines a boolean expression that filters rows for specific operations (SELECT, INSERT, UPDATE, DELETE) and can apply to particular roles or to PUBLIC (all roles). For example, to restrict selects on a documents table to rows where owner_id matches the current user’s ID, you might write:
CREATE POLICY docs_select_own ON documents
FOR SELECT
USING (owner_id = current_setting('app.user_id')::int);
For write operations, consider WITH CHECK clauses on INSERT and UPDATE policies to ensure users cannot create rows that would later be invisible to them.
Important: RLS works alongside PostgreSQL’s regular permission system. It does not replace it. A user must still have the appropriate table privileges (granted via GRANT) to perform operations. RLS then adds an additional filtering layer on top. For example, you might GRANT SELECT ON documents TO app_user_role, giving that role general read access, but RLS policies will determine which rows are actually visible. If a user has no SELECT privilege, RLS does not grant it; they simply cannot query the table.
Privilege Behavior Under RLS
Superusers and roles with the BYPASSRLS attribute skip row-level security checks entirely. The owner of a table is also exempt by default, since owners are typically administrative roles. As a result, if your application connects as the table owner or as a superuser, RLS policies will not be enforced for that session.
To ensure RLS applies, use a dedicated non-owner role for your application to connect. For example, create an app_user role with limited privileges and have the application authenticate as that role. Grant access to the required tables, but do not grant ownership. In that configuration, RLS policies will be evaluated as intended.
Permissive vs. Restrictive Policies
PostgreSQL policies are permissive by default. Multiple permissive policies on a table are combined with logical OR — if any policy allows access, the row is accessible. This makes permissive policies suitable for broad, general rules.
For example, you might define a permissive policy covering all operations (*) that allows each employee to read and update documents within their own organization:
CREATE POLICY docs_org_policy ON documents
FOR ALL
USING (org_id = current_setting('app.org_id')::int);
In some cases, you need stricter rules for certain operations. Restrictive policies address this. Restrictive policies are combined with logical AND — all restrictive conditions must pass, in addition to any permissive policy.
For example, to ensure only the document owner can delete their own document, you can define a restrictive policy for DELETE:
CREATE POLICY docs_delete_own ON documents
AS RESTRICTIVE
FOR DELETE
USING (owner_id = current_setting('app.user_id')::int);
This combination allows layered access control: permissive policies define the baseline, while restrictive policies tighten constraints on sensitive operations.
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Conclusion
PostgreSQL can serve as a practical vector store for RAG systems by combining pgvector for semantic retrieval with Full Text Search for keywords. Simple fusion methods such as Reciprocal Rank Fusion improve result quality without complex score calibration, and both searches can run in parallel. Row-Level Security then prevents cross-user leakage by enforcing access rules inside the database engine. Together, these features let you build a retrieval pipeline on familiar infrastructure with tunable performance, centralized authorization, and minimal operational overhead.
Author

Arne Grobrügge
Data Scientist at scieneers GmbH
arne.grobrügge@scieneers.de












