Introducing ParadeDB

Written by Ming Ying on September 1, 2023

We’re excited to announce ParadeDB: a PostgreSQL database optimized for search. ParadeDB is the first Postgres database built to be an Elasticsearch alternative, engineered for lightning-fast full text, semantic, and hybrid search over Postgres tables.

Why We Built ParadeDB

For many organizations, search remains an unsolved problem. Despite the existence of giants like Elasticsearch, most developers who have worked with Elasticsearch know how incredibly painful it is to run, tune, and manage. While alternative search engines exist, gluing these external services on top of an existing database introduces the headache and costs of reindexing and duplicating data.

Many developers seeking a unified source of truth and search engine turn towards Postgres, which offers basic full text search via tsvector and semantic search via pgvector. These tools may work for simple use cases and medium-sized datasets, but break down when tables get large or queries become complex:

Slow ranking and phrase search over large tables
No support for BM25 calculations
No support for hybrid search, a technique that combines vector search with full-text search
No real-time search — data must manually be re-indexed or re-embedded
Limited support for complex queries like faceting or relevance tuning

By now, we’ve witnessed dozens of engineering teams who have begrudgingly stitched Elasticsearch on top of Postgres, only to ditch it later because it was too bloated, expensive, or convoluted. We asked ourselves: what if Postgres itself was built for Elastic-quality search? What if developers didn’t need to choose between one unified Postgres database with limited search capabilities or two separate services, one as the source of truth and one as the search engine?

Who ParadeDB is For

Elasticsearch has many use cases, and we aren’t trying to tackle all of them — at least not yet. Instead, we’re focused on nailing a core set of use cases for Postgres users that want to search over their database. ParadeDB is a good fit for you if:

You want a single, Postgres-based source of truth and hate duplicating data across multiple services
You want to perform full-text search over large volumes of documents stored in Postgres without compromising on performance or scalability
You want to combine ANN/similarity search with full text search for improved semantic matching

The Product

ParadeDB is a fully managed Postgres database with capabilities to index and search Postgres tables not found in any other Postgres provider:

Feature	Description
BM25 full text search	Full text search with support for boolean, fuzzy, boosted, and keyword queries. Results are scored using the BM25 algorithm.
Faceted search	Postgres columns can be defined as facets for easy bucketing and collection of metrics.
Hybrid search	Results can be given a score that combines semantic relevance (using vector search) and full text relevance (using BM25).
Distributed search	Tables can be sharded for fast, parallelized queries.
Generative search	Postgres columns can be fed into large language models (LLMs) for automatic summarization, classification, or text generation.
Real-time search	Text indexes and vector columns are automatically kept in sync with the underlying data.

Unliked managed services like AWS RDS, ParadeDB requires zero setup and supports the entire Postgres extension ecosystem, making it a fully customizable database. For developers that require a self-managed database, ParadeDB is open source and provides a straightforward Docker Compose stack.

How ParadeDB Is Built

The core of ParadeDB is a vanilla Postgres database with custom extensions, written in Rust, that introduce enhanced search capabilities.

ParadeDB’s search engine is built on Tantivy, an open-source, Rust-based search library heavily inspired by Apache Lucene. Search indexes are stored in Postgres as native Postgres indexes, which obviates the need to pipe data out of Postgres and duplicate it in foreign services and guarantees transaction safety.

ParadeDB introduces a new extension to the Postgres ecosystem: pg_search. pg_search implements Rust-based full text search in Postgres using the BM25 scoring algorithm. This extension comes pre-installed with ParadeDB.

What’s Next

We are currently building a cloud version of ParadeDB, and already offer a commercial self-hosted version with support and enterprise features. If you would like to request access to the ParadeDB commercial offerings, we invite you to join our waitlist.

The focus of the core team is on developing the open-source version of ParadeDB, which we will be launching in winter 2023.

We’re building in public and are excited to share ParadeDB with the community. Stay tuned for updates — in future blog posts, we’ll be covering some of the interesting technical challenges behind ParadeDB.

Blog

​Why We Built ParadeDB

​Who ParadeDB is For

​The Product

​How ParadeDB Is Built

​What’s Next

Why We Built ParadeDB

Who ParadeDB is For

The Product

How ParadeDB Is Built

What’s Next