We’re excited to announce ParadeDB: a PostgreSQL database optimized for search. ParadeDB is the first Postgres database built to be an Elasticsearch alternative, engineered for lightning-fast full text, semantic, and hybrid search over Postgres tables.
Why We Built ParadeDB
For many organizations, search remains an unsolved problem. Despite the existence of giants like Elasticsearch, most developers who have worked with Elasticsearch know how incredibly painful it is to run, tune, and manage. While alternative search engines exist, gluing these external services on top of an existing database introduces the headache and costs of reindexing and duplicating data.
Many developers seeking a unified source of truth and search engine turn towards Postgres, which offers basic full text search via
tsvector and semantic search via
pgvector. These tools may work for simple use cases and medium-sized datasets, but break down when tables get large or queries become complex:
- Slow ranking and phrase search over large tables
- No support for BM25 calculations
- No support for hybrid search, a technique that combines vector search with full-text search
- No real-time search — data must manually be re-indexed or re-embedded
- Limited support for complex queries like faceting or relevance tuning
By now, we’ve witnessed dozens of engineering teams who have begrudgingly stitched Elasticsearch on top of Postgres, only to ditch it later because it was too bloated, expensive, or convoluted. We asked ourselves: what if Postgres itself was built for Elastic-quality search? What if developers didn’t need to choose between one unified Postgres database with limited search capabilities or two separate services, one as the source of truth and one as the search engine?
Who ParadeDB is For
Elasticsearch has many use cases, and we aren’t trying to tackle all of them — at least not yet. Instead, we’re focused on nailing a core set of use cases for Postgres users that want to search over their database. ParadeDB is a good fit for you if:
- You want a single, Postgres-based source of truth and hate duplicating data across multiple services
- You want to perform full-text search over large volumes of documents stored in Postgres without compromising on performance or scalability
- You want to combine ANN/similarity search with full text search for improved semantic matching
ParadeDB is a fully managed Postgres database with capabilities to index and search Postgres tables not found in any other Postgres provider:
|BM25 full text search
|Full text search with support for boolean, fuzzy, boosted, and keyword queries. Results are scored using the BM25 algorithm.
|Postgres columns can be defined as facets for easy bucketing and collection of metrics.
|Results can be given a score that combines semantic relevance (using vector search) and full text relevance (using BM25).
|Tables can be sharded for fast, parallelized queries.
|Postgres columns can be fed into large language models (LLMs) for automatic summarization, classification, or text generation.
|Text indexes and vector columns are automatically kept in sync with the underlying data.
Unliked managed services like AWS RDS, ParadeDB requires zero setup and supports the entire Postgres extension ecosystem, making it a fully customizable database. For developers that require a self-managed database, ParadeDB is open source and provides a straightforward Docker Compose stack.
How ParadeDB Is Built
The core of ParadeDB is a vanilla Postgres database with custom extensions, written in Rust, that introduce enhanced search capabilities.
ParadeDB’s search engine is built on Tantivy, an open-source, Rust-based search library heavily inspired by Apache Lucene. Search indexes are stored in Postgres as native Postgres indexes, which obviates the need to pipe data out of Postgres and duplicate it in foreign services and guarantees transaction safety.
ParadeDB introduces a new extension to the Postgres ecosystem:
Rust-based full text search in Postgres using the BM25 scoring algorithm. This extension comes pre-installed
The managed cloud version of ParadeDB is currently in private beta. Our goal is to publicly launch a self-serve cloud in early 2024. If you would like access to the private beta in the meantime, we invite you to join our waitlist.
The focus of the core team is on developing the open-source version of ParadeDB, which we will be launching in winter 2023.
We’re building in public and are excited to share ParadeDB with the community. Stay tuned for updates — in future blog posts, we’ll be covering some of the interesting technical challenges behind ParadeDB.
Was this page helpful?