5 min read

DocumentDB: Open-Source Announcement

We are excited to announce the official release of DocumentDB—an open-source document database platform and the engine powering the vCore-based Azure Cosmos DB for MongoDB, built on PostgreSQL.

NoSQL databases have historically provided cloud-specific solutions without a common standard for interoperability. This has led to a growing demand for an interoperable, portable, and fully supported production-ready local instance of a document data store. We also felt that it would be great to have a standard for NoSQL databases to provide more flexibility in both choosing and switching between NoSQL databases. Moreover, the last decade has seen an explosion in the popularity of PostgreSQL within the developer community. Thus, to meet the community’s NoSQL database needs and the universal adoption of PostgreSQL, we launched DocumentDB—a fully permissive, open-source platform for document data stores built on the powerful PostgreSQL engine.

Our mission 

Visibility 

The mission for DocumentDB is to provide the developer community with a NoSQL datastore, implemented using PostgreSQL with complete visibility into the architecture and implementation of the engine. All the core components of the database engine from CRUD (Create, Read, Update, Delete) operations to indexing and vector search functionality are public. Moreover, PostgreSQL has seen a meteoric rise in popularity with its continuously evolving feature set and rich ecosystem of extensions. We decided to launch DocumentDB—a fully open-source platform powered by PostgreSQL on which an end-to-end document database experience can be built, to meet the community’s NoSQL database needs.

Licensing 

To uphold the true spirit of open source, the project uses the most permissive MIT license, where developers have no restrictions on incorporating the project into new and existing solutions of their own. There are no commercial licensing fees, no usage or distribution restrictions, and no gimmicks. While contributions to the project are always welcome and encouraged, there are no requirements for users to commit their customizations, contributions, and enhancements back to the project. The MIT license guarantees complete freedom to fork the repository, use, and distribute with no obligations.

Open-Source standard 

DocumentDB is the first implementation of the project’s more ambitious mission to create a standard for open-source document databases, much like the ANSI (American National Standards Institute) SQL standard for relational databases. The creation of a NoSQL standard will heighten the compatibility and interoperability of NoSQL engines in the future. The motivation behind the standard is to minimize differences in public-facing APIs (Application Programming Interfaces) and engine fundamentals between NoSQL database implementations. Overall, this will facilitate an improved developer experience when onboarding document databases and more importantly—when switching from one database to another.

Our architecture 

The project is comprised of two primary components, which work together to support document operations: 

  • pg_documentdb_core — A custom PostgreSQL extension optimizing for BSON (Binary JavaScript Object Notation) datatype support in Postgres. 
  • pg_documentdb_api — The data plane implementing CRUD operations, query functionality, and index management.

For contributors 

At the bottom of the stack is the pg_documentdb_core layer—a fully homegrown and customized Postgres extension to optimize support for the BSON data type. This extension provides the following capabilities: 

  • The ability to parse and manipulate BSON documents in the Postgres layer of the database engine, at all levels of nesting within the BSON document. 
  • The ability to index fields in the BSON document—including single field indexes, multi-key indexes, compound indexes to optimize query filtering criteria on multiple fields, text indexes as well as geospatial indexes leveraging the capabilities of the PostGIS extension. 
  • The ability to perform vector search queries powered by the pg_vector Postgres extension. Common use cases include: 
    • Generative AI applications, chatbots, AI agents 
    • Fraud and anomaly detection use cases in financial services 
    • Similarity search for product recommendation systems in retail applications
    • Natural language processing
    • Content filtering 
    • RAG (Retrieval-Augmented Generation) patterns for contextually relevant search responses 
  • A fully functional authentication mechanism including SCRAM (Salted Challenge Response Authentication Mechanism) authentication. 

These features at the bottom of the stack will serve as the foundation for building an end-to-end NoSQL database user experience. A protocol translation layer can be built at the top of this stack to convert the inbound NoSQL database protocol of choice into the underlying Postgres protocol. 

For users 

Users looking for a ready-to-use NoSQL database can leverage an existing solution in FerretDB—powered by DocumentDB as the backing engine. FerretDB is a popular open-source document datastore with its most recent release (FerretDB 2.0), anchored by DocumentDB. While users can interact with DocumentDB through Postgres, FerretDB 2.0 provides an interface with a document database protocol. FerretDB also has a similarly permissive Apache license along with a significant presence in both the Postgres and NoSQL communities. 

A screenshot of a computer

Engaging with the creators of the project 

We want to encourage direct engagement between the project’s creators and the open-source community as we embark on our mission to make Postgres the most flexible NoSQL database platform and join us in creating the first implementation standard for document databases. Check us out on GitHub for the most up-to-date information on the roadmap and vision of the project. Join our Discord channel, contribute to our broader vision, and share feedback on design discussions. If you are a document database user and/or familiar with the PostgreSQL ecosystem, we would love to hear from you.

Getting started 

Install Docker 

Follow the steps here to install Docker:

Clone the DocumentDB repository 

git clone https://github.com/microsoft/documentdb.git

Create the Docker image and navigate to the cloned repository 

docker build . -f .devcontainer/Dockerfile -t documentdb

Run the Docker Image as a container 

docker run -v $(pwd):/home/documentdb/code -it documentdb /bin/bash 

Build and deploy the binaries 

cd code 

make 

sudo make install 

Initialize the DocumentDB server and manage dependencies 

./scripts/start_oss_server.sh -t documentdb 

Connect to the psql shell 

psql -p 9712 -h localhost -d postgres 

After following the steps above, you are now all set to use DocumentDB locally. 

A few simple CRUD samples after installation 

To create a ‘patient’ collection in the ‘documentdb’ database:

SELECT documentdb_api.create_collection('documentdb','patient'); 

To insert a document into the patient collection 

select from documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P001", "name": "Alice Smith", "age": 30, "phone_number": "555-0123", "registration_year": "2022","conditions": ["Diabetes", "Hypertension"]}'); 

To select a document: 

select 1 from documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P002", "name": "Bob Johnson", "age": 45, "phone_number": "555-0456", "registration_year": "2022", "conditions": ["Asthma"]}'); 

select 1 from documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P003", "name": "Charlie Brown", "age": 29, "phone_number": "555-0789", "registration_year": "2023", "conditions": ["Allergy", "Anemia"]}'); 

Join our community

Ready to shape the future of document databases? Clone our repository on GitHub, connect with us on Discord, and start experimenting with DocumentDB now.

How to Improve Your Networking Skills

Microsoft Open Source

Open Source enables Microsoft products and services to bring choice, technology, and community to our customers