#apache-arrow #mongo-db #read #write #connector #collection #batch

mongodb-arrow-connector

MongoDB connector that reads and writes data to/from Apache Arrow

7 releases (breaking)

0.7.0 Apr 21, 2022
0.6.0 Mar 25, 2022
0.5.0 Dec 29, 2021
0.4.0 Oct 14, 2021
0.1.0 Apr 5, 2020

#2710 in Database interfaces

Apache-2.0

33KB
645 lines

MongoDB Apache Arrow Connector

A Rust library for reading and writing Apache Arrow batches from and to MongoDB.

Licensed under the Apache 2.0 license.

Motivation

We are curently writing this library due to a need to read MongoDB data into dataframes.

Features

  • Read from a collection to batches
  • Write from batches to a collection
  • Infer collection schema
  • Projection predicate push-down
  • Filter predicate push-down
  • Data types
    • Primitive types that MongoDB supports
    • List types
    • Nested structs (bson::Document)
    • Arbitrary binary data

lib.rs:

MongoDB to Apache Arrow Connector

This crate allows reading and writing MongoDB data in the Apache Arrow format. Data is read as RecordBatches from a MongoDB database using the aggregation framework. Apache Arrow RecordBatches are written to MongoDB using an insert_many into a collection.

Dependencies

~28–41MB
~756K SLoC