Notes On MongoDB

Table of Contents

References

Quick Start

Synopsis

$ mkdir -p /data/db # Default data dir
$ mongod

$ mongo
  MongoDB shell version v5.0.6
  connecting to: mongodb://127.0.0.1:27017/
  MongoDB server version: 5.0.6
  ================
  Warning: the "mongo" shell has been superseded by "mongosh",

> db
  test

>  show databases
   admin   0.000GB
   config  0.000GB
   dev     0.462GB
   local   0.000GB

> use local

> show tables      # Tables and Collections are synonymous
> show collections
  startup_log

> use newdb       # will create database. No separate command to create db
> show databases  # will not show empty databases.

> db.posts.insert({
     title: 'Post One',
     body: 'Body of post one',
     category: 'News',
     tags: ['news', 'events'],
     user: {
       name: 'John Doe',
       status: 'author'
     },
     date: Date()
   })

 > db.posts.insertMany([
     {
       title: 'Post Two',
       body: 'Body of post two',
       category: 'Technology',
       date: Date()
     },
     {
       title: 'Post Three',
       body: 'Body of post three',
       category: 'News',
       date: Date()
     },
     {
       title: 'Post Four',
       body: 'Body of post three',
       category: 'Entertainment',
       date: Date()
     }
   ])

# Collections are created on-the-fly.
> db.myNewCollection.insert( { "name" : "Thava", "msg" : "Hello!" })
WriteResult({ "nInserted" : 1 })

# You can create capped collection using createCollection method:
> db.createCollection("mySecondCollection", {
    capped : true,   // supports max document count
    size : 2,        // Max collection size is 2MB
    max : 2})        // Max count is 2 documents (max count to show)

  # Note: If you insert more documents than max count, then only the latest docs will show up,
          but old ones will be retained internally.


> db.myNewCollection.insertMany( [ { json1 }, {json2} ... ])
# List documents ...
> db.myNewCollection.insert( { "name" : "Thava2", "msg" : "Hello2!" })
> db.myNewCollection.find()

   { "_id" : ObjectId("6229c5d2f7b6acf25e81c6aa"), "name" : "Thava", "msg" : "Hello!" }
   { "_id" : ObjectId("6229c922f7b6acf25e81c6ab"), "name" : "Thava2", "msg" : "Hello2!" }

> db.myNewCollection.find().pretty()   # Pretty print.

# Filter condition with operators such as $eq, $ne, $gt, $lt, $lte, $gte has this syntax...
> db.myNewCollection.find( {  name: { $eq : 'Thava' } })

> db.myNewCollection.distinct( 'name' )
  [ "Thava", "Thava2" ]
> db.myNewCollection.find({name: /ava/} )
    ... both 2 documents displayed ...
> db.myNewCollection.distinct( 'name' ).count()
  2

# To update by specifying entire new set of attributes, do this:
> db.myNewCollection.update({name : 'Thava'}, {newName: 'New Me', planet: 'earth'});
# Note: The existing attribute name is gone after the update!!!

# To update specific field(s), use following form:
> db.myNewCollection.update({name : 'Thava'}, {$set: {age: 20}})

# Update does nothing if there is no match. However if you speciy {upsert:1},
# then it inserts on update even if there is no match!
# Following creates a new record with { name: 'Thava', age: 30 } !!!
> db.myNewCollection.update({name : 'Thava'}, {$set: {age: 20}}, {upsert:1})

   WriteResult({
       "nMatched" : 0,
       "nUpserted" : 1,
       "nModified" : 0,
       "_id" : ObjectId("622e0cce6abf50ff9f577040")
   })

> db.myNewCollection.find()
   { "_id" : ObjectId("622a436dc78724a63a99fe95"), "name" : "Thava2", "msg" : "Second Hello!" }
   { "_id" : ObjectId("622a4ae8c78724a63a99fe96"), "name" : "Thava5", "msg" : "Hello5!" }
   { "_id" : ObjectId("622e0cce6abf50ff9f577040"), "name" : "Thava", "age" : 20 }

# Following creates new record { age: 20 } only!
> db.myNewCollection.update({name : /Nothing/}, {$set: {age: 20}}, {upsert:1})

# Delete a field from document ...
# Note that 1 in {age:1} is a dummy value which is ignored.
> db.myNewCollection.update({name : 'Thava'}, {$unset: {age:1}})

# If you want multi-delete to occur ...
> db.myNewCollection.update({name : 'Thava'},       // Filter condition
                            {$unset: {age:1}},      // Which field(s) to unset
                            {multi: true});         // Can we delete more than 1 record?

> db.myNewCollection.insert( { "name" : "Thava2", "msg" : "Duplicate Hello22!", age: 22 })
> db.myNewCollection.insert( { "name" : "Thava2", "msg" : "Duplicate Hello25!", age: 25 })
> db.myNewCollection.aggregate([{$group: {_id: "$name", total_by_name: {$sum:1} }}])

{ "_id" : "Thava2", "total_by_name" : 3 }
{ "_id" : "Thava5", "total_by_name" : 1 }

> db.myNewCollection.aggregate([{$group: {_id: "$name", ages: {$push: "$age"}} }])
{ "_id" : "Thava5", "ages" : [ ] }
{ "_id" : "Thava2", "ages" : [ 22, 25 ] }

> db.myNewCollection.aggregate([{$group: {_id: "$name", min_age: {$min: "$age"}} }])
{ "_id" : "Thava5", "min_age" : null }
{ "_id" : "Thava2", "min_age" : 22 }

# Other Aggregate operations: $first (pickup first value), $last, $avg, $max, $addToSet, etc.
# Pipeline operations: $project, $match, $group, $sort, $skip, $limit, $unwind
# $unwind operation is reverse-group-by operation. Takes array field, creates multiple rows (documents)
#

# Text Search!
# A collection can only have one text search index, but that index can cover multiple fields.
# Atlas Search supports multiple text indexes per collection.
db.posts.find({
   $text: {
     $search: "\"Post O\""
     }
 })

# Remove document ...
> db.myNewCollection.remove({name: "Thava"});
> db.myNewCollection.find({});

# Drop all of collection and documents and indexes.
> db.myNewCollection.drop();

# To drop database itself ...
> db.dropDatabase()

MongoDB Cheatsheet

Show All Databases

::
> show dbs

Show Current Database

::
> db

Create Or Switch Database

::
use acme

Drop

::
db.dropDatabase()

Create Collection

db.createCollection('posts')

Show Collections

show collections

Insert Row

db.posts.insert({
  title: 'Post One',
  body: 'Body of post one',
  category: 'News',
  tags: ['news', 'events'],
  user: {
    name: 'John Doe',
    status: 'author'
  },
  date: Date()
})

Insert Multiple Rows

db.posts.insertMany([
  {
    title: 'Post Two',
    body: 'Body of post two',
    category: 'Technology',
    date: Date()
  },
  {
    title: 'Post Three',
    body: 'Body of post three',
    category: 'News',
    date: Date()
  },
  {
    title: 'Post Four',
    body: 'Body of post three',
    category: 'Entertainment',
    date: Date()
  }
])

Get All Rows

db.posts.find() db.posts.find().pretty()

Find Rows

db.posts.find({ category: 'News' })

Sort Rows

# asc
db.posts.find().sort({ title: 1 }).pretty()
# desc
db.posts.find().sort({ title: -1 }).pretty()

Count Rows

db.posts.find().count() db.posts.find({ category: 'news' }).count() db.posts.find({ category: 'News' }).count() // Filter is case sensitive db.posts.find({ category: /^New/ }).count() // This regex will reuse any existing index on category.

Limit Rows

db.posts.find().limit(2).pretty()

Chaining

db.posts.find().limit(2).sort({ title: 1 }).pretty()

Foreach

db.posts.find().forEach(function(doc) {
  print("Blog Post: " + doc.title)
})

Find One Row

db.posts.findOne({ category: 'News' })

Find Specific Fields i.e. Project Fields

db.posts.find({ title: 'Post One' }, {
  title: 1,
  author: 1
})

Update Row

db.posts.update({ title: 'Post Two' },
{
  title: 'Post Two',
  body: 'New body for post 2',
  date: Date()
},
{
  upsert: true,      // Insert if not found
})

Update Specific Field

db.posts.update({ title: 'Post Two' },
{
  $set: {
    body: 'Body for post 2',
    category: 'Technology'
  }
})

Increment Field ($inc)

db.posts.update({ title: 'Post Two' },
{
  $inc: {
    likes: 5
  }
})

Rename Field

::
db.posts.update({ title: 'Post Two' }, { $rename: { likes: 'views' } })

Delete Row

db.posts.remove({ title: 'Post Four' })

Sub-Documents

db.posts.update({ title: 'Post One' },
{
  $set: {
    comments: [
      {
        body: 'Comment One',
        user: 'Mary Williams',
        date: Date()
      },
      {
        body: 'Comment Two',
        user: 'Harry White',
        date: Date()
      }
    ]
  }
})

Find By Element in Array ($elemMatch)

::
db.posts.find({
comments: {
$elemMatch: {
user: 'Mary Williams' }

}

}

)

Add Index

# Create an ascending String index on String value
db.posts.createIndex({ title: 1})

# Create a text index ...
db.posts.createIndex({ title: 'text' }, {name: "title_idx", default_language: "english"})

# Create a Compound text index on title and body...
db.posts.createIndex({ title: 'text', body : 'text' }, {sparse: true})

# The second arg options are:
    unique: true,
    sparse: true,
    expireAfterSeconds: 3600
    hidden: true,   // If true, it is hidden from query planner

When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.

A text index on the other hand will tokenize and stem the content of the field.

db.posts.find({
  $text: {
    $search: "body post", // Will search compound index for either word match.
  }
}

# Full text indexes are by default case and diatrics insensitive.
db.posts.find({
  $text: {
    $search: "\"Post O\""     // String quote inside string means exact Phrase search.
    // $caseSenstive: true   // Will make it case sensitive text search. By default it is false.
    // diacriticSensitive: true // Default is false
  }
})

// Find either word body or post by should not contain "News" !
db.posts.find( { $text: { $search: "body post -News" } } )

// Sort results by search score

db.posts.find(
   { $text: { $search: "body post " } },     // Search either one word
   { score: { $meta: "textScore" } }         // Project score field. id is auto included.
).sort( { score: { $meta: "textScore" } } )  // {score:-1} means descending.
                                             // Here score to be treated as special meta type.

Greater & Less Than

::
db.posts.find({ views: { $gt: 2 } }) db.posts.find({ views: { $gte: 7 } }) db.posts.find({ views: { $lt: 7 } }) db.posts.find({ views: { $lte: 7 } })

Case Insensitive Index

When you create index specify collation with strength 1 or 2. (3 is default for case sensitive). The regex search is slow and not collation aware, so can not do case insensitive search.

::
db.cities.createIndex(
{ city: 1 }, { collation: { locale: 'en', strength: 2 // 1 - Compare base chars; 2 - compare diatrics; 3 - Case sensitive (default) } }

);

db.cities.find({city: "New York"})

An inefficient way (may not even work all times) for case insensitive search is:

db.cities.find({ city: /^new york$/i })

The above will do full scan and can not use index.

MongoDB Advantages

  • Document Oriented Storage − Data is stored in the form of JSON style documents.
  • Index on any attribute // How ?
  • Replication and high availability
  • Auto-sharding // Can we control ?
  • Rich queries
  • Fast in-place updates // How ?
  • Professional support by MongoDB

Best Fit for ... * Big Data * User Data Management * Mobile and Social Infrastructure

Question: How to use index and Join ?

Best Practice Guidelines

  • Prefer Embedding by default. If it is unbounded data, use DBRef instead.
  • If it is absolutely needed to access other, do not embed it.
  • In one-many relationship, to implement dbref, include array of child obj_ids in parent object.
  • In one-TooMany "containing" relationship, choose the child to have pointer to parent. Note: No need to keep a large array of references to child in parent. Example: Log file entries in a host.
  • Any Arrays should not grow unbounded. Need reverse reference in child.
  • In Many-Many relationship, parent has array of child reference ids and child also has array of parent reference ids. Note: There is no need to have a separate collection to track this.
  • You can have outlier patterns -- some kind of special handling only for some records. For example, Follow one-many design pattern by default and when the total number of children exceeds some threshold, have a flag like { is_too_many_children:1 } and only for those records do not include the entire array of references in parent record.
  • Explain your cursors. Profile slow queries.
  • Denormalise data as needed.
  • Do Joins while Write, not on read
  • Optimize your schema for most frequent use cases.
  • Do complex aggregation in the schema ???

What is MongoDB Cluster ?

MongoDB can run as single instance, but it is typically run as cluster.

In MongoDB, clusters can refer to two different architectures: * They can either mean a replica set * or a sharded cluster

ReplicaSet involves single primary node (used for writes) and couple of secondary nodes (for reads).

Sharded Cluster involves mongos process as the router which uses external configuration servers to understand the cluster and routes the query to different replica-sets (running set of mongod processes).

Replication

Usual stuff ...

  • Replica set (typically 3 nodes); One is primary others are secondaries.
  • Auto re-election during network partitioning
  • Automatic failover and recovery
  • Replication on write is asynchronous ???
  • Note that the primary role is fixed for that server and single mongod process does not act as primary for some data (collection) and secondary for other data. Such replication sets exist in other architectures where writes are balanced across replication set. However for mongodb, the replication set means one fixed primary.
  • Replication is mainly for data durability and not for performance (though useful to speed up reads through secondaries). Sharding is mainly for scaling and performance.

Synopsis:

$ mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0 # Invoke Primary Node
$ mongo
 > rs.initiate() 
 > rs.add("mongod1.net:27017");    # Add another secondary member
 > db.isMaster()                   # Is this DB instance master or secondary ?

Sharding

Sharding is another higher level artifact on top of replication. In cluster, there are 2 or more shards which each shard is a replica set.

Other usual stuff ...

  • The entry point to mongod server is called App Server or mongodb Router. The client request comes to this router mongod instance.
  • The common 3 config servers provides meta data information for the router so that it can target the right secondary nodes.
  • Each shard is a replica set. (passive data instances)
  • Sharding is about fragmenting the data. Replication is to add redundancy.
Application ----- mongos (sharding router) ---1:N--- mongod (single server or replica set)

ReplicaSet has primary master and read-only slaves.

       Replica Master (mongod) ===1:3====== ReadOnly Slave (mongod)
  • If you want to scale writes, you must shard.
  • Using only replica-set, you can only achieve better read throughput if you read from slaves.
  • What is config server ... etc

Config Servers

  • Config servers store the metadata for a sharded cluster.
  • Start a mongos using either a configuration file or a command line parameter to specify the config servers. mongos query router needs config info - It either comes from static config file or even better comes from config servers. Typically you need 3 config servers to handle network partitioning properly.
  • The config servers also store Authentication configuration information such as Role-Based Access Control or internal authentication settings for the cluster. MongoDB also uses the config servers to manage distributed locks. Each sharded cluster must have its own config servers.

Indexing

Using Javascript code :

const { MongoClient } = require("mongodb");

// "mongodb+srv://sample-hostname:27017/?maxPoolSize=20&w=majority";
const uri = "mongdodb+srv://localhost:27017";

// Create a new MongoClient
const client = new MongoClient(uri);

async function run() {
  try {
    // Connect the client to the server
    await client.connect();
    // Establish and verify connection
    await client.db("admin").command({ ping: 1 });
    console.log("Connected successfully to server");
  } finally {
    // Ensures that the client will close when you finish/error
    await client.close();
  }
}

run().catch(console.dir);

const database = client.db("sample_mflix");
const movies = database.collection("movies");
// Create an ascending index on the "type" and "genre" fields
// in the "movies" collection.

// *** The order of keys is important!!! Javascript honors creation order now a days! ***
const result = await movies.createIndex({ type: 1});  // Simple Index type "Ascending"
const result = await movies.createIndex({ type: 1, genre: 1 }); // Compound String Index
console.log(`Index created: ${result}`);

// To create text index ...
const result = await movies.createIndex({ type: "text"});  // Simple Text Index
// Wildcard text index will include all string fields in the index ...
const result = await movies.createIndex({ "$**": "text"});  // Wildcard Text Index
// Compound Index involving Ascending Key and Text Index Key ...
const result = await movies.createIndex({genre:1, "$**": "text"});  // Wildcard Text Index

// The following is an example of a query that would be covered by the index created above.

const query = { type: "movie", genre: "Drama" };
const sort = { type: 1, genre: 1 };
const projection = { type: 1, genre: 1 };
const cursor = movies
  .find(query)
  .sort(sort)
  .project(projection);

Creating Text indexes:

onst database = client.db("sample_mflix");
const movies = database.collection("movies");
// Create a text index on the "fullplot" field in the
// "movies" collection.
const result = await movies.createIndex({ fullplot: "text" }, { default_language: "english" });
console.log(`Index created: ${result}`);

The following is an example of a query that would be covered by the index created above. Note that the sort is omitted because text indexes do not contain sort order:

const query = { $text: { $search: "java coffee shop" } };
const projection = { fullplot: 1 };
const cursor = movies
  .find(query)
  .project(projection);

Note: The normal field names can not start with $, to avoid conflict with operations like $text, etc.

Using mongo client:

# Even ids are not auto-indexed by default??? Issue following command.
> db.myNewCollection.createIndex({KEY:1});
> 
> db.myNewCollection.createIndex({"name":1});  # Create asc index on top level "name" field.
> db.myNewCollection.createIndex({"name":1, "age": -1});  # name asc, age desc index.
> 

You can create index on Array fields and sub-document fields as well:

> db.users.createIndex({"tags":1}, {name: "tags_idx"})    // tags is an array field.

> db.users.find({tags:"cricket"})
> db.users.find({tags:"cricket"}).explain()

// Indexing Sub-Document Fields
> db.users.createIndex({"address.city":1,"address.state":1,"address.pincode":1})
> db.users.find({"address.city":"Los Angeles"})
> db.users.find({"address.city":"Los Angeles","address.state":"California"})
> db.users.find({"address.city":"LosAngeles","address.state":"California",
   "address.pincode":"123"})

> db.users.getIndexes()
> db.users.dropIndex()

Compound and multi-key Indexes

Index Limitations

Indexing can't be used in queries which use:

  • Regular expressions or negation operators like $nin, $not, etc.
  • Arithmetic operators like $mod, etc.
  • $where clause
  • A collection cannot have more than 64 indexes.
  • The length of the index name cannot be longer than 125 characters.
  • A compound index can have maximum 31 fields indexed.

ObjectId

An ObjectId is a 12-byte BSON type having the following structure:

  • The first 4 bytes representing the seconds since the unix epoch
  • The next 3 bytes are the machine identifier
  • The next 2 bytes consists of process id
  • The last 3 bytes are a random counter value

Note: BSON is an extension of JSON with additional (binary) data types.

To generate a new ObjectId use the following code:

// Generate new object Id
> newObjectId = ObjectId()
   ObjectId("5349b4ddd2781d08c09890f3")

// You can also provide a 12-byte id −
> myObjectId = ObjectId("5349b4ddd2781d08c09890f4")

> ObjectId("5349b4ddd2781d08c09890f4").getTimestamp()
   ISODate("2014-04-12T21:49:17Z")
> newObjectId.str
   5349b4ddd2781d08c09890f3

Schema Validation

By default collections can have heterogenous documents. Each document can have different fields and/or same field having different data types. From version 3.2, you can enforce schema validation. (May benefit from some sort of controled schema rules)

Synopsis

db.createCollection("students", {
  validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "name", "year", "major", "address" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            major: {
               enum: [ "Math", "English", "Computer Science", "History", null ],
               description: "can only be one of the enum values and is required"
            }, ...
         }
      }
   }
})

See Also: * db.createCollection() with validator option * Validation for existing collection: collMod command with validator option.

Transactions

Writing into single document is always atomic. However writing into multiple documents need transactions API to be atomic.

// For a replica set, include the replica set name and a seedlist of the members in the URI string; e.g.
// const uri = 'mongodb://mongodb0.example.com:27017,mongodb1.example.com:27017/?replicaSet=myRepl'
// For a sharded cluster, connect to the mongos instances; e.g.
// const uri = 'mongodb://mongos0.example.com:27017,mongos1.example.com:27017/'

const client = new MongoClient(uri);
await client.connect();

// Prereq: Create collections.

await client
  .db('mydb1')
  .collection('foo')
  .insertOne({ abc: 0 }, { writeConcern: { w: 'majority' } });

await client
  .db('mydb2')
  .collection('bar')
  .insertOne({ xyz: 0 }, { writeConcern: { w: 'majority' } });

// Step 1: Start a Client Session
const session = client.startSession();

// Step 2: Optional. Define options to use for the transaction
const transactionOptions = {
  readPreference: 'primary',
  readConcern: { level: 'local' },
  writeConcern: { w: 'majority' }
};

// Step 3: Use withTransaction to start a transaction, execute the callback, and commit (or abort on error)
// Note: The callback for withTransaction MUST be async and/or return a Promise.
try {
  await session.withTransaction(async () => {
    const coll1 = client.db('mydb1').collection('foo');
    const coll2 = client.db('mydb2').collection('bar');

    // Important:: You must pass the session to the operations

    await coll1.insertOne({ abc: 1 }, { session });
    await coll2.insertOne({ xyz: 999 }, { session });
  }, transactionOptions);
} finally {
  await session.endSession();
  await client.close();
}

Aggregation Pipeline

  • There are aggregation stages and operations aka expression.

  • Aggregations stages examples:

    • $match
    • $group
    • $project
    • $count (It is same as $group aggregation stage with $sum expression)
    • $accumulator
    • $function
  • For unsharded collection, you can use helper methods like distinct directly::
    db.coll.distinct("x")

  • For sharded collection, use the aggregation pipeline with the $group stage instead::
    db.coll.aggregate([
    { $group: { id: null, distinctValues: { $addToSet: "$x" } } }, { $project: { id: 0 } }

    ])

  • The pipeline returns a cursor to a document:: { "distinctValues" : [ 2, 3, 1 ] }

  • The aggregation javascript object is of class AggregationCursor:

    cursor = db.coll.aggregate([]);
    console.dir(cursor)        // Can inspect the object
    typeof(cursor)             // 'object'
    k.constructor           // [class AggregationCursor extends AbstractCursor]
    k.constructor.name    // AggregationCursor
    
  • To get results from aggregate cursor, do one of the following:

    - cursor.forEach( (doc) => console.log(doc), (err) => console.log('Error: ', err ));
    - let results = cursor.toArray()
    - var myCursor = collection.aggregate(pipeline);
      while (myCursor.hasNext()) { print(tojson(myCursor.next()));
    
  • Map reduce is deprecated. Use $accumulator and $function instead. Note that Atlas free cluster does not have security.javascriptEnabled parameter enabled. This is server side configuration and can not be changed through any API calls.

  • Conversion operators: $convert, $toBool, $toDate, $dateFromString, ... Example:

    { $convert:
       {
          input: "$price",
          to: "decimal",
          onError: "Error",
          onNull: Decimal128("0")
       } 
    }
    
  • Write output to new collection: { $out: "<output-collection>" } Also: { $out: { db: "<output-db>", coll: "<output-collection>" } }

Change Streams

const collection = db.collection('inventory');
const changeStream = collection.watch();
changeStream.on('change', next => {
  // process next document
});

// Or use iterator style ...
const changeStreamIterator = collection.watch();
const next = await changeStreamIterator.next();

Map Reduce

> db.collection.mapReduce(
   function() {emit(key,value);},  //map function
   function(key,values) {return reduceFunction}, {   //reduce function
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)

Backup and Restore

> mongodump   # Backup all data of all databases to /bin/dump/ directory.
> mongodump --host <hostname> --port <port> --out <backup_dir> --db <db_name> --collection <collection>

> mongorestore

Monitoring

::
> mongostat > mongotop

Modeling Relationships

  • Could be Embedded or Referenced

Consider one person having multiple addresses:

Person <---1:N---- Address

::
> db.myNewCollection.insert({"name":1}, "age": -1); # name asc, age desc index. > db.myNewCollection.insert( { "name" : "Thava10", "address": [ { line1: "Bharathi Street", pin: 627001 }, { line1: "Bharathi2 Street", pin: 627002 } ] }); > > db.myNewCollection.find({"name": "Thava10"});

# The second arg in find used to select subset of all fields. "name" does not appear in result. > db.myNewCollection.find({"name": "Thava10"}, {"address":1}); { "_id" : ObjectId("622adea7ad8473e2f3e91c9d"), // "name" column missing in output "address" : [ ... ] }

Embedded

Embed array of addresses in Person record like below:

{
 "_id":ObjectId("52ffc33cd85242f436000001"),
 "name": "John Smith",
 "address": [ {
       "line1": "22 A, Indiana Apt",
       "pin": 123456
    }, ...
 ]
} 

Referenced

Either we can use either:

  • Manual references (storing the document id of another document in current document) OR
  • Use MongoDB DBRefs that can store references across different collections.

Manual References

{
   "_id":ObjectId("52ffc33cd85242f436000001"),
   "name": "John Smith",
   "address_ids": [
         ObjectId("..."),
         ObjectId("...")
   ],
 ]
} 

> db.myNewCollection.update({name : 'Thava10'},
                            {$set: {address_ids: [100, 200, 300]}},
                            {multi: true});
> let result = db.myNewCollection.findOne({"name": /Thava10/},{"address_ids":1})

> db.myAddress.insertMany([{_id: 100, line1 : 'Addr100 Line1', pin: 100001},
                           {_id: 200, line1 : 'Addr200 Line1', pin: 200001},
                           {_id: 300, line1 : 'Addr300 Line1', pin: 300001}
                           ]);
  { "acknowledged" : true, "insertedIds" : [ 100, 200, 300 ] }

> let addresses = db.myAddress.find({"_id":{"$in":result["address_ids"]}})
    { "_id" : 100, "line1" : "Addr100 Line1", "pin" : 100001 }
    { "_id" : 200, "line1" : "Addr200 Line1", "pin" : 200001 }
    { "_id" : 300, "line1" : "Addr300 Line1", "pin" : 300001 }

Using DBRefs

We use DBRefs when the references may point to different collections. e.g. address may point to entries in home_address collection and office_address collection, etc. So, the external collection name could not be static but could be a dynamic value.

There are three fields in DBRefs:

* $ref − This field specifies the collection of the referenced document
* $id − This field specifies the _id field of the referenced document
* $db − This optional field points to the database of the referenced document

Consider a sample user document having DBRef field address as shown in the code snippet:

{
   "_id":ObjectId("53402597d852426020000002"),
   "address": {
   "$ref": "address_home",
   "$id": ObjectId("534009e4d852427820000002"),
   "$db": "tutorialspoint"},
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Tom Benzamin"
}

To look up the references:

> let user = db.users.findOne({"name":"Tom Benzamin"})
> let dbRef = user.address
> db[dbRef.$ref].findOne({"_id":(dbRef.$id)})

Note: The $ref, $id, $db is not a special word in mongoDB; It is the convention that is used to implement the DBRef concept.

Covered Queries

Covered queries can lookup data from index only with out even looking into the disk data. * The indexes are in RAM, so this is fast. * Compound Indexes are used for these. * You can exclude the id field in the result by specifying { id:0 } in projection. This may help an index to be a covering index for the query.

Example:

> db.users.createIndex({gender:1,user_name:1})
> db.users.find({gender:"M"},{user_name:1,_id:0})   // Covered query
> db.users.find({gender:"M"},{user_name:1})         // Not a covered query since lookup id needed.

Explain Query

> db.users.createIndex({gender:1,user_name:1})
> db.users.find({gender:"M"},{user_name:1,_id:0}).explain()

{
   "cursor" : "BtreeCursor gender_1_user_name_1",  // For TableScan, it is like "BasicCursor *"

   "isMultiKey" : false,
   "n" : 1,                     // Total docs matched!
   "nscannedObjects" : 0,       // Total docs really scanned.
   "nscanned" : 1,              // Total docs OR index looked up.
   "nscannedObjectsAllPlans" : 0,
   "nscannedAllPlans" : 1, "scanAndOrder" : false,

   "indexOnly" : true,      // <===== Note this.

   "nYields" : 0, "nChunkSkips" : 0, "millis" : 0,
   "indexBounds" : {
      "gender" : [ [ "M", "M" ] ],
      "user_name" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ]
   }
}

Query optimizer Hints

> db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1})
> db.users.find({gender:"M"},{user_name:1,_id:0}).hint({gender:1,user_name:1}).explain()

Also See: https://emptysqua.re/blog/optimizing-mongodb-compound-indexes/ If the selectivity is not atleast 90%

MongoDB Materialized Views

  • https://docs.mongodb.com/manual/core/materialized-views/

  • https://towardsdatascience.com/on-demand-materialized-views-a-scalable-solution-for-graphs-analysis-or-machine-learning-w-d3816af28f1

  • Views:

    db.createView(
    "<viewName>", "<source>", [<pipeline>], { "collation" : { <collation> } }

    )

  • Technically materialized view is a collection which is usually created out of $merge operation from a pipeline.

  • On-Demand Maeterilized View:

    updateMonthlySales = function(startDate) {
    db.bakesales.aggregate( [
       { $match: { date: { $gte: startDate } } },
       { $group: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, sales_quantity: { $sum: "$quantity"}, sales_amount: { $sum: "$amount" } } },
       { $merge: { into: "monthlybakesales", whenMatched: "replace" } }
     ] );
    };
    
    updateMonthlySales(new ISODate("1970-01-01"));
    
    After the initial run, the monthlybakesales contains the following documents;
      i.e. db.monthlybakesales.find().sort( { _id: 1 } ) returns the following:
    
       { "_id" : "2018-12", "sales_quantity" : 41, "sales_amount" : NumberDecimal("506") }
       { "_id" : "2019-01", "sales_quantity" : 86, "sales_amount" : NumberDecimal("896") }
    
       db.bakesales.insertMany( [
          { date: new ISODate("2019-01-28"), item: "Cake - Chocolate", quantity: 3, amount: new NumberDecimal("90") },
          ...
          { date: new ISODate("2019-02-03"), item: "Cake - Red Velvet", quantity: 5, amount: new NumberDecimal("100") }
       ] )
    
       updateMonthlySales(new ISODate("2019-01-01"));
       // Note Jan month aggregation updated (replaced) and Feb added.
    
       { "_id" : "2018-12", "sales_quantity" : 41, "sales_amount" : NumberDecimal("506") }
       { "_id" : "2019-01", "sales_quantity" : 102, "sales_amount" : NumberDecimal("1142") }
       { "_id" : "2019-02", "sales_quantity" : 15, "sales_amount" : NumberDecimal("284") }
    

The $merge stage:

  • Can output to a collection in the same or different database.
  • Creates a new collection if the output collection does not already exist.
  • Can incorporate results (insert new documents, merge documents, replace documents, keep existing documents, fail the operation, process documents with a custom update pipeline) into an existing collection.
  • Can output to a sharded collection. Input collection can also be sharded.

$match vs $expr

In simplest case having single condition you can use:

db.bakesales.aggregate( [ { $match: { date: { $gte: startDate } } } ]
db.inventory.find( { status: "A", qty: { $lt: 30 } } )   // Implicit And Condition

db.inventory.find( {    // Combination of And and OR
  status: "A",
  $or: [ { qty: { $lt: 30 } }, { item: /^p/ } ]
} )

For complex queries you need to use:

db.testexample.find({$expr:{$eq:["$dateFieldYear", "$$dateFieldYear"]}}) // Query operator
db.testexample.aggregate({$match:{$expr:{$eq:["$dateFieldYear", {"$year": "$dateField"}]}})

The $match query syntax is identical to the read operation query syntax; i.e. $match does not accept raw aggregation expressions. To include aggregation expression in $match, use a $expr query expression.

"$<field>" is equivalent to "$$CURRENT.<field>" where the CURRENT is a system variable that defaults to the root of the current object, unless stated otherwise in specific stages. $$CURRENT and $$ROOT is same by default.

An operator could be update operator if it pertains to update operation:

db.scores.updateOne( { _id: 1 }, { $min: { lowScore: 150 } } )

Note that $min is not an aggregate operation but update operation like $set --because it updates the lowScore field with min (lowScore, 150).

Pipeline stage is different from operator expression. E.g. $match is a pipeline stage $and is a query operator. $sum is an aggregation expression. But note that $count is not an aggregate expression but pipeline stage. i.e. { $count: 'myTotal' } at the top level creates new field with total count value.

Examples of Aggregation operator but not a query operator:

$first, $addToSet, $concat, etc.

Examples where same operator exists as Query operator also as Aggregation Operator:

$and (query Operator): db.inventory.find( { $and: [ { price: { $ne: 1.99 } },
                                                { price: { $exists: true } } ] } )
$and (aggregation) : 
       db.inventory.aggregate(
         [
           {
             $project:
                {
                  item: 1,
                  result: { $and: [ { $gt: [ "$qty", 100 ] }, { $lt: [ "$qty", 250 ] } ] }
                }
           }
         ]
      )

Test Data

  • Restore some test data: git@github.com:SouthbankSoftware/dbkoda-data.git
  • Has sakila test database.

BI Connector

BI Connector is used to translate SQL syntax to mongo DB queries. To see how it maps json array and subdocuments and array of subdocuments to new tables, See this link: https://www.mongodb.com/blog/post/bi-connector-mapping-document-structures-to-relational-schema-part-1

Mapping Synopsis

Consider a collection called bands which conatins entries like :

{
   "_id": ObjectId("5bfabde76f280102ddf27969"),
   "band": "Slayer",
   "formation": { "year": ISODate("1982-01-01T00:00:00Z"), "city": "Los Angeles" },
   "popular_albums": [ "Show No Mercy!", "Seasons in the Abyss" ],
   "members": [ { "name": "Tom Araya", "instrument": "Bass/Vocals" }, ... ]
}

mysql> show tables;
 +-----------------------------------------+
 | Tables_in_sql                           |
 +-----------------------------------------+
 | bands                                   |
 | bands_members                           |
 | bands_popular_albums                    |
 +-----------------------------------------+
Note: Both arrays became new tables. Subdocuments are just flattened.

MongoDB Atlas Search vs MongoDB Index

  • MongoDB Atlas -- is the cloud offering of MongoDB

  • Atlas Search is enterprise feature

  • Regular mongoDB supports only one text index at collection level. But Atlas supports multiple.

  • You can search by phrase or words. The words condition is neither "AND" nor "OR". It takes all into account and sorts the output according to relevance.

  • It does auto stemming of the keywords.

  • Note that $search is the pipeline (enterprise atlast feature) and $text is the operator. (MongoDB indexing feature)

  • For partial word match you can use wildcard query :

    db.collection.aggregate([{ $search: {
       wildcard: { 
        query: '*tex*', 
        path: { 'wildcard': '*' },
        allowAnalyzedField: true }} } ]);
    
  • Phrase search involves quoting inside string e.g. '"My String"'

  • When you combine phrase search and word search, you should use compound operator:

    db.collection.aggregate([
     {
         $search: {
             'compound': {
                 'should': [
                     {
                         'phrase': {
                             'path': '<field_path>',
                             'query': '[<first_phrase>, <second_phrase>]'
                         }
                     },
                     {
                         'text': {
                             'path': '<field_path>',
                             'query': '[<first_word>, <second_word>]'
                         }
                     }
                 ]
             }
         }
     }
    ])
    

Atlas Search is MongoDB's full-text search feature built on Apache Lucene. It allows you to perform advanced text search operations on your MongoDB data without needing to use a separate search engine. Key features of Atlas Search:

  • Built directly into MongoDB Atlas (MongoDB's cloud database service)
  • Supports complex text queries, fuzzy matching, and faceted search
  • Provides language-aware text analysis with analyzers for many languages
  • Enables relevance-based scoring and ranking

Compared to MongoDB's traditional text indexes:

Functionality:

  • Atlas Search offers more advanced search capabilities like fuzzy matching, highlighting, autocomplete, and synonyms
  • Traditional text indexes provide basic text search but lack advanced linguistic features

Performance:

  • Atlas Search is generally faster for complex text search queries
  • Traditional text indexes can be sufficient for simple text matching but don't scale as well for complex search patterns

Setup and maintenance:

Atlas Search requires using MongoDB Atlas (cloud) Traditional text indexes can be used in any MongoDB deployment (including self-hosted)

Query syntax:

  • Atlas Search uses a more powerful but complex query syntax with aggregation pipeline stages
  • Text indexes use simpler $text query operators

Language support:

  • Atlas Search has more robust language analysis capabilities
  • Text indexes support multiple languages but with more limited analysis
  • Atlas Search is the better choice when you need sophisticated search functionality like e-commerce product search or content management systems, while traditional text indexes might be sufficient for simpler use cases or when you're not using Atlas.