Concept of Compound Index in MongoDB for interview

Compound index means when we are creating an index on more than one keys. The best way to check how compound index affect the searching and sorting time is directly test it in the Mongo shell. Let’s create one collection of student with keys ‘student_id’,’student_name’ and ‘age’:-

for (i=0;i<=1000000;i++) 
{ db.student.inserOne( 
   {"student_id" : "id"+i, 
     "student_name" : "name"+i, 
     "age" : Math.floor(Math.random()*120)});
}

Now we will try to sort this output and will check what will happen in the execution stat. We will sort it by (‘student_id’, ‘age’):-

>> db.student.find().sort({"student_id" : 1, "age" : 1});
Error: error: {
        "ok" : 0,
        "errmsg" : "Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
        "code" : 96,
        "codeName" : "OperationFailed"
}

Add expected, here sorting is happening in memory before showing the result, and it has thrown the error because it is a huge collection and the result exceeds 32 MB of size.

If we check the explain plan of this above execution, then we will see the below result:-

>> db.student.find().sort({"student_id" : 1, "age" : 1}).explain("executionStats");



{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "newdb.student",
                "indexFilterSet" : false,
                "parsedQuery" : {

                },
                "winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "student_id" : 1,
                                "age" : 1
                        },
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "direction" : "forward"
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : false,
                "errorMessage" : "Exec error resulting in state FAILURE :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
                "errorCode" : 96,
                "nReturned" : 0,
                "executionTimeMillis" : 485,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 348213,
                "executionStages" : {
                        "stage" : "SORT",
                        "nReturned" : 0,
                        "executionTimeMillisEstimate" : 6,
                        "works" : 348216,
                        "advanced" : 0,
                        "needTime" : 348215,
                        "needYield" : 0,
                        "saveState" : 2720,
                        "restoreState" : 2720,
                        "isEOF" : 0,
                        "sortPattern" : {
                                "student_id" : 1,
                                "age" : 1
                        },
                        "memUsage" : 33554441,
                        "memLimit" : 33554432,
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "nReturned" : 348213,
                                "executionTimeMillisEstimate" : 6,
                                "works" : 348215,
                                "advanced" : 348213,
                                "needTime" : 2,
                                "needYield" : 0,
                                "saveState" : 2720,
                                "restoreState" : 2720,
                                "isEOF" : 0,
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "nReturned" : 348213,
                                        "executionTimeMillisEstimate" : 0,
                                        "works" : 348214,
                                        "advanced" : 348213,
                                        "needTime" : 1,
                                        "needYield" : 0,
                                        "saveState" : 2720,
                                        "restoreState" : 2720,
                                        "isEOF" : 0,
                                        "direction" : "forward",
                                        "docsExamined" : 348213
                                }
                        }
                }
        },
        "serverInfo" : {
                "host" : "DESKTOP-NH8I41E",
                "port" : 27017,
                "version" : "4.2.5",
                "gitVersion" : "2261279b51ea13df08ae708ff278f0679c59dc32"
        },
        "ok" : 1
}

Here we will check the “winningPlan” section of this above code. where the stage is “COLSCAN”, which is really a bad for long table scan.

To overcome this error at first we will create compound index in the same order in which we are trying to sort the result:-

db.student.createIndex({"student_id":1,"age":1});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

Now we will run the sorting query and will check the result of the execution stat:-

db.student.find().sort({"student_id" : 1, "age" : 1}).explain("executionStats");
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "newdb.student",
                "indexFilterSet" : false,
                "parsedQuery" : {

                },
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "student_id" : 1,
                                        "age" : 1
                                },
                                "indexName" : "student_id_1_age_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "student_id" : [ ],
                                        "age" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "student_id" : [
                                                "[MinKey, MaxKey]"
                                        ],
                                        "age" : [
                                                "[MinKey, MaxKey]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1000001,
                "executionTimeMillis" : 1153,
                "totalKeysExamined" : 1000001,
                "totalDocsExamined" : 1000001,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 1000001,
                        "executionTimeMillisEstimate" : 13,
                        "works" : 1000002,
                        "advanced" : 1000001,
                        "needTime" : 0,
                        "needYield" : 0,
                        "saveState" : 7812,
                        "restoreState" : 7812,
                        "isEOF" : 1,
                        "docsExamined" : 1000001,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 1000001,
                                "executionTimeMillisEstimate" : 5,
                                "works" : 1000002,
                                "advanced" : 1000001,
                                "needTime" : 0,
                                "needYield" : 0,
                                "saveState" : 7812,
                                "restoreState" : 7812,
                                "isEOF" : 1,
                                "keyPattern" : {
                                        "student_id" : 1,
                                        "age" : 1
                                },
                                "indexName" : "student_id_1_age_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "student_id" : [ ],
                                        "age" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "student_id" : [
                                                "[MinKey, MaxKey]"
                                        ],
                                        "age" : [
                                                "[MinKey, MaxKey]"
                                        ]
                                },
                                "keysExamined" : 1000001,
                                "seeks" : 1,
                                "dupsTested" : 0,
                                "dupsDropped" : 0
                        }
                }
        },
        "serverInfo" : {
                "host" : "DESKTOP-NH8I41E",
                "port" : 27017,
                "version" : "4.2.5",
                "gitVersion" : "2261279b51ea13df08ae708ff278f0679c59dc32"
        },
        "ok" : 1
}

In the above case if you look at the “winningplan” setion , you can find the index is used here and here sorting is not made in memory. So it is sorted successfully. Now here compound index is used in a specific order where “student_id” present in the first position.

db.student.createIndex({"student_id":1,"age":1});

Now, what do you think, if I run the below code, then what will happen?

db.student.find().sort({"age" : 1}) 
or
db.student.find().sort({"age" : -1})
ERROR:---
Error: error: {
        "ok" : 0,
        "errmsg" : "Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
        "code" : 96,
        "codeName" : "OperationFailed"
}

Yes, it will through an error because here the 1st key will be considered for solo sorting, and for others positional keys in memory sorting will take place for compound indexes. The error is same as the 32 MB space limitation and if the size is less than 32 MB then it will show us the output after the in memory sorting but it will show very a poor performance.

The below code will not through an error although sorting is done based on the “age” key, because the output of the query is less than the 32 MB, so here the in memory sorting will not create any problem. because here we are narrowing down the range of age.

db.student.find({"age" : {$gte: 15 , $lte:16}}).sort({"age" : 1})

If we look the execution stat , then we will see the below result:-

db.student.find({"age" : {$gte: 15 , $lte:16}}).sort({"age" : 1}).explain("executionStats")
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "newdb.student",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "age" : {
                                                "$lte" : 16
                                        }
                                },
                                {
                                        "age" : {
                                                "$gte" : 15
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "age" : 1
                        },
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "filter" : {
                                                "$and" : [
                                                        {
                                                                "age" : {
                                                                        "$lte" : 16
                                                                }
                                                        },
                                                        {
                                                                "age" : {
                                                                        "$gte" : 15
                                                                }
                                                        }
                                                ]
                                        },
                                        "direction" : "forward"
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 16607,
                "executionTimeMillis" : 569,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 1000001,
                "executionStages" : {
                        "stage" : "SORT",
                        "nReturned" : 16607,
                        "executionTimeMillisEstimate" : 7,
                        "works" : 1016612,
                        "advanced" : 16607,
                        "needTime" : 1000004,
                        "needYield" : 0,
                        "saveState" : 7942,
                        "restoreState" : 7942,
                        "isEOF" : 1,
                        "sortPattern" : {
                                "age" : 1
                        },
                        "memUsage" : 1607223,
                        "memLimit" : 33554432,
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "nReturned" : 16607,
                                "executionTimeMillisEstimate" : 2,
                                "works" : 1000004,
                                "advanced" : 16607,
                                "needTime" : 983396,
                                "needYield" : 0,
                                "saveState" : 7942,
                                "restoreState" : 7942,
                                "isEOF" : 1,
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "filter" : {
                                                "$and" : [
                                                        {
                                                                "age" : {
                                                                        "$lte" : 16
                                                                }
                                                        },
                                                        {
                                                                "age" : {
                                                                        "$gte" : 15
                                                                }
                                                        }
                                                ]
                                        },
                                        "nReturned" : 16607,
                                        "executionTimeMillisEstimate" : 2,
                                        "works" : 1000003,
                                        "advanced" : 16607,
                                        "needTime" : 983395,
                                        "needYield" : 0,
                                        "saveState" : 7942,
                                        "restoreState" : 7942,
                                        "isEOF" : 1,
                                        "direction" : "forward",
                                        "docsExamined" : 1000001
                                }
                        }
                }
        },
        "serverInfo" : {
                "host" : "DESKTOP-NH8I41E",
                "port" : 27017,
                "version" : "4.2.5",
                "gitVersion" : "2261279b51ea13df08ae708ff278f0679c59dc32"
        },
        "ok" : 1
}

The important part is if we look the winning plan section, then we can see here no index scan is used. So it’s an inefficient query.

So, during creation of the compound index we need to have a clear idea about the orders of the keys and how will you use those keys during searching and sorting operations.

Now , we will create another index to validate our understanding. Now the order will be reversed for the keys and old index also will remain there.

> db.student.createIndex({ "age":1,"student_id":1});
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 2,
        "numIndexesAfter" : 3,
        "ok" : 1
}

Now we will run the same sorting query based on the age of the student and will check the execution stats and the execution time:-

db.student.find({"age" : {$gte: 15 , $lte:16}}).sort({"age" : 1}).explain("executionStats")
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "newdb.student",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "age" : {
                                                "$lte" : 16
                                        }
                                },
                                {
                                        "age" : {
                                                "$gte" : 15
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "age" : 1,
                                        "student_id" : 1
                                },
                                "indexName" : "age_1_student_id_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "age" : [ ],
                                        "student_id" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "age" : [
                                                "[15.0, 16.0]"
                                        ],
                                        "student_id" : [
                                                "[MinKey, MaxKey]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 16607,
                "executionTimeMillis" : 30,
                "totalKeysExamined" : 16607,
                "totalDocsExamined" : 16607,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 16607,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 16608,
                        "advanced" : 16607,
                        "needTime" : 0,
                        "needYield" : 0,
                        "saveState" : 129,
                        "restoreState" : 129,
                        "isEOF" : 1,
                        "docsExamined" : 16607,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 16607,
                                "executionTimeMillisEstimate" : 0,
                                "works" : 16608,
                                "advanced" : 16607,
                                "needTime" : 0,
                                "needYield" : 0,
                                "saveState" : 129,
                                "restoreState" : 129,
                                "isEOF" : 1,
                                "keyPattern" : {
                                        "age" : 1,
                                        "student_id" : 1
                                },
                                "indexName" : "age_1_student_id_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "age" : [ ],
                                        "student_id" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "age" : [
                                                "[15.0, 16.0]"
                                        ],
                                        "student_id" : [
                                                "[MinKey, MaxKey]"
                                        ]
                                },
                                "keysExamined" : 16607,
                                "seeks" : 1,
                                "dupsTested" : 0,
                                "dupsDropped" : 0
                        }
                }
        },
        "serverInfo" : {
                "host" : "DESKTOP-NH8I41E",
                "port" : 27017,
                "version" : "4.2.5",
                "gitVersion" : "2261279b51ea13df08ae708ff278f0679c59dc32"
        },
        "ok" : 1
}

Now compare the execution stats between the two execution stats. The above stat, is using the index under the winning plan and also the totaldocsexamined is very less, which is equal to the numbers of rows returned in the above case. and the execution time is 30ms only compare to 569 ms in the above case. So always use the index wisely and thus you can save a lot of money.