MongoDB - Hybrid Schema

I recently ran into a situation where in we have a session and each Session produces a lot of messages (sync) and each message has an associated timestamp and order. This is similar to a blog post (session) and its comments (messages).

Since the messages are unbounded (not sure how many would be produced). In my initial design - I had a separate collection for Messages. Each Session would have one corresponding MessageW (wrapper) document and each MesssageW document would have an array of messages (acutal messages).

With all the available data I had - the size of the MessageW document never went above 1MB - so we were all good. But unfortunately, we had a new use case which broke this assumption 😟. The session was running for hours and there was tons of messages getting generated and yes - we hit the 16MB ceiling 😓.

Obvious solution was to create to create a new one doucment for every new message. So, that we never hit the size limit (16MB) of a mongoDB document. But, since we are looking at unbounded message - there could huge (million message docs) number of documents.

So, we choose to go with the Hybrid Approach

In the hybrid approach, we go with our initial option - where we have a MessageW (wrapper) document - which has an array of Messages. Then we add a constrain on the document i.e. the size of the array should not be more than a given arbitary number. For every new message we increment the count. If the number count crossed the constraint - then create a new MessageW document, inc the page number and also update the Session (NumberOfMsgPages). The reader would get the NumberOfMsgPages and read backwards - until he hit page 0. Voila!! 😍

Important Items to note:

  • Session Object : will have a NumberOfMsgPages property
  • MessageW Object: would have a Page and Count property.

Operators:

Since: A code is worth a thousand words. Here is the code 😊

    public void InsertMessage(Message message, ObjectId parentId)
    {
        var builder = Builders<Message>.Filter;

        // Find MessageObject: given parentId and CurrentPageNo
        // And: Constraint: Number of messages is less than 1000. 
        var filter = builder.Eq("parentId", parentId) 
            & builder.Lt("Count", 1000)
            & builder.Eq("Page", CurrentMessagePageCount);

        // Try and Insert into existing Message Document array And Increment the Count.
        var update = Builders<Message>.Update
            .AddToSet(x => x.Messages, message)
            .Inc("Count", 1);

        // If the constraint with "Count" matches.
        // ModifiedCount would be greater than Zero
        var result = MsgCollection.UpdateOne(filter, update);

        if (result.IsAcknowledged)
        {
            if(result.ModifiedCount== 0)
            {
                // Increment the Page count
                CurrentMessagePageCount += 1;

                // Update: NumberOfMsgPages in Session Collection
                var sessionFilter = Builders<MySession>.Filter.Eq("_id", parentId);
                var updateSession = Builders<MySession>.Update
                    .Inc("NumberOfMsgPages", 1);
                SessionsCollection.UpdateOne(sessionFilter, updateSession);

                // Find MessageObject: given parentId and CurrentPageNo-which is obviously not found. 
                // So, we will use "IsUpsert=true" to add a new document.
                // And: Number of messages is less than 20. 
                var filter2 = builder.Eq("parentId", parentId)
                    & builder.Eq("Page", CurrentMessagePageCount);

                // Try: Insert into existing Message And Increment the Count
                var update2 = Builders<Message>.Update
                    .AddToSet(x => x.Messages, message)
                    .Set("Page", CurrentMessagePageCount)
                    .Inc("Count", 1);

                 MsgCollection.UpdateOne(filter2, update2, new UpdateOptions() { IsUpsert = true });
            }
        }
    }

A obvious reaction after reading the code is - Ok, whats the big deal here?

Lets take a code walk through: (think how would you achive this in your traditional SQL DB’s)

  1. Code never check if the MessageW document exits before Update.

    • How did it work: “$IsUpsert=true” is the magic word.
    • $IsUpsert=True: MongoDB would internally insert a new document - if the document doesn’t exists.
  2. Updating counter is happening inside MongoDB:

    • $inc mongoDB operator - increments a specific field - in our case count
    • Hence, Client doesn’t have to read the document to know the actual count and then increment/update.
  3. Constraint Check:

    • $lt MongoDB operator would handle the constraints.
    • Update goes through if the constraint is met.
    • As part of the result, we could check if any doc was modified
    • Cool part - the second half of the code runs only once in 1000 inserts

How did we come up with the arbitary constraint number? well, I dont think we have a thumb rule here. Since we don’t track the actual document size - you would have to look at your data and come up with the sizing.

BTW, I am new to MongoDB and learning. If there are better approaches - feel free to write back to me. Thanks.

Reference: MongoDB Applied Design Patterns, by Rick Copeland

Jacob Aloysious
Jacob Aloysious
Software Enthusiast

35yr old coder, father and spouse - my interests include Software Architecture, CI/CD, TDD, Clean Code.

Related