MongoDB - Hybrid Schema
I recently ran into a situation where in we have a session and each Session produces a lot of messages (sync) and each message has an associated timestamp and order. This is similar to a blog post (session) and its comments (messages).
Since the messages are unbounded (not sure how many would be produced). In my initial design - I had a separate collection for Messages. Each Session would have one corresponding MessageW (wrapper) document and each MesssageW document would have an array of messages (acutal messages).
With all the available data I had - the size of the MessageW document never went above 1MB - so we were all good. But unfortunately, we had a new use case which broke this assumption 😟. The session was running for hours and there was tons of messages getting generated and yes - we hit the 16MB ceiling 😓.
Obvious solution was to create to create a new one doucment for every new message. So, that we never hit the size limit (16MB) of a mongoDB document. But, since we are looking at unbounded message - there could huge (million message docs) number of documents.
So, we choose to go with the Hybrid Approach
In the hybrid approach, we go with our initial option - where we have a MessageW (wrapper) document - which has an array of Messages. Then we add a constrain on the document i.e. the size of the array should not be more than a given arbitary number. For every new message we increment the count. If the number count crossed the constraint - then create a new MessageW document, inc the page number and also update the Session (NumberOfMsgPages). The reader would get the NumberOfMsgPages and read backwards - until he hit page 0. Voila!! 😍
Important Items to note:
- Session Object : will have a NumberOfMsgPages property
- MessageW Object: would have a Page and Count property.
Operators:
Since: A code is worth a thousand words. Here is the code 😊
public void InsertMessage(Message message, ObjectId parentId)
{
var builder = Builders<Message>.Filter;
// Find MessageObject: given parentId and CurrentPageNo
// And: Constraint: Number of messages is less than 1000.
var filter = builder.Eq("parentId", parentId)
& builder.Lt("Count", 1000)
& builder.Eq("Page", CurrentMessagePageCount);
// Try and Insert into existing Message Document array And Increment the Count.
var update = Builders<Message>.Update
.AddToSet(x => x.Messages, message)
.Inc("Count", 1);
// If the constraint with "Count" matches.
// ModifiedCount would be greater than Zero
var result = MsgCollection.UpdateOne(filter, update);
if (result.IsAcknowledged)
{
if(result.ModifiedCount== 0)
{
// Increment the Page count
CurrentMessagePageCount += 1;
// Update: NumberOfMsgPages in Session Collection
var sessionFilter = Builders<MySession>.Filter.Eq("_id", parentId);
var updateSession = Builders<MySession>.Update
.Inc("NumberOfMsgPages", 1);
SessionsCollection.UpdateOne(sessionFilter, updateSession);
// Find MessageObject: given parentId and CurrentPageNo-which is obviously not found.
// So, we will use "IsUpsert=true" to add a new document.
// And: Number of messages is less than 20.
var filter2 = builder.Eq("parentId", parentId)
& builder.Eq("Page", CurrentMessagePageCount);
// Try: Insert into existing Message And Increment the Count
var update2 = Builders<Message>.Update
.AddToSet(x => x.Messages, message)
.Set("Page", CurrentMessagePageCount)
.Inc("Count", 1);
MsgCollection.UpdateOne(filter2, update2, new UpdateOptions() { IsUpsert = true });
}
}
}
A obvious reaction after reading the code is - Ok, whats the big deal here?
Lets take a code walk through: (think how would you achive this in your traditional SQL DB’s)
Code never check if the MessageW document exits before Update.
- How did it work: “$IsUpsert=true” is the magic word.
- $IsUpsert=True: MongoDB would internally insert a new document - if the document doesn’t exists.
Updating counter is happening inside MongoDB:
- $inc mongoDB operator - increments a specific field - in our case count
- Hence, Client doesn’t have to read the document to know the actual count and then increment/update.
Constraint Check:
- $lt MongoDB operator would handle the constraints.
- Update goes through if the constraint is met.
- As part of the result, we could check if any doc was modified
- Cool part - the second half of the code runs only once in 1000 inserts
How did we come up with the arbitary constraint number? well, I dont think we have a thumb rule here. Since we don’t track the actual document size - you would have to look at your data and come up with the sizing.
BTW, I am new to MongoDB and learning. If there are better approaches - feel free to write back to me. Thanks.
Reference: MongoDB Applied Design Patterns, by Rick Copeland