MongoDB Tit Bits

Jacob Aloysious

Mar 7, 2021 2 min read Tech, MongoDB

Image credit: MongoDB

While I am getting started with MongoDB for my new project. Here are the few quick notes - based on my readings so far:

One Key Takeaway:

Data accessed together should be stored together

WiredTiger:

WiredTiger is the default storage engine for mongo db
It stores documents and indexes on disk
In memory cache stores some doc and frequency used index - working set
- 50% of (RAM -1 GB) Or, 256MB

Massive Arrays:

Max Document size is 16MB
Index performance on arrays decreases as array size increases
Extended Reference pattern: where we duplicate some and not all data

Index:

Each index is atleast 8KB
Index take up storage - One File for each collection and one file for each Index (WiredTiger impl)
Write performance as index needs to updated
Limit each collection to 50 Index max
Do: Add index for frequently supported queries - improves read performance
Don’t: Create unnecessary indexes - reduce performance and takes up space

Bloated Documents:

Do: Data accessed together should be stored together
Don’t: bloat your document with related data that is not accessed to gether
Data that is related to each other should NOT necessarily stored together
Remove bloat from frequently used documents - it can be inmemory wire tiger cache
Data Duplication is OK (depends!)
- Summary document and Details document - ref via links

Case Incentive Query:

$regex queries are case insensitive but not performant
Non-$regex queries are case sensitive
Collation:
- Language specific rules for MongoDB for string comparison
- Strength ranges from 1-5
- Strength 1-2 will give you case insensitive
- Query: {“first_name”: {$regex: /Jacob/i }} - case insensitive
- Query: {“first_name”: “Jacob”} - case sensitive

Accessing - Separate data together:

$lookup:
- Is used to join data from more than one collection
- Great for rarely used queries or analytical queries (batch run overnight)
- Very slow and resource intensive

Upsert:

Upsert is a combination of update and insert. Upsert performs two functions:
- Update data if there is a matching document.
- Insert a new document in case there is no document matches the query criteria.
Personally: I found this very interesting, as for my use cases - I dont need to track if the document already exist.

I understand - the blog looks a bit navive (read, nothing interseting), sry I am just getting started. BTW, I plan to write a follow blog - deep dive on my specific use cases and data modelling.

Reference:

Tech MongoDB