MongoDB Tit Bits

While I am getting started with MongoDB for my new project. Here are the few quick notes - based on my readings so far:
One Key Takeaway:
Data accessed together should be stored together
WiredTiger:
- WiredTiger is the default storage engine for mongo db
- It stores documents and indexes on disk
- In memory cache stores some doc and frequency used index - working set
- 50% of (RAM -1 GB) Or, 256MB
Massive Arrays:
- Max Document size is 16MB
- Index performance on arrays decreases as array size increases
- Extended Reference pattern: where we duplicate some and not all data
Index:
- Each index is atleast 8KB
- Index take up storage - One File for each collection and one file for each Index (WiredTiger impl)
- Write performance as index needs to updated
- Limit each collection to 50 Index max
- Do: Add index for frequently supported queries - improves read performance
- Don’t: Create unnecessary indexes - reduce performance and takes up space
Bloated Documents:
- Do: Data accessed together should be stored together
- Don’t: bloat your document with related data that is not accessed to gether
- Data that is related to each other should NOT necessarily stored together
- Remove bloat from frequently used documents - it can be inmemory wire tiger cache
- Data Duplication is OK (depends!)
- Summary document and Details document - ref via links
Case Incentive Query:
- $regex queries are case insensitive but not performant
- Non-$regex queries are case sensitive
- Collation:
- Language specific rules for MongoDB for string comparison
- Strength ranges from 1-5
- Strength 1-2 will give you case insensitive
- Query: {“first_name”: {$regex: /Jacob/i }} - case insensitive
- Query: {“first_name”: “Jacob”} - case sensitive
Accessing - Separate data together:
- $lookup:
- Is used to join data from more than one collection
- Great for rarely used queries or analytical queries (batch run overnight)
- Very slow and resource intensive
Upsert:
- Upsert is a combination of update and insert. Upsert performs two functions:
- Update data if there is a matching document.
- Insert a new document in case there is no document matches the query criteria.
- Personally: I found this very interesting, as for my use cases - I dont need to track if the document already exist.
I understand - the blog looks a bit navive (read, nothing interseting), sry I am just getting started. BTW, I plan to write a follow blog - deep dive on my specific use cases and data modelling.