MongoDB Tit Bits

Image credit: MongoDB

While I am getting started with MongoDB for my new project. Here are the few quick notes - based on my readings so far:

One Key Takeaway:

Data accessed together should be stored together

WiredTiger:

  • WiredTiger is the default storage engine for mongo db
  • It stores documents and indexes on disk
  • In memory cache stores some doc and frequency used index - working set
    • 50% of (RAM -1 GB) Or, 256MB

Massive Arrays:

  • Max Document size is 16MB
  • Index performance on arrays decreases as array size increases
  • Extended Reference pattern: where we duplicate some and not all data

Index:

  • Each index is atleast 8KB
  • Index take up storage - One File for each collection and one file for each Index (WiredTiger impl)
  • Write performance as index needs to updated
  • Limit each collection to 50 Index max
  • Do: Add index for frequently supported queries - improves read performance
  • Don’t: Create unnecessary indexes - reduce performance and takes up space

Bloated Documents:

  • Do: Data accessed together should be stored together
  • Don’t: bloat your document with related data that is not accessed to gether
  • Data that is related to each other should NOT necessarily stored together
  • Remove bloat from frequently used documents - it can be inmemory wire tiger cache
  • Data Duplication is OK (depends!)
    • Summary document and Details document - ref via links

Case Incentive Query:

  • $regex queries are case insensitive but not performant
  • Non-$regex queries are case sensitive
  • Collation:
    • Language specific rules for MongoDB for string comparison
    • Strength ranges from 1-5
    • Strength 1-2 will give you case insensitive
    • Query: {“first_name”: {$regex: /Jacob/i }} - case insensitive
    • Query: {“first_name”: “Jacob”} - case sensitive

Accessing - Separate data together:

  • $lookup:
    • Is used to join data from more than one collection
    • Great for rarely used queries or analytical queries (batch run overnight)
    • Very slow and resource intensive

Upsert:

  • Upsert is a combination of update and insert. Upsert performs two functions:
    • Update data if there is a matching document.
    • Insert a new document in case there is no document matches the query criteria.
  • Personally: I found this very interesting, as for my use cases - I dont need to track if the document already exist.

I understand - the blog looks a bit navive (read, nothing interseting), sry I am just getting started. BTW, I plan to write a follow blog - deep dive on my specific use cases and data modelling.

Reference:

Jacob Aloysious
Jacob Aloysious
Software Enthusiast

35yr old coder, father and spouse - my interests include Software Architecture, CI/CD, TDD, Clean Code.

Related