Skip to main content
Bytes & Beyond

The Backpack vs. The Library

The guide to Embedding vs. Referencing in MongoDB

Sometimes mongoDB developer faces “Should I embed this data or reference it?”

Think of it like this: carrying a backpack with everything you need, or making trips to a library to get information. Both approaches work, but each has trade-offs. If you choose poorly, your app might slow down, documents can get bloated, or your data may become hard to keep in sync.

This article will help you build a clear mental model and give you practical rules you can use right away.


Why the Choice Matters (One Big Idea)

MongoDB gives you document-level atomicity. If related data is in the same document, updates are atomic—safe and simple. But documents have a limit: a single BSON document can’t be larger than 16 MB.

So the real trade-off is:

Embed for fewer queries and atomicity
Reference for scalability and flexibility


The Backpack Approach (Embedding)

Imagine you’re going on a day trip. Do you pack your laptop, charger, snacks, water, and notebook in your backpack? Yes. Everything you need is right there.

// User document with embedded profile and preferences
{
  _id: ObjectId("..."),
  email: "mehedi@example.com",
  username: "mehedi_dev",

  // EMBEDDED: Profile data comes with the user
  profile: {
    firstName: "Mehedi",
    lastName: "Hasan",
    bio: "Full-stack developer who loves teaching",
    avatar: "https://example.com/avatar.jpg",
    location: { city: "Dhaka", country: "Bangladesh" }
  },

  // EMBEDDED: Preferences
  preferences: {
    theme: "dark",
    notifications: { email: true, push: false, sms: false },
    language: "en"
  },

  createdAt: ISODate("2024-01-15"),
  lastLogin: ISODate("2025-12-31")
}

One query. One document. Everything you need.

const user = await User.findOne({ email: "mehedi@example.com" });
console.log(user.profile.firstName); // "Mehedi"
console.log(user.preferences.theme); // "dark"

Embed when it’s “One-to-Few” and bounded

Good examples:

  • Profile and preferences
  • Order and line items (as an immutable snapshot)
  • Post and a small, capped number of comments

The Library Approach (Referencing)

Now imagine you are writing a research paper. Do you carry 47 textbooks in your backpack? No. You go to the library and fetch what you need.

// Users collection (small)
{
  _id: ObjectId("user123"),
  email: "mehedi@example.com",
  username: "mehedi_dev",
  createdAt: ISODate("2024-01-15")
}

// Posts collection (many)
{
  _id: ObjectId("post456"),
  title: "MongoDB Performance Tips",
  content: "Here are 10 ways to make MongoDB faster...",
  author: ObjectId("user123"), // REFERENCE to user
  publishedAt: ISODate("2025-12-30")
}

Now you have options:

Option A: Two queries (simple and clear)

const user = await User.findOne({ email: "mehedi@example.com" });
const posts = await Post.find({ author: user._id });

Option B: $lookup in aggregation (server-side join)

This is useful for analytics, admin pages, or pipelines. Avoid using it for high-traffic, user-facing features.

// Example: Join posts with their authors
const postsWithAuthors = await Post.aggregate([
  {
    $lookup: {
      from: "users",
      localField: "author",
      foreignField: "_id",
      as: "authorInfo"
    }
  },
  { $unwind: "$authorInfo" }
]);

Embedding vs Referencing Decision

Embed for one fast query, or reference so your document doesn’t explode past the 16 MiB limit. That’s the core trade-off.


The Most Common Mistake: The Unbounded Array

If something can grow without limit (likes, followers, comments, logs, events), don’t embed it in a single document.

Better approach:

  • Post stores basic fields
  • Comments are in a separate collection
  • Query comments by postId with pagination

Patterns

In practice, data models rarely fit into tidy boxes. You’ll face cases that don’t cleanly match “embed” or “reference.” For those in-between moments, MongoDB offers a few flexible patterns. Here are the ones that matter most:

The Subset Pattern (Backpack + Library hybrid)

Sometimes you want the best of both worlds. With the Subset Pattern, you embed just the most-used items—say, the latest 3 comments—right in the main document. The rest live in their own collection. This keeps your main document small and fast to read, but you can still fetch the full set when you need it.

The Extended Reference Pattern (fewer joins, some duplication)

Ever get tired of joining just to show a username or avatar? The Extended Reference Pattern lets you store a snapshot of those fields right alongside the reference. It’s a little duplication for a lot less joining.

For example:

// Comment stores both a reference and a cached display snapshot
{
  _id: ObjectId("c1"),
  postId: ObjectId("p1"),
  author: { _id: ObjectId("u1"), username: "mehedi_dev", avatar: "avatar.jpg" },
  content: "Great explanation!"
}

The catch? If the user updates their avatar or username, you’ll need to sync those changes everywhere you’ve cached them.

Syncing cached fields with Change Streams

So how do you keep those cached fields up to date? MongoDB’s Change Streams let you listen for changes and update duplicates as needed. Since version 6.0, you can even see what a document looked like before and after a change, making it easier to keep everything in sync.


Mongoose Tip: populate() isn’t magic

Many people think “populate” is just one query. In reality, it still fetches from another collection—Mongoose just abstracts the join.

Also, Document#populate() isn’t chainable. To populate multiple paths, use an array.

// populate multiple paths safely
await doc.populate(['author', 'comments.author']);

If you want user.posts without storing a posts: [] array in the user document, use a virtual populate:

// User schema
userSchema.virtual('posts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author'
});

// Usage
const user = await User.findById(userId).populate('posts');