Background

To ensure there is no duplicate documents, we could create an unique index.
However, it will throw error when there are duplicates entries in the collection:

MongoDB cannot create a unique index on the specified index field(s) if the collection already contains data that would violate the unique constraint for the index.

Remove the duplicates by keys

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
const collection = db.collection('MyCollection');
const operations: Promise<any>[] = [];
console.time('aggregation');

await collection
.aggregate([
{
$group: {
_id: {
key1: '$key1',
key2: '$key2',
},
dups: {
$push: '$_id',
},
count: {
$sum: 1,
},
},
},
{
$match: {
_id: {
$ne: null,
},
count: {
$gt: 1,
},
},
},
])
.forEach((doc) => {
console.log(doc);
doc.dups.slice(1).forEach((duplicateId: string) => {
operations.push(collection.deleteOne({ _id: duplicateId }));
});
});

console.timeEnd('aggregation');

console.time('remove duplicate');
await Promise.all(operations);
console.timeEnd('remove duplicate');

Create the unique index

1
await collection.createIndex({ key1: 1, key2: 1 }, { unique: true, name: 'unique_index' });