MongoDB Recovery

This section describes the strategy for restoring backups after data loss events.

Single MongoDB, Central or Site Recovery

If recovery from a snapshot fails, follow these steps:

  1. Perform MongoDB Secondary IaC install if needed.

  2. The node should automatically re-join the replica set and do an initial sync.

  3. Do the Post-Restore Checks.

The IaC install should follow the procedure for Secondary Node installation, even if the machine to be recovered was initially configured as the primary node. This is because MongoDB replica sets elect another primary when a member becomes unavailable.

Central MongoDB Full Recovery

If recovery from a snapshot fails, follow these steps:

  1. If needed, use the IaC procedure to install MongoDB on each required node.

  2. Restore the replica set from MongoDB backups (see Restoring a replica set from MongoDB backups).

  3. Do the Post-Restore Checks.

Local MongoDB Full Recovery

If recovery from a snapshot fails, follow these steps:

  1. If needed, use the IaC procedure to install MongoDB.

  2. If no backup is available, transfer the data from the Central MongoDB into Local MongoDB (see Transferring data from MongoDB replica set).

  3. Do the Post-Restore Checks.

AspenTech Inmation data in MongoDB

  • For VQT data, the s:i objectid is encoded into every MongoDB document’s ObjectID.

  • For event data, the s:i objectid is stored within the MongoDB document.

Here, object IDs are the same in the source and destination locations, as they are all under the Master Core, so they do not require remapping with esi-migrate, although it is usable for moving VQT data. Using mongodump and mongorestore is preferred over mongoexport and mongoimport for performance reasons, and is described in Transferring data from MongoDB replica set.

esi-migrate would be used to move objects and data between different environments, such as from Dev to Production. This is generally not preferred, since the data should be backfilled from the data source.

Restoring MongoDB backup files

A restore of a full MongoDB backup is basically a copy of the offline MongoDB data files to the online location. To do so:

  1. Ensure that the running MongoDB is configured the same as the offline backup was in terms of folder structure, security, binary version etc.

  2. Stop the running MongoDB service.

  3. Rename the data directory.

  4. Copy the data directory from the offline backup.

  5. Start the MongoDB service.

  6. If the restore was successful, delete the old data directory retained in Step 3.

Restoring MongoDB replica set

There are two scenarios for restoring MongoDB, shown on the right branch of the diagram below.

In the first scenario, MongoDB is already working properly after the VM snapshots have been restored. No further action is needed. This is shown as taking the yes branch of the Available? decision in the diagram.

In the second scenario, MongoDB must be restored from a backup of the MongoDB data directory. This is shown as taking the no branch of the Available? decision in the diagram.

Backup Restore Plan

Transferring data from MongoDB replica set

# Procedure Expected Result

1

Run mongodump on the source replica set.

Useful command line options:

  • -d=database

  • -c=collection

  • -q='json query'

  • --archive=a.zip

The database will match the name of the Custom Data Store. For time-series data, the collection is called rawdata, and for event data, the collection is called Prod.Sync.EventData.UnCut, and for batch data, there are multiple collections whose names start with BatchProductionRecord.

The data is exported successfully as binary data files.

2

If needed, transfer the file to a machine suitable for importing to the destination replica set, usually one of the replica set members.

The file is transferred successfully.

3

Run mongorestore to import the data to the replica set.

  • Use -d or -c or --nsTo to remap collections if needed.

The data is imported successfully.

Restoring a replica set from MongoDB backups

To restore data from a MongoDB backup, follow these steps:

Resuming the script at the wrong time will yield undefined behavior.
# Procedure Expected Result

1

Verify that the configuration is correct.

On the primary node, Setup.xml has BackupPath set to the backup directory.

2

Run backuprestore.ps1 (located in the installation directory) on each MongoDB node.

The script starts to run on all nodes.

On the secondaries, the script will stop the service, delete the MongoDB data files, and pause. On the primary, the script will prompt for the MongoDB password.

3

Enter the password on the primary.

The script continues to run on the primary.

On the primary, the script will stop the service, delete the files, copy from backup, run a temporary standalone, delete the local database, stop the standalone, and pause.

4

Wait until the script pauses on all nodes.

The script is paused on all nodes.

5

Resume the script on the secondary MongoDB nodes.

The script finishes running.

The MongoDB service starts on both secondaries.

It is necessary to have the primary MongoDB stopped before the secondary starts, in order to prevent an initial sync.

6

Resume the script on the primary.

The script finishes running.

On the primary, the MongoDB service starts, re-inits the replica set, and re-adds the secondary members.

It is necessary to have the secondary MongoDB nodes cleared before the primary starts, in order to prevent an initial sync.

Post-Restore Checks

# Procedure Expected Result

1

Open a mongo or mongosh shell and connect to the restored node. Open another shell to a known working node or another restored node.

Both shells connect successfully.

2

Run rs.status() on the two different replica set nodes.

Both outputs appear consistent with each other, do not show issues (such as status messages indicating unreachable nodes), and include {"ok":1}.

3

Show all collections and databases using this command: (if on a secondary, run rs.secondaryOk() or rs.slaveOk() before running this)

db.getMongo().getDBNames().forEach(v => print(v + '\n\t' + db.getSiblingDB(v).getCollectionNames().join('\n\t')))

Both outputs are the same and show the expected collections and databases.

4

With DataStudio, connect to a Core that uses the restored replica set.

The connection is successful.

5

Check relevant system objects, such as the MongoDB connector and system objects that depend on the replica set.

Objects appear green and do not show any issues.

6

The health monitoring tag for the MongoDB connector, such as /System/Core/Health Monitoring/MongoDB Monitoring/rs0/status/ok.

It has a value of 1.

7

Check the logs.

New data entering the system is being stored without errors in logs.

8

Check an object’s historical data and compare it to the data in another location, such as the master core.

The data is consistent across both sources.