Firebase Cloud Firestore Export and Import Data (Backup and Restore)

Firebase comes with a managed data export and import service for Cloud Firestore that allows us to backup the Firestore database into Cloud Storage and restore from a previously exported backup. This is an extremely important operation for various reasons like:

  • You should always backup your data at regular intervals, as a good practise.
  • Backups help recover from accidental deletion or accidental manipulation of data.
  • If you want to process and/or store large amounts of data frequently, then working on separate exported data makes more sense than putting massive load on the Cloud Firestore instance.

Few aspects of this managed export and import service to keep in mind:

  • We can export documents from all collections or just some of them.
  • We can import all data from exported data or just specific collections.
  • Data exported from one Firestore DB can be imported into another Cloud Firestore database. This will generally be cross-projects because currently Firebase allows only one Firestore DB per (Firebase/GCP) project.
  • Cloud Firestore exports can also be loaded into BigQuery.
  • To use this service, your project must be on the Blaze plan (not Spark), i.e., billing must be enabled. You can see the plan you are on in Firebase’s “Project Overview” page.

Next up, lets see what are some of the things we need to setup before we can backup or restore data.

Setup

There are two important things to do before we’re able to backup and restore data:

  • The first is to create a Cloud Storage bucket where the exported data will land in. Make sure you create this bucket in a location near your Cloud Firestore location. You can go to the “Firestore Database” page in your Firebase console to find the current Cloud Firestore location.
  • The second thing to do is to make sure that the right set of permissions (IAM) for the Cloud Storage bucket and Firestore resources have been assigned to the principal (service account in this case) that will perform the operation of importing and exporting.

Now lets see how to do these.

Cloud Storage Bucket Setup

  1. Go to https://console.cloud.google.com/storage/browser
  2. Click CREATE BUCKET
  3. Name your bucket whatever you want to, for instance, PROJECT_ID-firestore-backup
  4. Choose the storage location (preferably near your Cloud Firestore location), storage class, access control settings, object protection tools and data encryption method.
  5. Click CREATE

Add Permissions to the Principal

Every operation in Firebase (or GCP) is run by an account (principal) that tries to access one or more resources (like Cloud Storage bucket, Cloud Firestore, etc.). Whoever runs the export or import jobs (in this case a service account) must have a bunch of necessary permissions that are same as the ones assigned to the following roles:

  • Cloud Datastore Import Export Admin – Grants import and export permissions. The Owner and Cloud Datastore Owner roles also grant the same permissions and more. Basically an account will require the permissions granted by these roles to perform import and export operations on Cloud Firestore.
  • Storage Admin – Grants administrative permissions on Cloud Storage (create, delete, list, etc. buckets). The Owner role also grants the same set of permissions and more. Basically an account will require these permissions to be able to export data to Cloud Storage and import data from the same.

Now generally if the export and import operations are performed from the Google Cloud Console (UI) or gcloud utility (we will see both the methods below), then a default Firestore service agent (Google-managed service account) is used which already have the required permissions. So you need not do anything here.

But if you are using the Client SDK (we will see how to below), then you have to make sure whichever service account’s credentials you’re passing (or if the code is running in an environment that already has a service account attached), that service account must have the above roles (or the permissions mapped to those roles) assigned to them.

Note: If you are exporting data to or importing data from a bucket in a different project, make sure you give the relevant service account in the current project (default one or the one you are using via the client SDK) access to that bucket in the other project. Following are the steps to do that:

  1. Go to https://console.cloud.google.com/storage/browser (in the relevant/different Project)
  2. Click on the Bucket > PERMISSIONS > ADD
  3. Type in the Service Account name (from the current/other project).
  4. Select role Storage Admin
  5. Click SAVE

Export and Import

Let’s look at the various ways in which we can export (backup) and import (restore) data.

Google Cloud Console

  1. Go to https://console.cloud.google.com/firestore/import-export
  2. Click EXPORT
  3. Select if you want to export the entire database or specific collections.
  4. Choose the Cloud Storage bucket destination where you want the data to be exported. The destination specified here should be the same as the bucket you created in the Setup step above.
  5. Click EXPORT

As you can see in the console page, the import/export job will be run as a specific service agent (Google-managed service account) of the following format:

service-[project_number]@gcp-sa-firestore.iam.gserviceaccount.com

By default this service account has appropriate permissions on the Cloud Storage buckets and Firestore to perform the import and export operations.

Now if you want to import the exported data, follow these steps:

  1. Go to https://console.cloud.google.com/firestore/import-export
  2. Click IMPORT
  3. In the Filename field click BROWSE and find the bucket where you exported data previously. From that bucket find a filename with the extension (ends with) .overall_export_metadata and select that.
  4. Click IMPORT

gcloud

If you are using the gcloud CLI utility, then we can can export and import data with the following commands:

# First make sure you've selected the right project
$ gcloud config set project [PROJECT_ID]

# Export command
$ gcloud firestore export gs://[BUCKET_NAME]

# or to export specific collection IDs (groups)
$ gcloud firestore export gs://[BUCKET_NAME] \
--collection-ids=[COLLECTION_ID_1],[COLLECTION_ID_2],[SUBCOLLECTION_ID_1]

# Import command
# [EXPORT_PREFIX] should look like "2022-05-29T11:19:49_21853"
# if created from an export command before
$ gcloud firestore import gs://[BUCKET_NAME]/[EXPORT_PREFIX]

# or to import specific collection IDs (groups)
$ gcloud firestore import gs://[BUCKET_NAME]/[EXPORT_PREFIX] \
--collection-ids=[COLLECTION_ID_1],[COLLECTION_ID_2],[SUBCOLLECTION_ID_1]

Note:

  • BUCKET_NAME is the destination for the export operation and source for the import operation which should be the same as what you used in the Setup process (above).
  • The --collection-ids flag exports all collections and subcollections at any path with the specified collection IDs. Hence, effectively these collection IDs refer to collection groups.

This operation will run as the default Firestore service agent (Google-managed service account) which has the appropriate permissions to export/import data from/to Firestore and the Cloud Storage buckets as well. If you want to see the exact service account name, then head over to the export/import console page.

Client SDK

We can export/import our Firestore DB with the Firestore Admin SDKs as well. Let’s first see what the code (in Node.js) will look like to export the database:

const firestore = require('firebase-admin/firestore');

// Destination bucket
const bucket = 'gs://[PROJECT_ID]-firestore-backup';
// Service account credentials downloaded from
// Firebase console or GCP console
//
// Note: This need not be passed if the code is running
// directly in firebase functions which already has the
// required privileges to access different resources via
// the default service account attached
const config = {
  keyFilename: 'keys.json'
};
const client = new firestoreAdmin.v1.FirestoreAdminClient(config);
const databaseName = client.databasePath('PROJECT_ID', '(default)');

// Export call!
const exportPromise = client.exportDocuments({
  name: databaseName,
  outputUriPrefix: bucket,
  // Leave collectionIds empty to export all collections
  // or set a list of collection IDs to export
  collectionIds: ['users', 'posts', 'comments'],
});
exportPromise.then((res) => {
  console.log(res);
});

Do note that if you deploy the above code as a Firebase function then you won’t have to pass the service account credentials. Next, lets look at the code to import the exported data:

// IMPORTANT: Borrow the `client` creation code from
// the export snippet above

// Note how the bucket value this time is different
// from the export code above. This time it must be
// the folder (EXPORT_PREFIX) where the export operation put all
// the data. It can be found in the response
// of `client.exportDocuments` under the `outputUriPrefix` field
const bucket = 'gs://[PROJECT_ID]-firestore-backup/2022-05-29T17:06:52_49725';
const databaseName = client.databasePath('PROJECT_ID', '(default)');

const importPromise = client.importDocuments({
  name: databaseName,
  // This time its inputUriPrefix and not outputUriPrefix
  inputUriPrefix: bucket,
  // Empty array to import all collections
  collectionIds: ['users', 'posts', 'comments'],
});
importPromise.then((res) => {
  console.log(res);
});

Export/Import Operation Commands

Once you start an export or import operation, a unique name is assigned to it that you can use to check the status of the operation, cancel it and even delete it from the list. For example when we trigger an export via gcloud utility, we see something like this:

$ gcloud firestore export gs://[BUCKET_NAME]
Waiting for [projects/PROJECT_ID/databases/(default)/operations/ASA2MzU0MTIwMjEzChp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKLRI] to finish...

In the example above, operation name is ASA2MzU0MTIwMjEzChp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKLRI which is prefixed with projects/PROJECT_ID/databases/(default)/operations/. If you triggered the operation from the Google Cloud Console (UI), then run the following command to get the operation name:

# List all running and recently completed export and import operations
$ gcloud firestore operations list

Now using the operation name, we can perform the following actions:

# Describe an operation
$ gcloud firestore operations describe [OPERATION_NAME]

# Cancel an operation
# Note: Cancel will not undo an operation, i.e., a cancelled export
# operation will leave exported documents in Cloud storage as is and
# a cancelled import operation will not undo the updates already made
# to the Cloud Firestore database
#
# Also after an export or import operation is triggered via gcloud
# closing the terminal will not cancel the operation :D
$ gcloud firestore operations cancel [OPERATION_NAME]

# Delete an operation
# Note: Deleting only deletes the operation from the list of
# recent operations
$ gcloud firestore operations delete [OPERATION_NAME]

Let’s get a bit in detail on the output of describe:

$ gcloud firestore operations describe [OPERATION_NAME]
metadata:
  '@type': type.googleapis.com/google.firestore.admin.v1.ExportDocumentsMetadata
  operationState: PROCESSING
  outputUriPrefix: gs://[BUCKET_NAME]/2022-05-29T11:19:49_21853
  progressBytes:
    completedWork: '162199629'
    estimatedWork: '151708310'
  progressDocuments:
    completedWork: '156243'
    estimatedWork: '161500'
  startTime: '2022-05-29T11:19:49.930201Z'

I just want to talk a bit about the progressBytes and progressDocuments fields:

  • estimatedWork is an estimation of total number of bytes and documents the operation will process.
  • completedWork is the number of bytes or documents processed so far. By the end of the operation, the completedWork numbers may be higher than the estimatedWork numbers.

Ideally if you want to estimate the progress, dividing completedWork by estimatedWork should give you an approximation (not the exact or accurate progress).

Other Points

Some additional points to know/keep in mind:

  • The output of the exported data is in LevelDB log format.
  • Export and Import operations are billed or charged at the rates listed in Cloud Firestore pricing.
  • If you want to copy the exported data to another machine or to your local machine you can use whatever preferred method to download the exported files/data from Cloud Storage. For instance gsutil cp -r [src_url] [dest_url] can do the job here.

If you’d like to learn how to schedule automated backups and restores at regular intervals, then read the next article.

One thought on “Firebase Cloud Firestore Export and Import Data (Backup and Restore)

Leave a Reply

Your email address will not be published.