Use Globus to synchronize data between your campus and ACCESS

Some research projects—particularly those with team members at multiple campuses—need to maintain copies of the project's data on a campus system and on an ACCESS system. Keeping these copies synchronized is important, but it shouldn't be time-consuming. You can automate this synchronization using Globus. The steps to set this up are as follows.

Perform an initial transfer between your campus and ACCESS

To synchronize data between your campus and ACCESS data storage, you first need to be able to transfer data between your project's campus data storage and your project's ACCESS data storage. ACCESS uses Globus for data transfers, and many research campuses also have Globus set up for research use. If your campus doesn't already have Globus set up for your use, you can do it yourself.

Follow the instructions for transferring data with Globus, and make a copy of your project's data on your project's ACCESS storage (or on your campus storage, depending on where the data starts). Once you have a copy in both places, you're ready to set up synchronization.

Determine the synchronization direction(s)

The manner in which your research project produces new data will determine how the data should be synchronized between campus and ACCESS.

  • If new data is produced only in one place (either on campus or on ACCESS) and you need it to automatically appear in the other place, you'll need synchronization in only one direction: either from campus to ACCESS, or from ACCESS to campus.

  • If new data is produced both on campus and on ACCESS, you'll need bi-directional synchronization.

To synchronize in a single direction, you'll create a repeating task that synchronizes new data from the source to the destination. Your source is where new data is created, and your destination is where the copy is needed. For example, you can create a task that copies new data from your project's campus storage to your project's ACCESS storage every hour. The task will only transfer data that isn't already on the ACCESS storage.

For bi-directional synchronization, you'll create two repeating tasks: one in each direction. Each task will only transfer data that appears on the source and isn't already on the destination.

Set up synchronization

Assuming you've already used Globus to make a copy of your data, setting up a repeating synchronization task will be easy. The first step is exactly the same as for the first data transfer you performed. Login to the Globus web app, locate the data source and the destination. (The source is where new data will appear, and the destination is where the new data will be copied to.) Figure 15 shows the setup for synchronizing a folder called "sequencer-data" on campus storage to ACCESS's Darwin resource at the University of Delaware.

Figure 15. Setting up synchronization between campus and ACCESS. In this example, the folder to be synchronized is called "sequencer-data," and it will be synchronized from campus storage to ACCESS storage.

After you've located the data source and destination, and clicked to select the folder to be synchronized, click Transfer & Timer Options between the two Start buttons. (The place to click is circled in red in Figure 15.) This will display the options for synchronization, task start time, and repeating, shown in Figures 2 and 3. Please, always check the "sync" box! (If you don't check this box, every file will be transferred every time the task runs, even if it's already on the destination.)

Figure 16. Setting synchronization options to use file modification time

In the example in Figure 16, we set options to transfer new files and files with newer modification time on the source system, and to copy the modification time along with the file's contents. (You could instead transfer only new files, or new files plus files that have changed size, or new files plus files whose checksums at source and destination do not match.) We also set an option to terminate the transfer if a quota error is detected on the destination (i.e., you ran out of available storage). (Without that option, Globus will continue attempting to transfer data until the quota is increased, you manually cancel the task, or several days pass without any improvement in the error.)

Figure 17. Setting options to repeat the synchronization every two hours until 11pm on November 19, 2022

In Figure 17, we show how to set options to repeat the synchronization every two hours, ending at 11pm on November 19, 2022. (That's when this particular research project's ACCESS allocation ends.) You could instead specify a number of times to repeat the task (e.g., once per day for 60 days) or make it continue indefinitely until you manually delete the timer. You may also set the time of the first synchronization if you don't want it to begin immediately.

When you're finished setting your synchronization and repeating options, scroll up to the top of the File Manager window and click the Start button. Figure 18 shows how Globus will let you know that your task has started.

Figure 18. After clicking the Start button, Globus will confirm that the timer task was submitted successfully.

If you've determined that your data only needs to be synchronized in one direction, you're finished! You can move on to the next section on monitoring your synchronization.

If you've determined that you need bi-directional synchronization, you can repeat this process and create a second synchronization task in the other direction. The easiest way to do that is to remain on the File Manager screen, double-check that the Transfer & Timer Options are still the way you need them, and click the other Start button (the one with the arrow in the other direction). This will create the exact same synchronization task with the Source and Destination swapped.

Monitor synchronization

Your synchronization task will run according to the schedule you set. You'll receive an email notification each time the task runs. Each notification contains a summary of what was transferred. You can also view the task history in the Globus web app. As shown in Figure 19, click the Activity icon in the left side of the Globus web app, and you'll see recent activity.

Figure 19. Click Activity to see recent Globus activity, including repeating synchronization tasks. Click the arrow to the right of any entry to see details.

Click the Timers tab at the top of the display, as shown in Figure 20, to view your active timers. Here, you can cancel a timer by clicking the trash can on the right side of the timer entry.

Figure 20. View any timers you've created using the Timers tab on the Activity page.

Click the arrow on the right side of any entry in the Timers list to see details, as shown in Figure 21.

Figure 21. The details page shows everything about a repeating task, including synchronization options.

The Timer Log tab on the details page displays a list of every time the task has run so far, as shown in Figure 22. For each task execution, you can view a summary of what was transferred.

Figure 22. The Timer Log tab displays a list of each time the task has run. Click "view task" to see details of a run.

Conclusion

It's easy to maintain a synchronized copy of your project's data in two or more locations. This can enable collaboration with research partners, facilitate automated data processing, or gather data from sources at multiple campuses. If you can transfer data between two locations—such as your campus and ACCESS—you can also keep the copies synchronized with a repeating synchronization task. Once set up, the synchronization will happen automatically until your repeating schedule ends or you cancel the task.