Working with a large number of files on the web can be tedious, especially if you want to download, upload, or set annotations and provenance. The command line, Python client and R client have convenience functions for bulk upload and download. Uploading require a tab delimited manifest where each file to be uploaded and, optionally, annotations to be applied, are specified as a row in the file. Downloading in bulk requires identifying a container (
View) that contains the files of interest. In this article we will cover how to:
Files to be uploaded are specified in a tab separated (
.tsv) manifest. The manifest has columns that contain information about each file to be uploaded along with annotations that will be associated with the file in Synapse.
The required columns in the manifest are:
path: the directory of the file to be uploaded
parent: the Synapse ID of the Folder to upload to
It is optional to link provenance - the
used column can indicate files that were used to create the one being uploaded, and the
executed column can indicate code (in Synapse or on the web) that was used to generate the file. Here is an example manifest that uploads a single file:
The above manifest describes a “file.csv” that will be uploaded to the Synapse folder
syn123 and named “Tardar Sauce”. The manifest describes the provenance of the file indicating that it was generated using code deposited in GitHub (https://github.com/your/code/repo) from the data in
syn654. Additionally, the file has been annotated with
emotion: grumpy and
species: cat. Additional annotations could be associated with the file by adding more columns.
Download the template.
The format of the manifest file (called ‘filesToUpload.tsv’ in this example) can be validated prior to upload by using the parameter
dryRun will not upload the data specified in the manifest file. Instead, the client checks the manifest file format, all file paths exist, all files are unique, Provenance can be set (optional) and the parent synId exists. The number of files and total upload size is also summarized in the
dryRun output. This helps ensure your data upload does not end prematurely due to a typo in the file path or parent synId.
After validating the manifest, you can now upload the files to Synapse by removing the
dryRun parameter. Once the upload is complete, you will receive an email notification. This notification will also show any errors from the upload.
Files can be downloaded in bulk using the
syncFromSynapse function. This function allows you to download all the files in a folder or project along with all the annotations and provenance on those files. A manifest file called SYNAPSE_METADATA_MANIFEST.tsv that contains the metadata will also be added in the path.
You can modify values in the manifest and re-upload it to Synapse using
syncToSynapse to edit files in bulk. The manifest allows you to modify everything: file path, provenance, annotations, and versions. If the files have not changed and you only want to update the file annotations, add a column called forceVersion to the manifest with the value False for each row. This will stop
syncToSynapse from uploading new versions of the files.
You can also update annotations using File Views.
Please note that you cannot move things with a manifest. If the parentId is changed, it will create a copy and the file will exist in two different locations.