Jump to Section:
Overview
Users should submit SARS-CoV-2 sequences to public repositories, such as GISAID and GenBank, to share their data with coronavirus researchers and the public health community. Here we describe how to download sequences and metadata from your CZ GEN EPI account for GISAID data submissions. Please note that you need to register with GISAID to be able to upload data into their repository. We also list general steps to upload your data into GISAID. You can find a detailed protocol regarding GISAID SARS-CoV-2 submissions here. Once you submit your data to GISAID, remember to change your sample Public IDs in CZ GEN EPI to GISAID IDs.
Click here if you are interested in submitting to GenBank. You may be also be interested in reading about differences between GenBank and GISAID .
After reading this guide, you will be able to:
- Download sequences and metadata from your CZ GEN EPI account
- Edit metadata files to comply with GISAID requirements
- Upload data to the EpiCoV platform within GISAID
Downloading data to be uploaded in GISAID
There are two files required for GISAID data submissions, namely a sequence fasta file and a metadata file. You can easily download both files from your CZ GEN EPI account. The sequence file will be ready for your GISAID submission ‘as is’ when you download it. However, the downloaded metadata file is only a partially filled template that will need to be edited to comply with GISAID format requirements (see below). Download data files from your CZ GEN EPI account following these steps:
-
Select the samples you are interested in submitting to GISAID from the SARS-CoV-2 Sample page and click on the "Download" icon. Note that GISAID has a limit of 1000 samples per submission and, thus, you should not select more than 1000 samples to download.
Click on the "Download" icon on the right-hand side of the SARS-CoV-2 Sample page after selecting samples of interest. -
A "Select Download" dialog box will appear. Select "GISAID Submission Template". This will allow you to download a sequence file (fasta format) and a metadata file (".tsv" file extension).
Select "GISAID Submission Template" from the "Select Download" dialog box. Once you make your selection, you will be able to download it by clicking on ‘Download’.
Editing your metadata file for submission
Now that you downloaded the data, you need to edit your metadata file to comply with GISAID requirements. The metadata file downloaded from CZ GEN EPI follows the format requested by GISAID and has information already filled in. Therefore, you only need to edit a few of the fields. Below we provide a table describing the requested metadata from GISAID and the information that is already on your downloaded CZ GEN EPI metadata file. The table also highlights what needs to be done to complete the requested metadata.
Note the following information on the table:
- 1st column: Required metadata by GISAID is highlighted in orange.
- 2nd column: Description or examples of requested metadata.
- 3rd column: Information automatically filled in by CZ GEN EPI (‘blank’ fields are left empty).
- 4th column: Describes what users need to do to complete the metadata, including:
- No action (light purple): Most of the required fields are filled in by CZ GEN EPI and users do not need to add any information. However, we recommend that users read carefully and verify the information filled in by CZ GEN EPI.
- Required to enter information (orange): Metadata that will need to be entered by users. Enter the requested information when editing the file.
- Edit or enter information (optional; yellow): If you have more information for the fields highlighted in yellow, provide the appropriate metadata.
You can complete the information highlighted in the table above within the downloaded CZ GEN EPI metadata file by 1) editing the metadata file directly; or 2) transferring the downloaded data to a GISAID metadata template.
Option 1: Edit the file directly
You can simply open the downloaded metadata file by importing it into Excel as text (or copying the data into an Excel file) or using any text editor. Here are some general guidelines to keep in mind (see snapshots below regarding Excel):
- If you are working with Excel and would like to open the file, make sure to import the downloaded metadata file as text. Opening the file directly, without importing it, will disrupt the format.
- Make sure your dates are in the correct format (YYYY-MM-DD) while editing your file in Excel.
- Keep the headers as they are (first two rows).
- The FASTA filename column should match the name of the fasta file containing the consensus genomes.
- Name your file with date of submission and a descriptive name that specifies metadata (“YYYYMMDD_a_descriptive_name_metadata”). The downloaded metadata file from CZ GEN EPI follows this format.
- Save the edited file as comma-delimited (".csv").
To import a file as text using Excel, go to "Data", click on "Get Data" and select "From Text".
While importing data, the Text Import Wizard box will appear. Select "Delimited" and click "Next". Select "Tab" as the delimiter and click "Finish". Within the Import Data box, select the first cell of your worksheet (A1) to put your data and click "OK".
If you import the metadata file into Excel, make sure the date format is set to YYYY-MM-DD. You can select the required format from the "Format Cells" dropdown menu.
If you are copying data into Excel, make sure to keep dates in the correct format.
Option 2: Transfer the information to a metadata template from GISAID
You can download a metadata template from GISAID and transfer the information from the downloaded CZ GEN EPI metadata file. To download and edit the GISAID metadata template:
-
Log into your GISAID account. If you already have a template, skip to step 4 below.
GISAID login page -
Navigate to the EpiCoV platform and click on ‘Upload’.
EpiCoV platform. Click ‘Upload’ to navigate to the submission interface. -
A popup will appear. Select ‘Batch upload’.
Click ‘Batch upload’ to continue to the EpiCoV submission interface.
A second popup will appear. Click ‘OK’ to continue to the submission interface (you will not be submitting data from the command line). -
Download instructions and the metadata template (‘xls’ file extension).
Click on ‘Download Instructions and Template’ at the bottom of the page to download the metadata template. -
Open the template in Excel. You will notice two worksheets within the downloaded template, including ‘Instructions’ and ‘Submissions’. Click on the ‘Submissions’ worksheet.
Worksheets within the downloaded GISAID template. Edit information on the ‘Submissions’ worksheet. -
Complete the metadata required by GISAID by copying information from the file you downloaded from CZ GEN EPI and pasting it into the GISAID template. Keep the headers as they are (first two rows) and fill information as needed. Note that you can copy the downloaded metadata file and paste over (overwrite) the entire template given that the headers are the same in both files.
Downloaded GISAID metadata template opened in Excel. Required metadata are highlighted in orange font.
When pasting date information, make sure it stays in the correct format (YYYY-MM-DD). Setting the cell format to ‘Text’ often helps avoid unwanted changes.
If you are copying data into Excel, make sure to keep dates in the correct format. - After editing, save your file using the following format:
- Filename: Include the date of submission and a descriptive name that specifies metadata. General required format: ‘YYYYMMDD_a_descriptive_name_metadata’. The downloaded metadata file from CZ GEN EPI follows this format.
- File extension: Use file extension ‘.xls’ (EXCEL 97 - 2003 Format, NOT ‘.xlsx’) or ‘.csv’.
Uploading and submitting data to GISAID
Now that you edited the metadata file, you are ready to upload your data into GSAID. See SARS-CoV2 GISAID submission protocol for details about the submission process. Here we summarize steps to upload the downloaded fasta file from CZ GEN EPI and the edited metadata file to the EpiCoV platform within GISAID. To submit data:
-
-
-
Go to the EpiCoV platform within GISAID for Batch Upload (see steps 1 through 3 in the previous section). Upload your files using the web interface.
EpiCoV interface for uploading data. -
Select your sequence quality confirmation options from the dropdown menu. This is important because the GISAID data curation team evaluates the quality of your sequences by looking for gaps and/or insertions that result in frameshifts. Such frameshifts can result from real insertions and deletions as part of virus evolution, in which case your sequence does not need to be corrected. Frameshifts could also result from sequence assembly errors which you can correct through quality checks. Therefore, GISAID confirms with data submitters that detected frameshifts are indeed supported and reliably captured by their sequencing and assembly workflows. GISAID then attaches a flag to the sequence pointing out whether or not frameshifts have been verified.
You can choose one of three options regarding sequence quality confirmation. You can also contact the curation team to note that frameshifts in your sequences have not been verified, but would like to release the sequences as they are.
There are three confirmation options:
- Notify me about ALL DETECTED FRAMESHIFTS AND/OR SPIKE TRUNCATIONS in this submission for reconfirmation of affected sequences.
- Notify me only about NOT PREVIOUSLY REPORTED FRAMESHIFTS AND/OR SPIKE TRUNCATIONS in this submission for reconfirmation of affected sequences.
-
I confirm ANY FRAMESHIFTS AND/OR SPIKE TRUNCATIONS in this submission and request their release without reconfirmation by a Curator.
It is suggested that data submitters pick the second option (b) of the dropdown menu. By making this selection, you only need to reconfirm sequences containing frameshifts that have not been seen before (see Types of frameshifts and when to fix them for details). Once you confirm the sequences, you can re-submit and select the 3rd confirmation option (c) during submission. You can also choose to send a message to the GISAID sequence curation team to indicate that frameshifts in your sequences have not been verified, but you would like to release the sequences as they are. - Once you pick a confirmation option, click ‘Check and Submit’ at the bottom of the page to finalize your submission.
After uploading files and selecting a confirmation option, click ‘Check and Submit’ to finish your submission.
-
Go to the EpiCoV platform within GISAID for Batch Upload (see steps 1 through 3 in the previous section). Upload your files using the web interface.
-
Comments
0 comments
Please sign in to leave a comment.