Jump to Section:
Overview
Are you ready to evaluate how your SARS-CoV-2 genome sequences compare to sequences within your community and at large? The first thing you need to do is upload your consensus genome sequences and their respective sample information (known as metadata) to the CZ GEN EPI platform and you will be on your way to learning useful information.

Below is a step-by-step guide to upload data to the CZ GEN EPI platform. Follow the guide to upload your data in four simple steps:
Step 1: Navigate to web interface for uploading data
Step 2: Select your sample or sequence file
If you are interested in learning how to get your data files ready before uploading, jump ahead to:
Things to know about your sequence data file
After reading this user guide, you will be able to:
- Get your data ready to upload
- Upload your sequences and metadata information
- Understand uploaded sample information on your main dashboard (Sample page).
Step 1: Navigate to the web interface where you will upload your data
First, sign in to your CZ GEN EPI account (if you don’t have an account, request one here). Click ‘Upload’ on the right-hand corner of your main dashboard. This will take you to the ‘Upload Samples’ page.
To begin uploading data, find the ‘Upload’ icon on your dashboard.
Once you click ‘Upload’, you will be directed to the ‘Upload Samples’ page.
Step 2: Select your sample or sequence file
Before uploading your sequence data file (same as sample file), there are a few things you should note regarding format.
Things to know regarding sequence data files
- File format:
- Sequence or sample names:
- When preparing sequence fasta files, make sure that:
- Sequence or sample names are LESS than 120 characters long.
- Sequence or sample names ONLY contain letters from the English alphabet (A-Z, upper and lower case), numbers (0-9), periods (.), hyphens (-), underscores (_), and backslashes (/).
- Sequence or sample names DO NOT contain spaces.
- File or sequence names DO NOT contain any personal identifiable information (PII).
- When preparing sequence fasta files, make sure that:
- File limit:
- A maximum of 500 sequences can be uploaded at once. Therefore, a given fasta file can only contain up to 500 sequences. However, it is highly recommended that you upload up to 200 sequences in a given file. The smaller the sequence file (for example, 200 vs 500 sequences), the faster it will upload into the platform. Note that this will not keep you from uploading more than 500 sequences in multiple batches or uploads into your CZ GEN EPI account (for example, you can upload 1000 sequences by splitting them into multiple fasta files containing 200 sequences each and uploading them individually).
Select your sequence data file
Now that you know the appropriate format for your sequence data file and have it ready, upload sequences by following these steps:
- Click on ‘Select Sample Files’ within the ‘Upload Samples’ page and select the sequence file(s) of interest from your computer. Note that you can select multiple files as long as the total number of sequences does not exceed 500 (remember that we recommend uploading up to 200 sequences).
To upload sequence file(s), click on ‘Select Sample Files’ within the ‘Upload Samples’ page.
- Once you select your data file(s), review the summary of the number of files and total number of sequences ready to be uploaded under the ‘Select Sample Files’ icon. If you are done selecting sequence files to upload, click ‘Continue’ on the bottom-left corner of your screen. This will direct you to the ‘Metadata and Sharing’ page where you will add sample metadata (see Step 3).
After selecting your sequence file(s) you will see the total number of files imported and total number of samples or sequences selected for upload. When done adding sequence file(s) click ‘Continue’.
Once you select your sequence file(s) you will be directed to the ‘Metadata and Sharing’ page.
Step 3: Add your sample metadata
After selecting your sequence file(s), you will be prompted to add your sample metadata through the ‘Metadata and Sharing’ page. Before adding your metadata, there are a few things you should note regarding required information and how you can add it to the platform.
Things to know regarding metadata
Which metadata do you need?
Required metadata:
- Sample Name: Sample name (same as sequence name) specified in your sequence fasta file(s).
- Private ID: Private sample name or identifier from your group. Note that this could be the same as the sample name on your sequence fasta file.
- Collection Date: The month and year the sample was originally collected.
- Required format: YYYY-MM-DD
- Collection Location: The location where samples were originally collected. Following the general format below will ensure that your specific location is found in the reference data from GISAID. General format: Continent or Region/Country/State or Province/Division.
- Examples of acceptable formats:
- North America/USA/California/San Francisco County
- North America/USA/Illinois/Chicago
- North America/USA/California
- Examples of acceptable formats:
- Sample Privacy: You need to specify if you wish to keep your sample data private under a ‘Sample is Private’ field. By default, your data is public. Please note that ‘Public’ within CZ GEN EPI means that your data will be visible to groups or organizations with which you already have a data sharing relationship (for example, California DPHs with California Department of Public Health). ‘Public’ DOES NOT mean that everyone can see it. Even if you decide to share your data with other groups and organizations, your sample private identifiers will remain private (only you and your group can see sample private identifiers).
Optional metadata:
- GISAID ID (Public ID): If sequences have been uploaded to the GISAID public repository, you can provide the public ID that is available on GISAID.
- Sequencing Date: The month and year the samples were sequenced.
- Required format: YYYY-MM-DD
-
How can you upload metadata?
There are two ways to add your metadata to the CZ GEN EPI platform after uploading your sequence data file(s):
- Manual entry: You can enter metadata information manually through the web interface. This option is not recommended if you are uploading over 10 sequences with different metadata. Note that manual entry is not possible when uploading over 100 sequences.
- Downloading, completing, and uploading a metadata file: You can upload a table containing metadata information. This option is recommended any time you are uploading multiple sequences with different metadata and is the only option when uploading over 100 sequences. Note the following:
-
- You can only upload a single metadata file, even if you uploaded multiple sequence fasta files. This file needs to contain the metadata for each and all the uploaded sequences (up to 500).
- You will be able to download a tab-delimited table template (file extension ‘.tsv’) containing all of your sample names based on the provided sequence file on Step 2. Yon can then edit the table by completing the remaining metadata information.
- To prepare your metadata file, it is best to download the provided table template after adding sequence file(s) on Step 2. However, you can use your own tab- or comma- delimited file (‘.tsv’ or ‘.csv’ file extensions, respectively) if you follow the template format.
Add sample metadata
Now that you are familiar with the type of metadata you need, you are ready to add this information to the platform in one of two ways, manually or through a metadata file.
Entering metadata manually
- Fill out the metadata fields in the table. Note that the ‘Private ID’ column will be automatically populated with the sample name obtained from the sequence fasta file(s). You can edit this information if private sample IDs differ from those provided in the fasta file.
To add metadata manually, simply fill in the fields through the web interface.
- If samples have the same metadata for ‘Collection Date’, ‘Collection Location’, and/or ‘Sequencing Date’, you can select ‘Apply to All’ after filling in the first sample entry to automatically fill in information for the rest of the samples.
After filling in the information for the first sample, you can click ‘Apply to all’ to automatically fill information for the rest of the samples if they have the same information for a given column.
- If you don’t have information for optional metadata, leave the fields blank.
- After completing the required metadata fields, you will be able to continue to the next page to review your submission.
After completing the required fields, you will be able to continue to the next page.
Downloading, completing, and uploading metadata file
Once you upload your sequence fasta file(s), the platform will allow you to download a metadata table in TSV format (tab-delimited), where the ‘Sample Name’ column has been automatically populated based on your fasta file(s). To download the table, complete the rest of the metadata locally on your computer, and upload it to the platform:
- Click on ‘Download Metadata Template (TSV)’
After selecting your sequence file(s), you can download a metadata template file.
- When you open the downloaded metadata file, it should have seven columns (one for each of the required and optional metadata) and three example rows already filled in for your reference (Example Samples A, B, C). You can choose to delete these example rows or keep them in your metadata file (the system will recognize the default examples).
Example metadata table downloaded from the ‘Metadata and Sharing’ page after selecting 7 sequences to upload.
- Fill in the required metadata (Private ID, Collection Date, Collection Location). Sample privacy (‘Sample is Private’ column) is the only required entry that can be left blank in the metadata file. When you upload the metadata file, you will notice the ‘Sample is Private’ entries will automatically default to ‘No’ if the column was left blank in the metadata file.
- If you don’t have information for the optional metadata, simply leave entries for those columns blank (DO NOT delete the column).
- Save your tab-delimited metadata file (file extension ‘.tsv’) on your computer and upload it to the platform by clicking on ‘Select Metadata File’ and selecting the appropriate file from your computer browser. Note that you can only select one file.
To upload the completed metadata file, click on ‘Select Metadata File’.
- Note that if you are uploading a metadata file for more than 100 sequences, it may take a few minutes for the information to be added to the web interface. Be patient!
When uploading metadata for more than 100 sequences, the platform might take a few minutes to upload. This delay may result in an ‘Unresponsive Page’ pop up message in some computer systems. If this happens, make sure to click on ‘Wait’ instead of ‘Exit Page’.
- If there is a problem with your metadata file (for example, missing required fields), you will see error or warning messages. See details about error and/or warning messages and fix your metadata file accordingly (see troubleshooting guide describing common errors and warning messages and how to resolve them). After fixing the errors, re-upload the metadata file by clicking on ‘Select Metadata File’ and selecting the appropriate edited file. Note that re-uploading a metadata file will overwrite the original file.
Examples of warning messages after uploading a metadata file. In this example, the sample names on the metadata file did not match those on the sequence file and there were required inputs missing.
- If the file upload is successful, you will see a checkmark by the uploaded file name and the metadata table on the web interface will be automatically filled in with the uploaded information. Once you have successfully uploaded your metadata file, you will be able to continue to the ‘Review’ page to review your submission.
After successfully adding the metadata file without warnings, you will be able to continue to the ‘Review’ page.
Step 4: Review your submission and upload data
Please review the information on the ‘Review’ page carefully before starting the final upload process.
- Review the sample summary and table
After adding metadata (Step 3) you will be prompted to review your submission information on the ‘Review’ page. In the provided example, there are two sequences that should be private and this information can be confirmed by looking at the sample review table.
- Accept the terms of the sample submission if you agree with the statement regarding data privacy and having permission to upload data to the CZ GEN EPI platform. Checking off the terms box will allow you to start uploading your data to the platform by clicking ‘Start Upload’.
Checking off the box to agree with the terms and privacy policy of the CZ GEN EPI platform will allow you to start the final upload process.
- When samples are done uploading you will see an ‘Upload Complete’ message. Click on ‘Go to Samples’ to see uploaded samples in your main dashboard.
Message indicating that samples have been uploaded to the platform.
Main dashboard (Sample page)
After data upload is complete you will be able to see your newly uploaded samples in your main dashboard or Sample page. Public samples will have a globe icon, whereas private samples will have a lock icon. Note that there are new metadata columns on the dashboard:
Public ID: Preferably, a GISAID ID. If a GISAID ID is not available, the CZ GEN EPI platform will automatically generate a public ID based on when the sample was uploaded (sequential number/year). Note that private samples will also receive a Public ID. However, private samples will not be visible to other groups. Since a GISAID ID is the preferred public ID, we encourage users to submit their sequences to GISAID and update their Public ID in the platform whenever possible.
Lineage: SARS-CoV-2 lineage for each sample will be automatically detected and assigned by the platform after a few minutes. Lineages are assigned using the UShER phylogenetic placement tool. Lineage phylogenetic placement is verified routinely and updated as needed. Therefore, lineage assignments might change over time in your dashboard.
GISAID: Indicates whether or not a given sample has been submitted to the GISAID repository by the user, if it has been accepted (you will see an ISL Accession #) or rejected, or if the sample is not eligible for submission to the GISAID repository (private samples). Sample page showing uploaded samples. Note the different icons between public and private samples.
SARS-CoV-2 lineage will be automatically detected and assigned to each sample after sequences are processed within the platform.
Ready to add more samples to your account?
Click ‘Upload’ on the right-hand corner of your dashboard and follow Steps 2 through 4 of the guide.
Comments
0 comments
Please sign in to leave a comment.