Jump to Section:
Are you ready to evaluate how your virus genome sequences compare to sequences within your community and at large? The first thing you need to do is upload your consensus genome sequences and their respective sample information (known as metadata) to the CZ GEN EPI platform and you will be on your way to learning useful information.
Below is a step-by-step guide to upload data to the CZ GEN EPI platform. Follow the guide to upload your data in four simple steps:
If you are interested in learning how to get your data files ready before uploading, jump ahead to:
After reading this user guide, you will be able to:
- Get your data ready to upload
- Upload your sequences and metadata information
- Understand uploaded sample information on the main dashboard (Sample page) for each pathogen workspace.
Step 1: Navigate to the web interface where you will upload your data
First, sign in to your CZ GEN EPI account (if you don’t have an account, request one here). Select the workspace for the pathogen of interest and click "Upload" on the right-hand side of the main dashboard or Sample page. This will take you to the Upload Samples page.
To begin uploading data, find the "Upload" button on your Sample page.
Once you click "Upload", you will be directed to the Upload Samples page. Uploading sample file(s) is the first step of the data upload process.
Step 2: Select your sample or sequence file
Before uploading your sequence data file (same as sample file), there are a few things you should note regarding format.
Things to know regarding sequence data files
- File format:
- Sequence or sample names:
- When preparing sequence fasta files, make sure that:
- Sequence or sample names are LESS than 120 characters long.
- Sequence or sample names ONLY contain letters from the English alphabet (A-Z, upper and lower case), numbers (0-9), periods (.), hyphens (-), underscores (_), and backslashes (/).
- Sequence or sample names DO NOT contain spaces.
- File or sequence names DO NOT contain any personal identifiable information (PII).
- When preparing sequence fasta files, make sure that:
- File limit:
- Our team benchmarked up to 10,000 samples uploaded at once. A file containing 10,000 sequences can be uploaded within 1 minute.
- You can upload multiple sequence files.
Select your sequence data file
Now that you know the appropriate format for your sequence data file and have it ready, upload sequences by following these steps:
- Click on "Select Sample Files" within the Upload Samples page and select the sequence file(s) of interest from your computer.
To upload sequence file(s), click "Select Sample Files" within the Upload Samples page.
- Once you select your data file(s), review the summary regarding the number of files and total number of sequences ready to be uploaded. If you are done selecting sequence files to upload, click "Continue" on the bottom-left corner of your screen. This will direct you to the Metadata and Sharing page where you will add sample metadata (see Step 3). After selecting your sequence file(s) you will see the total number of files imported and total number of samples or sequences selected for upload. When done adding sequence file(s) click "Continue".
Once you select your sequence file(s) you will be directed to the Metadata and Sharing page.
Step 3: Add your sample metadata
After selecting your sequence file(s), you will be prompted to add your sample metadata through the Metadata and Sharing page. Before adding your metadata, there are a few things you should note regarding required information and how you can add it to the platform.
Things to know regarding metadata
Which metadata do you need?
- Sample Name: Sample name (same as sequence name) specified in your sequence fasta file(s).
- Private ID: Private sample name or identifier from your group. Note that this could be the same as the sample name on your sequence fasta file.
- Collection Date: The month and year the sample was originally collected.
- Required format: YYYY-MM-DD
- Collection Location: The location where samples were originally collected. Following the general format below will ensure that your specific location is found in the reference data from GISAID. General format: Continent or Region/Country/State or Province/Division.
- Examples of acceptable formats:
- North America/USA/California/San Francisco County
- North America/USA/Illinois/Chicago
- North America/USA/California
- Examples of acceptable formats:
- Sample Privacy: You need to specify if you wish to keep your sample data private under a "Sample is Private" field. By default, your data is public. Please note that "Public" within CZ GEN EPI means that your data will be visible to groups or organizations with which you already have a data sharing relationship (for example, California DPHs with California Department of Public Health). "Public" DOES NOT mean that everyone can see it. Even if you decide to share your data with other groups and organizations, your sample private identifiers will remain private (only you and your group can see sample private identifiers).
- GISAID ID or GenBank Accession (Public ID): If SARS-CoV-2 or Mpox sequences have been uploaded to public repositories (GISAID or GenBank), you can provide the Public ID that is available on these databases.
- SARS-CoV-2 Public ID: Please provide the GISAID ID (e.g., hCoV-19/USA/MS-MSPHL-0012/2023) or GenBank Isolate Name without the "SARS-CoV-2/human" prefix that is commonly associated with isolate names. For example, for GenBank Isolate Name "SARS-CoV-2/human/AUS/VIC14483/2020", you should use "AUS/VIC14483/2020" as the Public ID.
- Mpox Public ID: Please provide the GenBank Accession ID (e.g., NC_063383). Note that you have to strip the sequence version from the GenBank Accession ID. The sequence version is specified within accession IDs using a period followed by a version number. For example, for GenBank Accession ID "OP535341.1", you should use "OP535341" as the Public ID.
- Sequencing Date: The month and year the samples were sequenced.
- Required format: YYYY-MM-DD
How can you upload metadata?
There are two ways to add your metadata to the CZ GEN EPI platform after uploading your sequence data file(s):
- Manual entry: You can enter metadata information manually through the web interface. This option is not recommended if you are uploading over 10 sequences with different metadata. Note that manual entry is not possible when uploading over 100 sequences.
- Downloading, completing, and uploading a metadata file: You can upload a table containing metadata information. This option is recommended any time you are uploading multiple sequences with different metadata and is the only option when uploading over 100 sequences. Note the following:
- You can only upload a single metadata file. If you uploaded multiple sequence files, make sure to gather metadata for all the samples into a single file.
- You will be able to download a tab-delimited table template (file extension ".tsv") containing all of your sample names based on the provided sequence file on Step 2. Yon can then edit the table by completing the remaining metadata information.
- To prepare your metadata file, it is best to download the provided table template after adding sequence file(s) on Step 2. However, you can use your own tab- or comma- delimited file (".tsv" or ".csv" file extensions, respectively) if you follow the template format.
Add sample metadata
Now that you are familiar with the type of metadata you need, you are ready to add this information to the platform in one of two ways, manually or by uploading a metadata file.
Entering metadata manually
- Fill out the metadata fields in the table. Note that the "Private ID" column will be automatically populated with the sample name obtained from the sequence fasta file(s). You can edit this information if private sample IDs differ from those provided in the fasta file.To add metadata manually, simply fill in the fields through the web interface.
- If samples have the same metadata for Collection Date, Collection Location, and/or Sequencing Date, you can select "Apply to All" after filling in the first sample entry. This will automatically fill in information for the rest of the samples.
After filling in the information for the first sample, you can click "Apply to all" to automatically fill information for the rest of the samples if they have the same information for a given column.
- If you don’t have information for optional metadata, leave the fields blank.
- After completing the required metadata fields, you will be able to continue to the next page to review your submission (see Step 4).After completing the required fields, you will be able to continue to the next page.
Downloading, completing, and uploading metadata file
Once you upload your sequence fasta file(s), the platform will allow you to download a metadata table in TSV format (tab-delimited), where the "Sample Name" column has been automatically populated based on your fasta file(s). If you are uploading > 100 sequences, you will not see an interactive metadata table. The table will be static and, thus, you will not be able to make edits directly through the web interface. All changes will have be made through the metadata file.
To download the table template, complete the rest of the metadata locally on your computer, and upload the metadata file to the platform:
- Click on "Download SARS-CoV-2 Metadata Template (TSV)" for SARS-CoV-2 samples or "Download General Viral Metadata Template (TSV)" for Mpox samples. After selecting your sequence file(s), you can download a metadata file template. Note that when uploading 100 samples or more you will not be able to edit the metadata table directly through the web interface.
- When you open the downloaded metadata file, it should have seven columns (one for each of the required and optional metadata) and three example rows already filled in for your reference (Example Samples A, B, C). You can choose to delete these example rows or keep them in your metadata file (the system will recognize the default examples).Example metadata table downloaded from the Metadata and Sharing page after selecting 7 sequences to upload.
- Fill in the required metadata (Private ID, Collection Date, Collection Location). Sample privacy ("Sample is Private" column) is the only required entry that can be left blank in the metadata file. When you upload the metadata file, you will notice the "Sample is Private" entries will automatically default to "No" if the column was left blank in the metadata file.
- If you don’t have information for the optional metadata, simply leave entries for those columns blank (DO NOT delete the column).
- Save your metadata file as tab- (file extension ".tsv") or comma-delimited (file extension ".csv") on your computer. Upload the metadata file to the platform by clicking on "Select Metadata File" and selecting the appropriate file from your computer browser. Note that you can only select one file. The metadata table on the web interface will be automatically filled in with the uploaded information.
To upload the completed metadata file, click on "Select Metadata File".
- Note that if you are uploading metadata for a large number of samples, it may take a few minutes for the information to be added to the web interface. Be patient!
When uploading metadata for a large number of samples, the platform might take a few minutes to upload. This delay may result in an "Unresponsive Page" pop up message in some computer systems. If this happens, make sure to click on "Wait" instead of "Exit Page".
- If there is a problem with your metadata file (for example, missing required fields), you will see error and/or warning messages. Errors will need to be fixed to be able to upload the metadata file. See Uploading data: Troubleshooting guide for a description of common errors and warning messages and how to resolve them. After fixing the errors, re-upload the metadata file by clicking on "Select Metadata File" and selecting the appropriate edited file. Note that re-uploading a metadata file will overwrite the original file.Examples of error warning messages after uploading a metadata file. Specific sample entries resulting in errors and/or warnings will be listed under each message and highlighted within the metadata table. Fix the errors and re-upload the metadata file by clicking on "Select Metadata File" and selecting the appropriate edited file. Note that re-uploading a metadata file will overwrite the original file.
- If the file upload is successful, you will see a checkmark by the uploaded file name and no error messages. Once you have successfully uploaded your metadata file, you will be able to continue to the Review page to review your submission (see Step 4).After successfully adding the metadata file without errors, you will be able to continue to the Review page.
Step 4: Review your submission and upload data
Please review the information on the Review page carefully before starting the final upload process.
- Review the sample summary and tableAfter adding metadata (Step 3) you will be prompted to review your submission information on the Review page. Make sure that the "Sample is Private" field is accurate.
- When samples are done uploading you will see an "Upload Complete" message. Click on "Go to Samples" to see uploaded samples in your main dashboard.Message indicating that samples have been uploaded to the platform.
Main dashboard (Sample page)
After data upload is complete you will be able to see your newly uploaded samples in your main dashboard or Sample page for the pathogen of interest. Public samples will have a globe icon, whereas private samples will have a lock icon. Note that there are new metadata columns on the dashboard:
Public ID: Preferably, a GISAID ID or GenBank Isolate Name for SARS-CoV-2 or GenBank Accession ID for Mpox. If GISAID or GenBank IDs are not available, the CZ GEN EPI platform will automatically generate Public IDs based on when the sample was uploaded (sequential number/year). Note that private samples will also receive a Public ID. However, private samples will not be visible to other groups. Since a GISAID or GenBank ID is the preferred Public ID, we encourage users to submit sequences to GISAID and/or GenBank and update sample Public IDs by editing metadata within the platform whenever possible.
Lineage: SARS-CoV-2 or Mpox lineage for each sample will be automatically detected and assigned by the platform after a few minutes. Lineages are assigned using the UShER phylogenetic placement tool (SARS-CoV-2) or Nextclade (Mpox). Lineage phylogenetic placement is verified routinely and updated as needed. Therefore, lineage assignments might change over time in your dashboard.
GISAID (see note): Indicates whether or not a SARS-CoV-2 sample has been submitted to the GISAID repository by the user. This field is not included in the Mpox Sample page. If a given SARS-CoV-2 sample has been submitted and accepted by GISAID, you will see a GISAID ID under Public ID. If the sample was not accepted, you will see "Not accepted" under the GISAID column. If the sample is private or has not been submitted to GISAID, it will read "Not found". This is an optional field.
SARS-CoV-2 Sample page showing uploaded samples. Note the different icons between public and private samples.SARS-CoV-2 and Mpox lineages will be automatically detected and assigned to each sample after sequences are processed within the platform. Additionally, sequence quality will be automatically assessed using Nextclade.
Note on GISAID
As of April 13, 2023 CZ GEN EPI is only using GenBank to automatically collect SARS-CoV-2 contextual samples for phylogenetic analyses instead of GISAID. You may notice that the GISAID column is no longer available on the metadata table. Click here to learn more details about the transition from GISAID to GenBank.
Ready to add more samples to your account?
Click "Upload" on the right-hand corner of your dashboard and follow Steps 2 through 4 of the guide.