Jump to Section:
You, or someone from your group, can build phylogenetic trees in CZ GEN EPI with specific samples of interest. You can also specify the type of tree build (Overview, Targeted or Non-contextualized) based on your epidemiological questions. Given that you are deciding when to build trees and selecting tree types, these are known as "on-demand" phylogenetic trees, which are different from the automatic tree builds. Below are details about how to build on-demand phylogenetic trees.
After reading this user guide, you will be able to:
- Generate on-demand phylogenetic trees
- Further customize on-demand trees
- Understand why some tree builds may fail
Generating phylogenetic trees on-demand
See steps below to build an on-demand Overview, Targeted, or Non-contextualized phylogenetic tree with samples of interest.
- (Optional) When selecting samples of interest to force include in your tree build, the first option is to select samples from the table on your Sample page. Alternatively, you can add samples using their CZ GEN EPI Private or Public IDs and/or IDs from public databases (GISAID ID for SARS-CoC-2 or GenBank ID for Mpox) (see step 4 below). To include samples from the table on your Sample page, select samples of interest by adding a checkmark. Remember that you can filter samples on the table to search and select samples of interest.
Select samples for on-demand phylogenetic tree building and click on the tree icon representing the "Run Phylogenetic Analysis" dropdown menu on the right-hand side of the Sample page.
- Click on "Nextstrain Phylogenetic Tree" from the "Run Phylogenetic Analysis" dropdown menu on the right-hand side of the page.
Select "Nextstrain Phylogenetic Tree" from the dropdown menu.
- A dialog box will appear where you will provide a name for the tree build and specify the desired tree type. The tree type will depend on the questions you are trying to answer (see information about the different types of on-demand trees here).
Dialog box for specifying tree name, type, and, if applicable, samples to be force-included using sample IDs.You will see a warning at the bottom of the Create New Phylogenetic Tree dialog box if you select samples with "bad" QC tags from the samples table. Although this will not prevent you from running the analysis, you should be mindful of sequences containing errors that may result in misleading or unreliable trees.
- When you click on the different types of trees you will see information summarizing the main purpose of each tree type. Select the most suitable tree for your analysis. Note that you can define samples for Overview and Non-contextualized trees based on location and/or collection date range of interest (see below for details). You can also define SARS-CoV-2 samples by lineage. For Targeted trees, you can opt to preferentially select contextual samples from a given location. If you don't define samples here, Nextstrain will automatically select samples from your default location.
When selecting a tree type within the dialog box, you will see a summary describing its main purpose. Options for SARS-CoV-2 Overview and Non-contextualized trees include dropdown menus where you can define samples by location, lineage, and/or collection date range. Mpox Overview and Non-contextualized trees can only be defined by location and/or collection date range. For Targeted trees, you can opt to preferentially select contextual samples from a given location.
- (Optional) You can also use the "Force-include samples by ID (optional)" box to add samples from CZ GEN EPI and/or public databases (GISAID for SARS-CoV-2 or GenBank for Mpox) to your tree. Note that samples selected on the Sample page (step 1) and/or added using sample IDs through the dialog box will be force-included on your tree without undergoing quality control. See details about sample selection here.
Scroll down within the "Create New Phylogenetic Tree" dialog box to find the field where you can add samples for your tree build under "Force-Include Samples by ID (optional)". In this example, three samples were selected from the Sample page table and two other samples were added by specifying their IDs within the dialog box.
Note the following when adding samples through the "Create New Phylogenetic Tree" dialog box:
- Use GISAID IDs to add SARS-CoV-2 samples from GISAID (e.g., hCoV-19/USA/CA-CDPH-500000901/2021 or USA/CA-CDPH-500000901/2021).
- Use GenBank accession numbers to add Mpox samples from GenBank (e.g., U12345 or AF123456)
- Use CZ GEN EPI Public IDs or Private IDs to add samples from CZ GEN EPI.
- Multiple sample IDs must be separated by tabs, commas, or enter one ID per row.
- In the case of Targeted trees, samples specified here also serve as the focal samples that CZ GEN EPI uses to add all contextual samples.
- Added samples will not be affected by user-defined samples (see section below).
- Adding more than 2000 samples will increase the tree building run time.
Samples selected through steps 1 and 4 are referred to as "user-selected samples".
- After making all of your tree selections (samples, tree name, and type), click on "Create tree" at the bottom of the "Create New Phylogenetic Tree" dialog box.
To begin your tree build, click on "Create Tree" at the bottom of the dialog box after specifying your selections (samples, tree name, and tree type).
- Tree builds may take up to 12 hours. Go to the Phylogenetic Tree page to check the tree status. Once the build is complete, you will be able to view the tree in Nextstrain, download it, edit its name, and/or delete it (learn about functionalities within the Phylogenetic Tree page here).
Click on the Phylogenetic Tree page to see tree build status. Once the tree build is completed, you will be able to click on icons to view the tree in Nextstrain, interpret it using Galago, and/or download it. You can also edit the tree name or delete it by clicking on the "More Actions" dropdown menu.
- View your tree in Nextstrain.
Customizing on-demand tree builds
All on-demand tree builds can be customized by specifying which samples you would like to see on the tree. Samples can be directly selected from the sample table and/or force-included using sample IDs (steps 1 and 4 from previous section). These selections are referred to as user-selected samples. However, you can further customize Overview and Non-contextualized trees by defining samples of interest based on location, lineage(s) and/or collection date range. These selections for Overview and Non-contextualized trees are referred to as user-defined samples. For Targeted trees, you can opt to preferentially include contextual samples from a given location. Note that sample selections will impact which samples are chosen for the tree.
To further customize on-demand trees:
- After selecting the desired tree type within the "Create New Phylogenetic Tree" dialog box, specify options for location, lineage, and/or collection date range in the provided fields. Note that Mpox samples can only be defined based on location and/or collection date range.
Customization options for SARS-CoV-2 on-demand trees. Mpox tree builds have similar options but samples cannot be defined by lineage.
Note the following:
- Lineage and date range selections will limit which samples from your selected or default location will be included on the tree.
- User-selected samples (i.e., samples selected through the Sample page or those added using samples IDs within the "Create New Phylogenetic Tree" dialog box) will be force-included on your tree regardless of location, lineage, and/or date range parameters selected during customization. In other words, sample customization parameters added here will not overwrite user-selected samples. See the diagram below to better understand tree sample composition:
Diagram summarizing sample composition for on-demand trees.
- Click on the downward arrow under "Lineage" to see the list of lineages. To make selections, simply click on lineage(s) of interest. By default, "All" lineages are selected. This option is only available for SARS-CoV-2 tree builds.
Click on the downward arrow under "Lineage" to see the list of lineages. The total number of selected lineages will show within the lineage box. To deselect a given lineage, click on it and the check mark will be removed.
- Click on the downward arrow under "Collection Date" to specify a date range of interest. By default, "All time" is selected. You can choose from preset date ranges or specify dates following the YYYY-MM-DD format.
Click on the downward arrow under "Collection Date" to see date options. Selected dates will show on the Collection Date field.
When specifying date ranges, you can set a time interval. If you leave the beginning date blank, you are effectively selecting all the days prior to a given date. If you leave the end date blank, you are selecting days from a specified date until the present day. When done specifying the date range of interest, click on "Apply". Selected dates will then show on the Collection Date field.
- The location field will show your default location. If you are interested in a different location, define or change the location. Note that Mpox tree builds will show location only at the state level (Country/State). Use the search box under the Location dropdown menu to specify a location of interest.
Continue to build your tree (steps 5 through 7 from the previous section) after making your selections for location, lineage(s), and/or collection date range.
Failed on-demand tree builds
On-demand tree builds may fail if one or more of the user-defined customization parameters does not fit the data. In other words, if the tree build fails, it is usually because user-defined samples from a given location, lineage, and/or collection date range do not exist in either CZ GEN EPI or public databases (GISAID or GenBank). If this scenario does not apply to you and your on-demand tree builds are failing, please contact our team by sending an email to email@example.com.
If a tree build fails, you will see a "FAILED" status flag by the tree name on the Phylogenetic Trees page.
Please sign in to leave a comment.