Jump to Section:
Overview
One of the main functionalities of CZ GEN EPI is to facilitate phylogenetic tree building through Nextstrain. By easily accessing and building several types of phylogenetic trees, users are able to better understand how your samples fit within the context of pathogen dynamics inside and outside of their jurisdiction. Below is a description of phylogenetic tree types generated by CZ GEN EPI.
After reading this user guide, you will be able to:
- Understand ways of generating phylogenetic trees within CZ GEN EPI
- Understand the purpose of automatic tree builds
- Understand the purpose of on-demand tree builds
Generating Phylogenetic Trees
CZ GEN EPI currently generates phylogenetic trees in two ways:
- Automatically: CZ GEN EPI automatically generates an ‘Overview tree’ that provides a broad overview of viral diversity. Automatic trees are generated daily.
- On-demand: You, or someone within your group, can include samples of interest in ‘Overview’, ‘Targeted’, or ‘Non-contextualized’ trees. On-demand trees are generated whenever users need them, but note that tree building takes time (up to 12 hours).
All Nextstrain-generated trees can be accessed through the Phylogenetic Tree page. Note that phylogenetic placements made through UShER are not saved on the Phylogenetic Tree page.
Automatically generated Overview trees
Automatic trees are built by CZ GEN EPI on a daily basis. These routinely updated phylogenies are designed to provide a broad overview of the viral diversity in your samples and how it compares to viral diversity outside of your jurisdiction.
Tree image summarizing highlights for automatically generated phylogenetic trees.
Note the following:
- The tree will display samples automatically selected by Nextstrain, including:
- Samples from your jurisdiction: The number of samples from your jurisdiction for the automatic build is limited to 2000. Therefore, subsampling will occur if there are more than 2000 samples available from your jurisdiction.
- Samples outside of your jurisdiction: Samples that are closely related to samples in your jurisdiction, including samples collected nationally and internationally. These samples provide important context for understanding and interpreting the viral diversity and its evolution within the jurisdiction. National samples will include samples collected within your state (maximum of 500 sequences) and the broader USA (maximum of 400 sequences). International samples will include a maximum of 100 sequences.
- The tree will include samples from the past 3 months (12 weeks) from your jurisdiction to provide better support and visualization of recent cases.
- All automatic tree builds will be named ‘{Your jurisdiction name} Contextual Recency-Focused Build’ by default. However, you can edit tree names as needed through the Phylogenetic Tree page.
- Automatic trees fall within the ‘Overview’ tree category, which is designed to better understand the overall picture of viral diversity within your jurisdiction.
On-demand phylogenetic trees
There are three types of phylogenetic trees that you can build with specific samples of interest. The types of phylogenetic trees include: Overview, Targeted and Non-contextualized. Go to ‘Build on-demand trees’ to learn details about how to select samples and build each of the trees described below.
Overview trees
Overview trees: Purpose
On-demand overview trees are designed to better understand the overall picture of viral diversity within your jurisdiction for lineages and/or time period of interest. You can also specify samples of interest. Note that this contrasts with automatically generated overview trees, where all the samples are selected by Nextstrain.
Overview trees: Included samples
The overview trees contain samples from your jurisdiction and genetically similar samples from outside of the jurisdiction. Although this tree is similar to the CZ GEN EPI automatically generated Overview trees, you can customize the on-demand Overview tree. This on-demand tree version allows you to:
- Specify lineages and/or collection date range of interest to narrow down samples from your jurisdiction.
- If you have over 2000 samples in your account, you can specify samples to bypass subsampling and force the build to include samples of interest on the tree.
- Learn how to select samples for Overview trees here.
The on-demand Overview tree also includes samples that are randomly selected to represent samples collected over time and space. Randomly-selected samples are important to help maintain accuracy and precision of virus evolutionary rate. Additionally, randomly-selected samples reflect the history of the pandemic.
Overview trees: Limitations
Note that tree customization will hinder an unbiased overview of the viral diversity within your jurisdiction compared to the CZ GEN EPI automatically generated tree. Customized Overview trees are biased because the samples on the tree are not evenly subsampled across time given that you are choosing specific samples.
Targeted trees
Targeted trees: Purpose
Targeted trees facilitate outbreak investigations because they allow you to identify and examine samples most closely related to samples of interest, such as those from a potential outbreak. Compared to the Overview tree, the Targeted tree allows for a higher resolution of contextual samples by keeping as many closely related samples as possible. In contrast, contextual samples that have identical sequences are usually removed to allow more viral diversity in Overview trees.
Typical questions addressed with Targeted trees include:
- Are all of my samples part of the same outbreak?
- Did a given outbreak originate from a single introduction event?
- Has an outbreak been contained to a localized setting or did it spread and is circulating within the wider community?
Tree image summarizing highlights for Targeted phylogenetic trees.
Targeted trees: Included samples
First, you select samples of interest which become the focal samples for the tree (learn how to add samples to Targeted tree here). Nextstrain then adds twice the amount of contextual samples based on close genetic similarities to focal samples. Half of the contextual samples will be chosen regardless of collection location based on genetic similarity alone. The other half will represent samples collected over time from other regions within the state, outside of your state, and internationally.
Targeted trees also include samples that are randomly selected to represent samples collected over time and space. Randomly-selected samples are important to help maintain accuracy and precision of virus evolutionary rate. Additionally, randomly-selected samples reflect the history of the pandemic.
Non-contextualized trees
Non-contextualized trees: Purpose
Non-contextualized trees are designed to provide an overview of samples collected within your jurisdiction alone. Looking at samples from your jurisdiction may help you evaluate the following:
-
Do populations sampled by other groups within my jurisdiction show different patterns of viral diversity than what my group has captured?
-
If there is a different pattern in viral diversity sampled between groups in my jurisdiction, is my group preferentially sampling certain viral lineages?
Tree image summarizing highlights for Non-contextualized trees.
Non-contextualized trees: Included samples
Non-contextualized trees show all of the samples collected within your jurisdiction (up to 2000), including your samples and public data from CZ GEN EPI and GISAID. Note the following:
- If there are over 2000 samples from your jurisdiction, Nextstrain will automatically select samples through subsampling based on temporal representation rather than close relationships between samples.
- You can select samples of interest for the Non-contextualized tree build to force them to be included in the tree (see how to add samples of interest to Non-contextualized trees here). Unlike other trees, selected samples for Non-contextualized trees will not inform contextual samples because there are none in this version.
- If you select a large number of samples for the tree build, it may bias the proportion of lineages on the tree. This is because user-selected samples will not be subject to the same temporal subsampling as the Nextstrain-selected samples. Therefore, minimizing user-selection leads to a more unbiased tree.
Non-contextualized trees: Limitations
Note that the lack of contextual data makes it impossible to make sound epidemiologic inferences. Therefore, we caution against the use of non-contextualized trees for any purpose other than simply looking at viral diversity across the jurisdiction.
Comments
0 comments
Please sign in to leave a comment.