What is a lineage?
To facilitate surveillance efforts, SARS-CoV-2 viruses that are closely related and share signature mutations (genetic changes) are tracked through lineages or variants. A lineage is a group of closely related viruses that evolved from a common ancestor and, thus, share genetic history. A variant refers to a virus with mutations relative to the original SARS-CoV-2 virus detected in 2019. Certain variants with a defining set of mutations can be of more public health importance than others. For this reason, SARS-CoV-2 variants have been named and tracked by Pango, Nextstrain, and GISAID. Each of these platforms has their own nomenclature system that highlights specific virus mutations, but the Pango lineage and Nextstrain clade nomenclatures are the most widely used. When a given variant is demonstrated to be a public health threat, namely ‘variants of concern’ (VOC), it is named following the Greek alphabet (Alpha, Beta, Gamma, Delta, etc). The World Health Organization (WHO) uses this Greek letter nomenclature system to label VOC, which makes it easier to discuss SARS-CoV-2 dynamics and public health responses with general audiences. To see an overview of SARS-CoV-2 variants and their lineage or clade classification, including cross references between nomenclature systems, visit CoVariants.org and WHO: Tracking SARS-CoV-2 variants.
How does CZ GEN EPI assign SARS-CoV-2 lineages?
When users upload their data to CZ GEN EPI, the platform automatically assigns lineages to samples using the Phylogenetic Assignment of Named Global Outbreak Lineages (pangolin) tool. Lineages are named following the Pango nomenclature, known as Pango lineages. The pangolin tool uses three approaches to assign a Pango lineage, including pangoLEARN, UShER, and Scorpio. PangoLEARN uses a machine learning algorithm to assign lineages. UShER is a phylogenetic placement approach and lineages are assigned based on where sequences land when added to a pre-calculated SARS-CoV-2 phylogeny. Scorpio classifies sequences based on the presence of specific, lineage-defining mutations. Given that lineage-specific mutations need to be defined for Scorpio to assign lineages, this approach is limited to the classification of VOC or other variants that require monitoring and have been curated.
Lineages are assigned through pangolin using UShER (default) or pangoLEARN and assignments are supplemented with information from Scorpio, such as VOC name, whenever possible. Users can see which approach was used to assign a given lineage in their CZ GEN EPI account by going to the Sample page and hovering over the assigned lineage for a sample of interest.
Sample page showing lineage assignment details for three samples. This information appears when users hover over a lineage name. Note that some assignments will not have a Scorpio call. This may happen when looking at variants that are not considered VOC or other variants that are not being closely monitored.
If you are interested in learning about a specific lineage listed in your account see lineage notes. The Pango lineage system is updated to accommodate the growing number and diversity of SARS-CoV-2 samples and CZ GEN EPI always uses the latest Pango lineage definitions. Updates may lead to better phylogenetic resolution and, as a consequence, sample lineage reassignments. Therefore, users may notice that some lineage assignments change over time.
Want to learn more about SARS-CoV-2 lineages?
Check out these articles to get you started: