Ultraconserved, super weird
When the human genome was published in 2001, regions with perfect sequence identity with mouse and rat genomes were soon identified (Bejerano et al., 2004). Despite millions of years of divergence between humans and rodents, 481 regions have been perfectly conserved -- not a single base pair is different over a stretch of ≥200 nucleotides for each of these regions. This conservation extends as far as chickens (96% sequence identity; divergence from humans ~310 million years ago, MYA), and with somewhat relaxed definitions (e.g., considering sequences ≥100 bp in length), thousands of sequences with remarkable conservation are found in opossum (diverged from humans ~180 MYA), frog (~360 MYA), and pufferfish (~450 MYA) (Stephen et al., 2008). Furthermore, these sequences are often located adjacent to critical developmental genes (Woolfe et al., 2005) and possess enhancer activity (Pennacchio et al., 2006). Given these striking features, proving the essentiality of these DNA elements should be easy . . . right?
In 2007, Ahituv et al. reported the deletion of ultraconserved enhancers adjacent to known essential genes (Ahituv et al., 2007). Unexpectedly, mice with deletion of any of four ultraconserved elements (uc248, uc329, uc47, or uc482) were viable with no post-natal phenotypes. Given these data, some speculated that ultraconserved elements may not arise from purifying selection. However, enhancer elements often have “backups” such that loss of one enhancer’s activity can be functionally compensated by another (Osterwalder et al., 2018). Accordingly, mice with combinatorial loss of two ultraconserved enhancers flanking a gene with known roles in the central nervous system have decreased weight and neuron density (Dickel et al., 2018). However, the phenotypes observed in these animals is less dramatic compared to the originally hypothesized lethality arising from ultraconserved element loss. Furthermore, thousands of single and combinatorial ultraconserved element deletion cell lines have effectively no phenotype (Schneider et al., 2019). Based on these studies, it appears that the most highly conserved elements of the human genome have subtle, if any, functional role. That’s weird.
Two landmark papers inspired us to study ultraconserved elements. Lareau et al. and Ni et al. discovered that ultraconserved elements frequently overlap exons that are subject to alternative splicing. More than that, these ultraconserved alternative exons were most commonly found in genes encoding RNA binding proteins. For example, ultraconserved, alternatively spliced exons are found in every member of the SR gene family. Even more than that, a significant fraction of ultraconserved alternative exons, when spliced into mRNA, introduce a premature translation termination codon. These stop codon-containing exons signal their host transcript for degradation via the nonsense-mediated decay (NMD) pathway (Kurosaki et al., 2019), and thus, are termed “poison exons” based on their ability to “poison” their host transcript. So, the situation is this: over the course of hundreds of millions of years of evolution, sequence elements that lead to the destruction of their host transcripts have been perfectly conserved and shared between many species. That’s super weird.
I dig the pig
To study ultraconserved alternative exons, we needed a tool to experimentally manipulate them. However, when we began our work, there were no suitable tools available. RNAi can knock down genes, and while it can be adapted to target specific isoforms, total gene expression is affected. Furthermore, RNAi cannot be used to study the effects of poison exon deletion because skipping a poison exon leads to increased transcript abundance. Antisense oligonucleotides can manipulate splice isoform ratios, but this technology is low throughput, and it is hard to predict which target sequences will have the most potent effect on splicing outcomes. CRISPR/Cas9 is highly efficient, generates predictable editing outcomes, and is amenable to massively multiplexed screens. However, when we began our work, CRISPR/Cas9 was most routinely used to generate gene knockouts.
Motivated by reports that simultaneous delivery of two gRNAs (paired-gRNAs, pgRNAs) could generate kilobase deletions (Diao et al., 2017; Gasperini et al., 2017; Zheng et al., 2014; Zhu et al., 2016), we reasoned that we could delete entire alternative exons and/or cis-elements required for their inclusion. This approach came to be affectionately known within our lab as “paired guide RNAs for alternative exon removal”, or pgFARM (pronounced pig farm).
Using pgFARM, we (1) performed a variety of proof-of-principle studies to optimize the technique and (2) explored functional roles for previously uncharacterized isoforms. In the course of this work, we established several of pgFARM’s key features:
Efficiency: for many substrates, we can enforce nearly 100% exon exclusion, allowing for identification of splicing-dependent phenotypes in polyclonal cell lines. For example, pgFARM-mediated skipping of MBNL1’s exon five (contains a nuclear localization sequence) results in specific loss of nuclear localized MBNL1 protein isoforms.
Specificity: for all tested substrates, we detected no evidence of unwanted, cryptic splice isoforms -- we only affected the inclusion of the targeted exon.
Speed: pgFARM constructs can be designed and cloned in ~1 week. Once these vectors are delivered to Cas9 expressing cell lines, exon exclusion isoforms can be detected in < 48 hours. Therefore, it is feasible to develop specific RNA isoform knockout cell lines within a few weeks.
Generalizability: pgFARM is robust in multiple human and mouse cell lines. In our manuscript, we confirmed pgFARM works for > 20 distinct exons -- a number that continues to grow as we apply pgFARM in new settings.
- Scalability: we took advantage of CRISPR/Cas9’s scalability to perform high-throughput functional screens. Our goal was twofold. First, we wanted to test if pgFARM could provide a methodology to increase the “resolution” of functional genomics from genes to transcripts. Second, we wanted to test the intuitive, but previously unproven, hypothesis that ultraconserved exons are essential for cell viability. Toward these goals, we constructed a library containing ~10,000 pgRNAs targeting coding and poison exons in hundreds of genes:
Our high-throughput functional screens provided us with our first insights into which conserved poison exons are essential for cell growth. For example, in HeLa cells, 43% of the poison exons targeted in our library significantly decreased cell growth over a two week time course compared to 58% of constitutive exons -- only a modest increase relative to poison exons. These data suggested that poison exons are frequently required for normal cell growth in a manner similar to constitutive coding exons in the same gene. Consistent with this, we validated that pgFARM-mediated exclusion of coding and conserved poison exons in the genes CPSF4 and SMG1 resulted in reduced cell growth.
Tumor suppressor exons
More than 100 poison exons were required for normal cell growth in vitro. However, we also observed examples of genes (e.g., SNRNP70) where targeting the coding exon had dramatic effects on cell viability but targeting the corresponding poison exon had little or no effect on cell growth. Several SR genes showed these trends, which sparked our interest because overexpression of SRSF1 is sufficient to transform cells (Anczuków et al., 2012) and many SR genes are abnormally expressed in a variety of cancers (Park et al., 2019; Urbanski et al., 2018). Given that poison exons regulate gene product abundance, why were these exons not enriched in our in vitro screen? We began to suspect that our cell viability screening approach in vitro was not sufficient for robustly uncovering pro-proliferative phenotypes. We speculated that since transformed cell lines such as HeLa cells are already so highly proliferative in culture, it would be hard to further “take the brakes off”. Therefore, we sought out a more stringent selection pressure -- in vivo tumorigenesis.
Inspired by work from Chen et al., we repeated our poison exon knockout screen but instead of simply growing the cells in culture, we implanted them into mice. For these experiments, I greatly benefitted from the expertise of Alice Berger and Maria McSharry, who taught me how to perform xenograft experiments. After weeks of growth in vivo, we harvested tumor samples and profiled the pgFARM library diversity using Illumina sequencing. As for our in vitro screens, we found that many poison exons are essential for cell growth in vivo. However, when we turned our attention to poison exons in SR and hnRNP genes, the story was very different. While targeting the coding exons of these genes resulted in strong depletion, pgRNAs targeting their poison exons were enriched. This was an incredibly exciting result because it suggests that the oncogenic effects of certain SR and hnRNP proteins are constrained by the inclusion of their poison exons. In other words, in certain contexts, poison exons are tumor suppressors. We found that the tumor suppressor activity of poison exons extends beyond SR and hnRNP genes. In total, we found 61 enriched poison exons and performed validation experiments showing that pgFARM-mediated exclusion of a poison exon in EPC1 is sufficient to increase tumor growth in vivo.
As poison exon inclusion is frequently auto-regulated, it is possible that SR, hnRNP, and other genes control the inclusion of their own poison exons as “fail safes” to ensure that their expression does not reach levels that would activate cell malignancy. Interestingly, SR and hnRNP genes frequently contain multiple NMD-sensitive splice variants, suggesting that post-transcriptional control of their abundance has redundancy similar to ultraconserved enhancer redundancy mentioned above. In the future, it will be interesting to test if disrupting multiple NMD-inducing splicing events in the same gene enhances the phenotypes observed in our study.
Moving genomics from genes to transcripts
While identifying RNA isoforms and quantifying their abundance is relatively straightforward using RNA-seq, a major bottleneck in the RNA splicing field has been the lack of tools to functionally interrogate specific isoforms. In this regard, we believe pgFARM is a major step forward, and we hope that it is used, and improved upon, by other investigators to study how RNA processing contributes to health and disease.
In addition to pgFARM, other important strategies are being employed to manipulate RNA splicing. These include methods such as CRISPR-SKIP (Gapinske et al., 2018) and TAM (Yuan et al., 2018) which use Cas9-directed base editors to mutate splice sites. Disease-associated mis-splicing has also been corrected by delivering single gRNAs to mutate splice sites in the DMD gene (Long et al., 2018). Additionally, direct manipulation of splicing at the RNA level is possible through the use of RNA targeting Cas variants such as dCasRx (Konermann et al., 2018)
Ultimately, the most appropriate tool will depend on the specific needs of each experiment. For example, a current limitation of pgFARM is its inability to enforce exon inclusion. To achieve this, methods such as dCasRx or TAM might be most suitable. On the other hand, because dCasRx directly targets RNA, its efficacy might differ depending on the targeted transcript and the relative abundance of substrate-specific RNA binding proteins in a given cell type. In this regard, pgFARM has advantages because it is relatively substrate agnostic -- Cas9’s ability to cut different regions of DNA is similar (or, at least, predictable). Additionally, because DNA-targeting strategies such as pgFARM induce permanent edits, they might be useful for long-term experiments such as in vivo studies or the generation of clonal cell lines.
Overall, we hope that the ongoing development of new DNA- and RNA-targeting CRISPR/Cas9 systems will provide a diverse toolkit to manipulate various forms of RNA processing including cassette exon exclusion/inclusion, alternative 5'/3' splice site selection, alternative branchpoint selection, and alternative cleavage and polyadenylation site choice.
RNA processing in health and disease
Our study uncovered roles for conserved poison exons in cell growth and tumorigenesis. In the future, it will be important to test if ultraconserved poison exons are required for embryonic viability. As mentioned above, several of the genes we studied produce multiple NMD-sensitive splice variants. Therefore, for these genes, the essentiality of a poison exon could be masked by functional redundancy provided by other NMD-sensitizing features. Strategies to disrupt multiple features in the same cell will be important and may require simultaneous delivery of many gRNAs (Sanson et al., 2019).
While we were interested in studying conserved poison exons given their incredible sequence conservation, our work also served the more general goal of optimizing pgFARM for future use in exploring how specific disease-associated RNA isoforms contribute to cancer phenotypes. RNA-seq studies of human cancers frequently reveal hundreds or thousands of aberrantly expressed RNA isoforms. Parsing through this data and deciding which isoforms might contribute to specific hallmarks of cancer is a daunting task: “Does this isoform contribute to increased angiogenesis? Or maybe it’s the mis-splicing of this exon that increases a cell’s resistance to apoptotic signals? Or maybe this one confers resistance to chemotherapy?” Through the design and application of rationally designed, disease-centric pgRNA libraries, we hope that pgFARM will help find the needle in a haystack: