Information

Why are there splice variants within the same organism? What might contribute to the need for the feature?


I only know splice variants are produced by different combinations of introns and exons. I wish to know why there is a need of such function. Perhaps using the same amount of DNA sequence to produce multiple proteins saves genetic material. Also, I want to know what contribute to the need for splice variants feature (e.g. evolutionary pressure, the need to increase the complexity)

I find that in human, protein CD81 is predicted by Ensembl to have a lot of splice variants (http://grch37.ensembl.org/Homo_sapiens/Gene/Splice?db=core;g=ENSG00000110651;r=11:2398547-2418649;t=ENST00000475945). However, only 1 of them is characterized. Thus, splice variants seem redundant. Any examples of splice variants actually carry different function?


The reason is very simply to provide enough variation in a limited sized genome to produce the repertoire of proteins produced by the cells of multicellular organisms. It is also a matter of efficiency and reduced energy consumption.

Consider that on average there are about 100,000 unique protein types being produced in a human cell [1], but the human genome is estimated to contain only 19,042 protein coding genes [2], then the cell needs some way to vary that limited instruction set.

Also remember that differentiated cell types express certain genes but not others, and produce different proteins that other cells do. So that implies that there are more than 100,000 different types of human proteins, and far fewer than 19,042 genes being expressed in any one cell at the same time.

So without splice variants, our genomes would either need to be far larger than the approximately 3 billion base pairs it is already or our repertoire of proteins would be significantly less. We would also have a lot of redundancy, as many common exons would have to be repeated over an over again. That would require a lot more energy for DNA synthesis, and nucleic acid synthesis, etc. The process would become inefficient rather quickly and would likely make complex multicellular animal life untenable.

There is a bit of an error in your question. Splice variants are identified by the mRNAs that are produced, and exons are defined by the sequence that is in the mature mRNA. Introns are, by definition spliced out of the pre-mRNA, meaning that splice variants are not "produced by different combinations of introns and exons." Splice variants will only consist of different combinations of exons. The only time an intron would be found in a mature mRNA is if a splice site is mutated and it is no longer recognized by the spliceosome, so it leaves the intron in incorrectly. This will generally result in a non functional protein.

EDIT

The collection of components required to carry out the intricate processes involved in generating and maintaining a living, breathing and, sometimes, thinking organism is staggeringly complex. Where do all of the parts come from? Early estimates stated that about 100,000 genes would be required to make up a mammal; however, the actual number is less than one-quarter of that, barely four times the number of genes in budding yeast. It is now clear that the 'missing' information is in large part provided by alternative splicing, the process by which multiple different functional messenger RNAs, and therefore proteins, can be synthesized from a single gene.

-Expansion of the eukaryotic proteome by alternative splicing: Nilsen and Graveley

Alternative splicing of pre‐messenger ribonucleic acid (pre‐mRNA) allows the generation of different mRNAs from the same gene. Evolution of alternative splicing affecting translated regions of mRNAs permits the synthesis of different proteins from a single gene, significantly increasing the diversity of the protein repertoire.

-Patthy, László(Apr 2008) Alternative Splicing: Evolution. In: eLS. John Wiley & Sons Ltd, Chichester.

Also while I am not one to accept a Nobel Prize at face value, they are usually awarded when the field accepts the explanation of the hypothesis. The 1993 Nobel Prize in Physiology and Medicine was awarded to Richard J. Roberts and Phillip A. Sharp "for their discovery of 'split genes'."


Insofar organismal complexity is defined as the number of known cell types, there is a strong relationship between splicing, the repertoire of isoforms and organismal complexity.

See paper here


Also, I want to know what contribute to the need for splice variants feature.

One common function of splice variants no one mentioned is to function as a dominant negative of a longer functional full length transcript. dominant negative splice variants So it allows for additional regulation of post-transcriptional processing.

This is actually quite a common function of splice variants.

In addition, as others have mentioned, it adds complexity at a reduced cost. Many transcription factor proteins have both transactivation domains (ability to interact with and regulate DNA transcription) and protein-protein interacting domains. With splicing variants, the organism can make different parts of that protein that are functional for only one subset of activity.


Frontiers in Celland Developmental Biology

The editor and reviewers' affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review.


  • Download Article
    • Download PDF
    • ReadCube
    • EPUB
    • XML (NLM)
    • Supplementary
      Material
    • EndNote
    • Reference Manager
    • Simple TEXT file
    • BibTex


    SHARE ON

    Organizational Cell Biology

    E.T. Spiliotis , L. Dolat , in Encyclopedia of Cell Biology , 2016

    Conclusions

    Septins comprise a network of filaments that interface with cell membranes and the actin and microtubule cytoskeleton. Septins are integral components of the cell membrane’s skeleton impeding lateral diffusion and vesicle delivery, and thereby, affecting the generation and maintenance of distinct membrane compartments and domains. In the cytoplasm, septins act as filamentous actin- and MAPs that regulate the spatial organization and functions of the cytoskeleton. Septins interact directly with actomyosin and actin-binding proteins, affecting the localization, formation, cross-linking, and contractility of actin microfilaments. Septins bundle microtubules and appear to regulate the organization, dynamics, and posttranslational modifications of microtubules as well as their interactions with motors and MAPs. Thus, septins fine-tune the spatial organization and functions of the cell, regulating a diversity of cellular processes ranging from cell division and motility to cell death and autophagy.

    Abnormalities in septin expression and septin mutations underlie a number of disease states ( Dolat et al., 2014a ). Male infertility and Hereditary Neuralgic Amyotrophy (HNA), an autosomal dominant neuromuscular disorder, are linked to septin gene mutations ( Kuo et al., 2012 Kuhlenbaumer et al., 2005 ). Blood disorders such as the Bernard–Soulier Syndrome and acute myeloid and lymphoblastic leukemias are characterized by septin gene deletions and fusions with the mixed lineage leukemia (MLL) gene ( Cerveira et al., 2011 Bartsch et al., 2011 ). Septins are abnormally expressed in many solid tumors of epithelial origin including breast, ovarian, renal and colorectal carcinomas, and changes in epigenetic marks of septin expression have been utilized as diagnostic biomarkers ( Connolly et al., 2011 ). Alterations in septin expression have also been reported in neurodevelopmental disorders (e.g., Down syndrome and schizophrenia) and septin aggregates have been found in the neurofibrillary tangles and Lewy bodies of patients with neurodegenerative disorders such as Alzheimer’s and Parkinson’s ( Dolat et al., 2014a ). Recent studies indicate that septins are also important of host defense against pathogenic bacteria ( Mostowy and Cossart, 2011 ). While more studies are required to elucidate how abnormalities in septin expression contribute to the pathogenesis of human diseases, septins are integral components of the cytoskeleton and its functions. Future studies will enhance our understanding of septins as regulators of the spatial organization of the cell.


    Sandwalk

    This post is about a recent review of alternative splicing published by my colleague Ben Blencowe in the Dept. of Medical Genetics at the University of Toronto (Toronto, Ontario, Canada). (The other author is Jermej Ule of The Francis Crick Institute in London (UK).) They are strong supporters of the idea that alternative splicing is a common feature of most human genes.

    I am a strong supporter of the idea that most splice variants are due to splicing errors and only a few percent of human genes undergo true alternative spicing.

    This is a disagreement about the definition of "function." Is the mere existence of multiple splice variants evidence that they are biologically relevant (functional) or should we demand evidence of function&mdashsuch as conservation&mdashbefore accepting such a claim?

    Background: what are splice variants?

    Let me begin by defining some terms. Modern techniques are capable of detecting specific RNA molecules that may be present at less than one copy per cell. By scanning many different tissues, workers have compiled extensive lists of transcripts that are complementary to various parts of the genome. This gives rise to the idea of pervasive transcription and that was one of the reasons why ENCODE researchers claimed that most of our genome is functional.

    Most knowledgeable scientists now agree that many of those transcripts are spurious transcripts produced by accidental transcription. Many of those transcripts overlap with known genes and the primary transcript will be processed by splicing if it overlaps a splice site. This gives rise to transcripts that are characterized as splice variants and those transcripts are not so easily dismissed as mistakes by workers in the field of alternative splicing. That's because alternative splicing is a real phenomenon that has been well-studied in a few genes since the early 1980s.

    I restrict the term "alternative splicing" to those situations where the alternate transcripts are known to be biologically relevant, or when we have a strong reason to suspect true alternative splicing. In situations where the transcript variants don't have the characteristics of true alternative splicing, and where there's no evidence of biological relevance, I will refer to those transcripts as "transcript variants" or "splice variants." This differs from standard usage in the field where all the splice variants are automatically assumed to be examples of true alternative splicing. 1

    It's hard to find a modern up-to-date database that lists all the variants for an individual gene but it seems from scanning old databases that there may be dozens of splice variants for most genes. One of most widely quoted papers in the field is the Pan et al. (2008) paper from the Blencowe/Frey labs. This is the paper where they claim that 95% of human multiexon protein-coding genes are alternatively spliced and that there are, on average, "at least seven alternative splicing events" per gene.

    I reject this terminology. I would say there are at least seven splice variants per gene and it remains to be seen whether they are examples of splicing errors or true alternative splicing. Neverthelss, in spite of the lack of supporting evidence&mdashother than the mere existence of splice variants&mdashthis paper is widely quoted as evidence of pervasive alternative splicing.

    An example of splice variants

    The top figure below shows some of the splice variants for the human triose phosphate isomerase gene (TPI1) from the Ensembl: human database. I think these are only a small subset of the variants that have been reported for this gene but even in this small subset you can see predictions of eight different proteins plus two variants that don't encode proteins.

    The bottom figure is the same data for the mouse gene [Ensemble: mouse]. There are only three variants of the mouse TRI1 genes in the Ensemble database and only one of them is predicted to make a different protein&mdashone that's missing the C-terminal half of the protein. Note that the patterns of transcript variants of the mouse and human genes are not the same. Production of these variants is not conserved in mammals.

    Triose phosphate isomerase is an important metabolic enzyme found in all species, including bacteria. The enzyme catalyzes an important reaction in gluconeogenesis/glycolysis. The structure of the protein is well known and it's function is well understood. It seems very unlikely that humans would make seven functional variants of this protein especially since none of them are found in other mammals.

    (Note: There seems to be an increasing reluctance to publish examples of transcript variants for specific genes. I can't recall when I've last seen any images like the ones I posted above. I wonder if this is because the proponents of alternative splicing are embarrassed to show representations of the data or whether they don't look at it themselves. I suspect the latter explanation. It seems as though workers in the field are increasingly relying on bioinformatic analysis of transcript variant databases without ever actually looking at specific genes to see if the databases make sense. It's time to re-issue my Challenge to Fans of Alternative Splicing.)

    The Deflated Ego Problem

    The controversy over the frequency of alternative splicing is related to something I call The Deflated Ego Problem. The "problem" is based on the view that humans are extraordinarily complex compared to other species and that this complexity should be reflected in the number of genes. Many scientists were "shocked" to discover that humans don't have very many more genes than the nematode Caenorhabditis elegans and even fewer genes than some flowering plants.

    In order to preserve their view of human exceptionalism, these shocked scientists have been forced to come up with an explanation for this "anomaly." I listed seven of these explanations in the Deflated Ego post but the one I want to draw your attention to is alternative splicing. The idea is that while humans may not have a lot more genes than nematodes, they make much better use of those genes by producing multiple proteins from each gene. Thus, the complexity of humans is explained by alternative splicing and not by an increase in the number of genes.

    The lack of genes is often referred to as the G-value paradox (see Deflated egos and the G-value paradox). It's only a problem if you haven't been following the work of developmental biologists over the past forty years. They have established that complexity and species differences are usually explained by changes in how genes are regulated and not by large increases in the evolution of new genes [Revisiting the deflated ego problem]. There is no "problem" and scientists should not have been shocked. 2

    Here's an explicit explanation of the imaginary problem as expressed by Gil Ast in a 2005 Scientific American article (Ast, 2005).

    The old axiom "one gene, one protein" no longer holds true. The more complex an organism, the more likely it became that way by extracting multiple protein meanings from individual genes
    When a first draft of the human sequence was published the following summer, some observers were therefore shocked by the sequencing team's calculation of 30,000 to 35,000 protein-coding genes. The low number seemed almost embarrassing. In the years since, the human genome map has been finished and the gene estimate has been revised downward still further, to fewer than 25,000. During the same period, however, geneticists have come to understand that our low count might actually be viewed as a mark of our sophistication because humans make such incredibly versatile use of so few genes.

    Through a mechanism called alternative splicing, the information stored in the genes of complex organisms can be edited in a number of ways, making it possible for a single gene to specify two or more distinct proteins. As scientists compare the human genome to those of other organisms, they are realizing the extent to which alternative splicing accounts for much of the diversity among organisms with relativity similar gene sets .

    Indeed, the prevalence of alternative splicing appears to increase with an organism's complexity&mdashas many as three quarters of all human genes are subject to alternative splicing. The mechanism itself probably contributed to the evolution of that complexity and could drive our further evolution.

    This view has become standard dogma in the alternative splicing world so that almost every new paper begins with a reference to it as though it were established theory. It seems to be widely accepted that multiple versions of metabolic enzymes such as triose phosphate isomerase will explain human complexity. 3

    But it is not a fact that most genes exhibit some form of alternative splicing it's merely speculation designed to assuage deflated egos. Furthermore, the explanation relies on the assumption that less complex animals must make fewer proteins from a similar set of genes. Recent experiments have shown that this assumption is false so the whole argument falls apart [Alternative splicing in the nematode C. elegans].

    Explain these facts

    • Splicing is associated with a known error rate that's consistent with the production of frequent spurious splice variants. Explain why this fact is ignored.
    • The unusual transcript variants are usually present at less than one copy per cell. Explain how thousands of such rare transcripts could have a function.
    • The unusual transcript variants are rapidly degraded and usually don't leave the nucleus. What is their function?
    • The transcripts are not conserved, as expected if they are splicing errors. Give a rational evolutionary explanation for why we should ignore the lack of sequence conservation.
    • In the vast majority of cases, the predicted protein products of these transcripts have never been detected. Explain that.
    • The number of different unusual transcripts produced from each gene makes it extremely unlikely that they could all be biologically relevant. Explain how such strange transcripts, and even stranger protein variants, could have evolved.
    • The number of detectable transcripts correlates with the length of the gene and the number of introns, which is consistent with splicing errors. Explain how this is consistent with biologically relevant alternative splicing.
    • Gene annotators who have looked closely at the data have determined that >90% of them are spurious junk RNA or noise and they have not been included in the standard reference database. Why do genome annotators dismiss most splice variants?

    This brings me, finally, to the paper I want to discuss. It was published last October (2019).

    This review article begins with the statement that "Transcripts from nearly all human protein-coding genes undergo one or more forms of alternative splicing . " This statement is misleading, at best. I could easily make the case that nearly all genes produce multiple transcript variants but most of them are due to splicing errors. The interesting question is how many of them might, instead, be due to biologically relevant alternative splicing. The burden of proof is on those who claim functionality and, in the absence of evidence of function, the default assumption is junk RNA.

    Most of the review article deals with the variety of RNA-binding and DNA-binding proteins that give rise to splice variants. I don't find this very interesting since it's not clear whether these are spurious binding events that give rise to errors in splicing or whether they are biologically relevant.

    The authors clearly believe that alternative splicing ". accounts for the vast range of biological complexity and phenotypic attributes across metazoan species." They conclude that, ". it is becoming clear that alternative splicing has been particularly important for enriching proteomic complexity in animals in ways that have provided an expanded toolkit for evolution."

    It's important to note that the authors are aware of the fact that the pattern of production of splice variants is not conserved between species. In fact, they explicitly mention this point in support of their claim that ". alternative splice patterns have diverged rapidly among species." They believe that the lack of conservation can be explained away by postulating rapid selection such that the patterns of thousands of genes are different, even between closely related species. This is a common rationale (rapid selection for divergence) used to dismiss the lack of sequence conservation.

    The other interpretation, of course, is that most of the splice variants are due to splicing errors and that's why they are not conserved (see Using conservation to determine whether splice variants are functional for an extended discussion of this issue).

    The most interesting part of the review paper, in my opinion, is the section called "Function versus Noise or Evolutionary Fodder." This is the part of the paper that deals with the controversy and it's good to see it finally addressed since most papers on alternative splicing ignore it. Here's how Ule and Blencowe begin this section .

    Alternative splicing is well documented at the transcript level, and microarray and RNA-seq experiments routinely detect evidence for many thousands of splice variants. However, large-scale proteomics experiments identify few alternative isoforms. The gap between the numbers of alternative variants detected in large-scale transcriptomics experiments and proteomics analyses is real and is difficult to explain away as a purely technical phenomenon. While alternative splicing clearly does contribute to the cellular proteome, the proteomics evidence indicates that it is not as widespread a phenomenon as suggested by transcript data. In particular, the popular view that alternative splicing can somehow compensate for the perceived lack of complexity in the human proteome is manifestly wrong. [my emphasis LAM]

    . The results from large-scale proteomics experiments are in line with evidence from cross-species conservation, human population variation studies, and investigations into the relative effect of gene expression and alternative splicing. Gene expression levels, not alternative splicing, seem to be the key to tissue specificity. While a small number of alternative isoforms are conserved across species, have strong tissue dependence, and are translated in detectable quantities, most have variable tissue specificities and appear to be evolving neutrally. This suggests that most annotated alternative variants are unlikely to have a functional cellular role as proteins. [my emphasis, LAM]

    As you might have guessed, Ben Blencowe was unhappy with this result so he responded with a critical letter published in the same journal a few months later (Blencowe, 2017) [see Debating alternative splicing (Part IV)]. In that letter, he made the same points that he makes in the Ule and Blencowe review namely that the mass spec experiments are flawed for technical reasons&mdashthey are not detecting protein variants that should be there. However, the authors do concede that, ". alternative splicing events lie on an evolving spectrum of regulation and functionality therefore, it is very challenging to draw a line between those that are functional or non-functional."

    Tress et al. responded to Blencowe's letter back in 2017 (Tress et al., 2017b). As experts in proteomics they were probably aware of all of the objections that Blencowe raised, and many more. After considering Blencowe's criticisms, they write, "We believe our conclusions are well substantiated and invite readers to judge for themselves in the article and related papers."

    Resolving the controversy

    It don't think it's possible to state conclusively that almost all human protein-coding genes produce protein variants by biologically-relevant alternative splicing. Scientists who make such claims are wrong because there's nothing to support such a claim other than wishful thinking. On the other hand, it's not possible to conclude that most splice variants are noise, although I firmly believe that the evidence tilts in the direction of noise. The apppropriate null hypothesis is that the transcripts do not have a function and the burden of proof is on those who make the claim for function.

    The main problems I have with the alternative splicing literature are: (1) that proponents of widespread alternative splicing are using questionable evolutionary arguments to rationalize their claim, and (2) they are mostly ignoring any objections to their claims and refusing to acknolwedge that they could be mistaken.

    It's interesting that Ule and Blencowe do not address any of the other criticisms of alternative splicing. They only respond to one paper. Here's a short list of other papers they might have considered.

    Bhuiyan, S.A., Ly, S., Phan, M., Huntington, B., Hogan, E., Liu, C.C., Liu, J., and Pavlidis, P. (2018) Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics, 19:637. [doi: 10.1186/s12864-018-5013-2]

    Bitton, D.A., Atkinson, S. R., Rallis, C., Smith, G.C., Ellis, D.A., Chen, Y.Y., Malecki, M., Codlin, S., Lemay, J.-F., and Cotobal, C. (2015) Widespread exon skipping triggers degradation by nuclear RNA surveillance in fission yeast. Genome Research. [doi: 10.1101/gr.185371.114]

    Hsu, S.-N., and Hertel, K.J. (2009) Spliceosomes walk the line: splicing errors and their impact on cellular function. RNA biology, 6:526-530. [doi: 10.4161/rna.6.5.986]

    Melamud, E., and Moult, J. (2009a) Stochastic noise in splicing machinery. Nucleic acids research, gkp471. [doi: 10.1093/nar/gkp471]

    Melamud, E., and Moult, J. (2009b) Structural implication of splicing stochastics. Nucleic acids research, gkp444. [doi: 10.1093/nar/gkp444]

    Mudge, J.M., and Harrow, J. (2016) The state of play in higher eukaryote gene annotation. Nature Reviews Genetics, 17:758-772. [doi: 10.1038/nrg.2016.119]

    Pickrell, J.K., Pai, A.A., Gilad, Y., and Pritchard, J.K. (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet, 6:e1001236. [doi: 10.1371/journal.pgen.1001236]

    Saudemont, B., Popa, A., Parmley, J.L., Rocher, V., Blugeon, C., Necsulea, A., Meyer, E., and Duret, L. (2017) The fitness cost of mis-splicing is the main determinant of alternative splicing patterns. Genome biology, 18:208. [doi: 10.1186/s13059-017-1344-6]

    Stepankiw, N., Raghavan, M., Fogarty, E.A., Grimson, A., and Pleiss, J.A. (2015) Widespread alternative and aberrant splicing revealed by lariat sequencing. Nucleic acids research, 43:8488-8501. [doi: 10.1093/nar/gkv763]

    Tress, M. L., Martelli, P. L., Frankish, A., Reeves, G. A., Wesselink, J. J., Yeats, C., ĺsólfur Ólason, P., Albrecht, M., Hegyi, H., Giorgetti, A. et al. (2007) The implications of alternative splicing in the ENCODE protein complement. Proceedings of the National Academy of Sciences, 104:5495-5500. [doi: 10.1073/pnas.0700800104]

    Zhang, Z., Xin, D., Wang, P., Zhou, L., Hu, L., Kong, X., and Hurst, L. D. (2009) Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC biology, 7:23. [doi:10.1186/1741-7007-7-23]

    1. Some authors recognize this problem but they solve it by distinguishing between functional alternative splicing and spurious alternative splicing. I don't think this is helpful.

    Ast, G. (2005) The alternative genome. Scientific American, 292:58-65. [doi: 10.1038/scientificamerican0405-58]


    The importance of evaluation of the predictive models for complex phenotypes

    One of the main challenges in feature selection is the accurate estimation of the prediction performance of the machine learning models on new samples unseen at the training phase, especially in settings in which the data is high-dimensional and the number of labeled training data is relatively small. Given the massive dimensionality of modern GWAS and NGS studies, it is in fact not very hard to find genetic features that can almost perfectly fit to a small training set but fail to generalize to unseen data, a phenomenon known as model overfitting. Therefore, the models learned from genetic data should always be tested on independent data not used for training the model. In case the number of labeled data is small, one must resort to cross-validation techniques that repeatedly split the data into training and test sets, and the predictive accuracy is reported as an average over the test folds. In many applications of genomic predictors, there are a number of examples of the so-called selection bias[40], meaning that the cross-validation is used to estimate the performance of the learning algorithm only, but not the preliminary feature selection done on the whole data, therefore leading to information leak and grossly over-optimistic results. Further, if cross-validation is used for selecting the hyper-parameters of the learning algorithm or for feature selection, this needs to be done within an internal cross-validation loop, separately during each round of an outer cross-validation loop [40–43]. This two-level technique is sometimes referred to as the nested cross-validation[42, 44]. An example demonstrating the behavior of a cross-validation error when it is used as a selection criterion with greedy forward selection is presented in Figure 1. The error curve that constantly decreases as a function of the number of selected features clearly indicates that the cross-validation becomes a part of the training algorithm itself in the inner loop, and therefore it cannot be trusted as a measure of true prediction performance for unseen data.

    The figure illustrates how the external and internal cross-validation results behave as functions of the number of selected features. The external-cross validation consists of three training/test splits. The wrapper-based feature selection method, greedy RLS [23], is separately run during each round of the external cross-validation. Greedy RLS, in turn, employs an internal leave-one-out cross-validation on the training set for scoring the feature set candidates. The red curve depicts the mean values over these internal cross-validation errors. As can be easily observed from the blue curve, this internal cross-validation MSE used for the model training keeps constantly improving, which is expected, because the internal cross-validation quickly overfits to the training data when it is used as a selection measure. The blue curve depicts the area under curve (AUC) on the test data, held out during the external cross-validation round, that is, data completely unseen during the internal cross-validation and feature selection process. In contrast to the red curve, the blue curve starts to level off soon after the number of selected variants reaches around 10, indicating that adding extra features is not beneficial anymore even if the internal scoring function keeps improving. The green curve depicts the AUC of the RLS model trained using features selected by single-locus p-value based filter method, Fisher’s exact test, which is run with the same external training/test split as the greedy selection method. Similarly to the blue curve, the green one also stops improving soon after a relatively small set of features has been selected. The data used in the experiments is the Wellcome Trust Case Controls Consortium (WTCCC) Hypertension dataset combined with the UK National Blood Services’ controls.

    The evaluation of the predictive power is important also when considering predictive models constructed on the basis on statistical significant variants. For instance, there are numerous observations showing that the increases in the proportion of variance explained by significant variants does not go hand in hand with improved genetic prediction of disease risk. For instance, when using statistical modeling on the single training sample only, a panel of thousands of non-significant variants collectively could capture over one-third of the heritability for schizophrenia, but the same panel only explained a few percent of disease susceptibility in another replication cohort [8]. Similarly, while the statistical explanation power of the genetic variation in human height could be substantially increased by considering increasing number of common variants in a single population sample [45], the proportion of variance accounted for in other independent samples was much smaller [46]. These examples underscore the importance of rigid validation of the predictive accuracy of the models based on genetic profiles. While external cross-validation is a valid option, it is not free of any study-specific factors. For example, if there is a problem during the genotyping phase, it will appear also in any training and test data splits. These errors, stemming from problems during the experimental design and/or quality control have led for the need to re-evaluate the established methods and use caution when claiming replication [47]. The recommended option for truly validating the generalizability of predictive risk models is to make use of a large enough set of independent samples in which there is no overlap between the examined cohorts [48]. However, here one should consider whether the aim is to validate the predictive model itself (e.g. using external cross-validation or independent validation samples), or the predictive variants selected by the model (replication of the model construction or its application to separate cohorts) [49].

    Through the development of better model validation techniques and unbiased examination of all feature subsets in genome-wide scale, we are likely to continuously improve the accuracy of the predictive models and increase their reproducibility on independent population samples. A challenge here is that differences in the population genetic structure, attributable to confounding factors such as the ethnicity or ancestry of the subjects, may result in highly heterogeneous datasets with a number of hidden subject sub-groups, which may associate with divergent disease phenotypes and therefore cause an increased false-positive rates [50]. Related to this, while there are comparisons among various feature selection methods and predictive modeling frameworks on individual cohorts [23, 24, 27], there is not yet any definitive results whether one method will universally lead to optimal results in other subject cohorts or populations. Such confounding variability should also be taken into account in the model construction and evaluation, perhaps in some form of population stratified cross-validation. Failure to replicate a genetic association should not only be considered as a negative result, as it may also provide important clues about genetic architecture among study populations or genetic interactions among risk variants [51]. When epistasis interactions are involved, then it is likely that simple methods, such as single-locus filters, will not alone be able to provide most optimal results, while in extremely large datasets, wrapper methods may pose computational limitations if combined with complex prediction models. Finally, even though the improvements obtained by the machine learning wrappers, compared to those from the traditional p-value based filters, may seem quite modest (e.g. Figure 1), it may turn out that even slight improvements in the predictive accuracy can result in significant clinical benefits. Moreover, it is argued that the modest predictive improvements may be further aggregated through pathway and network-level analyses of the selected variants.


    Splice variant antigens

    Splice variant antigen frequency in cancer

    Splice variant antigens are post-transcriptionally derived TSAs arising from alternative splicing events, including those from mRNA splice junction mutations 52,53,54,55,56,57 , intron retention 58,59,60,61,62,63 or dysregulation of the spliceosome machinery in the tumour cell 15,64,65 . Other types of post-transcriptionally derived TSAs include alternative ribosomal products (for example, ribosomal frameshifting 66,67 , non-canonical initiation 68,69,70,71 , termination codon read-through 69 , reverse-stand transcription 72 and doublet decoding 73 ) and post-translational splicing 74,75,76 — these two mechanisms are difficult to apply in anticancer therapies, given the lack of tools for predicting such products.

    The study of splice variant proteins has historically focused on haematological malignancies, with splice variant protein expression being understudied in solid tumours. As such, putative splice variant antigens derived from these proteins have received less attention in solid tumours, with expression only recently validated 77 . In haematological cancers in which SNV burden is relatively low 6 , splice variant antigens could broaden the number of available TSA targets for therapeutic application. Splice variant proteins can arise through cis-acting mutations that disrupt or create splice site motifs or through trans-acting alterations in slicing factors that have historically been identified in haematological malignancies 77,78 . The role of spliceosome machinery in the generation of splice variants in haematological malignancies is a current area of investigation. Mutations in spliceosome proteins (for example, splicing factor 3b subunit 1 (SF3B1), serine- and arginine-rich splicing factor 2 (SRSF2), U2 small nuclear RNA auxiliary factor 1 (U2AF1) and U2AF2) are common in myelodysplastic syndrome, acute myeloid leukaemia (AML), chronic myelomonocytic leukaemia (CMML), and chronic lymphocytic leukaemia (CLL) 79,80,81,82,83 . Sharing of these spliceosome protein mutations across haematological cancer types has led to the hypothesis that spliceosome dysregulation may cause the expression of splice variant mRNAs, which are not detectable in normal tissues, leading to the translation of TSAs 84,85,86 . Beyond haematological malignancies, recent reanalysis of the TCGA pan-cancer dataset demonstrated a strong association between somatic mutations in components of the spliceosome machinery and the expression of splice variant products 77 , providing evidence for the relevance of splice variant antigens in solid tumours.

    Tools for predicting splicing events and splice variant antigens

    Several types of splice variant callers have been described in the literature. Two of these tools, Spliceman 87 and MutPred Splice 88 , predict the capacity of exonic variants surrounding an annotated splice junction to interfere with normal splicing. Other tools provide de novo identification of alternative splicing events, including JuncBase 89 , SpliceGrapher 90 , rMATS 91 , SplAdder 92 and ASGAL 93 . Many of these tools (for example, SpliceGrapher, SplAdder and ASGAL) predict alternative splicing events through the generation of splicing graphs. This splicing graph is generated through comparisons of spliced alignments of RNA-seq reads against a genome reference, which consists of vertices (nodes) that represent predicted splicing sites for a given gene as well as edges that represent exons and introns between splicing sites. In addition to these splice variant callers, at least one peer-reviewed tool, Epidisco 46 (the computational pipeline for the multi-institutional PGV-001 personalized vaccine trial 94 ), has been described with the capacity to predict for splice variant antigens.

    Jayasinghe et al. 52 reported MiSplice, which integrates DNA-seq and RNA-seq data in order to discover mutation-induced splice sites, which they applied to the TCGA pan-cancer dataset. Splice variant mutations contained 2–2.5x more predicted TSA candidates than did SNVs, with some tumorigenesis-related genes containing ≥40 unique predicted TSAs. Furthermore, predicted splice variant antigen burden was correlated with programmed cell death 1 ligand 1 (PDL1) expression, suggesting that PDL1 blockade therapy may be efficacious in tumours with a high frequency of splice variant antigens. Additionally, Kahles et al. 77 reported a comprehensive analysis of splice variants in the TCGA pan-cancer dataset and then used mass spectrometry to identify tryptic-digested polypeptides that contained splice variant antigens in 63 primary breast and ovarian cancer samples. This method found, on average, 1.7 predicted splice variant antigens per sample, with up to 30% more alternative splicing events in tumours than in normal tissues. Notably, Kahles et al. 77 also reported several known (SF3B1 and U2AF1) and novel (transcriptional adaptor 1 (TADA1), serine–threonine protein phosphatase PPP2R1A and isocitrate dehydrogenase 1 (IDH1)) splicing quantitative trait loci that were associated with alternative splicing events in 385 genes, suggesting that these loci are important for predicting the burden of splice variant antigens.

    While these studies have demonstrated TSAs derived from cancer-specific splice junctions, further work will be needed to refine the computational methods for splice variant antigen prediction. Particular emphasis is needed on identifying novel splice junctions that are likely to yield mRNA isoforms that will not undergo nonsense-mediated decay 95 . To address this problem, improved full-length mRNA isoform inference procedures or hybrid (that is, long- and short-read) RNA-seq algorithms will need to be developed. These procedures would identify the full-length splice variant transcript, allowing for filtering of transcripts that do not contain premature stop codons that could subsequently trigger nonsense-mediated decay.

    While tumour-specific splice variants of particular genes have been described in multiple tumour types, there are currently no reports of the use of splice variant antigens in personalized therapies. For example, the presence of tumour-associated splice variants has been described in select genes, including receptor for hyaluronan-mediated motility (RHAMM two tumour-enriched variants, RHAMM-48 and RHAMM-147 in multiple myeloma) 96 and Wilms tumour protein 1 (WT1 one variant, E 5+ , enriched in multiple cancers) 97,98,99 . WT1-derived peptides have been studied as a therapeutic target in leukaemias 100,101,102,103,104 and in lung 105 and kidney cancers 106 however, these trials did not use epitopes specific for the E 5+ splice variant. Additionally, an HLA–B44-restricted epitope derived from a variant of the minor histocompatibility antigen HMSD (HMSD-v) selectively expressed by primary haematological malignant cells (including those of myeloid lineage as well as multiple myeloma), but also by normal mature dendritic cells, was observed to be targeted by the CD8 + cytotoxic T cell clone 2A12-CTL 107 . Co-incubation of 2A12-CTL with primary AML cells conferred tumour resistance to immunodeficient mice after injection, suggesting that this HMSD-v derived antigen is a viable target for immunotherapy. Finally, Vauchy et al. 108 described a CD20 splice variant (D393–CD20) whose expression is detectable in transformed B cells and upregulated in various B cell lymphomas. They subsequently demonstrated the capacity of D393–CD20-derived epitope vaccines to trigger both CD4 + and CD8 + T cell responses in HLA-humanized transgenic mice, supporting the use of CD20 splice variant epitopes for targeted immunotherapies in B cell malignancies.


    Access options

    Get full journal access for 1 year

    All prices are NET prices.
    VAT will be added later in the checkout.
    Tax calculation will be finalised during checkout.

    Get time limited or full article access on ReadCube.

    All prices are NET prices.


    The biology of stem-ness in tumors and its consequences in gliomas in vivo

    One topic in this area is the issue of stem-ness in tumor cells and what drives this character. The work evolves from our previous work showing that stem like cells are located in the PVN and are driven by NO signaling mentioned in the abstract among others.

    Mathematical and mouse modeling

    My laboratory has a long-standing collaboration with Franziska Michor of the Computational Biology department at the Dana Farber. We combine mathematical modeling with mouse modeling to understand the likelihood of events in the evolution of gliomas development or in optimizing therapy based on parameters obtained from mouse models. In these projects we have:

    1. Identified the most probable cell of origin for PDGF-induced gliomas,
    2. Determined the order of genetic events in the evolution of these tumors,
    3. Identified the first events in gliomas formation, and identified an optimized schedule for delivery of radiation therapy based on parameters obtained from our PDGF-induced gliomas model.

    The biology of therapeutic response in gliomas

    Many laboratories are studying the biology of these tumors (and other tumor types), but few are trying to understand the biology of how these tumors respond to therapy. This is conceptually important because the disease that kills people in the western world is a treated and recurrent tumor, not an untreated tumor. Therefore, we have spent effort in developing the technologies to understand how these tumors respond to standard therapy using the same rigor that we have studied the biology of the tumor in the first place.

    MRI and bioluminescence imaging and preclinical trial drug development

    In order to perform preclinical trials in mice, we need to identify tumors, quantify their size, and follow them over time non-invasively. One approach that we have used is by MRI scanning with T2 weighted images or with T1 weighted images with and without contrast as is done in people. However, MRI only measures anatomic structure and not biologic processes. Therefore, we have developed bioluminescence imaging strategies for use in preclinical trials of brain tumor-bearing mice. We initially developed a reporter mouse that expressed luciferase from the E2F1 promoter that measures proliferation and a Gli responsive promoter measuring SHH signaling. We are now developing genetic backgrounds that activate luciferase expression by cre recombinase activity that will allow us to “see” the tumor cells in vivo that have been deleted for PTEN, or that have knocked down INK4a/arf. This will allow us to easily identify mice with tumors and to count live tumor cells in vivo non-invasively.

    The glioma tumor microenvironment

    Gliomas are composed of not only tumor cells per se but also reactive astrocytes, microglia, endothelial cells and pericytes. Multiple lines of evidence indicate that many if not all of the cells that make up the stroma in these tumors contribute to the tumor biology and may be valid therapeutic targets.

    Novel models of gliomas subtypes and ependymomas.

    We have also developed a modified version of the RCAS/tv-a system that achieves loss-of-function combined with lineage tracing using short hairpins and florescent tags. This system is able to mimic the mesenchymal GBMs by combining knockdown the combination of NF1 and p53 while lineage tracing each of these two events from specific cell types, with a penetrance of essentially 100%. We are using this model to understand the evolution of mesenchymal GBM from proneural ones and understand the complexity of these tumors. This type of lineage tracing allows us to appreciate the cellular heterogeneity in ways that germline strategies are unable to. We also have developed a new model of ependymoma by expressing a commonly occurring gene fusion (C11orf95/RELA) with this system.

    PDGFR inhibition as a therapeutic strategy for PDGF-driven GBM

    PDGF signaling characterizes the proneural subgroup of GBM and is sufficient to induced similar tumors in mice. One might think that inhibition of PDGFR would be a good therapeutic strategy for at least the proneural GBM subgroup. However, several trials of PDGFR inhibitors have been done in humans with GBM and none have been successful. A simple explanation is that the patients were not stratified to PDGFR active tumors prior to enrolling in these trials. However, there are several additional more interesting possibilities as to why this might be the case, and we are investigating under what circumstances PDGFR inhibition might be effective. One contributing factor is likely to be cellular heterogeneity of these tumors where subclones of cells within the tumor express PDGFR while others express EGFR in humans, and in mice similar results can be seen. A second contributing factor in the resistance to PDGFR inhibition is the fact that most of the gene expression changes that accompany the oncogenic transformation of olig2 expressing cells by PDGFR in vivo are not reversed by PDGFR inhibitors in vivo, even when that inhibition achieves a full cycle arrest. Additionally, mutant forms of PDGFR alpha found in some GBM appear to reduce effect of PDGFR inhibition. Finally, we have found that additional alterations found in human gliomas such as loss of Ink4a/arf, p53 or PTEN enhance oncogenic character of these tumors and prevent PDGFR inhibition of achieving full cell cycle arrest.

    The role of TrkB splice variants in cancer

    Cancer-driving mutations are found across a wide range of tumor types, yet are often only present in a subset of tumor cells, making early detection and subsequent treatment of cancer difficult. The identification of a unique oncogenic driver that is found in nearly all tumor cells, across various cancer subtypes, would be a highly valuable biomarker as well as a promising diagnostic and profitable therapeutic target. We have identified one such target in the form of a splice variant of the TrkB neurotrophin receptor. This variant is expressed highly across nearly all human cancers when compared to normal tissues and forced expression drives multiple tumor types in mice. Furthermore, forced expression of this splice form of TrkB, when combined with loss of PTEN, is sufficient to induce cancers from many organ sites in mice. We are working on understanding the mechanism and implications of these findings.

    YAP1 gene fusions in cancer

    YAP1 is a transcriptional co-activator and a proto-oncogene. Several different YAP1 gene fusions have been identified in various human cancers. Here, we show that overexpression of several of these gene fusions in mice is sufficient to cause local tumor formation. Each of these YAP1 fusion proteins exert YAP activity, and also exert activity of the C’-terminal fusion partners. These fusion proteins evade the negative Hippo pathway regulation due to constitutive nuclear localization and resistance to degradation. Combined point mutations in YAP1 (S127/397A-YAP1) that achieve these functions also induces tumor formation in vivo. Genetic disruption of the TEAD binding domain of these oncogenic YAP1 fusions is sufficient to inhibit tumor formation in vivo, while pharmacological inhibition of the YAP1-TEAD interaction also reduces the YAP activity of the fusion proteins in vitro.


    Alternative Splicing: A Potential Source of Functional Innovation in the Eukaryotic Genome

    Alternative splicing (AS) is a common posttranscriptional process in eukaryotic organisms, by which multiple distinct functional transcripts are produced from a single gene. The release of the human genome draft revealed a much smaller number of genes than anticipated. Because of its potential role in expanding protein diversity, interest in alternative splicing has been increasing over the last decade. Although recent studies have shown that 94% human multiexon genes undergo AS, evolution of AS and thus its potential role in functional innovation in eukaryotic genomes remain largely unexplored. Here we review available evidence regarding the evolution of AS prevalence and functional role. In addition we stress the need to correct for the strong effect of transcript coverage in AS detection and set out a strategy to ultimately elucidate the extent of the role of AS in functional innovation on a genomic scale.

    1. Introduction

    The first draft of the human genome sequence [1, 2] was unveiled in February 2001 and surprisingly it was shown to contain

    23000 genes, only a fraction of the numbers of genes originally predicted [3]. To put this into perspective, there are

    20,000 genes in the genome of the nematode C. elegans. The lack of an association between gene number and organismal complexity has resulted in an increased interest in alternative splicing (AS) given it has been proposed to be a major factor in expanding the regulatory and functional complexity, protein diversity, and organismal complexity of higher eukaryotes [4–6]. However, despite the best efforts of many research groups we still understand very little about the actual role played by AS in the evolution of functional innovation—here understood as the appearance of novel functional transcripts—underpinning the increased organismal complexity observed.

    Alternative splicing is a posttranscriptional process in eukaryotic organisms by which multiple distinct transcripts are produced from a single gene [4]. Previous studies using high-throughput sequencing technology have reported that up to 92%

    94% of human multiexon genes undergo AS [7, 8], often in a tissue/developmental stage-specific manner [7, 9]. With the development and constant improvement of whole genome transcription profiling and bioinformatics algorithms, the ubiquity of AS in the mammalian genome began to become clear. The concept of one gene-one protein gave way as evidence mounted for the high percentage of AS incidence in nonhuman species [7, 8], such as fruit fly [10], Arabidopsis [11] and other eukaryotes [5]. Despite the advances in our understanding and characterisation of AS several questions remain unanswered. First, the large difference in transcript coverage between species has hampered direct comparisons of the prevalence of alternative splicing in different species [6]. Secondly, even if comparable AS estimates between species could be obtained, it is unclear to what extent any changes in AS prevalence along evolution have contributed to overall protein diversity or rather reflect splicing noise. Finally, we understand very little about how AS has evolved through time and how this is related to functional parameters of genes. Here we review how alternative is regulated and recent progress in our understanding of the evolution of alternative splicing.

    2. Alternative Splicing and Its Regulation

    In 1977, Chow et al. [12–15] reported that 5′ and 3′ terminal sequences of several adenovirus 2 (Ad2) mRNAs varied, implying a new mechanism for the generation of several distinct mRNAs. Following this study, alternative splicing was also found in the gene encoding thyroid hormone calcitonin in mammalian cells. Subsequent studies revealed that many other genes were also able to generate more than one transcript by cuttingout different sections from its coding regions (reviewed in [4, 16]).

    Depending on the location of the exonic segments cut-out-or if introns are left in, splicing events can be classified into four basic types (Figure 1). These four major modes of splicing are (1) exon skipping (2) intron retention (3) alternative 5′ splicing site (5′ss), and (4) alternative 3′ splicing site (3′ss) [22, 23]. In addition, mutually exclusive exons, alternative initiation, and alternative polyadenylation provide two other mechanisms for generating various transcript isoforms. Moreover, different types of alternative splicing can occur in a combinatorial manner and one exon may be subject to more than one AS mode, for example, 5′ss and 3′ss at the same time (Figure 1). Prevalence of each type of AS has been found to vary between different taxa. Several studies have shown that exon skipping is common in metazoan genomes [24] whereas intron retention is the most common type of AS among plants [25] and fungi [26].


    Different types of alternative splicing. The blue boxes are constitutive exons and alternatively spliced regions in red. Introns are represented by straight lines between boxes. Four types of common splicing events were identified: (1) exon skipping (2) intron retention (3) alternative 5′ splicing site (5′ss), and (4) alternative 3′ splicing site (3′ss).

    Alternative splicing is tightly regulated by cis elements as well as transacting factors that bind to these cis elements. Transacting factors, mainly RNA-binding proteins, modulate the activity of the spliceosome and cis elements such as exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs), and intronic splicing silencers (ISSs). Canonical mechanism of AS suggests that serine/arginine-rich (SR) proteins typically bind to ESEs, whereas heterogeneous nuclear ribonucleoproteins (hnRNP) tend to bind to ESSs or ISSs [27]. Given the crucial roles of these regulators in the splicing machinery, the cis and transacting mutations, which disrupt the splicing code, are known to cause disease (reviewed in [28–30]). It has been estimated that 15–60% of mutations cause disease by affecting the splicing pattern of genes ([31] and reviewed in [30]). Moreover, AS has also been shown to be regulated without the involvement of auxiliary splicing factors [32] and AS may be also combined with other posttranscriptional events such as the use of multiple internal translation initiation sites, RNA editing, mRNA decay, and microRNA binding and other noncoding RNAs [33, 34], suggesting the existence of additional noncanonical mechanism of AS that are yet to be identified [35].

    Recently, a direct role of histone modifications in alternative splicing has been reported, in which histone modification (H3-K27m3) affects the splicing outcome by influencing the recruitment of splicing regulators via a chromatin-binding protein in a number of human genes such as FGFR2,TPM2,TPM1 and PKM2 [36]. Moreover, it has been reported that CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing, providing the first evidence of developmental regulation of splicing outcome through heritable epigenetic marks [37]. Additionally, non-coding RNAs also have emerged as key determinants of alternative splicing patterns [34]. Therefore these findings reveal an additional epigenetic layer in the regulation of transcription and alternative splicing [38]. Genomewide genetic and epigenetic studies, therefore, have been proposed in at least 100 specific blood cell types [39], which will provide high quality reference epigenomes (using DNA methylation and histone marks assays) with detailed genetic and transcriptome data (whole genome sequencing, RNA-Seq, and miRNA-Seq), providing us with an opportunity to assess the genomewide influence of epigenetic factors in the regulation of AS in specific blood cell types. We are expecting the rise of comparative epigenetics will provide different perspective of the evolution of transcriptome.

    3. Identification of Alternative Splicing Events

    Alternative splicing is difficult to estimate from genomic parameters alone [40]. A number of regulatory motifs for AS have been uncovered but the presence of known alternative splicing motifs does not guarantee that a gene is actually alternatively spliced [40]. Thus, alternative splicing patterns are generally assessed from examining transcript data. For any gene of interest, alternative splicing events can be identified by using reverse transcription polymerase chain reaction (RT-PCR) conducted on a complementary DNA (cDNA) library. Over the last decade, as high-throughput transcriptome technologies have improved, it has become possible to assess alternative splicing patterns on a genomewide scale. Three main sources of transcriptome data have been used to assess splicing patterns: expressed sequence tags (ESTs), splice-junction microarrays, and RNA sequencing (RNA-Seq).

    The first wave of genomewide transcriptome analysis consisted in direct sequencing cDNA and ESTs carried out at large scale [41], which allowed alternative splicing events to be identified by aligning cDNA/EST sequences to the reference genome. ESTs are 200–800 nucleotide bases in length, unedited, randomly selected single-pass sequence reads derived from cDNA libraries [42]. Currently, there are eight million ESTs for human, including about one million sequences from cancer tissues, and about 71 million ESTs for around 2000 species in dbEST [43]. However, ESTs are based on low-throughput Sanger sequencing and are aggregated over a wide range of tissues, developmental states, and diseases using widely different levels of sensitivity.

    More recently, splice-junction microarrays and RNA-Seq have been increasingly used to quantitatively analyse alternative splicing events. Splicing microarrays target specific exons or exon-exon junctions with oligonucleotide probes. The fluorescent intensities of individual probes reflect the relative usage of alternatively splicing exons in different tissues and cell lines [44]. High-density splice-junction microarrays are a cost-effective way to assay previously known exons and AS events with low false positive rate. The disadvantage is that it requires prior knowledge of existing AS variants and gene structures. More importantly unlike RNA-Seq and EST, microarrays do not provide additional sequence information.

    RNA-Seq has emerged as a powerful technology for transcriptome analysis due to its ability to produce millions of short sequence reads [45–47]. RNA-Seq experiments provide in-depth information on the transcriptional landscape [45]. The ever-increasing accumulation of high-throughput data will continue to provide ever richer opportunities to investigate further aspects of AS such as low-frequency AS events as well as tissue-specific and/or development-specific AS events [7, 8, 47–49]. Earlier datasets consist of RNA read sequences of 50 bp or less, limiting the information about combinations of AS events in a single transcript but it is likely that the length of short reads will continue to increase over the next decade. With the increasing capacity of next-generation sequencing (RNA-Seq) the study of alternative spicing is likely to undergo a revolution [50]. The higher depth of sequencing of transcriptomes in human and other species has increased our understanding of the occurrence of AS event and AS expression patterns in different tissues [7, 51], developmental stages [10].

    Transcript assembly of sequence-based technologies, such as ESTs and RNA-Seq, can use either align-then-assemble or assemble-then-align, depending on the quality of reference genome and sequence data [47]. An algorithm can be employed to detect AS event by comparing different transcripts. However, detecting AS isoforms, as opposed to single AS event, is still challenging because short sequences provide little information in terms of the combination of exons. Several applications have been developed for transcript assembly and AS isoform detection, different strategies and comparison of these applications have been reviewed previously [47].

    4. Prevalence of Alternative Splicing across Eukaryotic Genomes

    Initial whole genome analyses suggested that 5%–30% of human genes were alternatively spliced (reviewed in [6, 16]). EST-based AS databases identify AS events in 40–60% of human genes [5, 52, 53] however, recently this number has been revised over and over with the latest estimates showing that up to 94% of human multiexon genes produce more than one transcript through alternative splicing [7, 8, 16]. Understanding how alternative splicing has changed over time could provide insights as to how alternative splicing has impacted on transcript and protein diversity and phenotype evolution [6]. In fungi, AS is thought to be rare due to the low number of exons in yeast [23]. In plants it has been estimated that around 20% of genes undergo AS based on EST data [25], a recent study using RNA-Seq, however, suggests that at least approximately 42% of intron-containing genes in Arabidopsis are alternatively spliced [11]. We are expecting significantly higher percentages of AS occurrence will be discovered from various eukaryotes given the in-depth studies of transcriptome using next-generation sequencing such as RNA-Seq are ongoing. A few studies have attempted to compare AS prevalence among different taxa with animals generally reported to have higher AS incidence than plants [16] and vertebrates having a higher AS incidence than invertebrates [24]. However, these studies are either based on limited data or failed to correct for differences in transcript coverage [6].

    There are a number of databases that provide AS data for multiple species [5, 52–54]. However, these existing resources are primarily focused on animal species and have poor coverage for protist, fungal, and plant genomes thus making it difficult to compare divergent taxa. Most importantly, none of these resources take into account the well-documented effects of differential transcript coverage across genes within and between species which greatly influences AS detection rates [6, 24, 55, 56]. Random sampling has been used [24] and shown to minimize the bias of transcript coverage (Figure 2). We expect that similar strategies will be employed in future comparative AS data resources.


    (a)
    (b)
    (a)
    (b)

    Footnotes

    Gene therapy denotes the replacement of faulty genes or the addition of new genes to cure or improve the ability to fight disease.

    OMIM, https://www ​.omim.org (accessed January 10, 2017) Genetic Alliance, http://www ​.diseaseinfosearch.org (accessed January 25, 2017).

    Small insertions or deletions can be created to inactivate an element larger defined deletions can be created to remove entire elements specific nucleotide substitutions can be made in the element or new genetic elements can be inserted into precise locations in the genome.

    There are six clinical trials involving the use of ZFNs to disrupt expression of CCR5. Three of these trials have been completed, one is ongoing, and two are currently recruiting participants. For more information, see https://www.clinicaltrials.gov/ct2/show/NCT02500849?term=zinc+finger+nuclease+CCR5&rank=1 (accessed January 10, 2017).

    Structured Approach to Benefit-Risk Assessment in Drug Regulatory Decision-Making, PDUFA V Plan (FY 2013-2017). Draft of February 2013. http://www ​.fda.gov/downloads ​/ForIndustry ​/UserFees/PrescriptionDrugUserFee ​/UCM329758.pdf (accessed January 30, 2017).

    The FDA recently held a public hearing to discuss its regulations and policies on manufacturer communications about unapproved or off-label uses of medical products, including cell-based therapies (FDA, 2016a).


    Watch the video: The different types of mutations. Biomolecules. MCAT. Khan Academy (November 2021).