One Gene and many proteins

Imagine a gene with $n$ exons and $m$ introns. How many proteins are possible from that gene? Would all the proteins be isoforms?

I might be wrong, but aren't numbers $n$ and $m$ are connected as $n=m+1$?

Answer seems to be combinatorial: how many combinations of $n$ objects can be assembled under certain restrictions? Namely, how many isoforms certain gene can have.

Restrictions include: how many exon-intron junctions on codon (or precisely between codons), how many exons are actually contain protein-coding sequence of mRNA (some exons are coding untranslated region, 3'- or 5'-UTR, for example), how certain gene processes alternative splicing.

So, as you can see, answer will highly depend on sequence of given gene. As far as I know, maximum number of isoforms is limited by 5, even though there are genes with hundreds of exons.

For most eukaryotic genes (and some prokaryotic ones), the initial RNA that is transcribed from a gene’s DNA template must be processed before it becomes a mature messenger RNA (mRNA) that can direct the synthesis of protein. In this way, the primary mRNA transcript becomes the mature mRNA transcript. This process is known as splicing.
RNA splicing involves the removal or “splicing out” of certain sequences in the mRNA, referred to as intervening sequences, or introns. The final or mature mRNA thus consists of the remaining sequences, called exons, which are connected to one another during the splicing process.
Splicing different combinations of exon together can lead to the production of a variety of different proteins being produced from a single gene. In the diagram below, three different proteins have been produced from the same gene, as a result of combining different exons together.

Examples of genes where alternative splicing leads to the production of different protein products include the gene CGRP (Calcitonin Gene Related Peptide), which can produce either CGRP or calcitonin depending on what splicing activity occurs. Typically CGRP, a neurotransmitter, is the product found in neurones and calcitonin, a hormone conerned with regulation of calcium levels in the blood is produced in non-neuronal cells such as the thyroid gland.

One Gene-One Polypeptide Concept

It is a fact that hereditary characters are maintained and transmitted from one generation to another through DNA molecules, because DNA can duplicate itself and duplicated molecules can be passed on to the offsprings. The general activity of genes brings about expression of hereditary traits in the organism.

Now the main questions are as to how the genes (DNA molecules) govern the biosynthetic processes of the cells and how these genes control the phenotypic properties of the organisms. The answers to these basic questions were sought in the relationship between genes and specific biochemical reactions.

The heritable changes that geneticists first studied were necessarily those which could most easily be observed. An English physician, Sir Archibald Garrod, made penetrating study of some rare hereditary diseases in human and recognised that certain biochemical deficiencies were caused by enzymatic abnormalities. On the basis of his studies on congenital (existing from the birth) diseases of human.

Sir Garrod safely suggested relationship between genes and enzymes. The idea that the action of a gene is concerned with the formation of particular enzyme was ignored by most geneticists for some thirty years.

James B. Summer of Cornel University and John H. North Rop of Rockefeller Institute between 1926 and 1930 showed that enzymes are proteins. The idea about gene and enzyme relationship was revived by George W. Beadle and Edward L. Tatum (1941).

From the studies on heritable metabolic abnormalities of the fungus, Neurospora crassa, they concluded that all the intermediary biosynthetic steps of a metabolic process were governed by distinct genes. Beadle and Tatum formulated the one gene -one enzyme concept in 1944. The theory states that a gene exerts its influence on the phenotype through its role in the production of an enzyme.

Beadle and Tatum studied the genie action in neurospora crassa. Normally the fungus can grow on minimal culture medium containing agar, sucrose, nitrate, inorganic minerals and the only vitamin biotin. This means that this organism can synthesize all other vitamins and amino-acids which are required in its metabolism.

When the conidia of this fungus are treated with mutagenic agents (say. X-ray), some of them become unable to grow on minimal medium.

These mutant spores are then tested systematically by adding particular vitamins, amino-acids, etc. to the minimal medium to determine what substance or substances they are unable to synthesize. The mutants can be crossed with normal or wild type and their products of meiosis, the 8 ascospores, may be individually tested for their nutritional requirements.

If an arginine requiring strain (a -) is crossed with normal strain (a +), all the 8 ascospores can grow in a medium containing arginine, but only 4 ascospores can grow in a medium lacking arginine. This indicates that a single gene has mutated.

The arginine requiring mutant may be more complex because, as shown in the following chain, arginine synthesis involves a chain of intermediate steps each reaction is controlled by one gene. Three steps have been noted in the conversion of glutamic acid to arginine and each of them has been found to be controlled by one gene.

The mutation of a single gene leads to the suppression of one step. This can be demonstrated by growing mutant on minimal medium with that substance which cannot be synthesized. In several biochemical studies, the extracts of neurospora have shown that tryptophan like arginine is synthesized in a sequence of chemical reactions.

Mutations in tryptophan require strains map at several genetic locations. Each mutant is defective in one of the steps in biosynthetic sequence. The mutation in a specific chromosomal region is reflected by the loss of activity of one enzyme. Thus basic gene enzyme relationship is clear.

Recent researches have verified the basic conclusion about gene-enzyme relationship. Currently the Beadle and Tatum’s concept of one gene-one enzyme has been revised to one gene-one polypeptide chain (protein) in view of the complexity in the structures and the functions of enzymes. The modem researches have proved that gene is DNA which is directly concerned with the synthesis of particular protein.

The expression of genes by genetic transcription into complimentary RNA sequences and subsequent translation of hereditary information contained in mRNA into polypeptide chain which forms the ultimate product of gene action is called primary gene action.

The analysis beyond primary action of gene is greatly complicated by the integrated state of cellular and developmental metabolism, by the remoteness of the phenotype from the primary gene action and the number of intervening steps influenced by other genes (gene interaction) and by environmental factors (gene activation).

In prokaryotes the transcription and translation of genetic information occurs in one cell compartment whereas in eukaryotes the two processes are accomplished into two separate compartments of a cell, i.e., nucleus and cytoplasm. In addition, some genetic information (organelle DNA) is also present and utilized within certain cytoplasmic organelles particularly plastids and mitochondria.

The operation of nuclear and extra-nuclear genomes is coordinated by some mechanism which is not yet fully understood. The genetic regulation of primary gene action in strict sense of the term occurs only at the level of transcription.

The whole series of biochemical processes which lead from a gene to the phenotypic expression by which it is recognised is referred to as gene action system (Waddington, 1962).

Thus gene has two essential functions:

(i) Replication or self-reproduction and

(ii) Intervention in mechanism by which the phenotype of organism is produced in a given environment (phenogenesis).

Primary Gene Action (Gene and Protein Synthesis):

The genotype (total genetic material) of the cell determines the potential type of proteins and also determines their relative amounts in the cell. Proteins serve as structural components of the cells which make up the framework of living body. Special types of proteins which act as catalysts in bringing about numerous chemical reactions and control them precisely are called enzymes.

In fact, all functions of the living system are carried out by proteins. Thus from the structural as well as functional point of view proteins are important constituents of the cells, or in other words, proteins constitute the fundamental molecular machinery of the cell.

Genes act by controlling the structure and the rate of production of specific proteins (enzymes). Genes are segments of DNA molecule. The DNA of each gene forms a complimentary mRNA strand which attaches to the ribosomes where it serves for coding of a protein (enzyme).

The sequence of aminoacids in protein (i.e., the structure of protein or enzyme) is determined by the sequence of nucleotides in mRNA which in turn is determined by the sequence of nucleotides of DNA (gene).

A series of enzyme controlled reactions determine traits in an organism. Since the structure of these enzymes is controlled by genes it follows that genes determine traits.

The whole events may be summarized as follows:

The basic building blocks of proteins are amino acids. Excepting proline, all other amino acids have a common structure consisting of a central carbon atom (the a-carbon) to which is linked a-amino group (- NH2), an α-carboxyl (- COOH) and a hydrogen atom (proton). The other part of aminoacid is called the R-group which varies from one aminoacid to another. It is R-group that gives the aminoacid its chemical properties.

The general formula of an aminoacid is illustrated below:

Proteins are polymeric molecules and are formed by combination of many amino acid molecules in linear sequence. The amino acids are joined with one another through special type of bond, called peptide linkage.

This peptide bond is established by elimination of a molecule of water between carboxyl (- COOH) and amino (- NH2) groups of adjacent amino acids. The peptide bond is of amide (- CONH -) type.

An example of peptide bond formation is given below:

In this way, many amino acids are joined end to end by peptide bonds making a long chain. One end of protein chain contains amino (NH2) group (amino end) and the other contains carboxylic (- COOH) group (carboxylic end).

The linear chain of amino acids formed in this manner is termed as polypeptide chain or protein. The formation of polypeptide linkage requires the action of enzyme peptide polymerase and energy rich compound guanosin triphosphate (GTP).

In living system about 20 out of 22 different amino-acids are known to take part in the protein synthesis. These are mentioned in the Table 20.1.

The molecular weight of protein depends upon the length of molecule and number of amino acids in it.

Proteins are recognised by four structural levels:

It is the linear sequence of aminoacids in a polypeptide chain. The newly synthesized polypeptide chain is called primary protein.

(ii) Secondary structure:

When the primary polypeptide chains are twisted or coiled into a helix, electrostatic bonds are formed between – COOH group and NH2 group and hydrogen bonds develop between amino acids facing each other due to coiling. Such proteins are said to be secondary proteins. In biological proteins, the polypeptide chains remain coiled either in a like shape or in p like shape and hence they are called a helix and P helix respectively.

(iii) Tertiary structure:

When very long polypeptide chain becomes extensively folded and coiled in order to compress the long spiral chain into a globular form and subsequently certain intra chain bonds, specially disulphide (- S – S -) bridges between cysteine residues of polypeptide chain over a vast surface to create interstices between the polypeptide chains, a tertiary structure of protein is resulted.

The tertiary structure of a protein often places hydrophobic (water hating) groups on the outside and hydrophilic (water loving) groups on the inside. Examples are insulin enzyme, ribonuclease enzyme etc. The most tertiary proteins act like catalysts or enzymes.

(iv) Quaternary structure:

The association of more than one polypeptide chain to form stable unit corresponds to the quaternary structure. Quaternary proteins result when inter chain bonds or bridges are established so as to link two or more otherwise independent polypeptide chains. Most proteins with molecular weight higher than 20,000 possess quaternary structure and are composed of more than one polypeptide chain.

The best example of quaternary protein is haemoglobin (molecular weight 1, 00,000) which is formed of two a-and two P-chains. The chemical and biological individuality of a protein depend upon the order in which amino acids are linked in it. So in a particular protein chain aminoacids are arranged in correct order.

But can these aminoacids be so precisely arranged? Yes, it is so, and this precision in the sequence of aminoacids during the synthesis of protein is controlled by DNA molecule which itself is not directly involved in the synthesis. Actually DNA molecules send their message to the sites of protein synthesis through special type of RNA called messenger RNA or mRNA.

The information for the structure of polypeptide (protein) is stored in a polynucleotide chain. The sequence of bases in a polynucleotide chain determines the sequence of amino acids in a particular polypeptide.

The transfer of information from DNA to wRNA and then from wRNA to protein (aminoacid sequence) is unidirectional according to Francis Crick (1956) and it does not flow in reverse direction i.e., from protein to RNA to DNA.

The DNA molecule is provided with the information for its own replication. Francis Crick termed this flow of information from DNA to RNA to protein molecule as central dogma. This is shown in Fig. 20.1.

The central dogma of molecular biology, therefore, involves the following three major processes for preservation and transmission of genetic information:

A process which is indicated by the arrow encircling DNA signifying that DNA is template for self-replication.

The arrow between DNA and RNA indicates that all cellular RNAs are synthesized on DNA templates.

This is process by which all proteins are determined by RNA templates on the ribosomes.

In certain cells infected with RNA viruses, e.g., TMV, φMS2, φR17 etc., the viral RNA produces new copies of itself with the help of RNA replicase. Genetic RNA of some viruses, e.g., RSV, sometimes acts as a template for the production of complementary strand of DNA (reverse transcription).

On this ground Barry Commoner (1968), however, suggested that flow of information should be cyclic rather than in one way, but such reversals of normal flow of information are rare events.

A Secret Flexibility Found in Life’s Blueprints

The millimeter-long roundworm Caenorhabditis elegans has about 20,000 genes — and so do you. Of course, only the human in this comparison is capable of creating either a circulatory system or a sonnet, a state of affairs that made this genetic equivalence one of the most confusing insights to come out of the Human Genome Project. But there are ways of accounting for some of our complexity beyond the level of genes, and as one new study shows, they may matter far more than people have assumed.

For a long time, one thing seemed fairly solid in biologists’ minds: Each gene in the genome made one protein. The gene’s code was the recipe for one molecule that would go forth into the cell and do the work that needed doing, whether that was generating energy, disposing of waste, or any other necessary task. The idea, which dates to a 1941 paper by two geneticists who later won the Nobel Prize in medicine for their work, even has a pithy name: “one gene, one protein.”

Over the years, biologists realized that the rules weren’t quite that simple. Some genes, it turned out, were being used to make multiple products. In the process of going from gene to protein, the recipe was not always interpreted the same way. Some of the resulting proteins looked a little different from others. And sometimes those changes mattered a great deal. There is one gene, famous in certain biologists’ circles, whose two proteins do completely opposite things. One will force a cell to commit suicide, while the other will stop the process. And in one of the most extreme examples known to science, a single fruit fly gene provides the recipe for more than 38,000 different proteins.

But these are dramatic cases. It was never clear just how common it is for genes to make multiple proteins and how much those differences matter to the daily functioning of the cell. Many researchers have assumed that the proteins made by a given gene probably do not differ greatly in their duties. It’s a reasonable assumption — many small-scale tests of sibling proteins haven’t suggested that they should be wildly different.

It is still an assumption, however, and testing it is quite an endeavor. Researchers would have to take a technically tricky inventory of the proteins in a cell and run numerous tests to see what each one does. In a recent paper in Cell, however, researchers at the Dana-Farber Cancer Institute in Boston and their collaborators reveal the results of just such an effort. They found that in many cases, proteins made by a single gene are no more alike in their behavior than proteins made by completely different genes. Sibling proteins often act like strangers. It’s an insight that opens up an interesting new set of possibilities for thinking about how the cell — and the human body — functions.

Share this article



Get Quanta Magazine delivered to your inbox

Lucy Reading-Ikkanda for Quanta Magazine

Proteins transact much of a cell’s daily business. Messages are sent from one part of the cell to another, for instance, by a protein bucket brigade — one attaches to another, which then switches on another, which then modifies another, and so on, culminating in a string of alterations that delivers the message. A protein’s particular shape helps determine what it can attach to and therefore what it can do. Finding out which proteins another protein will stick to is often the first step in understanding its role in the cell.

Marc Vidal, a biologist at Dana-Farber, has a long history of tracing such protein partnerships on a grand scale. His lab looks to see how large numbers of proteins interact with one another and how those interactions might change in someone with a disease. But it can be frustrating to do this when you aren’t sure whether you should assume that proteins from the same gene do the same thing. Even if we perfectly understand a particular genome sequence, “we still don’t have a perfect knowledge of the components that are encoded by the genome,” Vidal said. “And the reason is that the good old rules don’t hold.”

To see just how often the old rules might be broken, the Vidal lab and their collaborators gathered a set of proteins made from about 1,500 genes — about 8 percent of our total complement. They sorted out which proteins came from the same genes, finding that about 500 of the genes made at least two. Then they ran multiple tests in which each of the proteins was given the chance to attach to more than 15,000 other proteins often found in the cell. Finally, they compared each protein’s results to those of its sibling proteins — all those proteins made by the same gene. How often did sibling proteins attach to the same partners? How often did they not?

The answer was rather unexpected. “It was so striking,” said David Hill, a scientist at Dana-Farber, that he thought, “This can’t be right we’ve got to figure out what we did wrong.” But the results held up to prodding. They found that 61 percent of sibling protein pairs share some but not all of their interactions. Moreover, nearly one in five of all sibling protein pairs had nothing in common. Comparing the proteins in their data set with proteins made by separate genes, the team found that in many cases the sibling proteins’ interactions were as different as if they’d had totally unrelated origins.

Lucy Reading-Ikkanda for Quanta Magazine

Because this paper suggests that different functions for proteins from the same gene are relatively common, it implies that the phenomenon probably matters for the everyday life of the cell, said Neil Kelleher, a biologist at Northwestern University who was not involved with the research. “We don’t know how much of the complexity of cells and tissues in our body arises from this,” he said. But it’s possible these different proteins could be part of what’s behind the distinct cell types in the body. Perhaps lung cells prefer to make one protein, while another protein predominates in a heart cell.

Some diseases might have their roots in one protein dominating where it shouldn’t. For example, a 2014 paper implies that certain alternative forms of proteins may play a role in autism. Additionally, the new research suggests that when researchers are trying to understand the biological underpinnings of a disease, they should not assume that it will be enough to pinpoint the genes involved. If a gene makes multiple proteins, biologists will need to deduce which protein is responsible for the problem.

Yet it remains to be seen just how relevant the new findings will be for understanding typical cell behavior. Stefan Stamm, a biologist at the University of Kentucky, notes that the study does not assess whether every observed protein interaction happens on a regular basis in real life. Previous work suggests that some of these proteins exist only in small numbers in the wild. But Stamm agrees that we’re ignorant of a lot of the variety in the world of proteins. “Personally, I think that there are more [alternative versions] than are being reported,” he said.

Hill estimates that the team has upped the number of genes known to make multiple proteins substantially. But “this is still the tip of the iceberg,” he notes. The team started with just 1,500 genes. Looking at 10,000 — half of all human genes — would make it clearer how widespread multiprotein genes are. The team might also choose to look deeper at a small handful of genes, getting a better picture of what their multiplicity of proteins is doing and observe how important they actually are within the cell. Either way, there is still much more to know.

The complexity implied by this finding may feel slightly overwhelming: How can we begin to unpack the biology of cells and tissues if there are so many different proteins coming from genes that, not all that long ago, people thought could make only a single one? But Kelleher said that in a sense, these results are reassuring. Theoretically, taking into account all the ways that a recipe provided by a gene could be interpreted — all the chances to substitute salt for sugar, say, or all the times when baking soda could replace baking powder — there could be up to 50 different proteins per gene.

This study suggests that in reality, only a small fraction of those possibilities are made. And only some of these proteins behave differently from one another. “People go ‘Oh my God, it’s so vast.’ But we can measure this,” he said hopefully. “It’s not so complicated as to be unknowable.”

Key Terms

  • DNA: a biopolymer of deoxyribonucleic acids (a type of nucleic acid) that has four different chemical groups, called bases: adenine, guanine, cytosine, and thymine
  • messenger RNA: Messenger RNA (mRNA) is a molecule of RNA that encodes a chemical &ldquoblueprint&rdquo for a protein product.
  • protein: any of numerous large, complex naturally-produced molecules composed of one or more long chains of amino acids, in which the amino acid groups are held together by peptide bonds


RNA is very similar to DNA with the following exceptions:

it is single stranded | it has uracil instead of thymine | it has the sugar ribose, instead of deoxyribose

The base-pair rule is followed during transcription, except, instead of pairing thymine with adenine, when creating an RNA strand, uracil is used

DNA Strand: T G C A T C A G A
RNA Strand: A C G U A G U C U

View the following animation: Transcription

Transcription begins on the area of DNA that contains the gene. Each gene has three regions:

1. Promoter - turns the gene on or off, defines the start of a gene
2. Coding region - has the information on how to construct the protein
3. Termination sequence - signals the end of the gene

RNA Polymerase is responsible for reading the gene, and building the mRNA strand. It reads only the 3' to 5' strand.
Introns – areas of the RNA that will not be expressed (“junk DNA”) and are spliced out
Exons – areas of RNA that will be expressed

Still confused: Check on these animations:

Why do so many scientists misunderstand the Central Dogma of Molecular Biology?

Scientists are careful people who do not make statements without knowing the facts. They try hard to get things right. So we think hope.

But there is a very curious case where intelligent, knowledgeable scientists seem to get things wrong at an alarming frequency. It is quite interesting, because the misunderstanding isn’t that hard to avoid. It is a mystery why scientists keep flunking it. In an added twist of diabolical irony, the very article published to explicitly demolish the misunderstanding is often cited in support of it!

I am thinking of the so-called Central Dogma of Molecular Biology, a statement about the information flow between DNA, RNA and protein. It was published in 1958 by Francis Crick, co-discoverer of the double-helical structure of DNA, Nobel Prize laureate 1962 and a true giant of 20th century science.

The misunderstanding of the Central Dogma can be stated in different ways. It is claimed that, e.g.:

  • The Central Dogma is a simplification since there are now many known exceptions to it.
  • The presence of feed-back loops in cellular signalling invalidates it.
  • Phenomena such as non-coding RNA regulation and expression violate it.
  • It is largely valid for prokaryotes but not for eukaryotes due to the more complex genetic regulation in the latter.

All these statements are wrong. Not about the facts of the biological phenomena in themselves, but in the claim that they affect the Central Dogma.

What does the Central Dogma say? Here is the full statement from Crick’s 1958 paper (Symp. Soc. Exp. Biol. 1958, vol 12, pp 138-163):

The Central Dogma

This states that once ‘information’ has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.

It is important to note that as stated here, the Central Dogma is even today exactly correct! Nothing in this statement has been shown to be wrong.

Incidentally, the paper containing the first statement of the Central Dogma is a veritable fireworks of ideas, interpretations and hypotheses. It is a joy to read. It’s a snapshot of a scientific field at a time of fascinating confusion and progress. To be sure, it does put forth a number of hypotheses which have turned out to be either flatly wrong or subtly mistaken. But the Central Dogma is not one of those.

In 1970, Crick published a paper (Nature 1970, vol 227, issue 5258, pp 561-563) to explicitly combat widespread misconceptions of the Central Dogma. It is this paper that is so often cited in support of those very same mistaken interpretations of the Central Dogma that Crick tried to eradicate. Rarely has a brilliant scientific paper failed so miserably! Referring to a paper containing the misunderstanding, he wrote:

This is not the first time that the idea of the central dogma has been misunderstood, in one way or another. In this article I explain why the term was originally introduced, its true meaning, and state why I think that, properly understood, it is still an idea of fundamental importance.

And he illustrated the Central Dogma in the following way:

The essential message of this image is the absence of arrows from protein to either RNA, DNA or protein. Crick wrote:

These are the three transfers which the central dogma postulates never occur:

Protein -> Protein
Protein -> DNA
Protein -> RNA

Or, in other words, there is no enzyme that can be called a ”protein-directed reverse translatase” or ”reverse ribosome”, nor is there any ”protein-directed protein polymerase”. This holds perfectly and exactly true to this day.

What is the basis of the misunderstanding? Why do so many believe that the Central Dogma has been superseded? Basically, it’s a confusion of information flow in the cell with information flow from the sequences of DNA into RNA and protein.

The mistake consists in believing that the Central Dogma is about information flow in general in the cell. It is not. The Central Dogma is about the specific sequence information in DNA, which gets transformed into RNA and protein.

As soon as it became reasonably clear which roles DNA and protein played, neither Crick nor really anyone else ever considered that the information flow in the cell did not involve proteins affecting DNA expression. In fact, the notion that any knowledgeable scientist, let alone Francis Crick, could have believed that DNA expression can occur without being regulated directly or indirectly by the environment is, frankly, ludicrous.

I have recently come across several cases of misunderstandings of the Central Dogma in Ph.D. theses. In Sweden, a Ph.D. thesis in the biomedical sciences most often consists of 3-5 regular scientific papers, either already published in scientific journals, or in manuscript form intended for publication. In addition, the thesis also contains an introduction which puts the results of the regular scientific papers in perspective. This introduction often contains a brief overview of the fundamental facts of molecular biology and biochemistry. Its function is in part to show that the student understands the field in general. It is here the misunderstanding usually rears its ugly head.

I will here quote from a recent Ph.D. thesis which clearly shows the misunderstanding. The name of the author is uninteresting, since I am concerned about the general phenomenon, not the individual scientist.

The central dogma of molecular biology […] is often stated in its popular form as ”DNA is transcribed into RNA and RNA is translated into proteins in the ribosomes”. The dogma in this form postulates that the information flow in a cell is essentially a one way process […]

Exceptions to this central flow are however numerous […]

Moreover, may proteins called transcription factors [..] bind to the DNA and influence the rate of transcription of genes, thus modifying how information is read.

In this example, there is even an illustration showing ”a more realistic representation” than the Central Dogma. Which is incorrect, because Crick’s Central Dogma is still exactly right, and there is no need of a ”more realistic representation”. The basis of the misunderstanding is clearly present in the last sentence of the first paragraph, where it is claimed that ”the central dogma postulates that the information flow in the cell…”. This is wrong. The Central Dogma does no such thing.

So how serious is this problem? In a rather simplistic exercise, I gathered a bunch of Ph.D. theses from 2009 to 2015 that were lying around at my place of work, SciLifeLab. In 9 of them, there was no mention of the Central Dogma, mostly because they focused on technology rather than basic science. In the remaining 8 thesis, 4 mention the Central Dogma and describe it correctly. The final 4 either get it completely wrong, or make statements that show evidence of the misunderstanding. Out of 16 (or 8, depending on how one sees it), 4 fail. That is not a very good statistic.

It should be noted that a possible explanation – that there is something wrong with the education leading up to the Ph.D. theses – does not seem likely. The pattern is not very consistent, so it is more likely that the young scientists have formed these views on their own. It would be interesting to figure out what the root cause is, but I have no idea how to do that.

What is the moral of this? It can be condensed into two rules:

  1. If you cite a paper, make sure that you have read it.
  2. Before making a judgement about a hypothesis, make sure you first understand it.

If you think morals is boring, then have a look at this wonderful film made in 1971 at Stanford, which illustrates how the ribosome acts to translate mRNA into protein, one of the transfers of sequence information allowed by the Central Dogma.

Addition: Casey Bergman (@caseybergman) made me aware of a blog post from 15 Jan 2007 by Laurence A. Moran which traces the origin of the misunderstanding all the way back to Jim Watson, of all people! He apparently messed up the vital distinction in the first edition of his textbook The Molecular Biology of the Gene (1965). And there are other textbooks that provide confusion on the subject as well. Sigh…

Correction: In the sentence ”In fact, the notion that…”, it now says ”without” rather than ”with”! Thanks James Gilbert for spotting!

One Gene and many proteins - Biology

The information content of DNA is in the form of specific sequences of nucleotides.

DNA dictates the synthesis of proteins, which are the links between genotype and phenotype.

The symptoms of an inherited disease reflect a person's inability to synthesize a particular enzyme.

The one gene - one enzyme hypothesis, but not all proteins are enzymes and yet their synthesis depends on specific genes.
The one gene - one protein hypothesis but many proteins are composed of several polypeptides, each of which has its own gene.

Therefore, the hypothesis has been restated as the one gene - one polypeptide hypothesis .

Transcription and translation are the two main processes linking gene to protein
The bridge between DNA and protein synthesis is RNA .

RNA is chemically similar to DNA, except that it contains ribose as its sugar and substitutes the nitrogenous base uracil for thymine.

An RNA molecule almost always consists of a single strand .

The specific sequence of hundreds or thousands of nucleotides in each gene carries the information for the primary structure of a protein, the linear order of the 20 possible amino acids.

To get from DNA, written in one chemical language, to protein, written in another, requires two major stages, transcription and translation .

During transcription , a DNA strand provides a template for the synthesis of a complementary RNA strand. Fig. 17.2

This process is used to synthesize any type of RNA from a DNA template.

Transcription of a gene produces a messenger RNA ( mRNA ) molecule.

During translation , the information contained in the order of nucleotides in mRNA is used to determine the amino acid sequence of a polypeptide.

Translation occurs at ribosomes .

The basic mechanics of transcription and translation are similar in eukaryotes and prokaryotes.

Because bacteria lack nuclei, transcription and translation are coupled.

In a eukaryotic cell, almost all transcription occurs in the nucleus and translation occurs mainly at ribosomes in the cytoplasm.

In addition, before the primary transcript can leave the nucleus it is modified in various ways during RNA processing before the finished mRNA is exported to the cytoplasm.

Nucleotide triplets specify amino acids

In the triplet code , three consecutive bases specify an amino acid, creating 43 (64) possible code words. Fig. 17.3.

During transcription, one DNA strand, the template strand , provides a template for ordering the sequence of nucleotides in an RNA transcript.

Uracil is the complementary base to adenine.

During translation, blocks of three nucleotides, codons , are decoded into a sequence of amino acids. The codons are read in the 5'->3' direction along the mRNA.

Each codon specifies which one of the 20 amino acids will be incorporated at the corresponding position along a polypeptide.

Nirenberg determined the first match: UUU coded for the amino acid phenylalanine.

He created an artificial mRNA molecule entirely of uracil and added it to a test tube mixture of amino acids, ribosomes, and other components for protein synthesis.

This "poly(U)" translated into a polypeptide containing a single amino acid, phenyalanine, in a long chain.

By the mid-1960s the entire code was deciphered. Fig 17.4.

61 of 64 triplets code for amino acids.

The codon AUG not only codes for the amino acid methionine but also indicates the start of translation.

Three codons do not indicate amino acids but signal the termination of translation.

To extract the message from the genetic code requires specifying the correct starting point .

This establishes the reading frame and subsequent codons are read in groups of three nucleotides.

The genetic code must have evolved very early in the history of life

The genetic code is nearly universal, shared by organisms from the simplest bacteria to the most complex plants and animals.

In laboratory experiments, genes can be transcribed and translated after they are transplanted from one species to another.

This has permitted bacteria to be programmed to synthesize certain human proteins after insertion of the appropriate human genes.

Transcription is the DNA-directed synthesis of RNA. Fig 17.6a

Messenger RNA is transcribed from the template strand of a gene.

RNA polymerase separates the DNA strands and bonds the RNA nucleotides to the 3' end of the growing polymer as they base-pair along the DNA template.

Genes are read 3'->5', creating a 5'->3' RNA molecule.

Specific sequences of nucleotides along the DNA mark where gene transcription begins and ends.

RNA polymerase attaches and initiates transcription at the promotor , "upstream" of the information contained in the gene, the transcription unit . Fig 17.7

The terminator signals the end of transcription.

Bacteria have a single type of RNA polymerase that synthesizes all RNA molecules.

Eukaryotes have three RNA polymerases (I, II, and III) in their nuclei.

RNA polymerase II is used for mRNA synthesis.

Transcription can be separated into three stages : initiation, elongation, and termination .

Initiation - The presence of a promotor sequence determines which strand of the DNA helix is the template.

Within the promotor is the starting point for the transcription of a gene.

The promotor also includes a binding site for RNA polymerase upstream of the start point.

In eukaryotes, proteins called transcription factors recognize the promotor region, especially a TATA box , and bind to the promotor.

After they have bound to the promotor, RNA polymerase binds to transcription factors to create a transcription initiation complex .

Elongation - RNA polymerase then starts transcription. Fig 17.6b

As RNA polymerase moves along the DNA, it untwists the double helix, and adds nucleotides to the 3' end of the growing strand.

Behind the point of RNA synthesis, the double helix re-forms and the RNA molecule peels away.

A single gene can be transcribed simultaneously by several RNA polymerases at a time. This helps the cell make the encoded protein in large amounts.

Termination - Transcription proceeds until after the RNA polymerase transcribes a terminator sequence in the DNA.

Transcription, the movie!
Eukaryotic cells modify RNA after transcription

At the 5' end of the pre-mRNA molecule, a modified form of guanine is added, the 5' cap, which helps protect mRNA from hydrolytic enzymes. Fig 17. 8.

At the 3' end, an enzyme adds, the poly(A) tail .

It inhibits hydrolysis, and enables ribosome attachment and the export of mRNA from the nucleus.

Most eukaryotic genes and their RNA transcripts have long noncoding stretches of nucleotides.

Noncoding segments, introns , lie between coding regions, exons , which are translated into amino acid sequences, plus the leader and trailer sequences.

RNA splicing removes introns and joins exons to create a mRNA molecule with a continuous coding sequence.

This splicing is accomplished by a spliceosome . Fig 17.10.

Spliceosomes consist of a variety of proteins and several small nuclear ribonucleoproteins ( snRNPs ).

Each snRNP has several protein molecules and a small nuclear RNA molecule ( snRNA ).

In this process, the snRNA acts as a ribozyme , an RNA molecule that functions as an enzyme.

RNA splicing appears to have several functions.

1. Some introns contain sequences that control gene activity in some way.

2. May regulate the passage of mRNA from the nucleus to the cytoplasm.

3. Enables one gene to encode for more than one polypeptide.

Alternative RNA splicing gives rise to two or more different polypeptides, depending on which segments are treated as exons. Fig 19.11.

Proteins often have a modular architecture with discrete structural and functional regions called domains . Fig 17.11.

In many cases, different exons code for different domains of a protein.

Introns increase the opportunity for recombination between two alleles of a gene.

Exon shuffling could lead to new proteins through novel combinations of functions.

Translation is the RNA-directed synthesis of a polypeptide. Fig 17.12.

Transfer RNA ( tRNA ) (2-dimensional image Fig 17.13a 3-dimensional and symbol Fig 17.13b) transfers amino acids from the cytoplasm's pool to a ribosome.

The ribosome adds each amino acid carried by tRNA to the growing end of the polypeptide chain.

During translation, each type of tRNA links a mRNA codon with the appropriate amino acid.

Each tRNA arriving at the ribosome carries a specific amino acid at one end and has a specific nucleotide triplet, an anticodon , at the other.

Codon by codon, tRNAs deposit amino acids in the prescribed order and the ribosome joins them into a polypeptide chain.

tRNA molecules are transcribed from DNA templates in the nucleus. Each tRNA is used repeatedly.

To pick up its designated amino acid in the cytosol.

To deposit the amino acid at the ribosome.

To return to the cytosol to pick up another copy of that amino acid.

The anticodons of some tRNAs recognize more than one codon. The rules for base pairing between the third base of the codon and anticodon are relaxed (called wobble ).

At the wobble position, U on the anticodon can bind with A or G in the third position of a codon.

Each amino acid is joined to the correct tRNA by aminoacyl-tRNA synthetase . Fig 17.14. The 20 different synthetases match the 20 different amino acids.

The synthetase catalyzes a covalent bond between them, forming aminoacyl-tRNA or activated amino acid .

Ribosomes facilitate the specific coupling of the tRNA anticodons with mRNA codons.

Each ribosome has a large and a small subunit. Fig 17.15

These are composed of proteins and ribosomal RNA ( rRNA ), the most abundant RNA in the cell.

Each ribosome has a binding site for mRNA and three binding sites for tRNA molecules.

The P site holds the tRNA carrying the growing polypeptide chain.

The A site carries the tRNA with the next amino acid.

Discharged tRNAs leave the ribosome at the E site .

RNA is the catalyst for peptide bond formation.

Translation can be divided into three stages : Initiation , Elongation , and Termination

Initiation brings together mRNA, a tRNA with the first amino acid, and the two ribosomal subunits. Fig 17.17.

First, a small ribosomal subunit binds with mRNA and a special initiator tRNA, which carries methionine and attaches to the start codon. AUG = initiator codon

Initiation factors bring in the large subunit such that the initiator tRNA occupies the P site.

Elongation consists of a series of three-step cycles as each amino acid is added to the proceeding one. Fig 17.18.

Termination occurs when one of the three stop codons reaches the A site. Fig 17.19.

Typically a single mRNA is used to make many copies of a polypeptide simultaneously.

Multiple ribosomes, polyribosomes , may trail along the same mRNA. Fig 17.20.

During and after synthesis, a polypeptide coils and folds to its three-dimensional shape spontaneously.

In addition, proteins may require posttranslational modifications .

This may require additions like sugars, lipids, or phosphate groups to amino acids.

Enzymes may remove some amino acids or cleave whole polypeptide chains.

Two or more polypeptides may join to form a protein.

Free ribosomes are suspended in the cytosol and synthesize proteins that reside in the cytosol.

Bound ribosomes are attached to the cytosolic side of the endoplasmic reticulum. Fig. 17.21.

They synthesize proteins of the endomembrane system as well as proteins secreted from the cell.

Translation in all ribosomes begins in the cytosol, but a polypeptide destined for the endomembrane system or for export has a specific signal peptide region at or near the leading end.

A signal recognition particle ( SRP ) binds to the signal peptide and attaches it and its ribosome to a receptor protein in the ER membrane.

After binding, the SRP leaves and protein synthesis resumes with the growing polypeptide snaking across the membrane into the cisternal space via a protein pore.

Other kinds of signal peptides are used to target polypeptides to mitochondria, chloroplasts, the nucleus, and other organelles that are not part of the endomembrane system.

In these cases, translation is completed in the cytosol before the polypeptide is imported into the organelle.

RNA plays multiple roles in the cell: a review Table 17.1.

Comparing protein synthesis in prokaryotes (Fig 17.22) and eukaryotes (Fig 17.25)

One big difference is that prokaryotes can transcribe and translate the same gene simultaneously.

The new protein quickly diffuses to its operating site.

In eukaryotes, the nuclear envelope segregates transcription from translation.

In addition, extensive RNA processing is inserted between these processes.

Point mutations can affect protein structure and function

Mutations are changes in the genetic material of a cell (or virus).

These include large-scale mutations in which long segments of DNA are affected (for example, translocations, duplications, and inversions).

A chemical change in just one base pair of a gene causes a point mutation .

If these occur in gametes or cells producing gametes, they may be transmitted to future generations.

For example, sickle-cell disease is caused by a mutation of a single base pair in the gene that codes for one of the polypeptides of hemoglobin. Fig 17.23

A point mutation that results in the replacement of a pair of complementary nucleotides with another nucleotide pair is called a base-pair substitution . Fig 17.24a.

Some base-pair substitutions have little or no impact on protein function.

Missense mutations are those that still code for an amino acid but change the indicated amino acid.

Nonsense mutations change an amino acid codon into a stop codon, nearly always leading to a nonfunctional protein.

Insertions and deletions are additions or losses of nucleotide pairs in a gene. Fig 17.24b.

These have a disastrous effect on the resulting protein more often than substitutions do.

Unless these mutations occur in multiples of three, they cause a frameshift mutation .

All the nucleotides downstream of the deletion or insertion will be improperly grouped into codons.

Mutations can occur during DNA replication, DNA repair, or DNA recombination.

These are called spontaneous mutations .

Mutagens are chemical or physical agents that interact with DNA to cause mutations.

Physical agents include high-energy radiation like X-rays and ultraviolet light.

This makes sense because most carcinogens are mutagenic and most mutagens are carcinogenic.

The Mendelian concept of a gene views it as a discrete unit of inheritance that affects phenotype.

A gene is a specific nucleotide sequence along a region of a DNA molecule.

A gene is a region of DNA whose final product is either a polypeptide or an RNA molecule.

You’ve probably heard about GMOs or Genetically Modified Organisms but what exactly is a gene and what does it mean to modify the genes of a plant or animal?

This short film is designed to help. Here we discuss a basic definition of a gene, show what a gene looks like, what it is that genes actually code for, and the basic idea behind Genetically Modified Organisms.

For Teachers

The content of this video meets criteria in the following Disciplinary Core Ideas defined by Next Generation Science Standards. Use our videos to supplement classroom curriculum.

High School, Life Science 1

From Molecules to Organisms: Structures and Processes.

High School, Life Science 3

Heredity: Inheritance and Variation of Traits.

High School, Life Science 4

Biological Evolution: Unity and Diversity.

Georgia Biology 2

How genetic information is expressed in cells.

Georgia Biology 3

How biological traits are passed on to successive generations.

Georgia Biology 6


Our videos benefit from guidance and advice provided by experts in science and education. This animation is the result of collaboration between the following scientists, educators, and our team of creatives.



Each one of our cells contains 46 strands of DNA. A single strand is made of millions of particles called nucleotides and these Nucleotides come in 4 different types which scientists have labeled A C T and G.

A gene is a special stretch of DNA, a sequence of As Cs Ts and Gs that codes for something.

A gene contains information for a cell to read and use but what exactly does that information do?

You might of heard that there’s a blue eye gene, a freckle gene, or possibly even an anger gene, but single genes don’t literally make things like eyeballs or freckles or temper tantrums. Genes make proteins. Those proteins then interact with each other and all sorts of chemicals inside the body, to build things like eye pigments, freckles, and mood altering hormones.

A single strand of DNA contains thousands of genes (or unique protein recipes). Humans have roughly 20,000 altogether. Some genes are small, only about 300 letters long. Others are well over a million.

The length and sequence of a gene determine the size and shape of the protein it builds. The size and shape of a protein determine the function that protein will have inside the body.

hemoglobin for example is a protein structure found in red blood cells. It’s unique shape and size allow it to capture oxygen molecules when blood flows near the lungs, and then release em later when blood flows near oxygen starved tissues.

Pepsin is a digestive protein. Its unique shape allows it to break down food inside your stomach so it can be absorbed in the body.

Keratin is a structural protein. It’s unique shape and size allow it to link together with other keratin proteins to form hard structures like fingernails, claws, and beaks.

Different creatures have different genes which is ultimately why their bodies look and function differently. But one of the many reasons that scientists believe all life on earth is related, is that the basic DNA code, the language of As Cs Ts and Gs is pretty much the same for all living things. Many creatures even share some of the same genes.

You might not be too surprised to learn that humans and chimps (which are closely related) share 96% or their genetics code but what would you think a lowely fruit fly has in common with a beautiful swimsuit model? Surprisingly, about half of its genes.

Because all creatures use DNA in pretty much the same way, genetic engineers have found that if they take a gene from say a bacteria cell and insert it into the DNA of an animal or plant cell. That animal or plant cell will then read the new gene and produce the bacterial protein.

Engineers have mixed and matched the genes of different organisms to produce many new creatures including corn that is toxic to insects but supposedly safe for human consumption, tomatoes that last up to twice as long in the grocery store before going bad and a new form of bacteria that produce the human protein insulin which we then collect from these bacteria and sell to people with Diabetes who need extra insulin to survive.

So just to sum things up a bit, what exactly is a gene? A gene is a special stretch of of DNA, not the entire strand of DNA, just a segment, that codes for something. Each gene is like a unique recipe which usually tells a cell how to make a protein or a group of proteins.

Different creatures have different genes, but all genes are written in the same basic DNA language of As Cs Ts and Gs.

What is the relationship between a gene and a protein?

The terminology can be somewhat confusing. Dysferlin is a protein, and "the dysferlin gene" means "the gene which contains the instructions for producing the dysferlin protein." Each gene tells the cell how to put together the building blocks for one specific protein. However, the gene (DNA) sits inside a different compartment of the cell (the nucleus) from the location of the cellular machines that make proteins (ribosomes). Therefore, the gene must first make a copy of itself (called messenger RNA - mRNA), which is smaller and more portable than DNA and is able to leave the nucleus to reach the ribosomes. A ribosome then reads each set of three nucleotides in the mRNA code and converts the instructions into a chain of amino acids that attach together to form a protein. The mRNA also tells the ribosome where to start the protein and when the protein is finished namely, when it should stop attaching new amino acids to the protein. Because the nucleotides are read in groups of three, it is important for the ribosome to know how to group the nucleotides. If the nucleotides are grouped incorrectly, the ribosome will choose the wrong amino acids and the protein will not function. Usually, when a protein is not properly produced, it is because there is some mutation in the gene which contains its instructions.

Watch the video: 15 Τροφές Πλούσιες Σε Πρωτεΐνη (November 2021).