18.1: Transcription—from DNA to RNA - Biology

Transcription: from DNA to RNA

Section summary

Bacteria, archaea, and eukaryotes must all transcribe genes from their genomes. While the cellular location may be different (eukaryotes perform transcription in the nucleus; bacteria and archaea perform transcription in the cytoplasm), the mechanisms by which organisms from each of these clades carry out this process are fundamentally the same and can be characterized by three stages: initiation, elongation, and termination.

A short overview of transcription

Transcription is the process of creating an RNA copy of a segment of DNA. Since this is a process, we want to apply the Energy Story rubric to develop a functional understanding of transcription. What does the system of molecules look like before the start of the transcription? What does it look like at the end? What transformations of matter and transfers of energy happen during the transcription and what, if anything, catalyzes the process? We also want to think about the process from a Design Challenge standpoint. If the biological task is to create a copy of DNA in the chemical language of RNA, what challenges can we reasonably hypothesize or anticipate, given our knowledge about other nucleotide polymer processes, must be overcome? Is there evidence that Nature solved these problems in different ways? What seem to be the criteria for success of transcription? You get the idea.

Listing some of the basic requirements for transcription

Let us first consider the tasks at hand by using some of our foundational knowledge and imagining what might need to happen during transcription if the goal is to make an RNA copy of a piece of one strand of a double-stranded DNA molecule. We'll see that using some basic logic allows us to infer many of the important questions and things that we need to know in order to properly describe the process.

Let's imagine that we want to design a nanomachine/nanobot that would conduct transcription. We can use some Design Challenge thinking to identify problems and subproblems that need to be solved by our little robot.

• Where should the machine start? Along the millions to billions of base pairs, where should the machine be directed?
• Where should the machine stop?
• If we have start and stop sites, we will need ways of encoding that information so that our machine(s) can read this information—how will that be accomplished?
• How many RNA copies of the DNA will we need to make?
• How fast do the RNA copies need to be made?
• How accurately do the copies need to be made?
• How much energy will the process take and where is the energy going to come from?

These are, of course, only some of the core questions. One can dig deeper if they wish. However, these are already good enough for us to start getting a good feel for this process. Notice, too, that many of these questions are remarkably similar to those we inferred might be necessary to understand about DNA replication.

The building blocks of transcription

The building blocks of RNA

Recall from our discussion on the structure of nucleotides that the building blocks of RNA are very similar to those in DNA. In RNA, the building blocks consists of nucleotide triphosphates that are composed of a ribose sugar, a nitrogenous base, and three phosphate groups. The key differences between the building blocks of DNA and those of RNA are that RNA molecules are composed of nucleotides with ribose sugars (as opposed to deoxyribose sugars) and utilize uridine, a uracil containing nucleotide (as opposed to thymidine in DNA). Note below that uracil and thymine are structurally very similar—the uracil is just lacking a methyl (CH3) functional group compared to thymine.

Figure 1. The basic chemical components of nucleotides.
Attribution: Marc T. Facciotti (original work)

Transcription initiation


Proteins responsible for creating an RNA copy of a specific piece of DNA (transcription) must first be able to recognize the beginning of the element to be copied. A promoter is a DNA sequence onto which various proteins, collectively known as the transcription machinery, bind and initiates transcription. In most cases, promoters exist upstream (5' to the coding region) of the genes they regulate. The specific sequence of a promoter is very important because it determines whether the corresponding coding portion of the gene is transcribed all the time, some of the time, or infrequently. Although promoters vary among species, a few elements of similar sequence are sometimes conserved. At the -10 and -35 regions upstream of the initiation site, there are two promoter consensus sequences, or regions that are similar across many promoters and across various species. Some promoters will have a sequence very similar to the consensus sequence (the sequence containing the most common sequence elements), and others will look very different. These sequence variations affect the strength to which the transcriptional machinery can bind to the promoter to initiate transcription. This helps to control the number of transcripts that are made and how often they get made.

Figure 2. (a) A general diagram of a gene. The gene includes the promoter sequence, an untranslated region (UTR), and the coding sequence. (b) A list of several strong E. coli promoter sequences. The -35 box and -10 box are highly conserved sequences throughout the strong promoter list. Weaker promoters will have more base pair differences when compared to these sequences.

Note: possible discussion

What types of interactions are changed between the transcription machinery and the DNA when the nucleotide sequence of the promoter changes? Why would some sequences create a "strong" promoter and why do others create a "weak" promoter?

Bacterial vs. eukaryotic promoters

In bacterial cells, the -10 consensus sequence, called the -10 region, is AT rich, often TATAAT. The -35 sequence, TTGACA, is recognized and bound by the protein σ. Once this protein-DNA interaction is made, the subunits of the core RNA polymerase bind to the site. Due to the relatively lower stability of AT associations, the AT-rich -10 region facilitates unwinding of the DNA template, and several phosphodiester bonds are made.

Eukaryotic promoters are much larger and more complex than prokaryotic promoters, but both have an AT-rich region—in eukaryotes, it is typically called a TATA box. For example, in the mouse thymidine kinase gene, the TATA box is located at approximately -30. For this gene, the exact TATA box sequence is TATAAAA, as read in the 5' to 3' direction on the nontemplate strand. This sequence is not identical to the E. coli -10 region, but both share the quality of being AT-rich element.

Instead of a single bacterial polymerase, the genomes of most eukaryotes encode three different RNA polymerases, each made up of ten protein subunits or more. Each eukaryotic polymerase also requires a distinct set of proteins known as transcription factors to recruit it to a promoter. In addition, an army of other transcription factors, proteins known as enhancers, and silencers help to regulate the synthesis of RNA from each promoter. Enhancers and silencers affect the efficiency of transcription but are not necessary for the initiation of transcription or its procession. Basal transcription factors are crucial in the formation of a preinitiation complex on the DNA template that subsequently recruits RNA polymerase for transcription initiation.

Initiation of transcription begins with the binding of RNA polymerase to the promoter. Transcription requires the DNA double helix to partially unwind such that one strand can be used as the template for RNA synthesis. The region of unwinding is called a transcription bubble.

Figure 3. During elongation, RNA polymerase tracks along the DNA template, synthesizes mRNA in the 5' to 3' direction, and unwinds then rewinds the DNA as it is read.


Transcription always proceeds from the template strand, one of the two strands of the double-stranded DNA. The RNA product is complementary to the template strand and is almost identical to the nontemplate strand, called the coding strand, with the exception that RNA contains a uracil (U) in place of the thymine (T) found in DNA. During elongation, an enzyme called RNA polymerase proceeds along the DNA template, adding nucleotides by base pairing with the DNA template in a manner similar to DNA replication, with the difference being an RNA strand that is synthesized does not remain bound to the DNA template. As elongation proceeds, the DNA is continuously unwound ahead of the core enzyme and rewound behind it. Note that the direction of synthesis is identical to that of synthesis in DNA—5' to 3'.

Figure 4. During elongation, RNA polymerase tracks along the DNA template, synthesizing mRNA in the 5' to 3' direction, unwinding and then rewinding the DNA as it is read.

Figure 5. The addition of nucleotides during the process of transcription is very similar to nucleotide addition in DNA replication. The RNA is polymerized from 5' to 3', and with each addition of a nucleotide, a phosphoanhidride bond is hydrolized by the enzyme, resulting in a longer polymer and the release of two inorganic phosphates.

Note: possible discussion

Compare and contrast the energy story for the addition of a nucleotide in DNA replication to the addition of a nucleotide in transcription.

Bacterial vs. eukaryotic elongation

In bacteria, elongation begins with the release of the σ subunit from the polymerase. The dissociation of σ allows the core enzyme to proceed along the DNA template, synthesizing mRNA in the 5' to 3' direction at a rate of approximately 40 nucleotides per second. The base pairing between DNA and RNA is not stable enough to maintain the stability of the mRNA synthesis components. Instead, the RNA polymerase acts as a stable linker between the DNA template and the nascent RNA strands to ensure that elongation is not interrupted prematurely.

In eukaryotes, following the formation of the preinitiation complex, the polymerase is released from the other transcription factors, and elongation is allowed to proceed as it does in prokaryotes with the polymerase synthesizing pre-mRNA in the 5' to 3' direction. As discussed previously, RNA polymerase II transcribes the major share of eukaryotic genes, so this section will focus on how this polymerase accomplishes elongation and termination.


In bacteria

Once a gene is transcribed, the bacterial polymerase needs to be instructed to dissociate from the DNA template and liberate the newly made mRNA. Depending on the gene being transcribed, there are two kinds of termination signals. One is protein-based and the other is RNA-based. Rho-dependent termination is controlled by the rho protein, which tracks along behind the polymerase on the growing mRNA chain. Near the end of the gene, the polymerase encounters a run of G nucleotides on the DNA template and it stalls. As a result, the rho protein collides with the polymerase. The interaction with rho releases the mRNA from the transcription bubble.

Rho-independent termination is controlled by specific sequences in the DNA template strand. As the polymerase nears the end of the gene being transcribed, it encounters a region rich in CG nucleotides. The mRNA folds back on itself, and the complementary CG nucleotides bind together. The result is a stable hairpin that causes the polymerase to stall as soon as it begins to transcribe a region rich in AT nucleotides. The complementary UA region of the mRNA transcript forms only a weak interaction with the template DNA. This, coupled with the stalled polymerase, induces enough instability for the core enzyme to break away and liberate the new mRNA transcript.

In eukaryotes

The termination of transcription is different for the different polymerases. Unlike in prokaryotes, elongation by RNA polymerase II in eukaryotes takes place 1,000–2,000 nucleotides beyond the end of the gene being transcribed. This pre-mRNA tail is subsequently removed by cleavage during mRNA processing. On the other hand, RNA polymerases I and III require termination signals. Genes transcribed by RNA polymerase I contain a specific 18-nucleotide sequence that is recognized by a termination protein. The process of termination in RNA polymerase III involves an mRNA hairpin similar to rho-independent termination of transcription in prokaryotes.

In archaea

Termination of transcription in the archaea is far less studied than in the other two domains of life and is still not well understood. While the functional details are likely to resemble mechanisms that have been seen in the other domains of life, the details are beyond the scope of this course.

Cellular location

In bacteria and archaea

In bacteria and archaea, transcription occurs in the cytoplasm, where the DNA is located. Because the location of the DNA, and thus the process of transcription, are not physically segregated from the rest of the cell, translation often starts before transcription has finished. This means that mRNA in bacteria and archaea is used as the template for a protein before the entire mRNA is produced. The lack of spacial segregation also means that there is very little temporal segregation for these processes. Figure 6 shows the processes of transcription and translation occurring simultaneously.

Figure 6. The addition of nucleotides during the process of transcription is very similar to nucleotide addition in DNA replication.
Source: Marc T. Facciotti (own work)

In eukaryotes....

In eukaryotes, the process of transcription is physically segregated from the rest of the cell, sequestered inside of the nucleus. This results in two things: the mRNA is completed before translation can start, and there is time to "adjust" or "edit" the mRNA before translation starts. The physical separation of these processes gives eukaryotes a chance to alter the mRNA in such a way as to extend the lifespan of the mRNA or even alter the protein product that will be produced from the mRNA.

MRNA processing

5' G-cap and 3' poly-A tail

When a eukaryotic gene is transcribed, the primary transcript is processed in the nucleus in several ways. Eukaryotic mRNAs are modified at the 3' end by the addition of a poly-A tail. This run of A residues is added by an enzyme that does not use genomic DNA as a template. Additionally, the mRNAs have a chemical modification of the 5' end, called a 5'-cap. Data suggests that these modifications both help to increase the lifespan of the mRNA (prevent its premature degradation in the cytoplasm) as well as to help the mRNA initiate translation.

Figure 7. pre-mRNAs are processed in a series of steps. Introns are removed, a 5' cap and poly-A tail are added.

Alternative splicing

Splicing occurs on most eukaryotic mRNAs in which introns are removed from the mRNA sequence and exons are ligated together. This can create a much shorter mRNA than initially transcribed. Splicing allows cells to mix and match which exons are incorporated into the final mRNA product. As shown in the figure below, this can lead to multiple proteins being coded for by a single gene.

Figure 8. The information stored in the DNA is finite. In some cases, organisms can mix and match this information to create different end products. In eukaryotes, alternative splicing allows for the creation of different mRNA products, which in turn are used in translation to create different protein sequences. This ultimately leads to the production of different protein shapes, and thus different protein functions.

SC.912.L.18.1 Macromolecules

As food travels through the digestive system, it is exposed to a variety of pH levels. The stomach has a pH of 2 due to the presence of hydrochloride acid (HCl), and the small intestine has a pH ranging from 7 to 9. HCl converts pepsinogen into pepsin, an enzyme that digests proteins in the stomach. Which of the following most likely happens to pepsin as it enters the small intestine?

A. It becomes inactive.

B. It begins to replicate.

C. It's shape changes to engulf large proteins.

D. It's activity increases to digest more proteins.

Molecular Cell Biology. 4th edition.

Although DNA stores the information for protein synthesis and RNA carries out the instructions encoded in DNA, most biological activities are carried out by proteins. The accurate synthesis of proteins thus is critical to the proper functioning of cells and organisms. We saw in Chapter 3 that the linear order of amino acids in each protein determines its three-dimensional structure and activity. For this reason, assembly of amino acids in their correct order, as encoded in DNA, is the key to production of functional proteins.

Three kinds of RNA molecules perform different but cooperative functions in protein synthesis (Figure 4-20):

Figure 4-20

The three roles of RNA in protein synthesis. Messenger RNA (mRNA) is translated into protein by the joint action of transfer RNA (tRNA) and the ribosome, which is composed of numerous proteins and two major ribosomal RNA (rRNA) molecules. [Adapted from (more. )

Messenger RNA (mRNA) carries the genetic information copied from DNA in the form of a series of three-base code “words,” each of which specifies a particular amino acid.

Transfer RNA (tRNA) is the key to deciphering the code words in mRNA. Each type of amino acid has its own type of tRNA, which binds it and carries it to the growing end of a polypeptide chain if the next code word on mRNA calls for it. The correct tRNA with its attached amino acid is selected at each step because each specific tRNA molecule contains a three-base sequence that can base-pair with its complementary code word in the mRNA.

Ribosomal RNA (rRNA) associates with a set of proteins to form ribosomes. These complex structures, which physically move along an mRNA molecule, catalyze the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis. Ribosomes are composed of a large and small subunit, each of which contains its own rRNA molecule or molecules.

Translation is the whole process by which the base sequence of an mRNA is used to order and to join the amino acids in a protein. The three types of RNA participate in this essential protein-synthesizing pathway in all cells in fact, the development of the three distinct functions of RNA was probably the molecular key to the origin of life. How each RNA carries out its specific task is discussed in this section, while the biochemical events in protein synthesis and the required protein factors are described in the final section of the chapter.

Portions of DNA Sequence Are Transcribed into RNA

The first step a cell takes in reading out a needed part of its genetic instructions is to copy a particular portion of its DNA nucleotide sequence𠅊 gene—into an RNA nucleotide sequence. The information in RNA, although copied into another chemical form, is still written in essentially the same language as it is in DNA—the language of a nucleotide sequence. Hence the name transcription.

Like DNA, RNA is a linear polymer made of four different types of nucleotide subunits linked together by phosphodiester bonds (Figure 6-4). It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides—that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A (Figure 6-5), the complementary base-pairing properties described for DNA in Chapters 4 and 5 apply also to RNA (in RNA, G pairs with C, and A pairs with U). It is not uncommon, however, to find other types of base pairs in RNA: for example, G pairing with U occasionally.

Figure 6-4

The chemical structure of RNA. (A) RNA contains the sugar ribose, which differs from deoxyribose, the sugar used in DNA, by the presence of an additional -OH group. (B) RNA contains the base uracil, which differs from thymine, the equivalent base in DNA, (more. )

Figure 6-5

Uracil forms base pairs with adenine. The absence of a methyl group in U has no effect on base-pairing thus, U-A base pairs closely resemble T-A base pairs (see Figure 4-4).

Despite these small chemical differences, DNA and RNA differ quite dramatically in overall structure. Whereas DNA always occurs in cells as a double-stranded helix, RNA is single-stranded. RNA chains therefore fold up into a variety of shapes, just as a polypeptide chain folds up to form the final shape of a protein (Figure 6-6). As we see later in this chapter, the ability to fold into complex three-dimensional shapes allows some RNA molecules to have structural and catalytic functions.

Figure 6-6

RNA can fold into specific structures. RNA is largely single-stranded, but it often contains short stretches of nucleotides that can form conventional base-pairs with complementary sequences found elsewhere on the same molecule. These interactions, along (more. )

39 Transcription: from DNA to RNA

Both prokaryotes and eukaryotes perform fundamentally the same process of transcription, with the important difference of the membrane-bound nucleus in eukaryotes. With the genes bound in the nucleus, transcription occurs in the nucleus of the cell and the mRNA transcript must be transported to the cytoplasm. The prokaryotes, which include bacteria and archaea, lack membrane-bound nuclei and other organelles, and transcription occurs in the cytoplasm of the cell.

Transcription requires the DNA double helix to partially unwind in the region of mRNA synthesis. The DNA sequence onto which the proteins and enzymes involved in transcription bind to initiate the process is called a promoter. In most cases, promoters exist upstream of the genes they regulate. The specific sequence of a promoter is very important because it determines whether the corresponding gene is transcribed all of the time, some of the time, or hardly at all.

Figure 2: The initiation of transcription begins when DNA is unwound, forming a transcription bubble. Enzymes and other proteins involved in transcription bind at the promoter. Note the base-pairing between the RNA transcript and the template strand of DNA. From: Wikimedia public domain.

Transcription always proceeds from one of the two DNA strands, which is called the template strand. The mRNA product is complementary to the template strand and is almost identical to the other DNA strand, called the non-template strand, with the exception that RNA contains a uracil (U) in place of the thymine (T) found in DNA. This means that the base-pairing rules between a DNA molecule and an RNA molecule are:


An enzyme called RNA polymerase proceeds along the DNA template adding nucleotides by base pairing with the DNA template in a manner similar to DNA replication.

Figure 3: During elongation, RNA polymerase tracks along the DNA template, synthesizes mRNA in the 5′ to 3′ direction, and unwinds then rewinds the DNA as it is read. Again, notice the base-pairing between the template strand of DNA and the newly forming RNA.

Once a gene is transcribed, the RNA polymerase needs to be instructed to dissociate from the DNA template and liberate the newly made mRNA.

In a prokaryotic cell, by the time transcription ends, the transcript would already have been used to begin making copies of the encoded protein because the processes of transcription and translation can occur at the same time since both occur in the cytoplasm (Figure 4). In contrast, transcription and translation cannot occur simultaneously in eukaryotic cells since transcription occurs inside the nucleus and translation occurs outside in the cytoplasm.

Figure 4: Multiple polymerases can transcribe a single bacterial gene while numerous ribosomes concurrently translate the mRNA transcripts into polypeptides. In this way, a specific protein can rapidly reach a high concentration in the bacterial cell.

Reverse Transcription

ttsz/iStock/Getty Images Plus

In reverse transcription, RNA is used as a template to produce DNA. The enzyme reverse transcriptase transcribes RNA to generate a single strand of complementary DNA (cDNA). The enzyme DNA polymerase converts the single-stranded cDNA into a double-stranded molecule as it does in DNA replication. Special viruses known as retroviruses use reverse transcription to replicate their viral genomes. Scientists also use reverse transcriptase processes to detect retroviruses.

Eukaryotic cells also use reverse transcription to extend the end sections of chromosomes known as telomeres. The enzyme telomerase reverse transcriptase is responsible for this process. The extension of telomeres produces cells that are resistant to apoptosis, or programmed cell death, and become cancerous. The molecular biology technique known as reverse transcription-polymerase chain reaction (RT-PCR) is used to amplify and measure RNA. Since RT-PCR detects gene expression, it can also be used to detect cancer and in aid genetic disease diagnosis.

18.1: Transcription—from DNA to RNA - Biology


This page takes a simple look at the structure of RNA and how the information in DNA is used to make messenger RNA. It is designed for 16 - 18 year old chemistry students, and if you are doing biology or biochemistry, you will probably need more detail than this page gives.

Note: If you have come straight to this page from a search engine), you should be aware that this is the third page in a sequence of pages about DNA. These pages are written to be read one after the other, so unless you already understand the structure of DNA, follow this link to start from the beginning.

The function of messenger RNA in the cell

You will probably know that the sequence of bases in DNA carries the genetic code. Scattered along the DNA molecule are particularly important sequences of bases known as genes. Each gene is a coded description for making a particular protein.

Note: It would be more accurate to say that each gene coded for a particular polypeptide, because some proteins are made of more than one polypeptide chain. For simplicity, I'm going to refer from now on to the synthesis of a protein, rather than a polypeptide - it sounds less scary!

To be really accurate, some genes code for other sorts of molecule apart from proteins, but we are only going to be looking at the genes involved in protein synthesis.

Getting from the code in DNA to the final protein is a very complicated process.

The code is first transcribed ("copied", although with one important difference - see later) to messenger RNA. That then travels out of the nucleus of the cell (where the DNA is found) into the cytoplasm of the cell. The cytoplasm contains essentially everything else in the cell apart from the nucleus. Here the code is read and the protein is synthesised with the help of two other forms of RNA - ribosomal RNA and transfer RNA. We'll talk a lot more about those in a later page.

I'm going to take this complicated process very gently - a bit at a time!

How does messenger RNA differ from DNA?

There are several important differences.

RNA is much shorter than DNA. DNA contains the code for making lots and lots of different proteins. Messenger RNA contains the information to make just one single polypeptide chain - in other words for just one protein, or even just a part of a protein if it is made up of more than one polypeptide chain.

Overall structure

DNA has two strands arranged in a double helix. RNA consists of a single strand.

The sugar present in the backbone of the chain

DNA (deoxyribonucleic acid) has a backbone of alternating deoxyribose and phosphate groups. In RNA (ribonucleic acid), the sugar ribose replaces deoxyribose.

If you have read this sequence of pages from the beginning, you will already have come across the difference between these two sugars. But to remind you . . .

The only difference is the presence of an -OH group on the 2' carbon atom in ribose.

Note: If you don't understand what 2' means, you obviously haven't read the first page in this sequence of pages. It's a bad idea trying to take short cuts with this!

RNA uses the base uracil (U) rather than thymine (T)

The structure of uracil is very similar to that of thymine.

The nitrogen shown in blue in the uracil is the one which attaches to the 1' carbon in the ribose. In the process, the hydrogen shown in blue is lost together with the -OH group on the 1' carbon in the ribose.

The only difference between the two molecules is the presence or absence of the CH3 group.

Uracil can form exactly the same hydrogen bonds with adenine as thymine can - the shape of the two molecules is exactly the same where it matters.

Compare the hydrogen bonding between adenine (A) and thymine (T):

. . . with that between adenine (A) and uracil (U):

In DNA the hydrogen bonding between A and T helps to tie the two strands together into the double helix. That isn't relevant in RNA because it is only a single strand. However, you will find several examples in what follows on this and further pages where the ability of adenine (A) to attract and bond with uracil (U) is central to the processes going on.

The base pairing of guanine (G) and cytosine (C) is just the same in DNA and RNA.

So in RNA the important base pairs are:

adenine (A) pairs with uracil (U)

guanine (G) pairs with cytosine (C).

Transcription is the name given to the process where the information in a gene in a DNA strand is transferred to an RNA molecule.

The coding strand and the template strand of DNA

The important thing to realise is that the genetic information is carried on only one of the two strands of the DNA. This is known as the coding strand.

The other strand is known as the template strand, for reasons which will become obvious is a moment.

Note: These two strands are often given other names as well, sometimes in a very confusing way (at least to a non-biochemist!). The two terms coding and template are commonly used, and seem to me to best describe the function of the two chains.

The coding strand

The information in a gene on the coding strand is read in the direction from the 5' end to the 3' end.

Remember that the 5' end is the end which has the phosphate group attached to the 5' carbon atom. The 3' end is the end where the phosphate is attached to a 3' carbon atom - or if it is at the very end of the DNA chain has a free -OH group on the 3' carbon.

You may remember this diagram of a tiny part of a DNA chain from the first page in this sequence:

If the left-hand chain was the coding chain, the genetic code would be read from the top end (the 5' end) downwards. The code in this very small fragment of a gene would be read as ". . . A T T G C . . .".

The template strand

The template strand is complementary to the coding strand. That means that every A on the coding strand is matched by a T on the template strand (and vice versa). Every G on the coding strand is matched by a C on the template strand (and again vice versa).

If you took the template strand and built a new DNA strand on it (as happens in DNA replication), you would get an exact copy of the original DNA coding strand formed.

Almost exactly the same thing happens when you make RNA. If you build an RNA strand on the template strand, you will get a copy of the information on the DNA coding strand - but with one important difference.

In RNA, uracil (U) is used instead of thymine (T). So if the original DNA coding strand had the sequence A T T G C T, this would end up in the RNA as A U U G C U - everything is exactly the same except that every T had been replaced by U.

The transcription process

Finding the start of the gene on the coding strand

Transcription is under the control of the enzyme RNA polymerase. The first thing that the enzyme has to do is to find the start of the gene on the coding strand of the DNA. Remember that DNA has lots of genes strung out along the coding strand. That means that the enzyme has to pick the right strand and identify the beginning of each gene.

It does this by recognising and binding with one or more short sequences of bases "upstream" of the start of each gene. "Upstream" means that it is slightly closer to the 5' end of the DNA strand than the gene.

These base sequences are known as promoter sequences.

Remember that the two strands of DNA are hydrogen bonded together. You can think of the enzyme as being wrapped around both strands. In fact, the enzyme is big enough to enclose not only the promoter sequence but the beginning of the gene itself.

Transcribing the gene and making the RNA

Once the enzyme has attached to the DNA, it unwinds the double helix over a short length, and splits the two strands apart. This gives a "bubble" in which the coding strand and template strand are separated over the length of about 10 bases.

The next diagram shows the enzyme in the process of starting to make the new RNA strand.

New nucleotides are added to the growing RNA chain at the 3' end. The next nucleotides to be added in the example here would contain the bases G and then C. The new G in the RNA would complement the C below it in the template strand. Next after that in the template strand is a G. That would be complemented by a C in the growing RNA.

Note: Remember that a nucleotide contains the base attached to a sugar (in this case, ribose) which is attached to a phosphate group. The ribose and the phosphate add to the backbone of the RNA chain with the bases hanging off that backbone.

Now compare the bit of RNA with the coding strand directly above it. Apart from the fact that every thymine (T) is now a uracil (U) instead, the chains are identical.

Now the enzyme moves along the DNA, zipping it up again behind it. Essentially it moves the bubble along the chain, adding new nucleotides all the time. The growing RNA tail becomes detached from the template strand as the enzyme moves along.

How does the enzyme know where to stop after it reaches the end of the gene? You will remember that it recognises the beginning of the gene by the presence of a promoter sequence of bases upstream of the start.

After the end of the gene ("downstream" of the gene), there will be a termination sequence of bases. Once the enzyme gets to those, it stops adding new nucleotides to the chain and detaches the RNA molecule completely from the template chain.

So . . . we've produced a molecule of messenger RNA - so called because it is now going to carry the genetic code (the message) out of the nucleus of the cell to the cytoplasm where protein synthesis can take place.

Before we look at how that synthesis works, we need to stop and consider the nature of the code itself. That's on the next page in this sequence.

Questions to test your understanding

If this is the first set of questions you have done, please read the introductory page before you start. You will need to use the BACK BUTTON on your browser to come back here afterwards.

Overview of Transcription

Transcription is the first stage of the expression of genes into proteins. In transcription, an mRNA (messenger RNA) intermediate is transcribed from one of the strands of the DNA molecule. The RNA is called messenger RNA because it carries the "message," or genetic information, from the DNA to the ribosomes, where the information is used to make proteins. RNA and DNA use complementary coding where base pairs match up, similar to how the strands of DNA bind to form a double helix.

One difference between DNA and RNA is that RNA uses uracil in place of the thymine used in DNA. RNA polymerase mediates the manufacture of an RNA strand that complements the DNA strand. RNA is synthesized in the 5' -> 3' direction (as seen from the growing RNA transcript). There are some proofreading mechanisms for transcription, but not as many as for DNA replication. Sometimes coding errors occur.

How to identify a transcription start site

The critical issue in mapping a true site of transcription initiation is to be able to distinguish it from a 5' end generated by RNA cleavage or degradation and from a 5' end generated by incomplete copying of RNA into cDNA. The conventional hallmark of TSSs in most eukaryotes is addition of a 7-methyl guanosine cap structure to the 5'-triphosphate of the first base transcribed by RNA polymerase II. This unique feature of the transcription initiation nucleotide is the basis of several methods aiming to enrich and identify capped messages and subsequently to map the exact positions in the genome of the nucleotides to which the cap is added. The main methods used are cap analysis of gene expression (CAGE) [12], oligo-capping [13] and robust analysis of 5'-transcript ends (5'-RATE) [14]. CAGE is the most commonly used and exploits the 2',3'-diol structure of the cap nucleotide, which is only present in only one other place on an RNA molecule besides the cap - its extreme 3' end. The diol structure is susceptible to a specific chemical oxidation which can be followed by biotinylation, enabling selection of capped messages by immunoprecipitation with streptavidin. The enriched capped RNA fraction is then converted into cDNAs that span the entire lengths of the capped RNA molecules. Oligo-capping and 5'-RATE take advantage of the fact that the 5' cap is resistant to phosphatase treatment, which removes mono-, di- or triphosphates from cleaved or degraded RNA. Subsequent removal of the cap using tobacco acid pyrophosphatase leaves a 5'-monophosphate, which is amenable to ligation with a specific linker nucleotide that marks the position of the native 5' end of RNA and can later be used to select and sequence the 5' ends of capped cDNAs [13, 14].

Full-length cDNAs generated by the techniques described above can be further converted into short DNA tags derived from their 5' ends [12, 13, 15], which are very suitable for next-generation sequencing [16]. The combination of cap-selection and next-generation sequencing can generate sequence information about the exact positions of cap-addition sites for millions of RNA molecules [4, 15, 17], thus making it possible to obtain digital information about the number of transcriptional initiation events occurring at any genomic position. This information can be used to infer the positions, as well as the relative strengths, of different promoter elements [15], as exemplified in the recent articles from the FANTOM consortium [9–11]. It can also be correlated with information on the positions of other annotated genomic elements, such as repetitive elements [10] or short RNAs [9, 18], to identify any association between these elements and transcription initiation.

Transcription in Eukaryotes

Prokaryotes and eukaryotes perform fundamentally the same process of transcription, with a few significant differences (see Table 1). Eukaryotes use three different polymerases, RNA polymerases I, II, and III, all structurally distinct from the bacterial RNA polymerase. Each transcribes a different subset of genes. Interestingly, archaea contain a single RNA polymerase that is more closely related to eukaryotic RNA polymerase II than to its bacterial counterpart. Eukaryotic mRNAs are also usually monocistronic, meaning that they each encode only a single polypeptide, whereas prokaryotic mRNAs of bacteria and archaea are commonly polycistronic, meaning that they encode multiple polypeptides.

The most important difference between prokaryotes and eukaryotes is the latter’s membrane-bound nucleus, which influences the ease of use of RNA molecules for protein synthesis. With the genes bound in a nucleus, the eukaryotic cell must transport protein-encoding RNA molecules to the cytoplasm to be translated. Protein-encoding primary transcripts, the RNA molecules directly synthesized by RNA polymerase, must undergo several processing steps to protect these RNA molecules from degradation during the time they are transferred from the nucleus to the cytoplasm and translated into a protein. For example, eukaryotic mRNAs may last for several hours, whereas the typical prokaryotic mRNA lasts no more than 5 seconds.

The primary transcript (also called pre-mRNA) is first coated with RNA-stabilizing proteins to protect it from degradation while it is processed and exported out of the nucleus. The first type of processing begins while the primary transcript is still being synthesized a special 7-methylguanosine nucleotide, called the 5′ cap, is added to the 5′ end of the growing transcript. In addition to preventing degradation, factors involved in subsequent protein synthesis recognize the cap, which helps initiate translation by ribosomes. Once elongation is complete, another processing enzyme then adds a string of approximately 200 adenine nucleotides to the 3′ end, called the poly-A tail. This modification further protects the pre-mRNA from degradation and signals to cellular factors that the transcript needs to be exported to the cytoplasm.

Eukaryotic genes that encode polypeptides are composed of coding sequences called exons (ex-on signifies that they are expressed) and intervening sequences called introns (int-ron denotes their intervening role). Transcribed RNA sequences corresponding to introns do not encode regions of the functional polypeptide and are removed from the pre-mRNA during processing. It is essential that all of the intron-encoded RNA sequences are completely and precisely removed from a pre-mRNA before protein synthesis so that the exon-encoded RNA sequences are properly joined together to code for a functional polypeptide. If the process errs by even a single nucleotide, the sequences of the rejoined exons would be shifted, and the resulting polypeptide would be nonfunctional. The process of removing intron-encoded RNA sequences and reconnecting those encoded by exons is called RNA splicing and is facilitated by the action of a spliceosome containing small nuclear ribonucleo proteins (snRNPs). Intron-encoded RNA sequences are removed from the pre-mRNA while it is still in the nucleus. Although they are not translated, introns appear to have various functions, including gene regulation and mRNA transport. On completion of these modifications, the mature transcript, the mRNA that encodes a polypeptide, is transported out of the nucleus, destined for the cytoplasm for translation. Introns can be spliced out differently, resulting in various exons being included or excluded from the final mRNA product. This process is known as alternative splicing. The advantage of alternative splicing is that different types of mRNA transcripts can be generated, all derived from the same DNA sequence. In recent years, it has been shown that some archaea also have the ability to splice their pre-mRNA.

Table 1. Comparison of Transcription in Bacteria Versus Eukaryotes
Property Bacteria Eukaryotes
Number of polypeptides encoded per mRNA Monocistronic or polycistronic Exclusively monocistronic
Strand elongation core + σ = holoenzyme RNA polymerases I, II, or III
Addition of 5′ cap No Yes
Addition of 3′ poly-A tail No Yes
Splicing of pre-mRNA No Yes

Visualize how mRNA splicing happens by watching the process in action in this video.

Think about It

  • In eukaryotic cells, how is the RNA transcript from a gene for a protein modified after it is transcribed?
  • Do exons or introns contain information for protein sequences?

Clinical Focus: Travis, Part 2

This example continues Travis’s story that started in The Functions of Genetic Material.

In the emergency department, a nurse told Travis that he had made a good decision to come to the hospital because his symptoms indicated an infection that had gotten out of control. Travis’s symptoms had progressed, with the area of skin affected and the amount of swelling increasing. Within the affected area, a rash had begun, blistering and small gas pockets underneath the outermost layer of skin had formed, and some of the skin was becoming gray. Based on the putrid smell of the pus draining from one of the blisters, the rapid progression of the infection, and the visual appearance of the affected skin, the physician immediately began treatment for necrotizing fasciitis. Travis’s physician ordered a culture of the fluid draining from the blister and also ordered blood work, including a white blood cell count.

Travis was admitted to the intensive care unit and began intravenous administration of a broad-spectrum antibiotic to try to minimize further spread of the infection. Despite antibiotic therapy, Travis’s condition deteriorated quickly. Travis became confused and dizzy. Within a few hours of his hospital admission, his blood pressure dropped significantly and his breathing became shallower and more rapid. Additionally, blistering increased, with the blisters intensifying in color to purplish black, and the wound itself seemed to be progressing rapidly up Travis’s leg.

  • What are possible causative agents of Travis’s necrotizing fasciitis?
  • What are some possible explanations for why the antibiotic treatment does not seem to be working?

We’ll return to Travis’s example in later pages.

Key Concepts and Summary

  • During transcription, the information encoded in DNA is used to make RNA.
  • RNA polymerase synthesizes RNA, using the antisense strand of the DNA as template by adding complementary RNA nucleotides to the 3′ end of the growing strand.
  • RNA polymerase binds to DNA at a sequence called a promoter during the initiation of transcription.
  • Genes encoding proteins of related functions are frequently transcribed under the control of a single promoter in prokaryotes, resulting in the formation of a polycistronic mRNA molecule that encodes multiple polypeptides.
  • Unlike DNA polymerase, RNA polymerase does not require a 3′-OH group to add nucleotides, so a primer is not needed during initiation.
  • Termination of transcription in bacteria occurs when the RNA polymerase encounters specific DNA sequences that lead to stalling of the polymerase. This results in release of RNA polymerase from the DNA template strand, freeing the RNA transcript.
  • Eukaryotes have three different RNA polymerases. Eukaryotes also have monocistronic mRNA, each encoding only a single polypeptide.
  • Eukaryotic primary transcripts are processed in several ways, including the addition of a 5′ cap and a 3′-poly-A tail, as well as splicing, to generate a mature mRNA molecule that can be transported out of the nucleus and that is protected from degradation.

Multiple Choice

During which stage of bacterial transcription is the σ subunit of the RNA polymerase involved?

[reveal-answer q=�″]Show Answer[/reveal-answer]
[hidden-answer a=�″]Answer a. The σ subunit of the RNA polymerase involved in initiation.[/hidden-answer]

Which of the following components is involved in the initiation of transcription?

[reveal-answer q=�″]Show Answer[/reveal-answer]
[hidden-answer a=�″]Answer c. A promoter is involved in the initiation of transcription.[/hidden-answer]

Which of the following is not a function of the 5′ cap and 3′ poly-A tail of a mature eukaryotic mRNA molecule?

  1. to facilitate splicing
  2. to prevent mRNA degradation
  3. to aid export of the mature transcript to the cytoplasm
  4. to aid ribosome binding to the transcript

[reveal-answer q=�″]Show Answer[/reveal-answer]
[hidden-answer a=�″]Answer a. Facilitating splicing is not a function of the 5′ cap and 3′ poly-A tail.[/hidden-answer]

Mature mRNA from a eukaryote would contain each of these features except which of the following?

[reveal-answer q=�″]Show Answer[/reveal-answer]
[hidden-answer a=�″]Answer b. Mature mRNA from a eukaryote would not contain intron-encoded RNA.[/hidden-answer]

Fill in the Blank

A ________ mRNA is one that codes for multiple polypeptides.
[reveal-answer q=�″]Show Answer[/reveal-answer]
[hidden-answer a=�″]A polycistronic mRNA is one that codes for multiple polypeptides.[/hidden-answer]

The protein complex responsible for removing intron-encoded RNA sequences from primary transcripts in eukaryotes is called the ________.
[reveal-answer q=�″]Show Answer[/reveal-answer]
[hidden-answer a=�″]The protein complex responsible for removing intron-encoded RNA sequences from primary transcripts in eukaryotes is called the spliceosome.[/hidden-answer]

Think about It

  1. What is the purpose of RNA processing in eukaryotes? Why don’t prokaryotes require similar processing?
  2. Below is a DNA sequence. Envision that this is a section of a DNA molecule that has separated in preparation for transcription, so you are only seeing the antisense strand. Construct the mRNA sequence transcribed from this template.Antisense DNA strand: 3′-T A C T G A C T G A C G A T C-5′
  3. Predict the effect of an alteration in the sequence of nucleotides in the –35 region of a bacterial promoter.

Watch the video: From DNA to protein - 3D (January 2022).