Are longer and shorter DNA similarly charged?

Are longer and shorter DNA similarly charged?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

A longer DNA molecule would have more phosphate groups, so it should have a greater negative charge, right? It was taught in my class that only terminal ends of DNA are charged and all the phosphates in the middle are not charged.

My teacher said that this is the reason that Electrophoresis separates the fragments. All DNA have same charge and so same force on all of them. Then difference in masses would give different acceleration and thus separation of fragments according to size. But isn't Gel Electrophoresis more like a sieve where longer molecules move slowly and shorter pass through easily?

Each phosphate group has a single negative charge. Not just the terminals. The reason gel electrophoresis works so well with DNA is that charge is linearly proportional to size. The longer the fragment, the slower it moves. A double stranded DNA molecule with 100 nucleotides has a charge of -200. If it were twice as long, it would have twice as many, just as if it were 3.4 times longer, it would have 3.4 times as many negative charges. This way, charge is essentially ignored, so DNA migrates through the gel at a rate inversely proportional to its size.

Nucleotide content of the DNA molecule

DNA molecule which is GC-rich in other words major percentage of this DNA molecule is made up of Guanine and Cytosine pairs and in the other DNA molecule in which Guanine, Cytosine pairs are less a DNA molecule which is GC-rich DNA will have higher melting temperature.

This is because more heat energy is required to disrupt the stable base stacking interaction in this molecule. Thus the melting temperature of DNA is influenced by it's GC content. This can also be shown graphically (figure).

As per the above figure, you can see two melting curves. As we know higher the GC content of DNA, higher will be the melting temperature. So the first DNA sequence which is GC-rich and the second DNA sequence is not GC rich. The melting temperature at first DNA sequence is higher than the second DNA sequence.

Length of the DNA molecule

A longer molecule of double-stranded DNA requires more energy to get disrupted as compared to a shorter molecule. This is because longer the molecule greater the stabilizing forces between the two DNA strands more heat energy is required to dissociate the strands and hence higher will be the melting temperature.

Ionic strength of the DNA solution

The backbone of a DNA helix is made up of sugar and Phosphate. Each phosphate group in a DNA strand carries a negative charge. Thus overall each strand of DNA molecule carries a negative charge. The negative charges on both DNA strands will repel each other.

In eukaryotic cells proteins known as histones play important role in compaction of DNA within the nucleus of the cell. Histone proteins are rich in basic amino acids their positive charge helped in neutralizing the negative charges on DNA molecule.

In the laboratory the DNA molecules present in a solution are stabilized by adding positively charged ions, such as sodium (Na+). Being positively charged these ions bind to the sugar phosphate backbone and neutralize the negative charges on the phosphate groups. Thus DNA in a solution becomes stable ionic strength.

Suppose we have same DNA molecules in the given two DNA solutions. But the sodium chloride added in the first solution is 50 milli molar and in the second solution sodium chloride added is 100 millimolar.

Single-molecule sequencing technologies

When considering the properties of single-molecule sequencing technologies, the focus is most frequently on read length, error rate, and throughput (Figure 3) however, input sample quantity and quality requirements, simplicity and parallelizability of sample preparation, and data analysis are also important components that must be factored in when considering whether a technology, single-molecule or otherwise, is appropriate for a given problem. Some of the applications frequently undertaken with current sequencing technologies and the relative importance of various properties of different sequencing methods are shown in Table 1. Important properties of single-molecule technologies that relate to these various applications are discussed below.

The attributes of single-molecule sequencing technology. The current read counts and read lengths for single-molecule sequencing technologies are shown by the dots. Each technology is striving for improvements in their key attributes with the research aimed in the directions shown by the arrow.

Sequencing by synthesis

The first commercially available single-molecule sequencing system was developed by our colleagues at Helicos BioSciences [18]. In this system, individual molecules are hybridized to a flow cell surface containing covalently attached oligonucleotides. Fluorescently labeled nucleotides and a DNA polymerase are added sequentially and incorporation events detected by laser excitation and recording with a charge coupled device (CCD) camera. The fluorescent 'Virtual Terminator' nucleotide prevents the incorporation of any subsequent nucleotide until the nucleotide dye moiety is cleaved [19]. The images from each cycle are assembled to generate an overall set of sequence reads. On a standard run, 120 cycles of nucleotide addition and detection are carried out. Well over a billion molecules can be followed simultaneously in this approach. Because there are two 25-channel flow cells in a standard run, 50 different samples can be sequenced simultaneously, with the additional possibility of significantly greater throughput of samples through multiplexing. Sample requirements are the simplest of all technologies: sub-nanogram amounts are necessary and very poor quality DNA, including degraded or modified DNA, can be sequenced [20, 21]. Average read lengths are relatively short (about 35 nt) with raw individual nucleotide error rates currently about 3 to 5%, occurring randomly throughout the sequence reads and predominantly in the form of a 'dark base' or deletion error, which is accounted for in the alignment algorithm [22]. This error rate is not an issue when detecting polymorphisms because 30× coverage is typically used for diploid genomes with second generation systems to overcome the uneven coverage induced by amplification. Over-sampling is needed to overcome the stochastic nature of heterozygote detection, with 30× coverage advisable to ensure that nearly all heterozygotes are called correctly. At this coverage level, accurate consensus sequences are generated regardless of error rates within this range. Single-molecule systems have a much more even coverage and thus do not require as much depth for complete detection of heterozygotes. The even coverage relative to second generation systems was shown with ChIP experiments, in which sequence reads were relatively constant with respect to GC content with single-molecule sequencing, whereas significant deviations were observed at both high and low GC content with amplification-based sequencing [23] and with whole-genome sequencing of a human sample [24].

The Helicos Sequencer system can also sequence RNA molecules directly, thus avoiding the many artifacts associated with reverse transcriptase and providing unparalleled quantitative accuracy for RNA expression measurements [25]. The very high read count per sample allows precise expression measurements to be made with either RNA or cDNA [26–29], a feature not yet possible with other single-molecule technologies. Indeed, whole classes of RNA molecules that cannot be visualized using other technologies can be detected using a single-molecule approach [30, 31]. As with many single-molecule systems, repeated reads of the same molecule can markedly improve the error rate and also allow detection of very rare variants in a mixed sample. For example, a rare variant in a sample containing a mixture of few tumor cells among many normal cells might not be detectable with amplified DNA. With repeat sequencing of the same molecule, the error rate can be driven sufficiently low that mutations in heterogeneous samples such as tumors can be readily detected. Because of the minimal sample preparation needs, the ability to use exceptionally small starting quantities, and the high read count, this technology is ideal for quantitative applications such as ChIP, RNA expression, and copy number variation, and situations in which sample quantity is limiting or degraded [20, 23]. Standard, whole human genome resequencing is readily accomplished [24], but it is currently less expensive on second generation systems.

Pacific Biosciences has developed another sequencing-by-synthesis approach using fluorescently labeled nucleotides. In this system, DNA is constrained to a very small volume in a zero-mode wave guide [32] and the presence of a fluorescently labeled cognate nucleotide near the DNA polymerase is measured. The dimensions of the wave guide are so small that light can penetrate only the region very close to the edge, where the polymerase used for sequencing is constrained. Only nucleotides in that small volume near the polymerase can be illuminated and fluoresce for detection. Because the nucleotide that is being incorporated in the extending DNA strand spends a longer time near the polymerase, it can, to a large extent, be distinguished from non-cognate nucleotides. All four potential nucleotides are included in the reaction, each labeled with a different color fluorescent dye so that they can be distinguished from each other. Each nucleotide has a characteristic incorporation time that can further aid in improving base calls. Sequence reads of up to thousands of bases, longer than possible with second generation systems, are obtained in real time for each individual molecule [33–36]. However, the current throughput is less than 100,000 reads per run, so the overall sequence yield is much lower than second generation systems and the Helicos system. In addition, the raw error rate, currently 15 to 20% [37, 38], is significantly higher than with any other current sequencing technology, creating challenges in using the data for some applications, such as variant detection.

Much longer reads, referred to as 'strobe reads' [39], can be generated by turning off the laser for periods of time during sequencing, which prevents premature termination caused by laser-induced photodamage to the polymerase and nucleotides. If long reads are not necessary, the high raw error rate can be overcome by ligating a hairpin oligonucleotide to each end of the DNA, creating a circular template (called SMRTbell for single molecule real time), and then repeatedly sequencing the same molecule [37]. This procedure works when the molecules are relatively short but it cannot be used with long reads, so those retain the high raw error rate. Even with a high error rate, the very long reads can be productively used for joining sequence contigs. An additional benefit for this system is the ability to potentially detect modified bases. It is possible to detect 5-methylcytosine [40], although the role of sequence context and other factors in affecting the accuracy of such assignments remains to be clarified. In principle, direct RNA sequencing should also be possible with this system, but this has not been reported yet for natural RNA molecules because nucleotides bind repeatedly to the reverse transcriptase before nucleotide incorporation, thereby giving false signals with multiple insertions that prevent determination of a meaningful sequence. In addition, the low read count of this system will limit it to the identification of common mRNA isoforms rather than quantitative expression profiling or complete transcriptome coverage, both of which require a much higher read count than possible in the foreseeable future. In general, the long reads and short turnaround time make this system most useful for helping to assemble genomes, assessing the analysis of structural variation, haplotyping, metagenomics, and identification of splicing isoforms.

Life Technologies, a major provider of both first and second generation sequencing systems, is developing the fluorescence resonance energy transfer (FRET)-based single-molecule sequencing-by-synthesis technology initially introduced by Visigen [41]. Substantial advances have been made, with commercial release of the 'Starlight' system expected in the near future. The current technology consists of a quantum-dot-labeled polymerase that synthesizes DNA using four distinctly labeled nucleotides in a real-time system [42]. Quantum dots, which are fluorescent semiconducting nanoparticles, have an advantage over fluorescent dyes in that they are much brighter and less susceptible to bleaching, although they are also much larger and more susceptible to blinking. The genomic sample to be sequenced is ligated to a surface-attached oligonucleotide of defined sequence and then read by extension of a primer complementary to the surface oligonucleotide. When a fluorescently labeled nucleotide binds to the polymerase, it interacts with the quantum dot, causing an alteration in the fluorescence of both the nucleotide and the quantum dot. The quantum dot signal drops, whereas a signal from the dye-labeled phosphate on each nucleotide rises at a characteristic wavelength. The real-time sequence is captured for each extending primer. Because each sequence is bound to the surface, it can be reprimed and sequenced again for improved accuracy. It is not clear what the sequence specifications will be but its similarity to the Pacific Biosciences technology make that a likely reference point. If so, it will have the same strengths in terms of applications (genome assembly, structural variation, haplotyping, metagenomics) whereas potentially being challenged with quantitative applications requiring a high read count (such as ChIP or RNA expression).

Optical sequencing and mapping

There are other technologies that enable very long reads to be produced but at the cost of significantly lower throughput. For example, it is possible to adhere very long DNA molecules, up to hundreds of kilobases long, to surfaces and interrogate them for particular sequences by cutting them with various restriction enzymes or labeling them after treatment with sequence-specific nicking enzymes. The lengths of the examined molecules are dependent on the ability to handle such long DNA without mechanically shearing it. Complete restriction digests that allow ordering of sequence contigs have been generated for human and other genomes from collections of single molecules spanning entire genomes [43]. Highly repetitive and duplicated genomes, such as maize, are particularly difficult to assemble with traditional sequencing but have been successfully analyzed with this single-molecule system [44]. The restriction sites provide sequence landmarks on the DNA and thus long repeat regions and other intricate structural variations can be assigned in an unambiguous manner. Specialized applications such as genome-wide methylation mapping can also be undertaken [45].

Similarly, DNA molecules can be constrained to nanotubes and specifically labeled for viewing [46]. Single molecules of RNA have been visualized using scanning tip Raman spectroscopy [47]. In an alternative method also using adsorption of long DNA molecules to a surface, guanines could be distinguished from all other bases and the partial sequence read with a scanning electron microscope [48]. Possibilities for reading other bases through insertion of heavy atoms such as bromine or iodine on particular nucleotides have been suggested by ZS Genetics [49]. Although the low strand throughput and incomplete sequence reading are currently limiting, there is potential for reads that are hundreds of kilobases long, again limited primarily by the ability to handle the DNA without shearing it. Other technologies using direct reading of stretched DNA have been reviewed elsewhere [7]. These optical sequencing technologies provide a powerful view of genome structure, but they cannot provide the detailed sequence data or access to many other sequencing applications that require high read counts, such as gene expression measurements.


All of the sequencing techniques described so far require some kind of label on the DNA or nucleotide substrates to detect the individual base for sequencing. However, nanopore approaches generally do not require an exogenous label but rely instead on the electronic or chemical structure of the different nucleotides for discrimination. The advantages and potential means of using nanopores have been reviewed [14, 50]. Nanopores of greatest interest thus far include those assembled with solid-state systems constructed of materials such as carbon nanotubes or thin films [51–54] and the biologically based α-hemolysin [55–59] or MspA [60, 61]. These bacterial pore proteins have been extensively studied and engineered to optimize the detection of specific bases and the translocation rate of DNA through the pore. Although sequencing native DNA based on its natural properties would eliminate the labeling step and potentially allow very long reads with minimal sample preparation, thus reducing costs, the differences among nucleotides are very modest and their detection is compounded by difficulties in controlling the pace and directionality of the DNA through the nanopore. Specific detection and unidirectional flow are required for high accuracy sequencing.

A variety of methods have been used to slow the pace of DNA through nanopores, including attachment of polystyrene beads [53], salt concentrations [62], viscosity [63], magnetic fields [64], and the introduction of regions of double-stranded DNA on a single-stranded target [54, 58]. At the high translocation speeds typically found (potentially millions of bases per second), detecting a signal over background noise from each nucleotide can be a challenge, and this has been overcome in some cases by reading groups of nucleotides (such as by using hybridization of known sequences as is being developed by NabSys [53]) or encoding the original sequence in a more complex manner by converting the nucleotide sequence using a binary code of molecular beacons (as is being developed by NobleGen [65]). Maintaining a unidirectional flow of DNA has been enhanced by coupling an exonuclease to the process and reading the cleaved nucleotides (as developed by Oxford Nanopore [66]).

Although nanopore sequencing technologies continue to advance, simply showing the ability to sequence DNA, something not yet demonstrated by nanopores with natural DNA, is not sufficient. There needs to be a path to lower costs, longer reads, or higher accuracy relative to other technologies that will provide nanopores with a unique advantage relative to other methods. Even if reagent costs can be significantly reduced, sample preparation and informatic costs remain and these may become the dominant costs of sequencing and will vary depending on the technology being used. The ever-rising hurdles created by extant technology will not be easy to overcome. With the variety of second generation and single-molecule technologies already commercialized and others on the horizon, there will need to be substantial advances on many fronts to make these technologies commercially viable.

Biology Chapter 19

When chromosomes condense before mitosis or meiosis, the scaffold proteins and 30-nm fibers are folded into still larger and more tightly packed structures that ultimately lead to chromosomes that are visible during cell division

DNases are enzymes that cut DNA, but they can't cut efficiently if DNA is tightly wrapped with proteins, as it only works effectively if DNA is in a decondensed, or open, configuration

In mammals, the sequence recognized by these enzymes is a C next to a G in one strand of DNA - the sequence is abbreviated CpG

Methylated CpG sequences are recognized by proteins that trigger chromatin condensation

Addition of these groups provides condensed or decondensed chromatin depending on the specific set of modifications made to particular histones

The Histone Code Hypothesis postulates that particular combinations of histone modifications set the state of chromatin condensation for a particular gene

This has an important role in regulating transcription

Histone acetyltransferases (HATs) add acetyl groups to the positively charged lysine residues in histones

Histone Deacetylases (HDACs) remove them

Acetylation of histones usually results in decondensed chromatin, a state associated with active transcription

The added acetyl groups neutralize the positive charge on lysine residues and loosen the close association of the histones that make up the core of the nucleosome with the negatively charged DNA - also creates a binding site for other proteins that help open the chromatin

When HDACs remove acetyl groups from histones, this usually leads to condensed chromatin, a state associated with no transcription

Scientists engineer tunable DNA for electronics applications

DNA may be the blueprint of life, but it’s also a molecule made from just a few simple chemical building blocks. Among its properties is the ability to conduct an electrical charge, fueling an engineering race to develop novel, low-cost nanoelectronic devices.

Now, a team led by ASU Biodesign Institute researcher Nongjian "N.J." Tao and Duke theorist David Beratan has been able to understand and manipulate DNA to more finely tune the flow of electricity through it. The key findings, which can make DNA behave in different ways — cajoling electrons to smoothly flow like electricity through a metal wire, or hopping electrons about like the semiconductors materials that power our computers and cellphones — pave the way for an exciting new avenue of research advancements. Over short distances, electrons flow across DNA and spread fast like waves across a pond. Across longer distances, they behave more like particles and hopping takes effect. “Think of trying to get across a river,” explained Limin Xiang, a postdoctoral researcher in Biodesign Institute researcher Nongjian Tao’s lab. “You can either walk across quickly on a bridge or try to hop from one rock to another.” Download Full Image

The results, published in the online edition of Nature Chemistry, may provide a framework for engineering more stable and efficient DNA nanowires, and for understanding how DNA conductivity might be used to identify gene damage.

Building on a series of recent works, the team has been able to better understand the physical forces behind DNA’s affinity for electrons.

“We’ve been able to show theoretically and experimentally that we can make DNA tunable by changing the sequence of the ‘A, T, C, or G’ chemical bases, by varying its length, by stacking them in different ways and directions, or by bathing it in different watery environments,” said Tao, who directs the Biodesign Center for Biolectronics and Biosensors and is a professor in the Ira A. Fulton Schools of Engineering.

Along with Tao, the research team consisted of ASU colleagues, including lead co-author Limin Xiang and Yueqi Li, and Duke University’s Chaoren Liu, Peng Zheng and David Beratan.

Untapped potential

Every molecule or substance has its own unique attraction for electrons — the negatively charged particles that dance around every atom. Some molecules are selfish and hold onto or gain electrons at all costs, while others are far more generous, donating them more freely to others in need.

But in the chemistry of life, it takes two to tango. For every electron donor there is an acceptor. These different electron dance partners drive so-called redox reactions, providing energy for the majority of the basic chemical processes in our bodies.

For example, when we eat food, a single sugar molecule gets broken down to generate 24 electrons that go on to fuel our bodies. Every DNA molecule contains energy, known as a redox potential, measured in tenths of electron volts. This electrical potential is similarly generated in the outer membrane of every nerve cell, where neurotransmitters trigger electronic communication between the 100 trillion neurons that form our thoughts.

But here’s where the ability of DNA to conduct an electrical charge gets complicated. And it’s all because of the special properties of electrons — where they can behave like waves or particles due to the inherent weirdness of quantum mechanics.

Scientists have long disagreed over exactly how electrons travel along strands of DNA, said David N. Beratan, professor of chemistry at Duke University and leader of the Duke team.

“Think of trying to get across a river,” explained Limin Xiang, a postdoctoral researcher in Tao’s lab. “You can either walk across quickly on a bridge or try to hop from one rock to another. The electrons in DNA behave in similar ways as trying to get across the river, depending on the chemical information contained within the DNA.”

Previous findings by Tao (pictured left) showed that over short distances, the electrons flow across DNA by quantum tunneling that spread fast like waves across a pond. Across longer distances, they behave more like particles and the hopping takes effect.

This result was intriguing, said Duke graduate student and co-lead author Chaoren Liu, because electrons that travel in waves are essentially entering the “fast lane,” moving with more organization and efficiency than those that hop.

“In our studies, we first wanted to confirm that this wave-like behavior actually existed over longer distances,” Liu said. “And second, we wanted to understand the mechanism so that we could make this wave-like behavior stronger or extend it to longer distances.“

Flick of the switch

DNA strands are built like chains, with each link comprising one of four molecular bases whose sequence codes the genetic instructions for our cells. Like metal chains, DNA strands can easily change shape, bending, curling and wiggling around as they collide with other molecules around them.

All of this bending and wiggling can disrupt the ability of the electrons to travel like waves. Previously, it was believed that the electrons could only be shared over at most three bases.

Using computer simulations, the Beratan team found that certain sequences of bases could enhance the electron sharing, leading to wave-like behavior over long distances. In particular, they found that stacking alternating series of five guanine (G) bases created the best electrical conductivity.

The team theorizes that creating these blocks of G bases causes them to all “lock” together so the wave-like behavior of the electrons is less likely to be disrupted by the random motions of the DNA strand.

“We can think of the bases being effectively linked together so they all move as one. This helps the electron be shared within the blocks,” Liu said.

Next, the Tao group carried out conductivity experiments on short, six to 16 base strands of DNA, carrying alternating blocks of three to eight guanine bases. By tethering their test DNA between a pair of two gold electrodes, the team could flip on and control a small current to measure the amount of electrical charge flowing through the molecule.

They found that by varying a simple repeating “CxGx” pattern of DNA letters (x is the odd- or even-numbered G or C letters), there was an odd-even pattern in the ability of DNA to transport electrons. With an odd number, there was less resistance, and the electrons flowed faster and more freely (more wave-like) to blaze a path across the DNA.

They were able to exert precise molecular-level control and make the electrons hop (known as incoherent transport, the type found in most semiconductors) or flow faster (coherent transport, the type found in metals) based on variations in the DNA sequence pattern.

The experimental work confirmed the predictions of the theory.

Information charge

The results shed light on a long-standing controversy over the exact nature of electron transport in DNA and might provide insight into the design of DNA nanoelectrics and the role of electron transport in biological systems, Beratan said.

In addition to practical DNA-based electronic applications (for which the group has filed several patents), one of the more intriguing aspects is relating their work — done with short simple stretches of DNA — back to the complex biology of DNA thriving inside of every cell.

Of upmost importance to survival is maintaining the fidelity of DNA to pass along an exact copy of the DNA sequence every time a cell divides. Despite many redundant protection mechanisms in the cell, sometimes things go awry, causing disease. For example, absorbing too much UV light can mutate DNA and trigger skin cancer.

One of the DNA chemical letters, “G,” is the most susceptible to oxidative damage by losing an electron (think of rusting iron — a result of a similar oxidation process). Xiang points out that long stretches of Gs are also found on the ends of every chromosome, maintained by a special enzyme known as telomerase. Shortening of these G stretches has been associated with aging.

But for now, the research team has solved the riddle of how the DNA information influences the electrical charge.

“This theoretical framework shows us that the exact sequence of the DNA helps dictate whether electrons might travel like particles, and when they might travel like waves,” Beratan said. “You could say we are engineering the wave-like personality of the electron.”

The research was funded by a multimillion-dollar, multi-institute project under the support of the Department of Defense’s Multidisciplinary University Research Initiative (MURI) program, aimed at aiding high-priority basic science that could lead to innovative advancements.

CITATION: "Engineering nanometer-scale coherence in soft matter," Chaoren Liu, Yuqi Zhang, Peng Zhang, David N. Beratan, Limin Xiang, Yueqi Li, Nongjian Tao. Nature Chemistry, June 20, 2016. DOI: 10.1038/nchem.2545

Basic Techniques to Manipulate Genetic Material (DNA and RNA)

To understand the basic techniques used to work with nucleic acids, remember that nucleic acids are macromolecules made of nucleotides (a sugar, a phosphate, and a nitrogenous base) linked by phosphodiester bonds. The phosphate groups on these molecules each have a net negative charge. An entire set of DNA molecules in the nucleus is called the genome. DNA has two complementary strands linked by hydrogen bonds between the paired bases. Exposure to high temperatures (DNA denaturation) can separate the two strands and cooling can reanneal them. The DNA polymerase enzyme can replicate the DNA. Unlike DNA, which is located in the eukaryotic cells’ nucleus, RNA molecules leave the nucleus. The most common type of RNA that researchers analyze is the messenger RNA (mRNA) because it represents the protein-coding genes that are actively expressed. However, RNA molecules present some other challenges to analysis, as they are often less stable than DNA.

DNA and RNA Extraction

To study or manipulate nucleic acids, one must first isolate or extract the DNA or RNA from the cells. Researchers use various techniques to extract different types of DNA (Figure 2). Most nucleic acid extraction techniques involve steps to break open the cell and use enzymatic reactions to destroy all macromolecules that are not desired (such as unwanted molecule degradation and separation from the DNA sample). A lysis buffer (a solution which is mostly a detergent) breaks cells. Note that lysis means “to split”. These enzymes break apart lipid molecules in the cell membranes and nuclear membranes. Enzymes such as proteases that break down proteins inactivate macromolecules, and ribonucleases (RNAses) that break down RNA. Using alcohol precipitates the DNA. Human genomic DNA is usually visible as a gelatinous, white mass. One can store the DNA samples frozen at –80°C for several years.

Figure 2: This diagram shows the basic method of DNA extraction.

Scientists perform RNA analysis to study gene expression patterns in cells. RNA is naturally very unstable because RNAses are commonly present in nature and very difficult to inactivate. Similar to DNA, RNA extraction involves using various buffers and enzymes to inactivate macromolecules and preserve the RNA.

Gel Electrophoresis

Because nucleic acids are negatively charged ions at neutral or basic pH in an aqueous environment, an electric field can mobilize them. Gel electrophoresis is a technique that scientists use to separate molecules on the basis of size, using this charge. One can separate the nucleic acids as whole chromosomes or fragments. The nucleic acids load into a slot near the semisolid, porous gel matrix’s negative electrode, and pulled toward the positive electrode at the gel’s opposite end. Smaller molecules move through the gel’s pores faster than larger molecules. This difference in the migration rate separates the fragments on the basis of size. There are molecular weight standard samples that researchers can run alongside the molecules to provide a size comparison. We can observe nucleic acids in a gel matrix using various fluorescent or colored dyes. Distinct nucleic acid fragments appear as bands at specific distances from the gel’s top (the negative electrode end) on the basis of their size (Figure 3). A mixture of genomic DNA fragments of varying sizes appear as a long smear whereas, uncut genomic DNA is usually too large to run through the gel and forms a single large band at the gel’s top.

Figure 3: a) Shown are DNA fragments from seven samples run on a gel, stained with a fluorescent dye, and viewed under UV light and b) a researcher from International Rice Research Institute, reviewing DNA profiles using UV light. (credit: a: James Jacob, Tompkins Cortland Community College b: International Rice Research Institute)

Nucleic Acid Fragment Amplification by Polymerase Chain Reaction

Although genomic DNA is visible to the naked eye when it is extracted in bulk, DNA analysis often requires focusing on one or more specific genome regions. Polymerase chain reaction (PCR) is a technique that scientists use to amplify specific DNA regions for further analysis (Figure 4). Researchers use PCR for many purposes in laboratories, such as cloning gene fragments to analyze genetic diseases, identifying contaminant foreign DNA in a sample, and amplifying DNA for sequencing. More practical applications include determining paternity and detecting genetic diseases.

Figure 4: Scientists use polymerase chain reaction, or PCR, to amplify a specific DNA sequence. Primers—short pieces of DNA complementary to each end of the target sequence combine with genomic DNA, Taq polymerase, and deoxynucleotides. Taq polymerase is a DNA polymerase isolated from the thermostable bacterium Thermus aquaticus that is able to withstand the high temperatures that scientists use in PCR. Thermus aquaticus grows in the Lower Geyser Basin of Yellowstone National Park. Reverse transcriptase PCR (RT-PCR) is similar to PCR, but cDNA is made from an RNA template before PCR begins.

DNA fragments can also be amplified from an RNA template in a process called reverse transcriptase PCR (RT-PCR) . The first step is to recreate the original DNA template strand (called cDNA) by applying DNA nucleotides to the mRNA. This process is called reverse transcription. This requires the presence of an enzyme called reverse transcriptase. After the cDNA is made, regular PCR can be used to amplify it.

Which of the following statements about gel electrophoresis is correct?a. Longer DNA fragments migrate farther than shorter fragmentsb. Migration distance is inversely proportional to the fragment sizec. Positively charged DNA migrates more rapidly than negatively charged DNAd. None of these statements are true

Migration distance is inversely proportional to the fragment size.

The gel electrophoresis is the process of separation of the DNA fragments after the action of restriction endonuclease action on enzyme. The gel electrophoresis is carried out after coating the DNA fragments with suitable salts and placing them on the side of the electrophoresis table.

The electrophoresis is carried through agar agar gel which has a network inside it. The smaller DNA fragments will pass more through the gel whereas the larger fragments will be left behind.

at the beginning of the run, DNA of all lengths are relatively close together. as time goes on the difference in the rate of migration of fragments of different length causes them to separate.

B) All DNA molecules will migrate up the gel toward the positive electrode.

Because the DNA has a negative charge - regardless the size-, all DNA molecules will migrate towards the positive electrode, at the top of the gel. This is due to the Coulomb's physic law "force of the interaction between the charges is attractive if the charges have opposite signs".

Instead of migrate down the gel , the DNA will do migrate up the gel (opposite direction).

C. The DNA fragments travel at different speeds through the gel.

Finish the lyric: Turn around, look at what you s.

(Sorry I can't answer your real question, I'm just trying to lift everyone's spirits in quarantine)

Migration distance is inversely proportional to the fragment size,

Gel electrophoresis is a technique used in molecular biology that uses electricity to separate biological molecules based on size (DNA) or charge (proteins). For the DNA molecule, different sizes are separated from one another based on how fast they can migrate through the gel matrix.

In this technique, small fragments or sizes of DNA migrates farther than long sizes of DNA due to low friction in the matrix. In other words, the smaller the size of the DNA fragment, the farther the migration distance and vice versa. This shows that an INVERSE RELATIONSHIP exists between the migration distance and the fragment size in the electrophoresis procedure.

Are longer and shorter DNA similarly charged? - Biology

Hybidization Technology

Denaturing Nucleic Acids

In opposition to these stabilizing interactions is the electrostatic repulsion of
the charged sugar-phosphate backbone.

There are two basic approaches to denaturing double-stranded DNA
- heating and chemical treatiment.

Chemical denaturants can be divided into three classes

Before leaving the topic of DNA denaturation, lets look a little more closely at
Heat Denaturation.
Consider what happens when we heat a nucleic acid solution - say the E. coli genome. To prepare the DNA for this experiment, we shear it up into small pieces (approximately 500 bp long) and heat it slowly while monitoring the A260.

The initial A260 is stable until, over an interval of approximately 5 degrees C, the A260 suddenly increases by approximately 40%.

This increase in absorbance is referred to as the
Hyperchromic Shift.

The hyperchromic shift is due to the melting of the double helix into two single strands. The increased rotational freedom of the N-bases on strand separation accounts for the observed increase in absorbance.

The melting temperature , or Tm , is the temperature at the midpoint of the hyperchormic shift as shown to the left.

Three main factors affect the melting temperature.

The GC content of the nucleic acid sample.
This is due to the fact that AT base pairs share 2 H-bonds while GC base pairs share 3 H-bonds.

Tm is sensitive to Na + concentration.
Na + acts to shield the negative charges of the sugar-phosophate backbone from interacting with one another. The repulsion between the negatively charged phosphate backbones is the major force destabilizing the double helix, therefore increasing Na + concentration increases helix stability and decreasing Na + concentration decreases helix stability.

DNA hybrid length
The longer the DNA hybrid is, the more H-bonds there are holding the two strands together. The longer the hybrid, the more H-bonds that must be simultaneously broken for the two strands to separate.
This is known as the 'zipper effect' after the (in)famous Canadian inventor Zippy. For our purposes we will only consider the two extremes of the zipper effect. For this course we will only consider the extemes of hybrid length - hybrids less than 50 bp (short) and those around 500 bp (long) in length.

Lecture 2: Biology Background, First and Second-generation sequencing

Scribed by Claire Margolis and revised by the course staff


In this lecture, we discuss how the sequencing process works for certain mainstream technologies. We first introduce some biological background. We then introduce and discuss two main sequencing technologies: Sanger (first generation sequencing technology) and Illumina (second generation sequencing technology).

Basics of DNA

The human genome is the entire DNA sequence of a human individual. Human DNA comes in 23 pairs of chromosomes, and each pair contains one chromosome inherited from the mother and one inherited from the father, yielding 46 chromosomes total. 22 of the pairs are autosomal chromosomes, and the last pair are the sex chromosomes. Every cell in an organism contains the same exact genomic data living in the cell’s nucleus. In humans, the genome is 3 billion base pairs (bp) long. Different species have genomes of very different sizes. Bacterial genomes are a few million bp most viral genomes are 10000s of bp and certain plants have genomes of that are hundreds of billion bp long. There are two types of cells: prokaryotic (no nucleus and found in organisms like bacteria) and eukaryotic (contains a nucleus and found in higher organisms like humans). While understanding the human genome is important, the techniques of this class are broadly applicable to other organisms.

Across humans, genomes are about 99.8% similar. Out of the 3 billion base pairs, individual genomes vary at 3-4 million base pair locations. These variations are captured in Single Nucleotide Polymorphisms (SNPs), though there are some large variations called Structural Variants (SVs). Differences in the individual genomes arise due to two reasons:

  1. Random mutations, which occur during evolution because natural selection favors certain phenotypes. These arise mainly due to “errors” during the DNA replication process during cell division. Most of these mutations are deleterious, leading to phenotypic changes that are harmful and resulting in the death of the cell. Occasionally, natural selection favors certain mutations, and these are preserved in the population.
  2. Recombination, which occurs during reproduction in high organisms like mammals. During recombination, the genetic material passed by the parent organisms to their child is a mixture of genetic material from the parents.

DNA structure

DNA is comprised of a sugar-phosphate backbone and four nucleotide bases: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). DNA is double-stranded and structured in a double-helix formation with pairs of nucleotides as “rungs” of the helix (hence the term “base pair”). Adenine always chemically binds with Thymine, and Cytosine always binds with Guanine. In other words, A is complementary to T, and similarly C is complementary to G. The A-T and C-G pairs are known as complementary pairs. The structure of DNA is shown below.

A DNA sequence is conventionally written in the 5’ end (head) to the 3’ end (tail) direction. When we write a DNA strand, we only write the letters representing the bases from one of the strands. The other strand, which is the reverse complement of the first strand, can be inferred because we know the complementary pairs. To get the reverse complement, we reverse the order of the nucleotides in the original string and then complement the nucleotides (i.e. interchange A with T and C with G). The figure below shows an example of a DNA fragment and its reverse complement strand.

DNA replication

DNA lies at the foundation of cell replication. When a cell undergoes cell division, also known as mitosis, the DNA in its nucleus is replicated and through a series of steps shown in the figure below, one parent cell yields two identical daughter cells.

Several biomolecules are involved during mitosis, and we give a heavily simplified explanation of the mitotic process here. In the figure, we start with two chromosomes: red and blue. First, the DNA is replicated, resulting in the more familiar X-shaped chromosomes. Through a complex cascade of biomolecular signals and within-cell restructuring, the (now-replicated) chromosomes are lined up in the middle of the cell. For each chromosome, the halves are pulled apart, and each of the two daughter cells receives a copy of the original chromosome. This results in two daughter cells that are genetically identical to the original parent cell. For us, DNA duplication is the most important part of this diagram this is the natural process we exploit in order to do sequencing.

During DNA replication, the two strands of DNA are first unzipped, resulting in two single strands each acting as a template for replication. A short RNA primer is then attached to a specific site on the DNA the bases in the primer are complementary to the bases in the site. An enzyme facilitates (or “catalyzes”) a chemical reaction, and DNA polymerase is the enzyme that catalyzes the complementary pairing of new nucleotides to the template DNA extending the bound primer. The nucleotides that DNA polymerase uses to extend a strand are called dNTPs (deoxynucleotide triphosphates). Biochemically, they are slightly different from the nucleotides in a way that makes them easier to work with during DNA replication. The dNTPs corresponding to A, C, G, and T are dATP, dCTP, dGTP, and dTTP, respectively. The DNA replication is illustrated below.

Sanger sequencing

The first technique used to get reads from DNA was a process called Sanger sequencing, which is based on the idea of sequencing by synthesis. Fred Sanger won his second Nobel prize for the invention of Sanger sequencing in 1977. Sanger sequencing was the main technology used to sequence genomic data until the mid 2000’s when the technology was replaced by second-generation generation sequencing technologies. The two sequencing techniques are related because they both use the sequencing by synthesis technique however, second-generation sequencing massively parallelizes Sanger sequencing, resulting in a gain of roughly 6 orders of magnitude in terms of cost and speed.

We look at sequencing from a computational point of view, and we need to understand the technology a bit in order to motivate what we do. In the following, we try to answer the following 3 questions.

  1. How do we get 6 orders of magnitude improvement between Sanger sequencing and second-generation sequencing?
  2. How are errors introduced? All measurements have errors, and the reasons why these errors exist depend on the technology.
  3. Why is the read length limited? One of the biggest computational challenges of sequencing is that although the sequence of interest is very long (> 1M bp), the data we get is very short (

Sequencing by Synthesis

Sequencing by synthesis takes advantage of the fact that DNA strands, which are normally in double-helix form, split apart for mitosis and each strand is copied. Sanger figured out a clever way of converting the sequencing problem into a problem of measuring mass.

We mentioned above that DNA polymerase naturally uses dNTPs to synthesize a new strand. The synthesis process occurs very quickly, making it hard to make any sort of measurement during synthesis. Sanger overcame this problem by figuring out a way to terminate synthesis using a modified version of dNTPs called ddNTPs (dideoxynucleotide triphosphates). DNA polymerase can attach a ddNTP to the sequence just like with dNTPs, but it cannot attach anything to the ddNTP. In other words, the attachment of a ddNTP halts the replication of the DNA molecule.

We will denote ddNTPs corresponding to A, C, G, and T as A*, C*, G*, and T*. By introducing a small amount of one type of ddNTP into the experiment (e.g. T*), when the reactions finish, we are left with: 1. small percentages of strands containing T*s at locations corresponding to A’s in the template, and 2. a large fraction of strands containing only normal dNTPs. This procedure is known as the chain termination method. We now describe Sanger’s sequencing procedure:

We first replicate the sequence using a technique called polymerase chain reaction (PCR), which also takes advantage of DNA replication to exponentially increase the amount of DNA. For our purposes, we will assume that after running cycles of PCR, we obtain times the original amount of the molecule. PCR dramatically increases the amount of biological material.

We break apart the two strands by heating up the sample. One of the single strands will be used as the template strand or the strand to which new bases will be attached.

We add a template strand of DNA to a test tube along with free-floating dNTPs and a few modified ddNTPs (1% of the nucleotides). All ddNTPs are of the same type. We also add a primer or a short sequence that attaches to the beginning of the strand of interest and starts the whole replication process.

We filter out sequences that end in ddNTPs using a technique called gel electrophoresis. This method exploits the fact that the DNA molecule has a charge. By putting the DNA sample in a gel and inducing an electric field over the gel, we can separate strands of different masses (larger strands move slower).

We measure the mass of isolated strands. This can be done by either radioactively labeling nucleotides and measuring the level or radioactivity or by adding florescent tags to the nucleotides and measuring the strength of the light emitted (i.e. take a picture).

The figure below illustrates a simple example showing the process of Sanger sequencing.

We combine these to get the sequence

30.0 48.2 56.7 86.3
61.3 99.3

Merging these 4 sorted lists gives us the underlying sequence. In the example we get

30.0 - A
48.2 - C
56.7 - G
61.3 - A
74.4 - A
86.3 - T
99.3 - C

giving us the sequence to be ACGAATC.

Limitations of Sanger sequencing

Sanger sequencing works for sequences below roughly 700 bp in length. This read limitation stems from the fact that as the length of a sequence increases, distinguishing between the mass of a length sequence and the mass of a length sequence becomes increasingly harder. To see this, note that a tolerance of 0.1% in measurement would make it impossible to distinguish a sequence of length 1000 from one of length 1001 even if all bases had the same molecular weight. Such errors in measuring mass are also a reason for errors in Sanger sequencing, though the error rate is around 0.001%.

Additionally, Sanger sequencing is slow (low-throughput) because the mass measuring process is time consuming. Sanger sequencing allowed scientists to sequence around 3000 bases per week. One of the main reasons that the procedure is slow is because it requires measuring the mass of many molecules, a costly process. The equipment used for Sanger sequencing is shown below

Illumina sequencing

Second-generation sequencing, pioneered by Illumina, makes a few modifications to the Sanger process shown above. The sequencing procedure also massively parallelizes the process, dramatically increasing the throughput while decreasing the price.

Illumina achieves parallelization by running several synthesis experiments at once. Each of many template strands is anchored on a chip, and only ddNTPs with florescent tags are available during the synthesis procedure (no dNTPs). Each type of ddNTP is tagged such that it emits a different wavelength or color. Since ddNTPs halt synthesis, the synthesis of new strands are synced. All new strands are the same length at the end of each synthesis cycle, at which point a picture of the chip is taken. These pictures are then analyzed by “base caller” software to identify (or “call”) the complementary nucleotides. Base calling will be discussed in greater detail next lecture. To override the chain termination, Illumina sequencing uses reversible termination. The sequencing process introduces an enzyme which can turn a ddNTP into a regular dNTP after it has bound, allowing the synthesis reactions to continue instead of being permanently halted.

In order to guarantee that enough light is emitted such that ddNTP signals are detectable, each of the template strands are cloned, resulting in clusters of the same strand being synthesized in unison. Because of reversible termination, Illumina sequencing removes the need to measure masses. In contrast to the gel electrophoresis procedure required for Sanger sequencing above, the figure below shows a glass slide used during Illumina sequencing. Illumina sequencing can sequence billions of template strands simultaneously, which greatly increases the throughput.

Errors in Illumina sequencing arise due to time steps where no ddNTP attaches to some sequence and hence the same base is read twice. Additionally, dNTPs still exist in solution, and therefore occasionally a dNTP rather than a ddNTP may be attached to a strand being synthesized. The DNA polymerase then continues synthesis until it adds a different ddNTP. For this reason, although all strands within each cluster are identical, the photograph may be noisy.

The Sanger sequencing figure is due to Claire Margolis. The DNA replication figure is taken from Alberts B, Johnson A, Lewis J, et al, Molecular Biology of the Cell. 4th edition. The rest are taken from Ben Langmead’s notes.

Lydell Grant

DNA test frees man serving life sentence for Houston murder, leads to new arrest.

New DNA testing in a 2010 Houston murder case has led to the exoneration of one man — after nine years behind bars — and the arrest of another.

Lydell Grant, 43, was found guilty of stabbing 28-year-old Aaron Scheerhoorn to death in the vicinity of a nightclub on the sworn testimony of multiple witnesses, according to reports. He was cleared last month — and released from prison — after DNA obtained from Sheerhorn’s fingernails was tested, using new technology. He was serving a life sentence.

On Thursday, Jermarico Carter, 41, was charged with the murder after the same DNA linked him to the crime. Investigators got a match to his DNA using an FBI database containing the DNA of convicted criminals, Fox 26 Houston reported Sunday.

“On behalf of the Houston Police Department, I want to extend an apology to Mr. Grant and his family as they have waited for justice all these years,” Houston Police Chief Art Acevedo said Friday on Twitter. Acevedo said Carter “has recently confessed to his role in Mr. Scheerhoorn’s killing.” Carter was in custody in Georgia on unrelated charges, the chief said.

Grant said Saturday that he wasn’t mad at Grant for sitting in jail for a crime he committed, Fox 26 reported. “I’m not mad at him at all,” he said, according to the station. “I forgive him because he know now what he did.”

The station reported that as of Saturday Grant no longer had to wear an ankle monitor or abide by a curfew. Houston prosecutors said they will move for Grant’s formal exoneration before the Texas Court of Criminal Appeals. Grant’s lawyer Mike Ware of the Innocence Project of Texas was quoted by The Associated Press as saying that he believes erroneous witness identifications based on outdated and flawed techniques used by detectives helped to wrongly convict his client.



  1. Hid

    It is a pity that I cannot express myself now - there is no free time. But I will be released - I will definitely write that I think.

  2. Malami

    You're right.

  3. Eghan

    What abstract thinking

  4. Vohn

    Definitely a great answer

  5. Ladon

    Thank you so much for posting it in good quality ....... I've been waiting so much ......

Write a message