Lecture 22: Genomics, Proteomics and Metabolomics - Biology

Lecture 22: Genomics, Proteomics and Metabolomics


Ever since Alexander Fleming accidentally discovered penicillin, there has been a constant warfare between the pharmaceutical companies and antibiotics resistance bacteria. During the last seven decades, the pharmaceutical companies have been developing novel antibiotics, and the microbes have been developing resistance against them. The pharmaceutical firms have been winning till very recently, but not anymore! That is the reason why in 2015, President Obama announced a national strategy to combat bacterial resistance. And its not surprising that last year the Nobel Prize in medicine went to three researchers working on antibiotic and antimalarial compounds.

Challenge of Antibiotic Sequencing

The way research is conducted in the field of antibiotics discovery has not changed much in the last 50 years. According to Thomas Frieden, the director of the Centers for Disease Control and Prevention, “The medicine cabinet is empty for some patients. It is the end of the road for antibiotics unless we act urgently.” We believe the main bottleneck in antibiotics discovery is the lack of computational techniques to streamline the discovery process. In this documents I discuss several open problems in computational antibiotic discovery for computer scientists and statisticians. Before getting to the open problems, I need to give a brief background.

Automated antibiotics discovery relies on the techniques from genomics, proteomics and metabolomics (Figure 1). Genomics studies focus on the static DNA sequences of an organisms, while proteomics and metabolomics focus on the dynamic protein and metabolite products of the genome. While the computational techniques in genomics and proteomics have been extensively studies in the last two decades, metabolomics is a novel field that has barely been touched by computer scientists and statisticians.

Figure 1. In addition to more traditional techniques from genomics, proteomics and metagenomics, antibiotic discovery relies on techniques from metabolomics.

The Central Dogma of Biology

The Central Dogma of Biology states that the DNA is transcribed to RNA by RNA-polymerase, and the RNA is translated to proteins by ribosome. In 1958, Edward Tatum made an extra-ordinary observation. When Tatum inhibited the ribosome in a microbe called Bacillus brevis, the microbe continued to produce a short peptide. But how can a microbe produce peptides without any ribosome? It turned out that the central dogma was not the only pathway for production of peptides. There is a large family of peptides called non-ribosomal peptides (NRPs) that can be produced with no ribosome.

Non-Ribosomal Peptides

NRPs include some of the most effective antibiotics in the history of medicine, including Penicillin. But how these peptides are synthesized? It turned out that NRPs are produced by gigantic modular enzymes called non-ribosomal peptide synthetase (NRPS).

Figure 2. (a) Each A-domain incorporates a single amino acid into the backbone of the NRP. The last domain usually cyclize the NRP. (b) The NRP code postulates that a specific substring of length 8 of the A-domain defines the amino acid loaded by the domain.

NRP code

Production of a NRP of length 10 amino acids requires an NRPS with 10 modules (called A-domains) and a length of roughly 100,000 amino acids. NRPs are so vital for microbes that some of the microorganisms assign 20-30% of their genome just for production of these peptides.
All A-domains are evolutionarily related and its possible to align their sequences. In 1999 Mohammed Marahiel discovered that very similar to the genetic code that translates each three nucleic bases to a protein, there is a non-ribosomal code that governs what amino acid each A-domains incorporate into the backbone of NRP. The NRP code postulates that a specific substring of length 8 of the A-domain defines the amino acid loaded by the domain. (Figure 2). In 2005, supervised learning based tools were developed to predict the structure of an NRP from NRPS.

The Bottleneck of supervised learning

The supervised learning methods for mining antibiotics genes have contributed greatly to the field during the last decade. However, these methods suffer from the lack of labelled training data. As a result, the top predictions of supervised learning methods is often incorrect, specially for exotic amino acids for which we do not have a lot of training data (Figure 3). It is possible to use metabolomics data to correct errors from these supervised learning techniques.

Figure 3. Top three supervised learning predictions for A-domains of a NRP with length 8. The top prediction is incorrect for four out of eight residues.

The traditional approach to antibiotics discovery

The traditional approach to antibiotics discovery is time consuming and expensive. Moreover, this process often results in rediscovery of known antibiotics, rather than novel ones. Our goal is to bring techniques from computer sciences and statistics to automate the process of natural product discovery.


The main technologies in metabolomics are mass spectrometry (MS) and nuclear magnetic resonance (NMR). Mass Spectrometry has the advantage of being fast, inexpensive, and it can work on complex samples (taken from environment or host) and very small (picograms) of material, while NMR is very accurate (Figure 4). Our goal is to combine mass spectrometry with genomics to make it accurate.

Figure 4. The main technologies in metabolomics are mass spectrometry (MS) and nuclear magnetic resonance (NMR).

Mass Spectrometry

A peptide antibiotic can be modeled as a graph, where each node is an amino acid and each edge is a peptide bond. Mass spectrometry cuts this graph from all possible bridges and 2-cuts, and measure the mass of each connected component from cuts, which is the sum of the masses of all its amino acids. These masses (called mass spectra) are given in no specific order, and we want to identify the masses of all amino acids in peptide antibiotic graph from mass spectra (Figure 5).

Figure 5. A peptide antibiotic can be modeled as a graph, where each node is an amino acid and each edge is a peptide bond. Mass spectrometry cuts this graph from all possible bridges and 2-cuts, and measure the mass of each connected component, which is the sum of the masses of all its amino acids. These masses (called mass spectra) are given in no specific order, and we want to identify the masses of all amino acids in peptide antibiotic graph from mass spectra.

Relationships to Turnpike problem

The problem of sequencing peptide from mass spectrometry resembles the classic turnpike and beltway problems in computer sciences. In the turnpike problem, we have a set of exits in a highway, and we are given the pairwise distances between these exits. The goal is to reconstruct the position of exits on the highway from these pairwise distances. To make the problem even more difficult, in real world applications some of the pairwise distances are missing, and we also have noise (Figure 6).

The turnpike problem is shown to be an NP-hard problem in noisy case. Our goal is to use genome mining results as a cheat sheet to solve a special case of the turnpike problem.

Figure 6. In turnpike problem, we want to reconstruct the consecutive distance of exits in a highway from pairwise distances.

Antibiotics Discovery from complex samples

The first antibiotics penicillin, was accidentally discovered from a rotten melon. Since then, majority of the antibiotics have been discovered from microbes living in environmental natural samples. However, most of the environmental samples are complex, with thousands of microbial strains living together. Metagenomics data collected on these complex samples tell us what microbes are living there, and metabolomics data tells what chemicals these microbes produce. Our goal is to use Big Data from metabolomics and metagenomics to characterize antibiotics from complex samples.

Genomics and Metagenomics Assembly

Genomics and metagenomics reads from next generation sequencing are given as short reads (100 to 300 base pairs) from an alphabet of A, G, C and T. Assembling genomes and metagenomes from billions of overlapping DNA substrings resembles solving puzzles from error-prone pieces. In the error-free case, this problem is equivalent to finding Eulerian path in a giant de bruijn graph (Figure 7).

Figure 7. Assembling genomes and metagenomes from billions of overlapping DNA substrings resembles solving puzzles from error-prone pieces. In the error-free case, this problem is equivalent to finding Eulerian path in a giant de bruijn graph.

Antibiotics Discovery: metabolomics meet metagenomics

Antibiotics discovery by metagenomics and metabolomics starts from collecting environmental and host-associated samples. These samples can be collected from untouched areas of earth (e.g. sponges and lichens in the oceans) or human body sites. Human microbiomes contains microbes that produce antibiotics that are shown to kill human pathogens, and this makes them potential drug candidates.

The next step is to collect metagenomics and mass spectrometry data on the samples. Metagenomics data is assembled, and the metagenomics assembly in mined using the supervised learning methods introduced in section NRP code to identify NRP synthetases. During the last decade, there has been extensive research on the questions of “What antibiotics are encoded by these NRP synthetases”, and “What antibiotics are encoded by these mass spectra”. The metabologenomics approach to antibiotics discovery answers both these questions simultaneously by building a bridge between mass spectrometry and metagenomics (Figure 8).

Figure 8. Metabolomics meets metagenomics.

The metabologenomics bridge

Now we have short reads (100-300 base pair reads of A,C,G &T) from metagenomics and mass spectra (vectors of masses), and we want to identify the structures of all antibiotics in these samples (Figure 9). After finding NRP genes, all possible candidate antibiotics structures are constructed from these genes. Note that we can have multiple candidates for each NRP gene due to ambiguity of NRP code. Then, the predicted mass spectra are constructed for each candidate NRP structure by cutting them from all the possible bridges and 2-cuts. Note that this procedure results in of predicted spectra, majority of them corresponding to infeasible antibiotics structures. Then predicted spectra is compared against the mass spectra (billions of spectra) to find similarities and detect feasible antibiotics structures. Because pairwise comparison of datasets of size billion is impractical, we need to come up with more efficient methods for this task, discussed in section Open Problems : Big Data. Moreover, each comparison between the predicted spectra and mass spectra should be done in an error tolerant manner, due to the noise in metabolomics and metagenomics data. This challenge is discussed in section Open Problems : error tolerant matching. Finally, we need to asses the statistical significance of results, which is discussed in Open Problems : statistical validation of the results.

Figure 9. Metabologenomics pipeline

Open Problems

In this section, I propose several open problems in the area of antibiotics discovery by Omics Big Data. These problems are independent and self-contained.

Open Problems : Matching Omics Big Data

In its simplest form, a mass spectrum/predicted spectrum is modeled by a binary vector with 1s at positions where the corresponding mass is present, and 0 when it is absent. Modern mass spectrometers have a mass range of 0-2000Da and accuracy of 0.01Da, resulting in sparse vectors of size 200,000, with roughly 100 ones (Figure 10).

The shared-peak count score of two spectra is defined as their dot product, and the problem of matching metabolomics data to metagenomics data amounts to finding all pairs of vectors from a given set of vectors with dot products higher than a threshold. The brute force approach has a complexity of O(N2) for datasets of size N, making it intractable for billions of spectra.

It turned out that a very similar problem emerged in gene expression for finding genes that are co-expressed under similar conditions, and in document clustering for finding documents that share a lot of words.

Biclustering of a data matrix refers to simultaneous clustering of it rows and columns into sub-matrices of similar behavior and has been applied to gene expression (genes and conditions), marketing (customers and product) and document-word clustering (documents and words).

The goal here is to develop an efficient pipeline for matching billions of mass spectra against billions of predicted spectra.

Figure 10. The problem of matching metabolomics data to metagenomics data amounts to finding all pairs of vectors from a given set of vectors with dot products higher than a threshold.

Open Problems : Error Tolerant Matching

Machine learning predictions of antibiotic structure from metagenomes are error-prone, especially for the less studied microbial strains for which smaller training datasets are available. We attempt to error correct metagenomics predictions from metabolomics data. This problem resembles the problem of demodulating digital data from an analog signal corrupted by noise in a communication channel. Since Viterbi decoding provides the optimal solution to the demodulation problem, it can be applied to connecting metabolomics data to metagenomics data while allowing for variations between the two (Figure 11).

Figure 11. Viterbi decoding provides the optimal solution for error tolerant matching of metabolomics data to metagenomic data

Open Problems : Matching Graphs

In practice, it is impossible to assemble metagenomics short reads into complete microbial genomics. Modern assemblers congregate short reads into assembly graphs. Complex metagenomics samples contain thousands of antibiotics genes each corresponding to a subgraph of the assembly graph. Our goal is to align thousands of graphs to billions of mass spectra. This is a generalization of both multiple sequence alignment, and biclustering (Figure 12).

Open Problems : Statistical validation of results

When searching billions of mass spectra against metagenomes with billions of base pairs, one needs to compute the statistical significance of matches between the metabolomics and metagenomics data, and filter out wrong matches based on the statistical significance (Figure 13). P-value of a match between metabolomics and metagenomics data is defined as the portion of randomly generated data with scores exceeding match score.

The naive Monte Carlo approach is computationally intractable for p-value below 10-6. The problem of computing very small p-value of match scores between metabolomics data and metagenomic data is equivalent to finding the probability of visiting a rare state in a markov chain, and importance sampling based methods can be utilized for solving it.

Figure 12. Antibiotics genes are subgraphs of the assembly graph. Each path in an antibiotic gene subgraph corresponds to a feasible antibiotic structure. When linking metabolomics data to metagenomics data, we need to align billions of mass spectra against thousands of antibiotics gene subgraphs.

Open Problems : Learning fragmentation pattern in metabolomics by graphical models

Given a chemical structure, how can we predict its mass spectra? Given a mass spectrum, how can we identify which chemical structure does it correspond to? These are open problems that we are trying to solve using large scale pairs of mass spectra and chemical structures.

The training data is given as pairs of sets that correspond to matching predicted spectra and mass spectra. These sets can be represented in binary form. The goal is to learn the structure and the parameters of a bipartite graphical model from the training data (Figure 14).

Figure 13. P-value of a match between metabolomics and metagenomics data is defined as the portion of randomly generated data with scores exceeding match score.

Figure 14. (a) The training data is given as pairs of sets that correspond to matching predicted spectra and mass spectra, respectively. (b) These sets can be represented in binary form (c) The goal is to learn the structure and the parameters of the bipartite graphical model that describes the dependencies between the predicted spectra and the mass spectra.

Open Problems: Automated hypothesis generation

In recent years, researchers have developed large scale mass spectrometry and metagenomics data from environmental and host-oriented samples. One example is a 3D Cartography of human skin, where paired data are available from different parts of skin. Metagenomics tells us about what microbes are living on human skin, and metabolomics tells us about the chemicals they produce.

This pattern of co-occurrence of metabolomics and metagenomics features can be used for generating hypotheses about the relationships between microbes and chemicals. For example, if a metabolomics feature corresponding to chemical Y is present only at the intersection of occurrence patterns of a metabolomics feature corresponding to chemical X, and a metagenomics feature corresponding to microbe A, then one can hypothesize that microbe A might convert chemical X to chemical Y (Figure 15). The statistical significance of such hypotheses should be computed to avoid false hypotheses generation.

Figure 15. Automated hypotheses generation.

Contact Us

We're currently offline. Send us an email and we'll get back to you, asap.

MLSB 2009 - Ljubljana

Molecular biology and all the biomedical sciences are undergoing a true revolution as a result of the emergence and growing impact of a series of new disciplines/tools sharing the “-omics” suffix in their name. These include in particular genomics, transcriptomics, proteomics and metabolomics, devoted respectively to the examination of the entire systems of genes, transcripts, proteins and metabolites present in a given cell or tissue type.

The availability of these new, highly effective tools for biological exploration is dramatically changing the way one performs research in at least two respects. First, the amount of available experimental data is not a limiting factor any more on the contrary, there is a plethora of it. Given the research question, the challenge has shifted towards identifying the relevant pieces of information and making sense out of it (a “data mining” issue). Second, rather than focus on components in isolation, we can now try to understand how biological systems behave as a result of the integration and interaction between the individual components that one can now monitor simultaneously (so called “systems biology”).

Taking advantage of this wealth of “genomic” information has become a ‘conditio sine qua non’ for whoever ambitions to remain competitive in molecular biology and in the biomedical sciences in general. Machine learning naturally appears as one of the main drivers of progress in this context, where most of the targets of interest deal with complex structured objects: sequences, 2D and 3D structures or interaction networks. At the same time bioinformatics and systems biology have already induced significant new developments of general interest in machine learning, for example in the context of learning with structured data, graph inference, semi-supervised learning, system identification, and novel combinations of optimization and learning algorithms.

The aim of this workshop is to contribute to the cross-fertilization between the research in machine learning methods and their applications to systems biology (i.e., complex biological and medical questions) by bringing together method developers and experimentalists.

The Workshop is organized as "core - event" of Patterns Analysis, Statistical Modelling and Computational Learning - Network of Excellence 2 (PASCAL 2).

EMBL external research community survey

Scientists working together on a mass spectrometer at the EMBL Metabolomics Core Facility. Credit: Kinga Lubowiecka/EMBL

If you have accessed EMBL experimental services at one or more of our facilities to support the conduct of your research, we would like to hear from you.

EMBL experimental services include:

  • EMBL Hamburg: the beamlines, macromolecular X-ray crystallography or sample preparation and characterisation services.
  • EMBL Grenoble: the MX or bioSAXS beamlines at the ESRF owned by the ESRF and jointly operated by the ESRF-EMBL Joint Structural Biology Group, or the Cryo electron microscope (CM01) at ESRF operated by the European Photon and Neutron campus partners.
  • EMBL Heidelberg: the microscopy, chemical biology, genomics, proteomics, metabolomics or protein expression and purification services.
  • EMBL Rome: the flow cytometry, microscopy, genetic and viral engineering or gene editing and embryology services.
  • EMBL Barcelona: the selective plane illumination microscopy (SPIM)/light-sheet microscopy or optical projection tomography services.

Please take the time to fill out the survey – your participation is extremely important to inform the assessment.

The survey will remain open until 30 June 2021 and should take no longer than 20 minutes to complete.

We appreciate your time and input, and please feel free to share this survey with other users.

Best wishes,
Edith Heard
Director General
European Molecular Biology Laboratory

Introduction to metabolomics

Course parameters:

: English
Level of course
: PhD course (with participation of a few master students)
: April 2021
No. of contact hours/hours in total incl. preparation, assignment(s) or the like:

3 days of lectures and exercises (24 h)
Preparation by reading book chapters, articles and preparing data for analysis (20 h)
Writing report (16 h)
Capacity limits: max. 30, min 8 participants. PhD students have highest priority. The teaching will be carried out as online teaching through Blackboard and Zoom

Objectives of the course:

The purpose of the course is to give an up-to-date introduction to the basic methods in metabolomics. The course will cover nuclear magnetic resonance (NMR) spectroscopy and gas and liquid chromatography mass spectrometry (GC/LC-MS) methodologies used in metabolomics, as well as design of experiments.

Learning outcomes and competences:

At the end of the course, the student should be able to:

  • Arrange the place of metabolomics in context of genomics, proteomics, *omics.
  • Compare and contrast the methods for a metabolomics study considering the benefits and the pitfalls of the methods.
  • Describe and explain how to prepare samples from different foods and bio-fluids for metabolomic analyses.
  • Evaluate and apply the appropriate experimental design in a metabolomics study.
  • Describe appropriate design of experiments for metabolomic analyses.
  • Explain the principles of metabolite detection and identification in LC-MS, GC-MS and NMR.
  • Outline applications of metabolomics to your field of research.
  • Apply multivariate data analysis methods relevant for metabolomic data.
  • Critically evaluate metabolomic data described in scientific literature and describe your own results in a scientific way.

Compulsory programme:
Attendance for a minimum of 80% of the theoretical and practical lessons is required to obtain the course diploma. Approved report.

Course contents:

Metabolism is the set of chemical reactions that occur in living organisms in order to maintain life and metabolites is the end products of these chemical reactions. Metabolomics is the systematic study of metabolite profiles in any biological sample and tissue. Set in a systems biology context, the metabolomics is the most recent member of the omics-family and it is complementary to genomics and proteomics.

The course gives an introduction to the use of metabolomics and its methods. The objectives are to give participants knowledge of advanced NMR and MS techniques, including hands-on experience with processing, analysis and identification of metabolites in such data.

Theoretical and practical aspects of these methods, including design of experiments, sample preparation, data acquisition, processing and further analysis will be discussed. The course will cover NMR- and MS-based metabolomics applied to various types of samples – meat and vegetable extracts, cells, milk, urine, blood, and feces – will be used as examples.


Enrolled in a science based PhD programme. Master students can participate as a part their master project in agreement with the supervisor.

Name of lecturer:
Assistant professor Ulrik Kræmer Sundekilde, Department of Food Science, Aarhus University, Denmark.

Type of course/teaching methods:
Lectures and exercises.

Selected chapters and papers will be announced on BlackBoard. You will receive a folder with information and printouts from the lectures at the beginning of the course.

Course assessment:
Passed/not passed assessment based on the written report considering the learning outcomes.


Department of Food Science.

Special comments on this course:

The course is organized in combination with the PhD course ‘Introduction to multivariate data analysis’ and it is encouraged to follow both courses, although the courses can be taken individually if necessary.

  • 12. April: Lectures
  • 13. April: Lectures
  • 15. April: Lectures
  • 16. April: Workshop: Working with own data
  • 23. April: Deadline for handing in report
  • 28. April: Examination seminar (peer-feedback, teacher-feedback)
  • 8. April: Lectures
  • 9. April: Lectures
  • 16. April: Data preparation, workshop
  • 23. April: Deadline for handing in report
  • 29. April: Notice of assessment

Online on Blackboard and Zoom.

Deadline for registration is 21 March 2021. Information regarding admission will be sent out no later than two workdays after registration deadline.

For registration: Opens later.

If you have any questions, please contact Ulrik Sundekilde, e-mail:

Metabolomics only (covering bread/coffee/fruit and folder):

PhD students and master students enrolled at Danish Universities and AU staff: 0 DKK

Chemometrics only (covering bread/coffee/fruit, folder and LatentiX license):

PhD students and master students enrolled at Danish Universities and AU staff: 750 DKK

Metabolomics and Chemometrics (covering bread/coffee/fruit, folder and LatentiX license):

PhD students and master students enrolled at Danish Universities and AU staff: 750 DKK

The International Conference on Genome Informatics (formerly known as Genome Informatics Workshop or GIW) is the longest running international bioinformatics conference.

The ABACBS Annual Conference is the national bioinformatics conference in Australia and now in its fifth year.

This year these two conferences will join forces to bring together members from all over Australia, the Asia-Pacific region and the world to enjoy the opportunity to interact and hear fantastic stories about genomics, bioinformatics and computational biology research with an emphasis on how advances in computational and statistical techniques are applied to solve important biological and biomedical problems.

The joint conference will feature international and national keynote speakers, oral presentations selected from full-length paper submissions and abstract, and posters selected from abstract.

We look forward to seeing you at GIW/ABACBS-2019!

Genomics and integrative analytics

Non-human, agricultural, environmental and microbial genomics

Proteomics and metabolomics

Methods development and reproducibility research

Charles Perkins Centre, University of Sydney, Sydney, Australia

The Machine Learning Applications Using Amazon Cloud Workshop has been cancelled and is now replaced by the Single Cell Rna-Seq Data Analysis On The Cloud Workshop. Please see here for more information.

Authors’ contributions

The article is based on the invited lecture given by HNK at ISAFG2015. HNK conceived the overall framework of this review article and assisted in the writing of the initial draft. PS and LK wrote the initial draft. HNK made significant contributions with the figures. All three authors mutually shared discussions and worked equally. All authors read and approved the final manuscript.


PS was funded by a Grant from the EU-FP7 Marie Curie Actions Grant (CIG-293511) and LJAK was funded by a Grant from the Danish Innovation Fund for the BioChild Project (Grant Number 0603-00457B and Project website: HNK, as a project leader and Grant holder, thanks EU-FP7 Marie Curie Actions Grant and Danish Innovation Fund. This paper is part of the collection ‘ISAFG2015’ (6th International Symposium on Animal Functional Genomics, 27–29 July 2015, Piacenza, Italy). The publication of the papers in this collection was partly sponsored by OECD Co-operative Research Programme: Biological Resource Management for Sustainable Agricultural Systems (CRP). HNK’s participation in ISAFG2015 was financed by OECD Co-operative Research Programme. The opinions expressed and arguments employed in this paper are the sole responsibility of the authors and do not necessarily reflect those of the OECD or of the governments of its Member countries.

Competing interests

The authors declare that they have no competing interests.

BCMB3X92 Proteomics and Func Genomics Complete Course Notes (Distinction)

These BCMB3X92 notes include my raw personal lecture notes, laboratory examples, and textbook summaries.

I received a final mark of 80 (Distinction).

Course information
+-- Contact
+-- Lectures
+-- Amino acids and their structures
Lecture 01
Lecture 02
+-- Systems biology
+-- +-- Techniques/technolgies
Lecture 03
+-- Sequence search
+-- Sequence motifs
+-- The big why?
Lecture 04
+-- 2D gels
+-- +-- 2D gel prep
+-- +-- Charge dimension: IEF
+-- +-- Mass dimension: Acrylamide
+-- +-- Choosing a stain
+-- +-- After running the gel
+-- 2d-gel approaches
+-- +-- Difference in-gel eelctrophoresis (DIGE)
+-- +-- Radiolabelling
+-- Summary
Lecture 05
+-- Mass spectrometry
+-- +-- Ionisation systems
+-- +-- Mass analyser
+-- +-- Detection
+-- Interpreting the data
+-- Time of flight
+-- Summary
Lecture 06
+-- Peptide mass fingerprinting (PMF)
+-- Micro-reverse phase columns
+-- Summary
Lecture 07
+-- Quadrupole (Q) MS
+-- Tandem MS
+-- +-- Fragmentation
+-- +-- Selective reaction monitoring (SRM)
+-- Quadrupole time of flight
+-- Summary
Lecture 08
+-- Ion traps
+-- Revisiting 2D gels
+-- Shotgun proteomics
+-- +-- Separating the peptides
Lecture 09
+-- Protein identification with Shotgun
+-- Protein quantification with Shotgun
+-- +-- Relative quantification
+-- +-- Absolute quantification
Lecture 10
+-- Isobaric affinity tags
Lecture 11
+-- The SILAC mouse
+-- +-- Summary
+-- Other 'omes
+-- +-- Stimulomes
+-- +-- Regulomes
+-- +-- Phenomes
Lecture 12
+-- More 'omes
+-- +-- Surfaceomes
+-- +-- Secretomes
Lecture 13
Lecture 14
+-- Post translational modifications (PTM)
Lecture 15
+-- Phosphorylation
+-- Phosphoproteomics
+-- +-- 2D gels
+-- +-- Characterisation of phosphopeptides
+-- +-- Enrichment of phosphopeptides
+-- MS$^3$
Lecture 16
Lecture 17
+-- Structure
+-- +-- N-linked glycosylation
+-- +-- O-linked glycosylation
+-- Function
+-- Analytical
+-- +-- Characterisation of glycopeptides by MS
+-- +-- Enrichment techniques
+-- +-- Ion traps
+-- +-- Information from spectra
Lecture 18
Supplementary notes
+-- Properties of glycobiology
+-- Biosynthesis of N-glycans
+-- +-- Processing
+-- Mucin
+-- Sialic acids
+-- Major functions of glycosylation
Lecture 19
+-- Redox proteomics and other PTMs
+-- +-- Oxidation and oxidative stress
+-- +-- Post-translational cleavage
+-- +-- Deamidation
Lecture 20
Lecture 21
+-- Proteoforms
+-- Top-down proteomics
+-- +-- Whole protein analysis
Lecture 22
+-- Diagnostic proteomics
+-- +-- Serum proteomics
+-- Profiling using SELDI-TOF
+-- MS tissue profiling / imaging
+-- Summary
Lecture 23
+-- Methods for validating proteomics results
+-- +-- Western blotting
+-- +-- Transcript levels
+-- +-- Immunohistochemistry
+-- +-- The human protein atlas
+-- +-- Selective reaction monitoring (SRM)
+-- +-- Metabolomics
+-- Summary
Lecture 24
+-- Metabolomics
Lecture 25
+-- Protein-protein interactions (PPI)
+-- Purification / enrichment
+-- +-- Yeast 2-Hybrid
+-- +-- Immunoprecipitation
+-- +-- Native PAGE
+-- +-- 2D Blue native and SDS-PAGE
+-- +-- Tandem affinity purification (TAP)
+-- Limitations of PPI investigations
+-- Summary
Lecture 26
+-- Exam


▪ Abstract The genome sequences of important model systems are available and the focus is now shifting to large-scale experiments enabled by this data. Following in the footsteps of genomics, we have functional genomics, proteomics, and even metabolomics, roughly paralleling the biological hierarchy of the transcription, translation, and production of small molecules. Proteomics is initially concerned with determining the structure, expression, localization, biochemical activity, interactions, and cellular roles of as many proteins as possible. There has been great progress owing to novel instrumentation, experimental strategies, and bioinformatics methods. The area of protein-protein interactions has been especially fruitful. First pass interaction maps of some model organisms exist, and the proteins in many important organelles are about to be determined. Researchers are also beginning to integrate large-scale data sets from various “omics” disciplines in targeted investigations of specific biomedical areas and in pursuit of a general framework for systems biology.


Cardiovascular calcification is an insidious form of ectopic tissue mineralization that presents as a frequent comorbidity of atherosclerosis, aortic valve stenosis, diabetes, renal failure, and chronic inflammation. Calcification of the vasculature and heart valves contributes to mortality in these diseases. An inability to clinically image or detect early microcalcification coupled with an utter lack of pharmaceutical therapies capable of inhibiting or regressing entrenched and detectable macrocalcification has led to a prominent and deadly gap in care for a growing portion of our rapidly aging population. Recognition of this mounting concern has arisen over the past decade and led to a series of revolutionary works that has begun to pull back the curtain on the pathogenesis, mechanistic basis, and causative drivers of cardiovascular calcification. Central to this progress is the discovery that calcifying extracellular vesicles act as active precursors of cardiovascular microcalcification in diverse vascular beds. More recently, the omics revolution has resulted in the collection and quantification of vast amounts of molecular-level data. As the field has become poised to leverage these resources for drug discovery, new means of deriving relevant biological insights from these rich and complex datasets have come into focus through the careful application of systems biology and network medicine approaches. As we look onward toward the next decade, we envision a growing need to standardize approaches to study this complex and multifaceted clinical problem and expect that a push to translate mechanistic findings into therapeutics will begin to finally provide relief for those impacted by this disease.


In light of the Covid-19 pandemic the University has revised its courses to incorporate the ‘Hybrid Learning Experience’ in a departure from previous academic years and previously published information. The University has changed the delivery (and in some cases the content) of its programmes. Further information on the general principles of hybrid learning can be found at: Hybrid learning experience | University of Surrey.

We have updated key module information regarding the pattern of assessment and overall student workload to inform student module choices. We are currently working on bringing remaining published information up to date to reflect current practice in time for the start of the academic year 2021/22.

This means that some information within the programme and module catalogue will be subject to change. Current students are invited to contact their Programme Leader or Academic Hive with any questions relating to the information available.

Systems Biology is widely accepted as a major future direction of biological research. The ethos of Systems Biology is to generate, analyse and integrate multiple data sets for understanding and modelling a biological system. We want to know the components (molecules) of the system, how they work/interact together, and, ideally, have some quantitation: the abundance of a particular component and/or the rates of action/interaction. Due to technological advances within molecular biology, we are now able to obtain quantitative information about molecules within a biological system on both small and large scales.

The purpose of this module is to introduce students to the basic concepts of Systems Biology. The module includes subjects relevant to prokaryotic and eukaryotic systems and is thus suitable for all bioscience students. Learning methods include: lectures, seminars, computational practical sessions, article discussion, workshops and research and problem solving during both lectures and computer-based investigations.

Watch the video: Genomics Proteomics and metabolomics add picture (January 2022).