|
|
 | | From: | Peter Van Osta | | Subject: | Human Cytome Project - Update 24 Jan. 2005 | | Date: | Mon, 24 Jan 2005 09:38:24 +0100 |
|
|
 | A Human Cytome Project - an idea
Hi,
As the on-line version of my article on the Human Cytome Project and the application of cytomics in medicine and drug discovery (pharmaceutical research) evolves, I put the updated version in this newsgroup for reference. The original "question" on a Human Cytome Project was posted in this newsgroup on Monday 1 December 2003.
Original on-line version:
http://ourworld.compuserve.com/homepages/pvosta/humcyt.htm
A Human Cytome Project - an idea
Introduction
The completion of the Human Genome Project holds many promises for the understanding of the genetics of man and the involvement of genes in human diseases. However the use of this information has to be viewed from another perspective as is currently being done, if we want to use this knowledge to improve medicine more efficiently. Predicting the dynamics of the cell and its fate in diseases from the genome upwards is likely to fail due to the complexity of metabolic processing and environmental influences on the cellular metabolism and the entire organism.
The clinical reality of disease processes extends beyond the present-day disease models and the (current) boundaries of scientific development. When we close the doors of our labs behind us and as physicians are confronted with the clinical reality of diseases in the outside world, our disease models fail all too often, as we can witness in the diagnosis and treatment of complex diseases. This is also painfully obvious in the dramatically high attrition rates during clinical development of new drugs.
When the endpoint of research is not only an experiment in a laboratory, but to have an impact on the clinical reality of everyday pathological processes, we fail to deliver in more than 80 to 90 percent of all drugs being developed. Reality extends beyond the frontiers of science. Outside the boundaries of scientific knowledge, significant parts of (biological/clinical) reality remain un-explained for and not well understood.
Drug discovery has to come up with drugs which can stand the test of clinical reality, but is being squeezed between the failing (theoretical) disease models and the demands for success of pharmaceutical companies and society. Applied research has to provide the step stones to cross the river from basic theoretical disease models to clinical reality, ideally without getting our feet wet or drowning before we reach the other side of the river.
How do we close the gap from model to clinic? The functional correlation between genome structure and clinically expressed disease is too low to lead to functional predictions from the genome and even proteome level upwards, without taking into account the spatial and temporal dynamics of cells, organs and organisms. Pathological processes have to be viewed from another organizational level of biology in order to capture the dynamics of in-vivo processes involved in diseases.
The current bottom-up view on genomic and proteomic research suffers from a correlation and prediction deficit in relation to the entire organism. The genome and proteome are the omega of biological research, not the alpha of drug discovery or disease treatment. From disease to gene we may find a link, but turning around and go back to develop a treatment for the clinical disease fails in many cases. To discover the involvement of a gene or protein in a disease, does not predict the potential for successful development of a treatment for the clinical disease entity as such.
The extraction of the appropriate attributes of a biological process in health and/or disease requires capturing the spatial and temporal dynamics of its manifestations at multiple scales and dimensions of biological organization. Disease entities express themselves in a space-time continuum in which their physical and chemical attributes evolve in a highly dynamic way. Capturing the appropriate features and disease describing parameters from the background noise of their surrounding processes and structures is more difficult than finding a needle in a haystack.
On Monday 1 December 2003 I posted a message about the idea of a Human Cytome Project (HCP) to the bionet.cellbiol newsgroup (Van Osta P, 2003). It seems that it was the right moment to ask the question, as there were already ideas emerging on the role of the cell as the final arbiter in the production of metabolic products and also the concept of predictive medicine by cytomics (Valet G, 2003).
The idea of a Human Cytome Project is already being discussed at scientific conferences (FOM 2004, ISLH 2004, ISAC XXII, EWGCCA 2004 …). At Focus on Microscopy (FOM) in Philadelphia on Wednesday afternoon, 7 April 2004, the idea of a Human Cytome Project was for the first time discussed at a scientific meeting. A round table discussion was held at the European Microscopy Congress (EMC) and already articles start to appear on the idea (Valet G, 2004; Valet G, 2004b; Valet G, 2004c). As the idea of a Human Cytome Project seems to have generated some interest in the scientific community, I decided to put the original message and question on my personal website for reference, so here it is. Monday, 1 December 2003 10:57:46 +0100 Hi,
I was wondering if there is already something going on to set up a sort of "Human Cytome Project”? In my opinion the hardware and most of the software seems to be available to set up such a project? For the cellular level, light-microscopy based reader technology would be very interesting to use?
Studying and mapping the genome, transcriptome and proteome at the organizational level of the cell for various cell types and organ models could provide us with a lot of information of what actually goes on in organisms in the spatio-spectro-temporal space?
I have been thinking (working) about a concept which could provide the basic framework for exploring and managing this cellular level of biological organization research on a large scale, but I would like to know if there is already some thought/work going on in the direction of setting up an initiative such as a "Human Cytome Project" ?
This is just an idea, so I am really interested to hear if there is something in it, or even if it is not worth while what I just wrote.
Best regards,
Peter Van Osta.
The path which lead to the idea of a Human Cytome Project
I will give a bit more background to the path which for me has lead to the idea that something of a Human Cytome Project might be feasible. The idea for large scale screening of the dynamics of the (living) cell came when I visited the Sanger Center in the UK in 2001 and was shown a big room filled with DNA-sequencers. From then on I wanted to create a system which could mean for cell-based research what DNA-sequencing had meant for Human Genome research.
However I did not want to create a catalog of the cytome, but to allow for the functional exploration of the cell in order to capture and describe the dynamics of cellular processes and not only create a catalog of its components. The multidimensional world of the cell requires a higher-dimensional approach than the linear world of DNA and also a different inner- and outer resolution is needed for each level of biological integration. It became clear to me that the cellular level is the lowest level of biological organization close enough to the complex dynamics of a disease process. Only a high correlation to the disease process itself allows a model to be used as a valid disease model.
Today powerful techniques to explore the cytome are available, such as flow cytometry (Edwards B.S., 2004) and advanced digital microscopy (Price J. H., 2003; Tsien R, 2003), which enables the exploration of the cellular function and phenotype. There are now exciting technological developments going on in what is called High Content Screening which will allow us to explore cellular systems on a large scale (Taylor DL, 2001; Giuliano KA, 2003). These developments and other technological advances made me feel confident that the exploration of the human cytome would be feasible. We should be able to open the door to the cell wide open to look at cellular structure and dynamics better than we do now by just looking through the keyhole. My personal interest and research
I myself wanted to know if a system to explore cells on a very large scale could be implemented and would work. Managing the flow of data from physics to features is the centerpiece of such as system. I wanted to transform the space-time continuum of biological processes in cells into their digital representations on a truly massive scale. Once a process is represented in a digital state it becomes accessible to quantitative content extraction and analysis.
As technologies evolve, it should be easy to exchange components of a system or expand it with new technologies. The system should therefore be modular and scalable, the core of the system should be of a different design than the interface to the outside world and they should evolve separately, only linked to each other for the exchange of information. The concept should allow for up-scaling the system for processing massive amounts of high-dimensional data.
The core has to be able to deal with multidimensional spaces and datasets and manage the dataflow between modules, each module dealing with a part of the entire process, from acquisition and detection to data generation. From center to periphery, the system becomes increasingly machine and technology related, while the core is only a data-transfer module unaware of technical or physical constraints. Each machine which becomes connected to the core enables to explore a subset of a physical (3D, spectral, temporal) space and informs the core about its capacities and restrictions.
A device attached to the system as such should allow for the exploration of a part of this spatio-spectro-temporal continuum. Devices differ in their sampling of the electromagnetic spectrum (LM, EM, CT, NMR …), the spatial scale at which they can operate (nm, microns, mm …) and their temporal resolution (nsec, msec, sec, min …). A given device has an inner and outer spatial, spectral and temporal resolution limit. All (imaging) devices generate pixel or voxel density profiles which can be used for (semi-) quantitative exploration. A given input data point represents a spatial, spectral and temporal sampling of the spatio-spectro-temporal continuum.
The physical dimensions of the high-dimensional space and the meaning of each pixel/voxel are only relevant for the quantification module as the detection module only deals with “density” patterns in a 5D space. Anisotropy in spatial, temporal and spectral sampling are only accounted for at the periphery of the system, as they have an impact on the quantification of objects. Each dimension (XYZ, spectral, temporal) is regarded as a continuum, sampled at discrete intervals, each with its own inner and outer resolution.
The system design allows for distributed operation, so a system could run on different platforms and interact with components over a network. It should use open standards for its communication with the outside world to allow for easy integration in a heterogeneous environment (XML, CORBA …). The output of the system should be a set of linked feature hyperspaces, each describing structural and functional aspects of the individual cell and its components. The data output must be in a format which can easily be parsed and fed into data analysis and visualization systems.
Since 2001 I have been thinking about, and working on, the design of such a scalable system, of which the first version of the M5 framework is now operational and it allows me to study its practical use in more detail (Van Osta P., 2004). This core is being built into a system for the exploration of cells, tissues and model organisms by using a microscopy based reader.
The roots and predecessor of my own work
The predecessor of this system and a source of inspiration dates back to the late eighties and early nineties of the twentieth century (Geerts H, 1987; Ver Donck L, 1992; Cornelissen F, 1993; Geerts, H, 1992; Geusebroek J.M., 2000; Van Osta P, 2002).
This use of digital microscopy in drug discovery originated from Nanovid microscopy long ago (De Mey J., 1981; De Brabander M., 1986; De Brabander M, 1986b; Geuens G, 1986; Geerts H, 1987; De Brabander M, 1989; Geerts H., 1991). Nanovid microscopy itself had its origin in the study of microtubules (De Mey J., 1976; De Brabander M., 1977). Automated Calcium (Ca2+) ratio imaging was used for studying the effect of drugs on isolated cardiomyocytes. This research dates back to halfway the eighties of the twentieth century (Borgers M, 1985; Ver Donck L, 1986; Ver Donck L, 1987; Borgers M, 1988; Ver Donck L, 1988; Ver Donck L, 1990; Geerts H, 1989; Olbrich HG, 1991; Ver Donck L, 1991; Ver Donck L, 1992; Cornelissen F, 1993; Ver Donck L, 1993; Cornelussen RN, 1996).
Drug discovery research by using cellular disease models with automated microscopy based systems was done in this environment for many years, before it became fashionable in the outside world (Geerts H, 1989; Ver Donck L, 1992; Cornelissen F, 1993; Nuydens R, 1993; Nuydens R, 1995; Nuydens R, 1995b; Geerts H, 1996; Nuydens R, 1998).
Why a Human Cytome Project?
Human Genome Project
The Human Genome Project (Lander ES, 2003; Venter JC, 2003) has set a new milestone in medicine and the understanding of human biology (Guttmacher, A., 2002; Guttmacher, A., 2003). Since its conception in 1986, it has answered many questions, but it has also left us with more questions to answer and it opened new horizons for exploration (Dulbecco R., 1986; Collins F., 2003). The results of the Human Genome Project lead to a first estimate that there are only about 34,000 genes in the human genome and by the end of 2003 the number was reduced to some 25,000 genes (Claverie J.-M., 2001; Wright F. A., 2001; Pennisi E., 2003). Now at the end of 2004 the euchromatic sequence of the human genome is complete, the number of genes is estimated to be about 20,000 to 25,000 (Collins FS, 2004).
The Caenorhabditis (C. elegans) genome is comprised of over 18,000 genes. The fruit fly (D. melanogaster) genome consists of about 13,000 genes and as such it has fewer genes than C. elegans, although as an organism it is far more complex. Gene number alone does not predict functional complexity. Although there is much more variation in the sizes of the genomes, this is not reflected in the number of genes.
The functional uncoupling of the dynamics of cellular function to its genomic gene-count came as a shock. The complexity and diversity of organisms is not reflected in the structural complexity of their genomes alone, but to a large extent it is hidden in the dynamics of gene expression and cellular processing. As there is no linear relation between the complexity of an organism and the physical structure of its genome, there is also no one-on-one relation between the phenotype of an organism and its genome. Relatively small differences between organisms, such as man and chimpanzee do result in large functional differences in gene processing and functional expression.
The structural relatedness of the human and chimpanzee genome, does not explain the large difference in brain function for which gene expression profiles in the brain are a better predictive instrument (Caceres M, 2003; Uddin M, 2004). Functional differences between chimpanzee and man are more outspoken in the brain than in other organs. Gene expression differences are more related to cerebral physiology and function in humans than gene sequences. Epigenetic phenomena within individual cells and differential processing in different cell types have more predictive power than the piecemeal and one-dimensional gene sequence approach, when applied on complex structures such as the brain (Wilson KE, 2004). From single gene and genome to the entire cell
Now we are starting to use the information coming out of the Human Genome Project, people start to understand that the dynamics of the cell and its fate in disease processes cannot simply be explained from its individual genes, genome or its proteome. Although all cells in the human body share the same genome, there is considerable heterogeneity in their phenotype and dynamics. Structural information alone or information from too low an organizational level cannot sufficiently predict higher-order phenomena as it does not sufficiently take into account interactions at higher organizational levels and influences from outside the low-level organizational unit. Cells have come up with compensation mechanisms to maintain their structural and functional integrity in the face of perturbations and uncertainty (Stelling J, 2004). Organisms are capable of buffering genetic variation (Hartman JL 4th, 2001). Genetic buffering mechanisms modify the genotype-phenotype relationship by concealing the effects of genetic and environmental variation on phenotype (Rutherford SL., 2000).
So if the structure of the genome alone cannot explain the differences between species, disease processes and the dynamics of the cell, where does our functional complexity and interspecies differences come from? How do we continue in the post-genome era to study the dynamics of the cell and entire organisms? How are genes related to the function of an organism and where do we loose track? These questions are not of academic importance alone, but their answers have a significant impact on the diagnosis and treatment of (complex) diseases, drug discovery and development.
Let us take a walk from gene to protein and take a closer look at “The Central Dogma of Molecular Biology”, which I personally prefer to call an axiom instead of a dogma. Science should only have axioms and leave dogmas to religion.
Associating genes with diseases
In order to start studying the contribution of a certain gene to a disease we must first find the gene(s) which might play a role in a given disease. The strength of the association must be detectable by the method being applied, which in complex gene-disease relationships has to find the association on a background of significant functional and phenotypical noise, such as in multifactorial diseases like diabetes (Doria A., 2000). Variation in the phenotypical expression of many quantitative traits (length, weight …) is due to the simultaneous segregation of multiple quantitative trait loci (QTL) as well as environmental influences. Genetic dissection of complex traits and quantitative trait loci is a complex process (Darvasi A., 1998; Darvasi A, 2002).A mono-factorial approach is likely to fail in a multifactorial process of pathogenesis (Templeton AR., 1998).
Giving a gene its place in a disease process is not a trivial endeavour and it is complicated by both technological and methodological difficulties. Association studies offer a potentially powerful approach to identify genetic variants that influence disease processes (Lohmueller KE, 2003; Roeder K, 2005). The density of Single Nucleotide Polymorphisms (SNP) makes them a popular target for studying gene-disease associations. However it is not only the density alone which counts, but also the information content of a given polymorphism (Bader JS. 2001; Ohashi J, 2001; Byng MC, 2003; Chapman JM, 2003; Garner C, 2003).
False positive correlations of genetic markers with disease are reported due to a flawed statistical analysis (Nurminen M., 1997; Edland SD, 2004; Wacholder S, 2004). In microarray experiments defining the appropriate sample size to find differentially expressed genesis is an important issue (Wang SJ, 2004). In complex diseases in which not only multiple genes and the dynamics of gene products play a role, associating particular genes with a disease entity is even more difficult than in so-called monogenic diseases (Carey G., 1994; Long AD, 1999). Proper subgroup analyses in a randomised controlled trial (RCT) require careful design (Brookes ST, 2001).
Turning a gene-disease association into determining its role in the actual causation of a disease process is even further away from finding and establishing a positive correlation (Templeton AR., 1998).
From genome sequence to gene activity
The genome sequence alone does not allow us to predict the functional impact of sequence variations as epigenetic modulation influences functional gene expression. Epigenetic modulation of gene function is a cause of non-Mendelian inheritance patterns and variability in the expression and penetrance of a disease. Even transmission of an identical gene sequence is not a guarantee for identical gene expression as the (in)-activation of a gene by epigenetic modulation occurs differently when a gene is of paternal or maternal origin.
Epigenetic modulation of gene expression is heritable during cell division but is not contained within the DNA sequence itself (Reik W, 2001; Bjornsson HT, 2004; Kelly TL, 2004; Chong S, 2004). Epigenetic modulation is one of the problems encountered when cloning, as the cloning process differs in its epigenetic regulation of (embryonic) gene expression (Mann M, 2002).
This differential inactivation of genes from maternal and paternal origin even leads to functional X-chromosome mosaicism in women as their cells at random inactivate one of their X chromosomes. X-inactivation occurs early in embryonic development and all cells subsequent inherit a different functional X chromosome. The inactivated X chromosome can be seen in a microscope as a Barr body in the interphase nuclei of female mammals. Differential activation of genes creates a functional chimera.
Chemical modification by methylation of cytosine residues is a major regulator of mammalian genome function and plays an important role in the intra-uterine development of an organism and the regulation of gene expression (Urnov FD, 2001). Tissue specific imprinting in genes leads to differential gene expression in different tissues (Weinstein LS, 2001). Aberrant DNA methylation has been implicated in the pathogenesis of a number of diseases associated with aging, including cancer and cardiovascular and neurological diseases (Walter J, 2003; Jiang YH, 2004; Macaluso M, 2004). A dietary component such as folic acid is a key component of DNA methylation during in utero development, disease development and aging (McKay JA, 2004). Genes and environment interact and this might play a critical role in the pathogenesis and inheritance of complex diseases (Vercelli D, 2004).
The correlation of even a gene sequence to the first steps in its expression does not show a one-on one relation to the gene sequence itself. The relation of gene structure to its function is a bidirectional process of which our understanding of the impact of different modulators is still not sufficient to create highly correlating disease models.
From gene to protein, a bumpy road
A eukaryote, such as Homo sapiens, has no one-on-one relation to its genes. The dynamics of gene expression is regulated by hypo-, iso- and epigenetic operators. The gene may be the structural unit of inheritance, but the protein domain is the functional unit of metabolism.
When we talk about protein structure, the primary structure refers to the amino acid sequence in a protein (1D). The primary structure is most closely related to mRNA and as such the gene sequence and gene structure from which the protein originates. The terms secondary and tertiary structure refer to the 3D conformation of a protein chain. Secondary structure refers to the interactions of the backbone chain (alpha helical, beta sheet, etc.). Tertiary structure refers to interactions of the side chains. Quaternary structure refers to the interaction between separate chains in a multi-chain protein (4D). The combined shape of the secondary and tertiary structure and the quaternary structure is referred to as the conformation of the protein. With increasing dimensionality, the relation between a higher order organization of protein structure and its gene relaxes as other physical and chemical influences play an increasingly important role in its physical and functional integrity.
In a mature enzyme, only a relatively small number of its amino-acids interact with a ligand, the majority of amino-acids help to create the appropriate 3D and even 4D structures required for its in-vivo functionality. Structural proteins and enzymes may show interactions over larger parts of their molecular surface to form functional homo- or hetero-polymers in their quaternary structure. From a single gene to a protein, we have to deal with the dynamics of gene expression regulation and mRNA formation (promoters, cis- and trans-regulation, transcription, splicing). We have to deal with the interaction of tRNA with mRNA in the translation of an mRNA sequence into a protein sequence and post-processing of the protein sequence into a functional 3D and 4D structure (Wobble, sequence processing, protein folding and interaction).
A structural similarity at the genome level does not lead to functional similarity, due to epigenetic regulation (Eckhardt F., 2004). Sequence variation, due to mutations does not bleed through to the protein level one-on one. Basic mechanisms act as powerful uncouplers of gene structure from protein function. Mutations in the DNA and errors during transcription of the DNA-sequence into mRNA are not linear predictive for the structure and function of the protein resulting from the translation of the DNA-sequence into the protein-sequence, due to the degeneration of the genetic code. The deleterious effects of sequence variations are up to a certain extent suppressed by the Wobble-mechanism used in base-pairing in translating mRNA to protein (Crick F, 1966).
Protein sequence = k x gene sequence
In this formula, ‘k’ is always smaller than one for most amino acids built into a protein, due to mechanisms such as splicing variation, Wobble mechanism.
In eukaryotes, a relatively simple genome compared to their functional and structural complexity can be used, because of the existence of introns and exons. An exon in general defines a functional domain and these domains are rearranged to create a more complex proteome than the genome it is derived from. Constitutive and alternative splicing of genes is dynamically regulated at the moment of transcription and pre-mRNA splicing by cis- and trans-acting factors (Kornblihtt AR, 2004). Before the completion of the Human Genome Project was finished it was expected that man would need about 100,000 genes to explain the structural and functional complexity of our species. This number has collapsed to about 25,000 genes and is about four times (75 percent) lower than expected (Collins FS, 2004). The functional differences between species are more related to differential processing, due to different up- and down regulation of genes in different cell types and organs. The use of different promoters and splicing variants is used to tune protein and enzyme structure and function in different cell locations and organs (Ayoubi TA, 1996, Masure S, 1999; Nogues G, 2003, Yeo G, 2004). Promoter variation and differential splicing allows for spatiotemporal differentiation in protein expression, while the organism does not have to manage an explosion in genomic size and sequence-complexity. This mechanism helps to uncouple the protein from the rigidity of the gene sequence in order to allow for functional variation while restricting structural variation at the genome level. Functional differentiation in gene expression allows for a better adaptability to changing conditions, without the need for fast-paced changes in gene structure.
Protein folding of a linear amino-acid sequence into a 3D protein also acts as a functional uncoupler of gene sequence to protein function. Changes in the physical and chemical environment of the protein may change the shape and alter the conformation of a protein. By putting a protein in a different physical and chemical environment which will change the ability of the van der Waals, hydrogen, ionic and covalent bonds which hold the protein together in its particular conformation, it is possible to cause the molecule to unfold by breaking those bonds and make it change or even lose its function (denaturation). 3D and 4D protein folding is a complex process. Even today the protein folding problem remains one of the most basic unsolved problems in computational biology. Predicting protein folding from the gene upwards ignores the influence of the post-translational modification (PTM) and the influence of the in-vivo physico-chemical environment of the protein. Proteoglycans and glycoproteins are not derived from a gene sequence as such, but their structure is the result of extensive post-translational modification. Cell membranes contain phospholipids, which are not encoded by DNA as such, but they result from metabolic processing and nutritional components.
While the protein-sequence at the moment of translation is related to the gene-sequence, the final structure and function of an enzyme is in addition defined by post-translational modification (PTM) and its physico-chemical environment (Kukuruzinska MA, 1998; Uversky VN, 2003; Schramm A, 2003; Seddon AM, 2004). Studying protein folding is a computational complex process and still the focus of intensive research (Murzin A. G., 1995; Orengo, C.A., 1997; Dietmann S, 2001; Day R, 2003; Harrison A, 2003; Pearl F, 2005). Epicellular regulation of protein glycosylation also plays an important role in the dynamics of protein activity (Medvedova L, 2004).
The majority of proteins are subjected to a multitude of post-translational modifications. Post-translational modification involves cleaving, attaching chemical groups (prosthetic groups), internal cross-linking (disulfide bonds). Already more than hundred different types of PTM are known, which act as functional uncouplers of protein structure from the gene sequence (Hoogland C, 2004). A protein precursor may be differently processed in different cell types and, in addition, diseased cells may process a given precursor abnormally (Dockray GJ., 1987; Poly WJ., 1997; Rehfeld JF., 1990; Rehfeld JF, 2003). Post-translational protein modifications finely tune the cellular functions of each protein and play an important role in cellular signaling, growth and transformation (Parekh RB, 1997; Seo J, 2004).
In a functional protein only a very few specific residues are actually responsible for enzyme activity, while the fold is much more closely related to ligand type (Martin AC, 1998). The effect of an amino-acid change on protein structure and function depends on the location of the amino-acid in the 3D structure, its physico-chemical properties and the physico-chemical environment it is being processed and used. Amino-acids which are distant neighbours in the protein sequence can become close neighbours in the 3D structure of the protein and as such a protein sequence variation is only a weak determinant of the function of a mature protein.
By just going from DNA-sequence to 3D protein structure, the relation between genome sequence and the functional status of a cell begins to fade. By taking this relation even further from gene to organism, we lose additional predictive power. How will be able to design models that will allow us to predict the functional outcome of a disease, when we use a fuzzy model to start with? Powerful uncouplers of the structural relation of even a protein to the gene it is primarily derived from, do not allow us to draw hard conclusions about impact on the functional status of an organism from the gene and genome sequence.
From proteome to cell
Eukaryotic cells are highly compartmentalized; proteins do not exist in the cell as in a homogeneous fluid, but in different compartments of the cell, each with a different physico-chemical environment. The 3D and 4D structure of a protein and its functionality is highly dependent from the in-vivo physico-chemical environment of the protein.
Studying proteins without taking into account their spatial and temporal organization in a cell, ignores the complexity and dynamics of protein expression and interaction in a cell. Studying proteins in-vivo reveals more about their function and dynamics (Chen, X., 2002; Hesse J, 2002; Pimpl P, 2002; Viallet PM, 2003; Murphy R. F., 2004). Without information about the relation between cellular structure and function, a lot of information is lost. A 2D protein-profile may show the entire protein content of a cell, but we lose all information about the intracellular spatial and temporal distribution of these proteins.
Eukaryotic cells are highly spatially differentiated structures. Proteins involved in trans-membrane trafficking, require a membrane to do their work and cannot do their work outside this specific physico-chemical environment. A protein has to reach the appropriate physico-chemical environment in the cell in order to do its work properly (Graham TR., 2004). Studying a protein outside its in-vivo physico-chemical context leads to a loss of correlation with its in-vivo dynamics.
There are three main cellular compartments in a eukaryotic cell, the nucleus, cytoplasm and the cell membrane. The nucleus itself is a highly organized 3D structure with highly spatial and temporal differentiated DNA- and RNA-processing machinery (Lamond AI, 2003; Politz, J., 2003; Pombo, A., 2003; Iborra F, 2003; Cremer T, 2004). Both transcription and splicing of the mRNA message are carried out in the nucleus (Sleeman JE., 2004). The distribution of eu- and heterochromatin changes throughout the cell cycle, chromosomes and spindles appear during cell division. The dynamics of gene transcription is visible in the chromatin condensation patterns in the nucleus.
The cytoplasm itself contains several organelles, smooth and rough endoplasmatic reticulum (SER and RER), ribosomes, the Golgi apparatus, mitochondria, lysozomes and the cell membrane. Each organelle deals with a different set of processes necessary for cell development and maintenance. The membranes of organelles are highly dynamic structures which undergo profound changes during the life cycle of a cell (Ellenberg, J. 1997; Zaal, K. J. M., 1999). The endoplasmic reticulum (ER) is a multifunctional signalling organelle that controls a wide range of spatially and temporally differentiated cellular processes (Berridge MJ., 2002).
The structural compartmentalisation of the intracellular environment allows for a functional differentiation and provides a process flow management mechanism. The membrane structure and the mitochondrial membrane potentials (MMP) of mitochondria play an important role in their function. (Zhang H., 2001; Pham N.A., 2004). Microtubules play an important role in cellular function and their organization and dynamics are being studied by microscopy based techniques (De Mey J., 1981; De Brabander M., 1986; Geuens G, 1986; De Brabander M, 1989; Geerts H., 1991; Olson KR, 1999).
The dynamics of intracellular ion-fluxes such as for calcium (Ca2+) is organized in a highly dynamic and spatial and temporal complex pattern. Ions are themselves not encoded by the genome, but play an important role in cellular function. The intra- and extra-cellular dynamics of ions (concentration, flux) interact with a spatial and temporally regulated pattern for protein expression and differential protein activity. The complexity of intracellular calcium-signaling extends beyond the mere expression profiles of genes encoding the proteins involved in calcium-dynamics (Berridge MJ., 1981; Bootman MD, 2002; Cancela JM, 2002; Berridge MJ., 2003; Berridge MJ, 2003b). For their proper function and survival cells have to manage Ca2+ concentration and flux in space, time and amplitude (Bootman MD, 2001). Calcium is involved in the delicate process of spatially and temporally organization of cellular communication (Berridge MJ., 2004).
As an example of spatial compartmentalisation in the cell, hydrolytic lysozomal enzymes require a specific physical and chemical environment to do their work, which inside the cell only exists inside the lysozomes (De Duve C, 1955). The boundary membrane of the lysozome keeps the hydrolytic enzymes away from the rest of the cytoplasm and so controls what will be digested (De Duve C., 1966).
The cell membrane separates the interior of the cell from its environment, but is a highly dynamic structure (Kenworthy, A. K., 1998; Varma, R., 1998). The appropriate spatial and temporal dynamics of the cell membrane are vital for the survival of the cell. The cell membrane provides the physical boundaries in which the cell can maintain a highly dynamic physical and chemical environment. Cell-to-cell communication is dynamically managed at the level of the cell membrane (Nohe A, 2004).
Proteins do their work in spatially different cellular environments and with different spatial and temporal patterns. A protein can be mobile in one cellular compartment and immobile in another (Ellenberg J., 1997). Co-expressed proteins may in reality never interact with each other because they do their work in separate cellular compartments. The substrates of proteins may migrate through different cellular compartments in order to be subjected to a highly dynamic interplay of enzymatic processes. Proteins which do their work in the same cellular compartment may only be expressed at different stages during the life cycle of a cell.
An important temporal differentiation of cellular processes occurs during the cell cycle. The different stages in the cell cycle each depend on the spatial and temporal expression of multiple proteins. The passage of the cell through the cell cycle is controlled by proteins in the cytoplasmic compartment, such as different Cyclins, Cyclin-dependent kinases (Cdks) and the Anaphase-Promoting Complex (APC). First there is the G1 phase (growth and preparation of the chromosomes for replication). Secondly the cell enters the S phase (synthesis of DNA and centrosomes) and finally the G2 phase which prepares the cell for the actual mitosis (M). The mitosis itself consist of a spatial and temporal sequence of events, called the prophase (mitotic spindle), prometaphase (kinetochore), metaphase (metaphase plate), anaphase (breakdown of cohesins) and telophase where a nuclear envelope reforms around each cluster of chromosomes and these return to their more extended form.
However our understanding of the cell cycle is still far from complete. The regulation of the cell cycle by G1 cell cycle regulatory genes is more complex than we thought (Pagano M, 2004).
We need to study and understand the intracellular in-vivo dynamics of protein metabolism and its spatial and temporal organization in different cell types. We need to study intracellular protein ecology, not just ex-vivo protein interactions or building a protein catalogue of only scalar dimensions.
The dynamics of cellular function
Taxonomy is the science of organism classification and refers to either a hierarchical classification of things, or the principles underlying the classification. Today the emphasis of biological research is on classifying genes, proteins in large catalogues, instead of studying the spatial and temporal dynamics of cellular processes in vivo. The global analysis of cellular proteins or proteomics is now a key area of research which is developing in the post-genome era (Chambers G, 2000; Ideker T., 2001; Aitchison J.D, 2003). Proteins show functional grouping into modules which can be grouped into elegant schemes (Hartwell, L.H., 1999; Segal, E., 2003).
In-vivo however the spatial and temporal distribution and interaction of proteins with other proteins, substrates, etc., adds another layer of complexity which is not taken into account by functional studies alone. Expression studies, no matter how we group them, do not reveal the intracellular spatial and temporal distribution of proteins and the functional outcome of their metabolic activity (spatial and temporal substrate trafficking) in various cellular compartments. Studying proteins only from a functional point of view ignores the impact of their intracellular spatial and temporal dynamics. Molecular taxonomy or systems biology (genomics, proteomics) will not provide us with the functional answers we need to know.
Systems biology studies biological systems systematically and extensively and in the end tries to formulate mathematical models that describe the structure of the system (Ideker T., 2001; Klapa MI, 2003; Rives A.W, 2003). However the level of biological integration which is being studied, genes, proteins, pathways is still too far away from pathological reality to allow for the development of highly predictive and highly correlating disease models. The end-point of present day systems biology only takes into account infra-cellular dynamics and loses track when iso- and epi-cellular phenomena interfere with the dynamics of the model. Studying the physics and chemistry of protein interactions cannot ignore the spatial and temporal dynamics of cellular processes.
The cell is at the crossroads of life itself, being the lowest order functional unit operating in a functional complete way. As such the cell is for life what the atom is for physics, the smallest biological level of organization, operating as a functional unit. Dysfunctional cells by whatever cause, either gene malfunction, infection, nutritional or environmental problems will eventually cause the entire organism to lose its functional integrity. The dynamics of cellular systems allow for the adaptation of the cell to a wide variety of conditions and challenges, a relatively uniform physical structure combined with a web of interacting dynamic processes leads to the multitude of cells which we see in living organisms. In a living organism there is no such thing as an average cell type from a functional point of view. Cells are functionally highly diverse in both spatial and temporal dimensions.
The stochastic variation of cellular processing at the molecular level is another cause of functional uncoupling of the cytome from the genome and ads to the variability in functional behavior between cells (McAdams H.H., 1999; Raser J.M., 2004). Structural research alone underestimates the complexity of dynamic processes as it does not capture sufficiently the dynamic complexity of the cell. The dynamic interaction of processes in multiple pathways is the centerpiece of cellular life, not the individual components or even individual enzymatic reactions in the cell. There is no monotonic sequence of causation from genome structure to cellular dynamics.
Cellular function can be compared to a symphony in which multiple “instruments” contribute to a complex, but in a healthy state harmonic, “sound”.
Genes and the dynamics of disease processes
The challenges faced by the medical world today are no less today than the ones we faced a century ago. The spectrum of diseases may have changed through time, as degenerative diseases and cancer play an increasing role in modern society. On the other side an old enemy is back on the rise, how much we thought that infectious diseases were a thing of the past; they are back and with a new and frightening face.
Our increase in the knowledge of the involvement of our genes and large scale proteomics in disease processes has not lead to an increase in the productivity of pharmaceutical research (Drews J., 2000; Huber, L.A., 2003; Lansbury PT Jr., 2004). The gap between the gene and the functional outcome of a disease is too wide to bridge it from one direction only (Workman P., 2001). Much thought has gone into finding a way how the knowledge coming out of genomics and proteomics could revolutionize drug discovery, such as for drug target discovery (Lindsay MA., 2003). The target of a drug molecule may be a protein, but the target of disease therapy is the entire cell and by extension the cell population of an organism. Every drug and its target may be part of a disease therapy, but the therapy is not restricted to the drug and its target. Every target is part of a therapy, but not every therapy is confined to a traditional drug target.
In the case of diseases where we have already found a genetic basis, this does not always allow us to create a model for the disease process. To discover the involvement of a gene in a disease process does not tell us anything about its place and relative importance in the multiple and multilevel elements involved in the causation of a disease, such as genes, nutrition, infectious agents and the environment. To discover a causative element is not the same as understanding and predicting its dynamic involvement in a disease process. What we do know is that all causation has to pass through cells, as they constitute the “quanta” of the organism itself.
Many diseases of clinical importance have heterogeneous mechanisms which lead to the disease and only in a subpopulation the diseases can be traced back to a single gene. In most cases a multiplicity of mechanisms contributes to the diseases process. Genetic information has a high predictive value in only a minority of cases.
Non-coding sequences, inter-gene and epigenetic interactions have a significant impact on the prediction of the age of occurrence, severity, and long-term prognosis of diseases (El-Osta A., 2004, Perkins DO, 2004).
The importance of the dynamics of the cell and its involvement in pathological processes and current therapeutic efforts also requires a better understanding of its function and phenotype in its relation to pathological processes in diseases, such as in cancer, Alzheimer disease and infectious diseases, such as AIDS, tuberculosis (TBC), influenza (flu), etc.
Trying to predict a disease process from the genome (proteome) upwards, is like trying to solve a higher order polynomial while omitting the majority of elements and expecting that the equation will work:
e.g.: Disease process = a x x + b
Instead of using a higher order multi-dimensional model, closer to in-vivo functional dynamics in which a matrix or web of causation and consequences interacts in a high-dimensional space-time continuum:
e.g.: Disease process = a x un + b x vo + c x wp + d x yq + e x zr
In addition, each parameter which is being used in an equation is in itself the result of an underlying or “overlying” dynamic process. Each layer of organization can be fed into higher or lower order levels of organization as there is always a cross-influence in both directions. It is a matter of expanding or collapsing the set of parameters and taking into account or ignoring underlying “modifying” influences. Reducing the complexity allows for a better understanding of a simplified model, but has a decreased match to the complexity and dynamics of biological reality. When we create a model, we should not regard it as a one-on-one substitute for reality which we capture only partially into our model.
Infectious diseases
Infectious diseases still pose a significant threat to the health and well being of (modern) society. After years of relative neglect, nations are increasingly aware of the present and future threats of infectious diseases and are even setting up new agencies, such as the European Centre for Disease Prevention and Control (ECDC) or expand the role of existing organizations, such as the Centers for Disease Control and Prevention (CDC). Beside their political and economical impact on society, how do we deal with infectious diseases in science?
In infectious diseases the environment, in this case the infectious agents, interacts in a complex way with the host defense system of which much remains to be explored. We must be aware of the fact that the golden era of antibiotics is already behind us as many infectious agents (e.g. TBC, MRSA and other bacterial diseases) are showing an increasing resistance against most classes of antibiotics which are available today (Davies J, 1994). We have succeeded in less than a century to destroy our best weapons against infectious diseases, due to misuse of antibiotics both by physicians and their patients. Only the elderly remember the days when mortality due to infections was a major cause of premature death, but the moment is approaching when this nightmare will return. Emerging infectious diseases (EIDs) and re-emerging infectious diseases challenge our defenses (Ranga S, 1997; Fauci AS., 2004; Morens DM, 2004).
Viral diseases (e.g. AIDS, influenza) are even harder to fight as they use the cellular machinery of the body itself to reproduce. We need to study the pathological process in cells in more detail and in a different way, in order to have a chance to succeed in the new therapeutic challenges ahead of us. Viruses, under selective pressure of modern antiviral drugs are also showing increasing resistance to treatment. We are running out of time in our battle against infectious diseases and a systematic approach will only give us the answers when it will be too late. We are not setting the agenda, but the diseases are taking the lead.
Due to modern technology, the time to respond to a new infectious challenge is being reduced. In modern times, diseases take planes too, which makes it even harder to fight them by classical isolation or quarantine. Airplanes may be safe to travel with, compared to other transport systems, but they can cause secondary mortality by transporting pathogens over large distances at a speed unknown to previous generations, which gives a new meaning to airborne infections (Gerard E, 2002; Van Herck K, 2004; Blair JE, 2004). Infectious diseases may initially go unnoticed in underdeveloped areas of the world (e.g. Ebola virus Lassa fever, Marburg virus), but as soon as they board a plane, it is modern technology which will give them free access to the world (Clayton AJ, 1979; Gillen PB, 1999). A relatively long incubation time combined with a high mortality rate will allow a disease to spread widely and cause a pandemic, before we even can start a treatment program. If an unknown disease causes such a pandemic, we may run out of time before we can find a cure as we first have to develop a diagnostic tool. A recent example which is a model of what can happen was the Severe Acute Respiratory Syndrome or SARS (Peiris, J.S.M. 2003, Berger A, 2004; Heymann DL, 2004; Tambyah PA, 2004).
Robert Koch presented his work on Tuberculosis on 24 March 1882 before the members of the Berlin Physiological Society, which meant a breakthrough in the understanding of this terrible disease (Winkle S, 1997, pp. 137-141). Now after more than 100 years of research and drug development, TB is on the rise again. In the war against infections such as Tuberculosis, there are no easy wins. We may win a fight but for the majority of pathogens we can only reach a status quo, but never completely win the war. Variability by mutating is a powerful weapon against our drug treatments and pathogens use it to their great advantage.
We must keep our defenses up to date and changing in order to outsmart our bacterial and viral enemies. New antibiotics are not found within the human genome. Penicillin was discovered by accident and many important antibiotics were found at the most unlikely places (Fleming, A, 1929). No hypothesis or model can be formulated to find the unexpected, but we have to find new antibiotics as bacteria are closing in on us and some of our worst enemies are even winning the race.
Scientists are waiting with fear for the next influenza pandemic which will hit us some day (Gust ID, 2001; Capua I, 2004). Scientists are trying to understand the lethal potential of the deadliest influenza epidemic of all times, which occurred after the first World-War. Soon the virus which caused the influenza pandemic, called the ‘Spanish flu’ will re-emerge out of the test tubes of the laboratory. Recent outbreaks of avian flu have given us a preview of what can happen and evidence is increasing that the possibilities for spreading avian influenza A virus (H5 or H7 subtype) are worse than previously was assumed (Koopmans M, 2004; Kuiken T, 2004).
New pathogens can have a devastating effect on a human population. Examples of what can happen when a new infectious agent hits a population with little or no immunological “experience” with a (re-)introduced pathogen, can be found in the histories of indigenous people confronted with infectious diseases introduced by European colonization as in Australia an Tasmania. Within 100 years of European colonization the total population of full-blood Aboriginal people in Tasmania became extinct. Introduced infectious diseases killed many more Aborigines than did direct conflict. Infectious diseases such as smallpox, measles, and influenza were major killers and even chickenpox was deadly as the Aboriginals had no immunological history even with chickenpox. Of the 90 percent of the Aboriginal population that died out as a result of European contact, it is estimated that around 80 or 90 percent of the deaths were the result of disease.
Most people have no idea of the role smallpox played in the destruction of an entire civilization after it was brought to America by the conquistadores. About 50 to 90 percent of the Native American population died of smallpox and the speed at which people died is beyond our imagination (McMichael AJ, 2004; Winkle S., 1997, pp. 855-861). A mortality of 50 percent for a new disease, for which we have no immunity, could kill half of the population of a country or an entire continent. Western society now has to fear the introduction of new pathogens from distant places and when the disease has the right pathological profile; it will spread extensively into the population before it is being diagnosed (e.g. AIDS). Re-emerging infectious diseases are a global problem with a local impact. It is an unpleasant thought that this time we will face the fate of the indigenous people during European colonization. In modern times we not only have to fear the accidental spreading of infectious diseases, but bio-terrorism will challenge our defenses sooner or later (Broussard LA, 2001, Gottschalk R, 2004).
Finding the infectious agent for a new and unknown disease requires something else than sequencing a genome as this approach only works when we have the time to do the sequencing while the pathogen takes its course. Analyzing the genome sequence of a new infectious agent can only start after it has been isolated by more traditional means (Berger A, 2004). Once we know the new pathogen, we can use its genome sequence to develop rapid diagnostic tools, based on PCR, but in order to do this we must first isolate it from the patient. Developing a therapy after this, takes much longer and the genome sequence itself without additional functional information is not enough. Only after Koch's postulates had been fulfilled, the WHO officially declared on 16 April 2003 that a previously unknown coronavirus was the cause of SARS. Modifying the disease progression requires an interaction with the actual disease process which extends beyond understanding the genome structure of the pathogen. Focusing more on the dynamics of the interaction of cellular systems with pathogens and using tools for functional research of the disease process at the cellular level (and beyond) will hopefully allow us to respond in time when we are faced with an unknown pathogen.
When we do not already have an antibiotic or antiviral drug at hand at the moment a new disease hits us, we are in serious trouble. In this case the only thing left is the medieval solution of quarantining the infected people, which only works if we are able to contain them before they spread over a country or even the planet (e.g. SARS or HIV).
Although all cells in the human body may share the same genome, there is a high spatial and temporal differentiation in gene expression and metabolic dynamics in different cell types and organs. In HIV, it is the CD4 lymphocytes which express the receptors by which the virus can enter the cell (Fauci AS, 1996). A hepatocyte may share its entire genome with a CD4 lymphocyte, but it does not express the proteins encoded by the gene which allows the virus to enter the cell. The progress of a HIV infection is also a highly dynamic process of interaction between the host and the virus (Wei, X., 1995). The observation of differences in disease progress leads to the discovery of a genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CCR5 structural gene (Dean M, 1996). The emerging picture on infectious diseases is one of highly polygenic patterns, with occasional major genes, along with significant inter-population heterogeneity (Frodsham AJ, 2004). The complex interactions and regulation of the Interleukin-1 (IL-1) family of proteins is just one of the issues in elucidating the dynamics of the human immune system (Laurincova B., 2000).
Clinical observations lead to genetic conclusions, but the way back to clinical treatment of diseases is a long and winding road for which the gene sequence or protein structure does not provide us with all the necessary information about the dynamics of the disease process. Studying the cellular dynamics of disease processes provides us with one of the step stones from gene to clinic. By focusing on genomics and proteomics alone, there remains a correlation and predictive deficit in our disease models.
Mendelian diseases
Mendelian inherited and monogenic diseases have always been at the center of attention in the relation of genetic variation to diseases. Monogenic diseases served as a model to prove the use of genetic information to the development of a disease and the outcome of a disease process. Phenotype-genotype relationships are complex even in the case of many monogenic diseases. Increasingly complex interactions have now been demonstrated in a number of monogenic Mendelian diseases (Nabholz CE, 2004). The (phenotypical and functional) expression and development of even a monogenic disease depends on its context, which comprises both other genes and environmental factors. These inter-gene and epigenetic interactions have a significant impact on the prediction of the age of occurrence, severity, and long-term prognosis of even ‘genetic’ diseases (Cajiao I, 2004; Hull J, 1998; Frank RE, 2004; Salvatore F, 2002; Sontag MK, 2004; Sangiuolo F, 2004).
The beta-thalassemias show a remarkable phenotypic diversity caused by the action of many secondary and tertiary modifiers, and a wide range of environmental factors (Weatherall DJ., 2001). Sickle cell anaemia and cystic fibrosis can serve as an example that genotype at a single locus rarely completely predicts phenotype (Summers KM., 1996). Although the gene defect in Huntington’s disease is known for years, the contribution of the gene defect to the functional out come of the disease is not yet known (Georgiou-Karistianis N, 2003). Cell based research will help to elucidate the disease mechanism in Huntington’s disease (Arrasate M, 2004).
In cystic fibrosis, the severity of the disease cannot be linked one-on-one to genetic variation in CFTR (Grody W, 2003). Cystic fibrosis is the most common autosomal recessive disorder in Caucasians, with a frequency of approximately 1 in 3000 live births, so finding a cure for this disease has a high impact on our society. Success stories with rare diseases may sound impressive from a scientific point of view, but there is no escape from the economic reality of the size of the patient population. So let us take a closer look at cystic fibrosis as it is a disease of which the gene held responsible for the disease was identified about 14 years ago (Rommens JM, 1989; Collins FS., 1990). The method (reverse genetics) used to identify the gene, did not require an understanding of the gene function at that moment or any understanding of the impact of genetic heterogeneity on the phenotypical expression of the disease (Iannuzzi MC, 1990; Audrezet MP, 2004). By starting form the gene for a single genetic disease such as cystic fibrosis, where did we get after 14 years of hard labour?
A once ‘monogenic’ disease such as cystic fibrosis shows remarkable phenotypic variation and clinical variation (Decaestecker K, 2004). By now about 1000 gene mutations of the cystic fibrosis transmembrane conductance regulator gene (CFTR) have been identified, which leads to a highly variable phenotypic and clinical presentation of the disease. (McKone EF, 2003). Mutations in the CFTR gene have been classified into 5 functional categories (Welsh MJ, 1993). A list of 1000 mutations is reduced to 5 functional classes at the protein level, which leads to a ratio of 0.5 percent for each mutation to lead to a distinct CFTR chloride channel dysfunction. Due to the functional uncoupling of gene structure to protein function in cystic fibrosis, genetic sequence variation has a low impact on functional variation on the protein level (1000 to 5). More important than gene sequence variation is the spatial location of a mutation in the 3D structure of a protein. (Rich DP, 1993). Even more important is the cellular and organ location of a functional defect as in Cystic Fibrosis mainly the pathological process (Pseudomonas aeruginosa infection) in the lungs are a major cause of morbidity and mortality (Elkin S, 2003).
Other genes act as modulators of the disease outcome, even in a disease such as cystic fibrosis, once regarded as a monogenic disease (Hull J, 1998, Frank RE, 2004; Salvatore F, 2002; Sontag MK, 2004; Sangiuolo F., 2004). We even need to take into account epigenetic information and environmental influences on disease outcome, even in a so called monogenic disease as cystic fibrosis.
Human populations show considerable genetic heterogeneity (allelic variation) and even geographic variation, which leads to difficulties in using gene sequence based diagnostic tools (Liu W, 2004; Raskin S, 2003). So, the sequence of one individual’s genome allows studying one person’s genetic profile, but does not lead to a population-wide prediction of genetic profiles. Genetic heterogeneity uncouples clinical outcome from model gene sequences (Imahara SD, 2004). This problem is not solved by simply adding more sequence information without a functional understanding of the meaning of sequence variation on phenotypic expression and disease outcome in the patient. Structural information without functional understanding leads to predictive deficits. The functional understanding of a disease process must be at the level of the patient and his cells and not at a lower order organizational level, such as the genome or proteome alone.
Genetic heterogeneity leads to a reduced sensitivity and an increase in false negative results if a genetic test is not adapted to this genetic heterogeneity. A mutational test leads to a simpler almost ‘binary’ readout, instead of the more ‘analog’ interpretation of a continuum of values in a functional test, but this comes at a price. A test which detects a disease marker at a higher organizational level can detect a disease more easily and will lead to less false negatives in this case.
The complexity of even monogenic diseases and the web of functional interactions between at the genome level, protein interactions and environmental influences on the disease outcome will dilute the predictive power of structural sequence information and the DNA-level. Using low-dimensional intracellular data to predict iso- and epicellular phenomena has a low predictive power to be used in clinical situations as such.
No pharmaceutical company would like the idea that it requires 14 years of preclinical research to reach an IND after a new drug target was identified as in cystic fibrosis. Even if only 1000 genes out of our 25,000 were involved in human diseases and would require the same amount of work, it would take us the equivalent of 14,000 years of work on the scale as was needed to achieve the same results as for the cystic fibrosis gene. But up to this moment no causal (gene) therapy came out of the identification of the CFTR gene, but an improvement of prenatal diagnostics (Klink D, 2004).
Pseudomonas aeruginosa lung infection is the major cause of morbidity and mortality in patients with cystic fibrosis (Elkin S, 2003). Over the past decades we have seen an improvement of symptomatic therapy, but still no causal therapy, leaving aside a lung transplant.
How are we going to develop drugs which have a large enough patient population to pay for the costs of drug discovery and development if we need to target individual mutant protein molecules? If it can be so difficult to go from a single gene to develop a therapy based on genetic information, how do we expect to proceed for the entire genome and proteome?
Degenerative diseases and cancer
The increasing longevity of western population is increasingly straining public healthcare systems, due to an increase in incidence of degenerative diseases and cancer. A diminishing active population has to support the growing financial demands of a healthcare system. Improving the health and self-reliance of the growing number of elderly people by efficient treatments of degenerative diseases and cancer is an important political issue. Where are we and where are we going to in science to solve these fundamental problems of modern society?
Unraveling the pathological mechanism of a complex disease is a major scientific challenge and still beyond reach of present day science in many cases. For degenerative diseases, such as Alzheimer disease , cancer, birth defects, cardiovascular diseases, Parkinson’s disease, diabetes, and nerve degeneration it is the dynamics of the cellular machinery itself which fails. Sharing one genome does not lead to sharing the same pathology, as cellular differentiation leads to a highly diverse spatial and temporal cellular function and morphology. Differential and heterogeneous degeneration patterns of different cell types are the consequence of a highly differentiated spatial and temporal expression pattern of proteins in different cell types and different sub-cellular compartments.
Unravelling part of the genetics of a disease does not yet bring therapeutic success. Multiple genes and (multiple) environmental factors contribute to the disease process and its clinical outcome in complex diseases (Liebman MN, 2002). In Crohn’s disease the gene defect found does not explain the severity of the disease (Peltekova VD, 2004). In breast cancer genetic variants of BRCA1 and BRCA2 do not have a consistent level of penetration and as such their presence alone does not explain the disease process (Ford D et al, 1998; Hartge, 2003). Although there is evidence for the involvement of the gene for PPAR-gamma in type 2 diabetes is, the mechanism by which it contributes to the disease process of diabetes is not clear and could not be deduced from genetic information alone (Barroso I, 1999).
In APC (Adenomatous Polyposis Coli) and HNPCC (Hereditary Non-Polyposis Colorectal Cancer) a genetic origin, only accounts for about 5 percent of all cases of colorectal cancer (Kinzler, 1996). Genes which are involved in diabetes, such as GCK (glukokinase) , HNF1A and HNF4A (Hepatic Nuclear Factor) are linked to less than 5 percent of cases of diabetes (Edlund, 1998, Fajans, 2001).
On of the major emerging health problems of modern society is Alzheimer’s disease (AD). This is not only because widely known people, such as the former president of the USA, Ronald Reagan, suffered from the disease in a long and unpleasant disease process. Today AD is still a chronic disease without a cure which causes patients to receive long-term care (Souder E, 2004).
Presently available drugs improve symptoms, but do not have a profound disease-modifying effect and fail to alter the course of AD, so it may be time to change the way we think about AD therapeutics (Crentsil V., 2004; Citron M., 2004; Kostrzewa RM, 2004)? Will we see a breakthrough in the understanding of the cellular and molecular alterations that are responsible for the degeneration of neurons in AD patients (Mattson MP., 2004)?
In Alzheimer’s disease (AD), only a minority of cases can be linked to a single hereditary gene mutation, the complexity of the disease process extends beyond our present understanding and disease models (Selkoe DJ., 2001; Eikelenboom P, 2004). Neurodegeneration in AD may be caused by deposition of amyloid beta-peptide in plaques in brain tissue (Amyloid Hypothesis), but no causal treatment has come out of this in 10 years of hard work (Hardy J, 2002; Lee HG, 2004; Lee HG, 2004b). Little is understood about the dynamics of amyloid beta-peptide and its fundamental role in the disease process of AD (Regland B, 1992; Koo EH., 2002; LeVine H 3rd., 2004).
A complex disease requires studying and understanding a complex in-vivo pattern of a spatially and temporally changing metabolic process, which goes beyond studying gene expression profiles, either single or multiplexed. Studying the multi-scale spatial and temporal dynamics of a complex disease process in a long-term space-time continuum is a tremendous scientific challenge. Instead of focusing on individual (molecular) targets in drug research and therapy, complex diseases may require pathway-engineering or cell replacement to restore the appropriate dynamics of spatial and temporal patterns of intracellular molecular processes. Functional or structural protein (re-) modeling or restoration in-vivo may be a better approach for complex diseases than just docking a small molecule to an active binding site?
At this moment the cell is the target for many therapeutic efforts to come to a causal therapy of complex diseases, which we can now only treat with external substitution, such as diabetes. Many diseases are far more complex and multi-factorial than monogenic diseases and should be studied with more power at a higher biological level than the genome or proteome to capture the complexity of the disease process.
One of the most promising domains of research today is stem cell research (He Q, 2003; Doss MX, 2004). Since the isolation and growth in culture of proliferative cells derived from mouse embryos in 1981, stem cell research has come a long way (Evans MJ, 1981; Martin GR., 1981). Instead of treating complex disease processes with a multitude of drugs, each with its own spectrum of sometimes serious and cumulative side effects, failing components of the human cytome could be engineered or replaced by stem cells (adult or embryonic) differentiated into the appropriate cell type.
When the distortion of cellular metabolism goes beyond a mere dysfunction of a single protein, a complete replacement of the dysfunctional cells has a better change to restore the complex and delicate balance and regulation of metabolic processing. The fine dynamics of spatial and temporal regulation of cellular metabolism and its response to changing demands of an organism in complex diseases are best met by replacing the failing part of the cytome with a well balanced cellular substitute. Those parts of cellular processes which are beyond the reach of (present-day) drug therapy or which are insufficiently treated by non-cellular means have the prospect of being restored to a physiologically appropriate level. With stem cell therapy we would be able to replace a non-functional part of the human cytome with a set of functioning and dynamically regulated cellular units.
Several diseases which currently cannot be treated or cured completely are the target of intensive research. In diabetes long term insulin replacement therapy does not prevent a multitude of chronic and severe side effects, such as circulatory abnormalities, retinopathy, nephropathy, neuropathy and foot ulcers. In juvenile diabetes however there is an immunological component which complicates treatment. The prospect to find a cure for diabetes which would restore the dynamics of insulin production is an important scientific and social challenge (Heit JJ, 2004).
There is hope for the development of stem cell therapies in human neurodegenerative disorders (Kim SU., 2004; Lazic SE, 2004; Lindvall O, 2004). Much research goes into finding a cure for degenerative diseases such as Parkinson’s disease (Drucker-Colin R, 2004; Hermann A, 2004; Roitberg B, 2004). Scientists are investigating the possibility to treat a failing heart with cellular cardiomyoplasty (Wold LE, 2004)
When we want to use stem cells for disease therapy we have to deal with the functional and structural characteristics of cells which are being used (Baksh D, 2004). The differentiation of stem cells of either adult or embryonic origin, into mature and functional cells is a complex and dynamically regulated process. Understanding the differentiation pathways of embryonic and adult stem cells and their spatio-temporal dynamics of differentiation and structural organization will require intensive research (Raff M., 2003). When using stem cells from an individual which suffers from a degenerative disease, the disease may not be cured when the same deficient pathway is activated in the differentiating stem cell. The molecular process may need to be corrected first in this case, for instance by gene therapy or by using exogenous stem cells.
Gene therapy also holds many promises for the therapy of life threatening diseases, but in order to improve gene therapy we will need a better understanding on what goes on inside the cell and what the consequences are on the cellular metabolism when we modify its function by inserting genes. At this moment monogenic diseases are the target for gene therapy, but in the future entire parts of pathways may need reconstruction. The gene is the means to achieve the ultimate goal to change the cellular metabolism to cure a disease.
The scientific challenges posed by complex diseases, such as many degenerative and chronic diseases and cancer will keep scientists busy, far beyond the current scope of present day science.
Drug discovery and development
Drug discovery and development: an inefficient process
At the end of the drug discovery and development pipeline, there are patients waiting for treatments, company presidents and shareholders waiting for profit and governments trying to balance their health care budget. For pharmaceutical and biotech companies, the critical issue is to select new molecular entities (NME) for clinical development that have a high success rate of moving through development to drug approval. Finding new drugs (which can be patented to protect the enormous investments involved) and at the same time reducing unwanted side effects is vital for the industry.
The cost to develop a single drug which reaches the market has increased tremendously in recent years and only 3 out of 10 drugs which reached the market in the nineties generated enough profit to pay for the investment (DiMasi, J., 1994; Grabowski H, 2002; DiMasi JA, 2003). This is mainly due to the low efficiency and high failure rate of the drug discovery and development process. Pharmaceutical companies are always trying hard to reduce this failure rate. They want to find ways to bring down the enormous costs involved in drug discovery and development (Dickson M, 2004; Rawlins MD., 2004).
Only about 1 out of 5,000 to 10,000 drugs makes it from early pre-clinical research to the market, which is not an example of a highly efficient process. The current focus of the pharmaceutical industry on blockbuster drugs is a consequence of the mismatch between the soaring costs and the profits required to keep the drug discovery and development process going. Only diseases with patient populations large enough (and wealthy enough) to pay back the costs for a full blown drug development are now worth while working on. Research for new antibacterial drugs is being abandoned, due to an insufficient return on investment (R.O.I.) to pay for the development costs of new drugs (Lewis L, 1993; Projan SJ., 2003; Shlaes DM., 2003). If the industry cannot bring the costs down, it may as well try to raise its income by changing its price policy, but this shifts the solution for the problem from in- to outside the company and places the burden on the national health care systems.
Companies which were more successful in the past achieved a higher efficiency even without the availability of extensive genomic and proteomic data and new low-level disease models. Introducing new technology and generating more data alone are not sufficient to improve the drug discovery process (Drews J. 1999; Horrobin DF, 2003; Omta S.W.F., 1995).
To be complete, there are alternative views which criticize the calculation of the cost of drug discovery and development. The consequence of accepting this alternative view would be that the pharmaceutical industry would be losing money due to costs outside its core mission, which is even worse, because research and development can be improved, but this would not help in this case. The result is in each case, that drugs are only worth while to develop, if they have an enormous market potential, otherwise they do not earn back the money invested, when they finally they reach the market.
The basic numbers for time spent and costs made in drug discovery and development can be found in several documents published by institutes which generate reports about the pharmaceutical industry (Boston Consulting Group, Tufts Center for the Study of Drug Development, Pharmaceutical Research and Manufacturers of America (PhRMA) , etc.). Let us now take a closer look at the drug discovery and development process. It takes an average pharmaceutical company about 10 to 15 years and US$ 500 to US$800 million to bring one new drug to the market. Of these 15 years about 6.5 years or 43 percent of the total time is spent in pre-clinical research and about 7 years or 46 percent of the total time is time spent in clinical research (1.5 years in phase I, 2 years in phase II and 3 years in phase III). Phase I of a clinical trial deals with drug safety and blood levels in healthy volunteers. Phase II deals with basic efficacy of a new drug. Finally Phase III deals with the efficacy of the drug in large patient populations. It is easy to understand that the increase of the population used to study the effect has a dramatic impact on the complexity and the cost of the clinical trial.
To process a New Drug Application (NDA) takes the U.S. Food and Drug Administration (FDA) on average 1.5 years based on the results and documents provided by the pharmaceutical industry. The situation in Europe for the European Medicines Evaluation Agency (EMEA) is probably of the same order of magnitude. About 0.1 percent of the original molecules screened in drug discovery enter phase I (5 out of 5,000 to be optimistic) and 0.02 percent of the original molecules finally reach the FDA (1 out of 5,000). Of the 5 molecules entering phase I, about 4 out of 5 or 80 percent fail to make it to a NDA. After approval by the FDA, the drug hits the market and enters phase IV of the clinical study process.
In the 1990’s about 38 percent of the drugs which came out of discovery research dropped out in phase I. Of those molecules which made it out of phase I, 60 percent of those failed in phase II clinical studies. And now we get to the really expensive phase III in which 40 percent of the remaining candidates failed. Of those drugs which made it out of phase III to the FDA 23 percent of the ones that made it through the clinical trials failed to be approved by the FDA. All this translates to about 11 percent overall success rates from starting the clinical trials (Kola I., 2004).
So the disease models used in drug discovery fail to predict failure in clinical development in about 80 to 90 percent of the drugs which enter drug development. And the combined predictive power of all clinical trials fails to predict failure in 1 out of four or 25 percent of drugs submitted to the FDA for approval.
What can we learn out this numbers and what is being done in drug discovery? The role of absorption, distribution, metabolism, excretion and toxicity (ADMET) is an important part of the drug discovery process as ADMET is an important cause of failure in drug development (Yan Z, 2001; Lin J, 2003; Nassar AE, 2004). The drug discovery process (target identification, target validation, lead identification/optimization …) and preclinical development such as ADMET studies, fail to predict the failure of a drug in clinical development for 4 out of 5 or at least 80 percent of the molecules which enter phase I. Improving the predictivity of disease models in drug discovery and ADMET is an important issue to reduce the failure rate in drug development.
A new drug spends about 90 percent or 13.5 years of his career within the discovery and development process, before it reaches the FDA for the last 10 percent or 1.5 years. So the FDA does not account for the majority of the time it takes to bring a new drug to the market, nor does it account for the majority of failures which is only 20-25 percent or 1 out of 5 or 1 out of 4 drugs which enter phase I or 1 out of 5,000 (0.02 percent) if we start from the beginning of the process. Although the investments in the early stages of the drug discovery process have increased tremendously, this means nothing compared to the cost of failure in phase III of a clinical trial.
A failure in phase IV in general means lawsuits against the company and a serious blow to its reputation. Due to the inefficiency of the drug discovery and development process, pharmaceutical companies rely on only a few drugs for their income and profit. The withdrawal of a drug from the market has serious consequences both due to the loss in revenue for the company and the financial consequences of lawsuits.
Some examples of drug withdrawals over the years give an indication of the impact on patients, society and the pharmaceutical industry. Thalidomide was withdrawn from the market in the sixties (McBride WG, 1961). In 2001 cerivastatin (Baycol) was removed from European and USA markets because of the risk for rhabdomyolysis (Bayer, 2001; Furberg CD, 2001; Davidson MH., 2002; Kind AH, 2002; Ravnan SL, 2002; Staffa JA, 2002; Maggini M, 2004). Prepulsid was withdrawn form the market due to cardiovascular adverse effects (Griffin JP., 2000; Wilkinson JJ, 2004). In 2004 the COX- 2 inhibitor rofecoxib (Vioxx) was withdrawn because of cardiovascular adverse effects (Dyer C., 2004; Juni P, 2004).
Improving drug discovery
In order to improve this process, where should we try to optimize it? After about 7 years in pre-clinical research, a new drug is ready for filing an initial new drug application (IND) after which the FDA’s Center for Drug Evaluation and Research (CDER) oversees the clinical studies. The clinical trials, from phase I to III are highly regulated and a company can only optimize the flow of events, but up to a large part it cannot decide freely what needs to be done in these stages of the process. Once a drug hits the FDA (CDER), strict rules need to be followed for the approval and failure to comply will only delay this process. So it is by improving the quality and shortening the process in drug discovery, a pharmaceutical company can make a significant difference.
How should we proceed to improve drug discovery? We have seen an enormous investment in research at the infra-cellular level, such as HTS, genome based and proteome based disease models in the past ten years and at the same moment have witnessed a disproportional decline in the productivity of research and development in drug discovery (Horrobin DF, 2000; Horrobin DF, 2001; Bleicher KH, 2003). The pharmaceutical industry has yet to find a way to reduce its high attrition rates (Kola I., 2004). The consolidation in the pharmaceutical industry will not solve this problem in the long run, as it only reduces the costs but does not improve scientific productivity; it only postpones the moment of truth. The scientists themselves will have to find new ways to improve their productivity; management cannot do this in their place. Society tries to protect itself against the adverse effects of new drugs, such as with Thalidomide in the sixties (McBride WG, 1961). This is done by increasingly stringent regulations but the currently used methods in the discovery process for new drugs cannot keep pace with these new requirements. However, as we can see, increasingly strict regulations do not explain all the problems pharmaceutical research is facing today.
Failing disease models in drug discovery
In the end it is clinical reality which decides on the fate of new drugs and not the technology or disease models used to create them. The early stages of drug discovery involve the identification and early validation of a disease-modifying target (Lindsay MA., 2003). Failing to make the right decision at the important step of hit to lead transition has costly time and resource implications in downstream drug development (Alanine A., 2003). Why do these early stages of drug discovery fail so often and why are they the cause of a huge efficiency deficit later on in the drug discovery process?
When a situation becomes desperate and the overall understanding of a biological process is insufficient or patchy, the only way out seems to be to improve throughput, but this clogs the pipeline downstream with low-quality data in relation to biomedical and clinical reality. There is a time-shift between the implementation of a new approach (linking genes almost directly to clinical diseases) and finding out about its impact on commercial success, which makes the feedback loop inefficient due to its long delay in relation to the quarterly and annual business cycle. From a business perspective, any process can be sped up and content can be sacrificed or complexity reduced. This may be true for a nuts and bolts process, but not for processes in a biomedical context when our understanding of pathogenesis is still very patchy and incomplete. The complexity of intermediary modulation of gene-disease (un-)coupling was clearly underestimated.
There is a fundamental problem with studying disease-relevant mechanisms in the current disease models as the pharmaceutical industry has been investing heavily in studying the bricks, instead of looking at the building as a whole. The emphasis in recent years has been on increasing quantity while at the same moment sacrificing the quality of correlation with clinical reality. You could also think of it as a pointillist painting, of which we have been looking at the individual dots, instead of looking at the entire painting. Another analogy is that we are trying to explain the tidal patterns of the oceans, by studying a water molecule and ignoring the moon. We have to look at biological phenomena at the appropriate scale of integration and from a functional point of view in order to get a grip on the development of pathological processes. We should try to understand the dynamics of disease processes at a higher level of biological integration, closer to the clinical reality, than the genome or proteome. An integrated cellular approach is needed to study disease processes (Lewis W. 2003).
If we modify a gene, e.g. by creating transgenic animals, we must try to understand the dynamics of the pathways we are modifying. Quite often transgenic animals or animals with gene knock-outs do not show the expected phenotype, because of a different genetic background and the highly dynamic interplay of metabolic pathways and environmental influences on the final phenotype (Sanford LP, 2001; Pearson H. 2002).
The (early stage) disease models we use don’t work as they should do and do not provide enough predictive power. One can study cellular components, like DNA and protein as such, but this will not reveal the complex interactions going on at the cellular level of biological integration or in other words, the cytome . Both medicine and pharmaceutical research would benefit from using more cell oriented disease models and even higher-order models, instead of using infra-cellular models to try to describe complex pathological processes at a molecular level and getting lost in the maze of molecules which are the building blocks of cells.
An important moment in the drug discovery and development pipeline is the transition from discovery research to clinical development, for which different approaches to develop gatekeepers have been proposed to reduce the failure rate in drug development on both sides of the transition (Lappin G., 2003; Nicholson J.K., 2002; Pritchard J.F., 2003). Drug discovery should improve the quality of drugs it allows to enter development and drug development should be able to protect itself from drugs likely to fail in phases I to III. A better quality of drugs entering drug development is needed, not just more quantity. Failing in larger numbers will not bring the solution to create a better process from discovery to phase III an IV.
A highly defined oligo-parametric infra-cellular disease model used in High Throughput Screening (HTS) which in its setup ignores the complexity of higher order biological phenomena, may produce beautiful results in the laboratory, but fails to generate results of sufficient predictive power to avoid considerable financial losses later on in the drug discovery pipeline (Bleicher KH, 2003). A living cell may be a less well defined experimental environment for the biochemist, but it will provide us with the additional modulating influences on our disease models which are lost in lower-order disease models.
Metabolic variation in disease models
Nowadays the first stages of drug discovery use genetically homogeneous disease models, which as a result do not show the same metabolic heterogeneity of patient populations. Genetic and metabolic heterogeneity is now seen as reason to exclude potential patients from treatment, not as a consequence of the failure of drug development. If we cannot develop drugs which will work in a genetically and metabolically heterogeneous environment, we try to reduce the patient population until it fits our abilities. However this micro-management of patient populations leads to a level of complexity in disease treatments the pharmaceutical industry, physicians and society cannot deal with in the end.
Pharmacogenomics is used to explain differences in drug metabolism during drug development (Dracopoli NC., 2003; Halapi E, 2004; Kalow W., 2004). Toxicogenomics and genotyping are used as a tool to identify safer drugs, worthwhile to enter clinical development (Guzey C, 2004; Koch WH., 2004; Yang Y, 2004).
Pharmacogenomics is not yet used to design or use early stage disease models with sufficient genetic heterogeneity to select drug molecules which will hold their activity in a metabolic heterogeneous environment. Genetic heterogeneity, epigenetic modulation and metabolic variation are not taken into account in the first stages of the drug discovery process. Optimizing a drug molecule for binding to one particular genetic variant, imminently leads to failure in a genetically heterogeneous patient population. Randomization in experimental design to counteract a systematic bias in one’s results involves more than sample unit randomization patterns.
Biological variation in heterogeneous cell or animal population may be an unpleasant fact of life, but it correlates better to the real conditions of the genetically and metabolically heterogeneous patient populations. Ignoring biological variation in drug discovery will cause failure in drug development. Using pharmaco-genomics only to exclude slow metabolizers, etc., from clinical trials and thereby homogenizing the trial population can lead to a dramatic reduction in potential patient population and a decline in profit generation potential. The important phase of a drug life cycle starts when it hits the market and we better take care that it will spend its full life cycle to generate enough revenue to fuel the company.
Hypo- or Subcellular disease models
We have seen an increase in capacity of DNA and RNA expression techniques, but their information still delivers data up to the level of the expressed protein, but not beyond. The quantitative chain of functional causation stops at the protein level. Higher order spatial and temporal dimensions of cellular dynamics are beyond the reach of these techniques. Gene expression studies do not tell you about the functional outcome of protein dynamics and enzymatic activity in the different cellular compartments. Up and down-regulation of gene expression, does not inform you about the functional interrelation of the encoded proteins and their spatial and temporal dynamics in the cell. Molecular pathways do not exist as parallelized unrelated up-and down regulating patterns, but are highly dynamic and intertwined modular networks (Sauer U., 2004). Where and how should we interfere with a disturbed cellular metabolic process in order to treat a disease? We fail to predict success with our current disease models, which translates itself in a high (up to 90 percent) attrition rate in drug development.
Southern, Northern and Western blots may show the quantitative sequence of gene expression up to protein concentration (Alwine JC, 1977; Alwine JC, 1979; Howe JG, 1981; Hinshelwood MM, 1993). DNA microarrays give a quantitative indication of gene expression (Barbieri RL, 1994; Schena M, 1995; DeRisi J, 1996; Jeong JG, 2004; Kawasaki ES., 2004). However finding a positive correlation between the pattern of gene expression and a given disease state is not the same as finding a causative relationship between (a) gene(s) and the causation matrix of a disease (Miklos GL, 2004). Moving up to the level of the dynamics of protein expression already demands a higher degree of sophistication in both assay design and data analysis (Kumble KD., 2003). However, without a functional assay on in-vivo dynamics of protein function and studying its spatial and temporal expression patterns (process flux) in the cell (compartments) and tissue, the functional impact on the cell remains unclear (Kriete A, 2003; Young MB, 2003; Egner, A., 2004).
Studying subcomponents of cellular pathways ignores the functional unity of the biological processes in the cell and the functional interactions between pathways. Without a better understanding of the phenotypic and functional outcome in the cell, the failure rate of the drug discovery process will remain high and very costly. There is a predictive deficit in the current oligo-parametric disease models used in pharmaceutical research which necessitates complex and expensive studies later on in the drug development pipeline to make up for the predictive deficit.
The popular techniques to explore and analyze low-dimensional data at high speed are based on the idea that this would provide all the data with sufficient predictive power to allow for a bottom-up approach to drug discovery. The current High Throughput Screening (HTS) and other early stage methods allow gathering low-dimensional data at high speed and volume, but their predictive power is too low as they lack depth of descriptive power (Perlin MW, 2002; Entzeroth M, 2003). We are just clogging the drug discovery pipeline with under-correlating data in relation to clinical reality. A bigger flow of unmanageable data does not equal a higher correlation to clinical reality.
The knowledge gathered at the infra-cellular level has to be viewed in its relation to the (living) cell and the biological and non-biological processes influencing its function and health, which requires a top-down functional and phenotypical approach rather than a bottom-up descriptive approach. Complex disease processes cannot be explained by simple oligo-parametric low-level models. A high-speed oligo-parametric disease model does not equal high predictive power. It is not the ability to study a simplified disease model at high speed which will allow us to succeed, but we must study and verify the functional outcome of the disease process itself.
A game of chess is not described by naming its pieces, but by the spatial and temporal interaction of both players or in other words the flow of actions and reactions, described in a space-time continuum and if we add the color it is a spatio-spectro-temporal flow of events. The individual pieces or moves do not explain the final outcome of the game, only when the entire process is analyzed from a positional and functional point of view we can understand and predict the reason why one player wins or loses. You have to study a game of chess at the appropriate organizational level in order to understand it or you will fail to find an explanation for the outcome of the game.
Isocellular disease models
Using cellular disease models in more detail is not a trivial endeavor. Cellular disease models need to be related to at least the in vivo cellular disease process we want to study, so a validation of this correlation is important (Gattei V, 1993; Thornhill MH, 1993; Lidington EA, 1999; Dimitrova D. S., 2002).
We now know that metabolic pathways show complex interactions and that gross genetic rearrangements can impair entire parts of cellular metabolism. The cellular models used in research should be validated for their functional and phenotypical representation of in vivo, in-organism processes. However many popular cell lines are not selected for their close linkage to clinical reality, but for their maintainability in the laboratory, lack of phenotypical variation, ease of transfectability, etc. .. It is assumed that those cellular models are a valid representative of the disease process, but almost never a thorough assessment is being done. Primary cell lines cells in general require a more complex tissue culture medium than most popular cell lines. Cancer cells (and transformed cells) can usually grow on much simpler culture medium. Replicative senescence and varying behavior at each passage (which may necessitate a change of cell lines for long term experiments) also make primary cell lines less popular, as they necessitate a change of cell lines and variability in experimental data. Reduction of unpleasant variability in experiments by choosing a specific disease model may create ‘nice’ results, but of a reduced predictive value. Quite often results obtained with one cell line, cannot be confirmed by using another cell line, without even talking about primary cells.
CHO cells (Chinese Hamster Ovary, Cricetulus griseus) are used in many assays, but they are not derived from a human cell and are aneuploid (Tjio, J. H., 1958). HeLa cells are derived from an aggressive cervical cancer; they have been transformed by human papillomavirus 18 (HPV18) and have different properties from normal cervical cells (Gey, G.O., 1952). The U-2 OS osteosarcoma cell line is easy to maintain and transfect (Ponten J, 1967). The PC12 cell line which responds reversibly to nerve growth factor (NGF) has been established from a rat adrenal pheochromocytoma, it has a homogeneous and near-diploid chromosome number of 40 (Greene LA, 1967). HEC cells are derived of a human endometrial adenocarcinoma cell line and are also very popular (Kuramoto H., 1972).
Some popular cell lines may correlate with themselves and not with the complex dynamics of the pathological process they are supposed to represent. Studying the dynamics of the involvement of a protein in a disease in patients and transforming this knowledge into a disease model in a particular cell line requires a careful assessment before embarking on a drug discovery process. Functional cell model drift should be verified at regular intervals and taken into account.
Even within individual cell lines there is not always homogeneity in phenotype and function. Cancer cells show genetical and chromosomal instability as they tend to lose parts of chromosomes (Duesberg P., 1998; Lengauer C, 1998; Duesberg P, 2004). Using cell lines derived from cancers poses a correlation risk in relation to clinical reality on research done by using these types of cell lines. Continuous sub-cultivation of cells and an increase in the number of passages may lead to chromosome rearrangements and loss of functional reactivity (Dzhambazov B, 2003). Loss of function destabilises a cell when critical parts of pathways are lost, although cell cycling may continue in parts of the cell culture, but this will cause a drift and on experimental results.
Many of the most popular cell lines lack parts or even entire chromosomes and therefore large chunks of metabolic pathways. A drug molecule can not interact with the proteins which are not present in the cell line and an adverse or even positive effect will go unnoticed. Functional loss of proteins and enzymes in cancer cell makes them unresponsive to drugs if the protein(s) which are the target of a drug are lost without killing the cell as such.
Even when a protein is successfully expressed in a cell as shown on a Western blot, this does not equal functional success. Western blotting tells you how much protein has accumulated in cells. Even knowing the rate of synthesis of a protein by Radio-Immune Precipitation (RIP) does not predict the functional outcome of protein expression. Protein function is also depending on the metabolic background of the cell in which the protein is expressed and its spatial and temporal organisation. If the enzymatic and structural background of the cell does not meet the prerequisites to put a functional protein in the right location, embedded in the right functional environment, nothing appropriate will happen. An appropriate functional assay is required to validate proper function of the expressed protein. A functional comparison between isolated native cardiac myocytes and cloned hERG demonstrates the advantages of cardiac myocytes over heterologously expressed hERG channels in predicting QT interval prolongation and TdP in man (Davie C, 2004).
In vivo enzymatic reactions are not linearly correlated to protein concentration or of “zero order”. The intracellular environment causes a more complex functional pattern for a given protein, such a bell shaped relation between protein concentration and function. A “blunt” on/off expression in a transfected cell does not correlate well to the physiological condition in a primary cell. When the appropriate metabolic environment is not present when studying a protein in a cellular disease model, predictivity of the disease model may be low compared to physiological conditions.
A traditional (homogeneous) cell culture in the laboratory may not yet mimic the physiological conditions in an entire organism, so our approach to cell-based research (and beyond) requires some redesign also. Creating a virtual organism, by differential screening of a multitude of cell type representing the main cell types in the human body (cardiomyocytes, hepatocytes …) could help us to improve the predictive value of cellular disease models. We need to study cell-to-cell and cell-type-specific pathway dynamics in more detail, as is the case for nuclear factor-kappaB (NF-KappaB) (Schooley K, 2003).
Metabolic pathways in cells do not exist in a void, but are interconnected and highly dynamic processes. Blocking a pathway has far-reaching consequences for the intracellular environment. The upstream metabolites will either find their way through other metabolic pathways or pile-up. Some inborn errors of metabolism are an example of this principle (PKU …). Drugs blocking pathways also cause a distortion of the delicate balance in metabolic processes and may cause upstream effects by metabolites which are normally metabolized before they can cause any harm. The kinetics of the “pharmakon” may be documented, but the change in cellular metabolism and pathway-network distortion are less well understood. Upstream metabolites may become processed by other pathways and unexpected adverse effects may show up. Adverse effects on cellular metabolism are only present in those cells which have an intact metabolic pathway and not even all cell types activate the same pathways at all times.
Differential multiplexing in (high-content) cell based screening could help us to acquire more information about the spatial and temporal dynamics of cellular processes. Developments are going on towards experimental multiplexing and up-scaling of the capacity of quantitative cellular research (Perlman ZE, 2004). Techniques such as High-Content Screening (HCS) or multiplexed quantification can be applied to cellular systems to study intra-cellular events on a large scale (Van Osta P., 2000; Taylor DL, 2001; Van Osta P., 2002b; Abraham VC, 2004; Van Osta P., 2004). Subcellular differential phenotyping is already possible on a large scale by using human cell arrays. Light-microscope technology is used to explore the spatial and temporal dynamics in cell arrays in great detail (Ziauddin J, 2001; Bailey SN, 2002; Baghdoyan S, 2004; Conrad C, 2004; Hartman JL 4th, 2004).
Analyzing a large number of tissues for candidate gene expression is now greatly facilitated by using Tissue MicroArray (TMA) technology (Kononen J, 1998; Simon R, 2002; Braunschweig T, 2004).
From individual cell to cytome
Studying cell function and drug impact at the level of the individual cell is called cellomics (Russo E., 2000). However, the concept of cellomics does not take into account the supra-cellular heterogeneity which is present in every cellular system, such as a cell culture or an organism. By studying cells while ignoring their diversity we make the same mistake as the statistician who drowned crossing a river that on average was just three feet deep.
Due to the heterogeneity of cell types and differences between cells in a healthy and disease state, we need to take this heterogeneity into account. Cytomes can be defined as cellular systems and the subsystems and functional components of the body. Cytomics is the study of the heterogeneity of cytomes or more precisely the study of molecular single cell phenotypes resulting from genotype and exposure in combination with exhaustive bioinformatics knowledge extraction (Davies E, 2001; Ecker RC, 2004b; Valet G, 2003; Valet G, 2004).
In order to get the broader view on pathological processes, we should move on to the phenotypical and functional study of the cellular level or the cytome in order to understand what is really going on in important disease processes. Although the genome and proteome level have their predictive value in order to understand the processes involved in disease (and health), the cytome level allows for an understanding of pathological phenotypes at a higher level. By integrating the knowledge from the genome and proteome, we could give guidance to the exploration of the cytome, which was not possible before this knowledge was available.
The cytome level will also provide guidance to focus the research at the genome and proteome level and so creating a better cross-level understanding of what is going on in cells (Gong JP, 2003; Valet G, 2004; Valet G, 2004b). Some would see this as taking a step back from the current structural and systematic descriptive approach, but it is mainly a matter of integrating research at another level of biological integration and looking in a different way to the web of interactions going on at the cellular level. Biological processes do not exist in a void, but they are a part of a web of interactions in space and time, rather than being an island on their own. A cell is a multidimensional physical structure (3D and time) with a finite size, not a dimensionless quantity. We cannot ignore the spatial and temporal distribution of events, without losing too much information.
In recent years the tools have matured to start studying the cellular level of biological integration, but the tools are still used in the same way as if they were derived from low-content high-throughput phenomena as this is still the dominant research model. The tools to generate and explore a high-dimensional feature space are still scattered and not brought into line with the exploration of the cytome.
Functional processing in cellular pathways
The interconnection of genome, proteome and cytome data will be necessary in order to allow for an in-depth understanding of the processes and pathways interacting at the cellular level. A monocausal approach will have to be replaced with a poly- and pluricausal approach in order to understand and explain the phenomena going on at the cellular level. Pluricausal means causal contributions at different levels, such as genes, other cells and environmental influences. Polycausal means multiple causal contributions at the same biological level, such as polygenic diseases or multiple agonistic and antagonistic environmental influences. The concept of a multithreaded, multidimensional, weighed causality is needed in order to study the web of interactions at the cellular level. A drug modulates cellular function, but changes can be studied at different levels of biological integration:
Disease outcome = drug x (a x clinicaln + b x physiologicalp + c x cellularq + d x geneticr )
Diagnosis and drug discovery merge if we take parallel models for both. The clinical diagnosis or para-clinical diagnosis of a disease should show a high correlation with the disease models used to study its possible treatment. A cause (e.g. a single gene defect, a bacteria) can have multiple consequences and as such be poly-consequential, which is the mirror situation of a single consequence being caused by multiple causes (co-causality or co-modulation) acting either synergistic or antagonistic (e.g. a disease with both a genetic an environmental component). In reality, a pathological condition is a mixture of those extremes (e.g. a bacterial or viral infection and the host’s immune system) and as such a simple approach is not likely to succeed in unraveling the mechanism of a disease. With the current systematic and descriptive approach however, we get lost in the maze of molecular interactions. We are looking at too low a level of biological integration and we get lost in a maze of structures and interactions. The cell is the lowest acceptable target, not its single components, like DNA or proteins.
We are looking at the alphabet, not even words or sentences, nature is not a dictionary, but it is a novel. We should study the flow of events in a cell with more power, not only the building blocks. As an example, Mendel did not need to know about DNA in order to formulate his laws of inheritance and he did not know that the discovery of the physical carrier of inheritance, DNA, would confirm his views later on, but his laws are still valid as such. Certainly physics was not at the stage it was in the 20th century when Newton formulated the law of gravity, but his observations and conclusions were valid. When Einstein formulated his relativity theory, he did not have modern physics at his disposal. His theory does not fit well to the quantum level, but does explain phenomena at a higher level of functional integration and as such is an appropriate model.
The value of a scientific model does not lie in the scale of phenomena it describes, but in its predictive correlation to the reality it tries to capture. The more we may try to exclude elements from reality, the better we may be able to build a model which holds in a tightly controlled situation in our laboratory, but fails when challenged by full-blown reality in the outside world.
What we find should not be in contradiction to what lower level structural descriptive research discovers, but we should not wait for its completion to start working on the problems we are facing in medicine and health care today.
Epicellular disease models
Organoids, parts of organs, isolated organs and animals are being used as epicellular disease models. Animal models are an important part of the drug discovery process. The correlation of the animal model to the actual disease process in man is an important issue to consider (Huskey SE, 2003). In the past twenty years a lot has changed in the use of animal models to study human disease and develop new drugs.
We can now use genetically modified animals to study gene regulation and cell differentiation in a mammalian system (Gordon JW, 1980; Isola LM, 1991; Brusa R., 1999). Transgenic and gene-deleted (knockout) mice are used extensively in drug discovery (Rudmann DG, 1999)
In genetically modified mice however, the observed phenotype is not always the direct result of the genetic alteration (Linder CC., 2001; Schulhof J, 2001). The effect of the genetic modification is not completely straightforward, due to variations in the genetic background of the animals (Crusio WE., 2004). Transgenic mice containing the same genetic manipulation exhibit profoundly different phenotypes due to diverse genetic backgrounds (Sigmund CD., 2000; Sanford LP, 2001; Holmes A, 2003; Thyagarajan T, 2003; Bothe GW, 2004).
Using inbred mouse strains reduces variation in genetic background, but also reduces the correlation of the disease model to real-world genetic and metabolic variation encountered in human populations. Finding a strong correlation in an inbred laboratory animal population is no guarantee that the correlation will hold in an out bred natural population (Mackay TF., 1996; Macdonald SJ, 2004). Do we want nice results with a low standard deviation (SD), or do we need results highly correlating with clinical reality? If one wishes to obtain the optimal mouse model for a human disease, one needs to choose the correct genetic background as well as the correct mutation (Erickson RP., 1996).
We still do not have an in-depth understanding of the delicate spatial and temporal interplay in metabolic pathways in cells, organs or entire organisms in transgenic animals. Introducing or removing a gene without a clear understanding of its spatial and temporal expression pattern, leaves us with a correlation deficit in relation to the disease process in man.
When we modify a gene, we modify a pathway-web with upstream and downstream consequences for cellular metabolism in different cellular compartments (nucleus, Golgi …). The metabolites which (dis-) appear due to the modification will modify a highly dynamic network of metabolic interactions. In-vivo spatial and temporal variation in protein structure and activity profiles will add to the complexity of unravelling the functional impact of modified gene expression.
How to explore and find new directions for research
We may now be capable to study a low-level layer of biological integration in great detail, such as the genome or proteome, but it is in the higher-order spatial and temporal patterns of cellular (and beyond) dynamics where the answers to our questions can be found. However, these higher-order levels of biological integration are still being studied is a dispersed way, due to the formidable technological and scientific challenges we are facing.
A 4-D physical space is still a formidable challenge to deal with compared to the 1-D problem of a DNA-sequence. The even higher-order feature hyperspace which is derived from this 4-D space is even further away from what we can easily comprehend. We focus the major efforts of our applied research on the level of technology we can achieve, not on the level of spatial and temporal understanding which is required. Applied research is suffering from a scale and dimensionality deficit in relation to the physical reality it should deal with. Reality does not simplify itself to adapt to the technology we use to explore biology just to please us.
At the moment we expect that an oligo- or even mono-parametric low-dimensional analysis will allow us to draw conclusions with sufficient predictive power to work all the way up to the disease processes in an entire organism. We are using disease models with a predictive deficit, which allow us to gather data at great speed and quantity, but in the end the translation of the results into efficient treatment of diseases fails in the majority of cases (up to 90 percent). The cost of this inefficient process is becoming a burden, which both society and the pharmaceutical industry will not be able to support indefinitely. As "the proof is in the pudding", not in its ingredients, we have to improve the productivity of biomedical and pharmaceutical research and broaden our functional understanding of disease processes in order to prepare ourselves for the challenges facing medicine and society.
If there were no consequences on the speed of exploration in relation to the challenges medicine is facing today, the situation would of course be entirely different. In many cases, the formulation of an appropriate hypothesis is very difficult and the resulting cycle of formulating a hypothesis and verifying it is a slow and tedious process. In order to speed up the exploration of the cytome, a more open and less deterministic approach will be needed (Kell DB, 2004).
Analytical tools need to be developed which can find the needle in the haystack, without a priori knowledge or in other words we should be able to find the black cat in a dark room, without knowing or assuming that there is a black cat. An open and multi-parametric exploration of the cytome should complement the more traditional hypothesis driven scientific approach, so we can combine speed with in-depth exploration in a two-leveled approach to cytomics. The multi-parametric reality which we need to deal with requires a more multi-factorial exploration than the way we explore the cellular level at this moment.
We now close our eyes to much of the complexity we observe; because our disease models are not up to the challenge we are facing today. Feeling happy with answers to questions in low-complexity disease models will not help us at the end of the drug discovery pipeline. We reduce the complexity of our datasets beyond the limits of predictive power and meaningfulness. We must reduce the complexity of possible conclusions (improvement or deterioration), but not the quality of data representation or data extraction into our mathematical models. The value of a disease model does not lie in the technological complexity of the machinery we use to study it, but in its realistic representation of the disease process we want to mimic.
A disease model which fails to generate data and conclusions which hold into drug development, years later, fails to fulfill its mission. Disease-models are not meant tot predict future behavior of the model, but to predict the outcome of a disease and a treatment. The residual gap between the model and the disease is in many cases too big to allow for valid conclusions out of experiments with current low-level disease models. Due to deficient early-stage disease models, the attrition rate in pharmaceutical research is still very high (80 percent or 4 out of 5 drugs in clinical research).
It is often said that the easy targets to treat are found already, but in relation to the status of scientific knowledge and understanding, “targets” were never easy to find. Disease models were just inadequate to lead to an in-depth understanding of the actual dynamics of the disease process. Just remember the concept of “miasma” before the work of Louis Pasteur and Robert Koch on infectious diseases. Only when looking back with present day knowledge we declare historical research as “easy”, but we tend to forget that those scientists were fighting an uphill battle in their days.
Instead of focusing on ever further simplifying our low-dimensional and oligo-param |
|
|