knowledge-database (beta)

Current group: bionet.cellbiol

Human Cytome Project - Update 20 Dec. 2003

Human Cytome Project - Update 20 Dec. 2003  
Peter Van Osta
From:Peter Van Osta
Subject:Human Cytome Project - Update 20 Dec. 2003
Date:Mon, 20 Dec 2004 08:54:31 +0100
A Human Cytome Project - an idea

Hi,

As my on-line version of my article on the Human Cytome Project and the
application of cytomics in medicien and drug discovery (pharmaceutical
research) evolves, I put the updated version in this newsgroup for
reference. The original "question" on a Human Cytome Project was posted in
this newsgroup on Monday 1 December 2003.

Original version:

http://ourworld.compuserve.com/homepages/pvosta/humcyt.htm

Introduction

The completion of the Human Genome Project holds many promises for the
understanding of the genetics of man and the involvement of genes in human
diseases. However the use of this information has to be viewed from
another perspective as is currently being done, if we want to use this
knowledge to improve medicine more efficiently. Predicting the dynamics of
the cell and its fate in diseases from the genome upwards is likely to
fail due to the complexity of metabolic processing and environmental
influences on the cellular metabolism and the entire organism.

The clinical reality of disease processes extends beyond the present-day
disease models and the (current) boundaries of scientific development.
When we close the doors of our labs behind us and as physicians are
confronted with the clinical reality of diseases in the outside world, our
disease models fail all too often, as we can witness in the diagnosis and
treatment of complex diseases. This is also painfully obvious in the
dramatically high attrition rates during clinical development of new
drugs.

When the endpoint of research is not only an experiment in a laboratory or
a publication, but to have an impact on the clinical reality of everyday
pathological processes, we fail to deliver in more than 80 to 90 percent
of all drugs being developed. Reality extends beyond the frontiers of
science. Outside the boundaries of scientific knowledge, significant parts
of (biological/clinical) reality remain un-explained for and not well
understood.

Drug discovery has to come up with drugs which can stand the test of
clinical reality, but is being squeezed between the failing (theoretical)
disease models and the demands for success of pharmaceutical companies and
society. The functional correlation between genome structure and
clinically expressed disease is too low to lead to functional predictions
from the genome and even proteome level upwards, without taking into
account the spatial and temporal dynamics of cells, organs and organisms.
Pathological processes have to be viewed from another organizational level
of biology in order to capture the dynamics of in-vivo processes involved
in diseases.

The current bottom-up view on genomic and proteomic research suffers from
a correlation and prediction deficit in relation to the entire organism.
The genome and proteome are the omega of biological research, not the
alpha of drug discovery or disease treatment. From disease to gene we may
find a link, but turning around and go back to develop a treatment for the
clinical disease fails in many cases. To discover the involvement of a
gene or protein in a disease, does not predict the potential for
successful development of a treatment for the clinical disease entity as
such.

On Monday 1 December 2003 I posted a message about the idea of a Human
Cytome Project (HCP) to the bionet.cellbiol newsgroup (Van Osta P, 2003).
It seems that it was the right moment to ask the question, as there were
already ideas emerging on the role of the cell as the final arbiter in the
production of metabolic products and also the concept of predictive
medicine by cytomics (Valet G, 2003).

The idea of a Human Cytome Project is already being discussed at
scientific conferences (FOM 2004, ISLH 2004, ISAC XXII, EWGCCA 2004
…). At Focus on Microscopy (FOM) in Philadelphia on Wednesday
afternoon, 7 April 2004, the idea of a Human Cytome Project was for the
first time discussed at a scientific meeting. A round table discussion was
held at the European Microscopy Congress (EMC) and already articles start
to appear on the idea (Valet G, 2004; Valet G, 2004b; Valet G, 2004c). As
the idea of a Human Cytome Project seems to have generated some interest
in the scientific community, I decided to put the original message and
question on my personal website for reference, so here it is. Monday, 1
December 2003 10:57:46 +0100 Hi,

I was wondering if there is already something going on to set up a sort of
"Human Cytome Project”? In my opinion the hardware and most of the
software seems to be available to set up such a project? For the cellular
level, light-microscopy based reader technology would be very interesting
to use?

Studying and mapping the genome, transcriptome and proteome at the
organizational level of the cell for various cell types and organ models
could provide us with a lot of information of what actually goes on in
organisms in the spatio-spectro-temporal space?

I have been thinking (working) about a concept which could provide the
basic framework for exploring and managing this cellular level of
biological organization research on a large scale, but I would like to
know if there is already some thought/work going on in the direction of
setting up an initiative such as a "Human Cytome Project" ?

This is just an idea, so I am really interested to hear if there is
something in it, or even if it is not worth while what I just wrote.

Best regards,

Peter Van Osta.

The path which lead to the idea of a Human Cytome Project

I will give a bit more background to the path which for me has lead to the
idea that something of a Human Cytome Project might be feasible. The idea
for large scale screening of the dynamics of the (living) cell came when I
visited the Sanger Center in the UK in 2001 and was shown a big room
filled with DNA-sequencers. From then on I wanted to create a system which
could mean for cell-based research what DNA-sequencing had meant for Human
Genome research.


However I did not want to create a catalog of the cytome, but to allow for
the functional exploration of the cell in order to capture and describe
the dynamics of cellular processes and not only create a catalog of its
components. The multidimensional world of the cell requires a
higher-dimensional approach than the linear world of DNA and also a
different inner- and outer resolution is needed for each level of
biological integration. It became clear to me that the cellular level is
the lowest level of biological organization close enough to the complex
dynamics of a disease process. Only a high correlation to the disease
process itself allows a model to be used as a valid disease model.

Today powerful techniques to explore the cytome are available, such as
flow cytometry (Edwards B.S., 2004) and advanced digital microscopy (Price
J. H., 2003; Tsien R, 2003), which enables the exploration of the cellular
function and phenotype. There are now exciting technological developments
going on in what is called High Content Screening which will allow us to
explore cellular systems on a large scale (Taylor DL, 2001; Giuliano KA,
2003). These developments and other technological advances made me feel
confident that the exploration of the human cytome would be feasible. We
should be able to open the door to the cell wide open to look at cellular
structure and dynamics better than we do now by just looking through the
keyhole. My personal interest and research

I myself wanted to know if a system to explore cells on a very large scale
could be implemented and would work. Managing the flow of data from
physics to features is the centerpiece of such as system.

As technologies evolve, it should be easy to exchange components of a
system or expand it with new technologies. The system should therefore be
modular and scalable, the core of the system should be of a different
design than the interface to the outside world and they should evolve
separately, only linked to each other for the exchange of information. The
concept should allow for up-scaling the system for processing massive
amounts of high-dimensional data.

The core has to be able to deal with multidimensional spaces and datasets
and manage the dataflow between modules, each module dealing with a part
of the entire process, from acquisition and detection to data generation.
From center to periphery, the system becomes increasingly machine and
technology related, while the core is only a data-transfer module unaware
of technical or physical constraints. Each machine which becomes connected
to the core enables to explore a subset of a physical (3D, spectral,
temporal) space and informs the core about its capacities and
restrictions.

A device attached to the system as such should allow for the exploration
of a part of this spatio-spectro-temporal continuum. Devices differ in
their sampling of the electromagnetic spectrum (LM, EM, CT, NMR …),
the spatial scale at which they can operate (nm, microns, mm …) and
their temporal resolution (nsec, msec, sec, min …). A given device
has an inner and outer spatial, spectral and temporal resolution limit.
All (imaging) devices generate pixel or voxel density profiles which can
be used for (semi-) quantitative exploration. A given input data point
represents a spatial, spectral and temporal sampling of the
spatio-spectro-temporal continuum.

The physical dimensions of the high-dimensional space and the meaning of
each pixel/voxel are only relevant for the quantification module as the
detection module only deals with “density” patterns in a 5D
space. Anisotropy in spatial, temporal and spectral sampling are only
accounted for at the periphery of the system, as they have an impact on
the quantification of objects. Each dimension (XYZ, spectral, temporal) is
regarded as a continuum, sampled at discrete intervals, each with its own
inner and outer resolution.

The system design allows for distributed operation, so a system could run
on different platforms and interact with components over a network. It
should use open standards for its communication with the outside world to
allow for easy integration in a heterogeneous environment (XML, CORBA
…). The output of the system should be a set of linked feature
hyperspaces, each describing structural and functional aspects of the
individual cell and its components. The data output must be in a format
which can easily be parsed and fed into data analysis and visualization
systems.

Since 2001 I have been thinking about, and working on, the design of such
a scalable system, of which the first version of the M5 framework is now
operational and it allows me to study its practical use in more detail
(Van Osta P., 2004). This core is being built into a system for the
exploration of cells, tissues and model organisms by using a microscopy
based reader. The roots and predecessor of my own work

The predecessor of this system and a source of inspiration dates back to
the late eighties and early nineties of the twentieth century (Geerts H,
1987; Ver Donck L, 1992; Cornelissen F, 1993; Geerts, H, 1992; Geusebroek
J.M., 2000; Van Osta P, 2002).

This use of digital microscopy in drug discovery originated from Nanovid
microscopy long ago (De Mey J., 1981; De Brabander M., 1986; De Brabander
M, 1986b; Geuens G, 1986; Geerts H, 1987; De Brabander M, 1989; Geerts H.,
1991). Nanovid microscopy itself had its origin in the study of
microtubules (De Mey J., 1976; De Brabander M., 1977). Automated Calcium
(Ca2+) ratio imaging was used for studying the effect of drugs on
cardiomyocytes (Geerts H, 1989; Olbrich HG, 1991; Ver Donck L, 1992;
Cornelissen F, 1993; Cornelussen RN, 1996).

Drug discovery research by using cellular disease models with automated
microscopy based systems was done in this environment for many years,
before it became fashionable in the outside world (Geerts H, 1989; Ver
Donck L, 1992; Cornelissen F, 1993; Nuydens R, 1993; Nuydens R, 1995;
Nuydens R, 1995b; Geerts H, 1996; Nuydens R, 1998). Why a Human Cytome
Project?
Human Genome Project

The Human Genome Project (Lander ES, 2003; Venter JC, 2003) has set a new
milestone in medicine and the understanding of human biology (Guttmacher,
A., 2002; Guttmacher, A., 2003). Since its conception in 1986, it has
answered many questions, but it has also left us with more questions to
answer and it opened new horizons for exploration (Dulbecco R., 1986;
Collins F., 2003). The results of the Human Genome Project lead to a first
estimate that there are only about 34,000 genes in the human genome and by
the end of 2003 the number was reduced to some 25,000 genes (Claverie
J.-M., 2001; Wright F. A., 2001; Pennisi E., 2003). Now at the end of 2004
the euchromatic sequence of the human genome is complete, the number of
genes is estimated to be about 20,000 to 25,000 (Collins FS, 2004).

The Caenorhabditis (C. elegans) genome is comprised of over 18,000 genes.
The fruit fly (D. melanogaster) genome consists of about 13,000 genes and
as such it has fewer genes than C. elegans, although as an organism it is
far more complex. Gene number alone does not predict functional
complexity. Although there is much more variation in the sizes of the
genomes, this is not reflected in the number of genes.

The functional uncoupling of the dynamics of cellular function to its
genomic gene-count came as a shock. The complexity and diversity of
organisms is not reflected in the structural complexity of their genomes
alone, but to a large extent it is hidden in the dynamics of gene
expression and cellular processing. As there is no linear relation between
the complexity of an organism and the physical structure of its genome,
there is also no one-on-one relation between the phenotype of an organism
and its genome. Relatively small differences between organisms, such as
man and chimpanzee do result in large functional differences in gene
processing and functional expression.

The structural relatedness of the human and chimpanzee genome, does not
explain the large difference in brain function for which gene expression
profiles in the brain are a better predictive instrument (Caceres M, 2003;
Uddin M, 2004). Functional differences between chimpanzee and man are more
outspoken in the brain than in other organs. Gene expression differences
are more related to cerebral physiology and function in humans than gene
sequences. Epigenetic phenomena within individual cells and differential
processing in different cell types have more predictive power than the
piecemeal and one-dimensional gene sequence approach, when applied on
complex structures such as the brain (Wilson KE, 2004). From single gene
and genome to the entire cell

Now we are starting to use the information coming out of the Human Genome
Project, people start to understand that the dynamics of the cell and its
fate in disease processes cannot simply be explained from its individual
genes, genome or its proteome. Although all cells in the human body share
the same genome, there is considerable heterogeneity in their phenotype
and dynamics. Structural information alone or information from too low an
organizational level cannot sufficiently predict higher-order phenomena as
it does not sufficiently take into account interactions at higher
organizational levels and influences from outside the low-level
organizational unit.

So if the structure of the genome alone cannot explain the differences
between species, disease processes and the dynamics of the cell, where
does our functional complexity and interspecies differences come from? How
do we continue in the post-genome era to study the dynamics of the cell
and entire organisms? How are genes related to the function of an organism
and where do we loose track? These questions are not of academic
importance alone, but their answers have a significant impact on the
diagnosis and treatment of (complex) diseases, drug discovery and
development.

Let us take a walk from gene to protein and take a closer look at
“The Central Dogma of Molecular Biology”, which I personally
prefer to call an axiom instead of a dogma. Science should only have
axioms and leave dogmas to religion. From genome sequence to gene activity

The genome sequence alone does not allow us to predict the functional
impact of sequence variations as epigenetic modulation influences
functional gene expression. Epigenetic modulation of gene function is a
cause of non-Mendelain inheritance patterns and variability in the
expression and penetrance of a disease. Even transmission of an identical
gene sequence is not a guarantee for identical gene expression as the
(in)-activation of a gene by epigenetic modulation occurs differently when
a gene is of paternal or maternal origin.

Epigenetic modulation of gene expression is heritable during cell division
but is not contained within the DNA sequence itself (Reik W, 2001;
Bjornsson HT, 2004; Kelly TL, 2004; Chong S, 2004). Epigenetic modulation
is one of the problems encountered when cloning, as the cloning process
differs in its epigenetic regulation of (embryonic) gene expression (Mann
M, 2002).

This differential inactivation of genes from maternal and paternal origin
even leads to functional X-chromosome mosaicism in women as their cells at
random inactivate one of their X chromosomes. X-inactivation occurs early
in embryonic development and all cells subsequent inherit a different
functional X chromosome. The inactivated X chromosome can be seen in a
microscope as a Barr body in the interphase nuclei of female mammals.
Differential activation of genes creates a functional chimera.

Chemical modification by methylation of cytosine residues is a major
regulator of mammalian genome function and plays an important role in the
intra-uterine development of an organism and the regulation of gene
expression (Urnov FD, 2001). Tissue specific imprinting in genes leads to
differential gene expression in different tissues (Weinstein LS, 2001).
Aberrant DNA methylation has been implicated in the pathogenesis of a
number of diseases associated with aging, including cancer and
cardiovascular and neurological diseases (Walter J, 2003; Jiang YH, 2004;
Macaluso M, 2004). A dietary component such as folic acid is a key
component of DNA methylation during in utero development, disease
development and aging (McKay JA, 2004). Genes and environment interact and
this might play a critical role in the pathogenesis and inheritance of
complex diseases (Vercelli D, 2004).

The correlation of even a gene sequence to the first steps in its
expression does not show a one-on one relation to the gene sequence
itself. The relation of gene structure to its function is a bidirectional
process of which our understanding of the impact of different modulators
is still not sufficient to create highly correlating disease models. From
gene to protein, a bumpy road

A eukaryote, such as Homo sapiens, has no one-on-one relation to its
genes. The dynamics of gene expression is regulated by hypo-, iso- and
epigenetic operators. The gene may be the structural unit of inheritance,
but the protein domain is the functional unit of metabolism.

When we talk about protein structure, the primary structure refers to the
amino acid sequence in a protein (1D). The primary structure is most
closely related to mRNA and as such the gene sequence and gene structure
from which the protein originates. The terms secondary and tertiary
structure refer to the 3D conformation of a protein chain. Secondary
structure refers to the interactions of the backbone chain (alpha helical,
beta sheet, etc.). Tertiary structure refers to interactions of the side
chains. Quaternary structure refers to the interaction between separate
chains in a multi-chain protein (4D). The combined shape of the secondary
and tertiary structure and the quaternary structure is referred to as the
conformation of the protein. With increasing dimensionality, the relation
between a higher order organization of protein structure and its gene
relaxes as other physical and chemical influences play an increasingly
important role in its physical and functional integrity.

In a mature enzyme, only a relatively small number of its amino-acids
interact with a ligand, the majority of amino-acids help to create the
appropriate 3D and even 4D structures required for its in-vivo
functionality. Structural proteins and enzymes may show interactions over
larger parts of their molecular surface to form functional homo- or
hetero-polymers in their quaternary structure. From a single gene to a
protein, we have to deal with the dynamics of gene expression regulation
and mRNA formation (promoters, cis- and trans-regulation, transcription,
splicing). We have to deal with the interaction of tRNA with mRNA in the
translation of an mRNA sequence into a protein sequence and
post-processing of the protein sequence into a functional 3D and 4D
structure (Wobble, sequence processing, protein folding and interaction).

A structural similarity at the genome level does not lead to functional
similarity, due to epigenetic regulation (Eckhardt F., 2004). Sequence
variation, due to mutations does not bleed through to the protein level
one-on one. Basic mechanisms act as powerful uncouplers of gene structure
from protein function. Mutations in the DNA and errors during
transcription of the DNA-sequence into mRNA are not linear predictive for
the structure and function of the protein resulting from the translation
of the DNA-sequence into the protein-sequence, due to the degeneration of
the genetic code. The deleterious effects of sequence variations are up to
a certain extent suppressed by the Wobble-mechanism used in base-pairing
in translating mRNA to protein (Crick F, 1966).

Protein sequence = k x gene sequence

In this formula, ‘k’ is always smaller than one for most amino
acids built into a protein, due to mechanisms such as splicing variation,
Wobble mechanism.

In eukaryotes, a relatively simple genome compared to their functional and
structural complexity can be used, because of the existence of introns and
exons. An exon in general defines a functional domain and these domains
are rearranged to create a more complex proteome than the genome it is
derived from. Constitutive and alternative splicing of genes is
dynamically regulated at the moment of transcription and pre-mRNA splicing
by cis- and trans-acting factors (Kornblihtt AR, 2004). Before the
completion of the Human Genome Project was finished it was expected that
man would need about 100,000 genes to explain the structural and
functional complexity of our species. This number has collapsed to about
25,000 genes and is about four times (75 percent) lower than expected
(Collins FS, 2004). The functional differences between species are more
related to differential processing, due to different up- and down
regulation of genes in different cell types and organs. The use of
different promoters and splicing variants is used to tune protein and
enzyme structure and function in different cell locations and organs
(Ayoubi TA, 1996, Masure S, 1999; Nogues G, 2003, Yeo G, 2004). Promoter
variation and differential splicing allows for spatiotemporal
differentiation in protein expression, while the organism does not have to
manage an explosion in genomic size and sequence-complexity. This
mechanism helps to uncouple the protein from the rigidity of the gene
sequence in order to allow for functional variation while restricting
structural variation at the genome level. Functional differentiation in
gene expression allows for a better adaptability to changing conditions,
without the need for fast-paced changes in gene structure.

Protein folding of a linear amino-acid sequence into a 3D protein also
acts as a functional uncoupler of gene sequence to protein function. While
the protein-sequence at the moment of translation is related to the
gene-sequence, the final structure and function of an enzyme is in
addition defined by post-translational modification (PTM) and its
physico-chemical environment (Kukuruzinska MA, 1998; Uversky VN, 2003;
Schramm A, 2003; Seddon AM, 2004). Changes in the physical and chemical
environment of the protein may change the shape and alter the conformation
of a protein. By putting a protein in a different physical and chemical
environment which will change the ability of the van der Waals, hydrogen,
ionic and covalent bonds which hold the protein together in its particular
conformation, it is possible to cause the molecule to unfold by breaking
those bonds and make it change or even lose its function (denaturation).
Epicellular regulation of protein glycosylation also plays an important
role in the dynamics of protein activity (Medvedova L, 2004).

The majority of proteins are subjected to a multitude of
post-translational modifications. Post-translational modification involves
cleaving, attaching chemical groups (prosthetic groups), internal
cross-linking (disulfide bonds). Already more than hundred different types
of PTM are known, which act as functional uncouplers of protein structure
from the gene sequence (Hoogland C, 2004). Even today the protein folding
problem remains one of the most basic unsolved problems in computational
biology. Predicting protein folding from the gene upwards ignores the
influence of the post-translational modification (PTM) and the influence
of the in-vivo physico-chemical environment of the protein. Proteoglycans
and glycoproteins are not derived from a gene sequence as such, but their
structure is the result of extensive post-translational modification. Cell
membranes contain phospholipids, which are not encoded by DNA as such, but
they result from metabolic processing and nutritional components.

In a functional protein only a very few specific residues are actually
responsible for enzyme activity, while the fold is much more closely
related to ligand type (Martin AC, 1998). The effect of an amino-acid
change on protein structure and function depends on the location of the
amino-acid in the 3D structure, its physico-chemical properties and the
physico-chemical environment it is being processed and used. Amino-acids
which are distant neighbours in the protein sequence can become close
neighbours in the 3D structure of the protein and as such a protein
sequence variation is only a weak determinant of the protein function.

By just going from DNA-sequence to 3D protein structure, the relation
between genome sequence and the functional status of a cell begins to
fade. By taking this relation even further from gene to organism, we lose
additional predictive power. How will be able to design models that will
allow us to predict the functional outcome of a disease, when we use a
fuzzy model to start with? Powerful uncouplers of the structural relation
of even a protein to the gene it is primarily derived from, do not allow
us to draw hard conclusions about impact on the functional status of an
organism from the gene and genome sequence. From proteome to cell

Eukaryotic cells are highly compartmentalized; proteins do not exist in
the cell as in a homogeneous fluid, but in different compartments of the
cell, each with a different physico-chemical environment. The 3D and 4D
structure of a protein and its functionality is highly dependent from the
in-vivo physico-chemical environment of the protein.

Studying proteins without taking into account their spatial and temporal
organization in a cell, ignores the complexity and dynamics of protein
expression and interaction in a cell. Studying proteins in-vivo reveals
more about their function and dynamics (Hesse J, 2002; Pimpl P, 2002;
Viallet PM, 2003; Murphy R. F., 2004). Without information about the
relation between cellular structure and function, a lot of information is
lost. A 2D protein-profile may show the entire protein content of a cell,
but we lose all information about the intracellular spatial and temporal
distribution of these proteins.

Eukaryotic cells are highly spatially differentiated structures. Proteins
involved in trans-membrane trafficking, require a membrane to do their
work and cannot do their work outside this specific physico-chemical
environment. A protein has to reach the appropriate physico-chemical
environment in the cell in order to do its work properly. Studying a
protein outside its in-vivo physico-chemical context leads to a loss of
correlation with its in-vivo dynamics.

There are three main cellular compartments in a eukaryotic cell, the
nucleus, cytoplasm and the cell membrane. The nucleus itself is a highly
organized 3D structure with highly spatial and temporal differentiated
DNA- and RNA-processing machinery (Lamond AI, 2003; Politz, J., 2003;
Pombo, A., 2003; Iborra F, 2003; Cremer T, 2004). Both transcription and
splicing of the mRNA message are carried out in the nucleus (Sleeman JE.,
2004). The distribution of eu- and heterochromatin changes throughout the
cell cycle, chromosomes and spindles appear during cell division. The
dynamics of gene transcription is visible in the chromatin condensation
patterns in the nucleus.

The cytoplasm itself contains several organelles, smooth and rough
endoplasmatic reticulum (SER and RER), ribosomes, the Golgi apparatus,
mitochondria, lysozomes and the cell membrane. Each organelle deals with a
different set of processes necessary for cell development and maintenance.
The membranes of organelles are highly dynamic structures which undergo
profound changes during the life cycle of a cell (Ellenberg, J. 1997;
Zaal, K. J. M., 1999).

The structural compartmentalisation of the intracellular environment
allows for a functional differentiation and provides a process flow
management mechanism. The membrane structure and the mitochondrial
membrane potentials (MMP) of mitochondria play an important role in their
function. (Zhang H, 2001; Pham NA, 2004). Microtubules play an important
role in cellular function and their organization and dynamics are being
studied by microscopy based techniques (De Mey J., 1981; De Brabander M.,
1986; Geuens G, 1986; De Brabander M, 1989; Geerts H., 1991; Olson KR,
1999)

As an example hydrolytic lysozomal enzymes require a specific physical and
chemical environment to do their work, which inside the cell only exists
inside the lysozomes (De Duve C, 1955). The boundary membrane of the
lysozome keeps the hydrolytic enzymes away from the rest of the cytoplasm
and so controls what will be digested (De Duve C., 1966).

The cell membrane separates the interior of the cell from its environment,
but is a highly dynamic structure (Kenworthy, A. K., 1998; Varma, R.,
1998). The appropriate spatial and temporal dynamics of the cell membrane
are vital for the survival of the cell. The cell membrane provides the
physical boundaries in which the cell can maintain a highly dynamic
physical and chemical environment. Cell-to-cell communication is
dynamically managed at the level of the cell membrane (Nohe A, 2004).

Proteins do their work in spatially different cellular environments and
with different spatial and temporal patterns. A protein can be mobile in
one cellular compartment and immobile in another (Ellenberg J., 1997).
Co-expressed proteins may in reality never interact with each other
because they do their work in separate cellular compartments. The
substrates of proteins may migrate through different cellular compartments
in order to be subjected to a highly dynamic interplay of enzymatic
processes. Proteins which do their work in the same cellular compartment
may only be expressed at different stages during the life cycle of a cell.

An important temporal differentiation of cellular processes occurs during
the cell cycle. The different stages in the cell cycle each depend on the
spatial and temporal expression of multiple proteins. The passage of the
cell through the cell cycle is controlled by proteins in the cytoplasmic
compartment, such as different Cyclins, Cyclin-dependent kinases (Cdks)
and the Anaphase-Promoting Complex (APC). First there is the G1 phase
(growth and preparation of the chromosomes for replication). Secondly the
cell enters the S phase (synthesis of DNA and centrosomes) and finally the
G2 phase which prepares the cell for the actual mitosis (M). The mitosis
itself consist of a spatial and temporal sequence of events, called the
prophase (mitotic spindle), prometaphase (kinetochore), metaphase
(metaphase plate), anaphase (breakdown of cohesisns) and telophase where a
nuclear envelope reforms around each cluster of chromosomes and these
return to their more extended form.

However our understanding of the cell cycle is still far from complete.
The regulation of the cell cycle by G1 cell cycle regulatory genes is more
complex than we thought (Pagano M, 2004).

We need to study and understand the intracellular in-vivo dynamics of
protein metabolism and its spatial and temporal organization in different
cell types. We need to study intracellular protein ecology, not just
ex-vivo protein interactions or building a protein catalogue of only
scalar dimensions. The dynamics of cellular function

Taxonomy is the science of organism classification and refers to either a
hierarchical classification of things, or the principles underlying the
classification. Today the emphasis of biological research is on
classifying genes, proteins in large catalogues, instead of studying the
spatial and temporal dynamics of cellular processes in vivo. The global
analysis of cellular proteins or proteomics is now a key area of research
which is developing in the post-genome era (Chambers G, 2000; Ideker T.,
2001; Aitchison J.D, 2003). Proteins show functional grouping into modules
which can be grouped into elegant schemes (Hartwell, L.H., 1999; Segal,
E., 2003).

In-vivo however the spatial and temporal distribution and interaction of
proteins with other proteins, substrates, etc., adds another layer of
complexity which is not taken into account by functional studies alone.
Expression studies, no matter how we group them, do not reveal the
intracellular spatial and temporal distribution of proteins and the
functional outcome of their metabolic activity (spatial and temporal
substrate trafficking) in various cellular compartments. Studying proteins
only from a functional point of view ignores the impact of their
intracellular spatial and temporal dynamics. Molecular taxonomy or systems
biology (genomics, proteomics) will not provide us with the functional
answers we need to know.

Systems biology studies biological systems systematically and extensively
and in the end tries to formulate mathematical models that describe the
structure of the system (Ideker T., 2001; Rives A.W, 2003). However the
level of biological integration which is being studied, genes, proteins,
pathways is still too far away from pathological reality to allow for the
development of highly predictive and highly correlating disease models.
The end-point of present day systems biology only takes into account
infra-cellular dynamics and loses track when iso- and epi-cellular
phenomena interfere with the dynamics of the model. Studying the physics
and chemistry of protein interactions cannot ignore the spatial and
temporal dynamics of cellular processes.

The cell is at the crossroads of life itself, being the lowest order
functional unit operating in a functional complete way. As such the cell
is for life what the atom is for physics, the smallest biological level of
organization, operating as a functional unit. Dysfunctional cells by
whatever cause, either gene malfunction, infection, nutritional or
environmental problems will eventually cause the entire organism to lose
its functional integrity. The dynamics of cellular systems allow for the
adaptation of the cell to a wide variety of conditions and challenges, a
relatively uniform physical structure combined with a web of interacting
dynamic processes leads to the multitude of cells which we see in living
organisms. In a living organism there is no such thing as an average cell
type from a functional point of view. Cells are functionally highly
diverse in both spatial and temporal dimensions.

The stochastic variation of cellular processing at the molecular level is
another cause of functional uncoupling of the cytome from the genome and
ads to the variability in functional behavior between cells (McAdams H.H.,
1999; Raser J.M., 2004). Structural research alone underestimates the
complexity of dynamic processes as it does not capture sufficiently the
dynamic complexity of the cell. The dynamic interaction of processes in
multiple pathways is the centerpiece of cellular life, not the individual
components or even individual enzymatic reactions in the cell. There is no
monotonic sequence of causation from genome structure to cellular
dynamics.

Cellular function can be compared to a symphony in which multiple
“instruments” contribute to a complex, but in a healthy state
harmonic, “sound”. Genes and the dynamics of disease processes

The challenges faced by the medical world today are no less today than the
ones we faced a century ago. The spectrum of diseases may have changed
through time, as degenerative diseases and cancer play an increasing role
in modern society. On the other side an old enemy is back on the rise, how
much we thought that infectious diseases were a thing of the past; they
are back and with a new and frightening face.

Our increase in the knowledge of the involvement of our genes and large
scale proteomics in disease processes has not lead to an increase in the
productivity of pharmaceutical research (Drews J., 2000; Huber, L.A.,
2003; Lansbury PT Jr., 2004). The gap between the gene and the functional
outcome of a disease is too wide to bridge it from one direction only
(Workman P., 2001). Much thought has gone into finding a way how the
knowledge coming out of genomics and proteomics could revolutionize drug
discovery, such as for drug target discovery (Lindsay MA., 2003). The
target of a drug molecule may be a protein, but the target of disease
therapy is the entire cell and by extension the cell population of an
organism. Every drug and its target may be part of a disease therapy, but
the therapy is not restricted to the drug and its target. Every target is
part of a therapy, but not every therapy is confined to a traditional drug
target.

In the case of diseases where we have already found a genetic basis, this
does not always allow us to create a model for the disease process. To
discover the involvement of a gene in a disease process does not tell us
anything about its place and relative importance in the multiple and
multilevel elements involved in the causation of a disease, such as genes,
nutrition, infectious agents and the environment. To discover a causative
element is not the same as understanding and predicting its dynamic
involvement in a disease process. What we do know is that all causation
has to pass through cells, as they constitute the “quanta” of
the organism itself.

Many diseases of clinical importance have heterogeneous mechanisms which
lead to the disease and only in a subpopulation the diseases can be traced
back to a single gene. In most cases a multiplicity of mechanisms
contributes to the diseases process. Genetic information has a high
predictive value in only a minority of cases.

Non-coding sequences, inter-gene and epigenetic interactions have a
significant impact on the prediction of the age of occurrence, severity,
and long-term prognosis of diseases (El-Osta A., 2004, Perkins DO, 2004).

The importance of the dynamics of the cell and its involvement in
pathological processes and current therapeutic efforts also requires a
better understanding of its function and phenotype in its relation to
pathological processes in diseases, such as in cancer, Alzheimer disease
and infectious diseases, such as AIDS, tuberculosis (TBC), influenza
(flu), etc.

Trying to predict a disease process from the genome (proteome) upwards, is
like trying to solve a higher order polynomial while omitting the majority
of elements and expecting that the equation will work:

e.g.: Disease process = a x x + b

Instead of using a higher order multi-dimensional model, closer to in-vivo
functional dynamics in which a matrix or web of causation and consequences
interacts in a high-dimensional space-time continuum:

e.g.: Disease process = a x un + b x vo + c x wp + d x yq + e x zr

In addition, each parameter which is being used in an equation is in
itself the result of an underlying or “overlying” dynamic
process. Each layer of organization can be fed into higher or lower order
levels of organization as there is always a cross-influence in both
directions. It is a matter of expanding or collapsing the set of
parameters and taking into account or ignoring underlying
“modifying” influences. Reducing the complexity allows for a
better understanding of a simplified model, but has a decreased match to
the complexity and dynamics of biological reality. When we create a model,
we should not regard it as a one-on-one substitute for reality which we
capture only partially into our model. Infectious diseases

In infectious diseases the environment, in this case the infectious
agents, interacts in a complex way with the host defense system of which
much remains to be explored. We must be aware of the fact that the golden
era of antibiotics is already behind us as many infectious agents (e.g.
TBC, MRSA and other bacterial diseases) are showing an increasing
resistance against most classes of antibiotics which are available today
(Davies J, 1994). We have succeeded in less than a century to destroy our
best weapons against infectious diseases, due to misuse of antibiotics
both by physicians and their patients. Only the elderly remember the days
when mortality due to infections was a major cause of premature death, but
the moment is approaching when this nightmare will return. Emerging
infectious diseases (EIDs) and re-emerging infectious diseases challenge
our defenses (Ranga S, 1997).

Viral diseases (e.g. AIDS, influenza) are even harder to fight as they use
the cellular machinery of the body itself to reproduce. We need to study
the pathological process in cells in more detail and in a different way,
in order to have a chance to succeed in the new therapeutic challenges
ahead of us. Viruses, under selective pressure of modern antiviral drugs
are also showing increasing resistance to treatment. We are running out of
time in our battle against infectious diseases and a systematic approach
will only give us the answers when it will be too late. We are not setting
the agenda, but the diseases are taking the lead.

Due to modern technology, the time to respond to a new infectious
challenge is being reduced. In modern times, diseases take planes too,
which makes it even harder to fight them by classical isolation or
quarantine. Airplanes may be safe to travel with, compared to other
transport systems, but they can cause secondary mortality by transporting
pathogens over large distances at a speed unknown to previous generations,
which gives a new meaning to airborne infections (Gerard E, 2002; Van
Herck K, 2004; Blair JE, 2004). Infectious diseases may initially go
unnoticed in underdeveloped areas of the world (e.g. Ebola virus Lassa
fever, Marburg virus), but as soon as they board a plane, it is modern
technology which will give them free access to the world (Clayton AJ,
1979; Gillen PB, 1999). A relatively long incubation time combined with a
high mortality rate will allow a disease to spread widely and cause a
pandemic, before we even can start a treatment program. If an unknown
disease causes such a pandemic, we may run out of time before we can find
a cure as we first have to develop a diagnostic tool. A recent example
which is a model of what can happen was the Severe Acute Respiratory
Syndrome or SARS (Peiris, J.S.M. 2003, Berger A, 2004; Heymann DL, 2004;
Tambyah PA, 2004).

Robert Koch presented his work on Tuberculosis on 24 March 1882 before the
members of the Berlin Physiological Society, which meant a breakthrough in
the understanding of this terrible disease (Winkle S, 1997, pp. 137-141).
Now after more than 100 years of research and drug development, TB is on
the rise again. In the war against infections such as Tuberculosis, there
are no easy wins. We may win a fight but for the majority of pathogens we
can only reach a status quo, but never completely win the war. Variability
by mutating is a powerful weapon against our drug treatments and pathogens
use it to their great advantage.

We must keep our defenses up to date and changing in order to outsmart our
bacterial and viral enemies. New antibiotics are not found within the
human genome. Penicillin was discovered by accident and many important
antibiotics were found at the most unlikely places (Fleming, A, 1929). No
hypothesis or model can be formulated to find the unexpected, but we have
to find new antibiotics as bacteria are closing in on us and some of our
worst enemies are even winning the race.

Scientists are waiting with fear for the next influenza pandemic which
will hit us some day (Gust ID, 2001; Capua I, 2004). Scientists are trying
to understand the lethal potential of the deadliest influenza epidemic of
all times, which occurred after the first World-War. Soon the virus which
caused the influenza pandemic, called the ‘Spanish flu’ will
re-emerge out of the test tubes of the laboratory. Recent outbreaks of
avian flu have given us a preview of what can happen and evidence is
increasing that the possibilities for spreading avian influenza A virus
(H5 or H7 subtype) are worse than previously was assumed (Koopmans M,
2004; Kuiken T, 2004). Almost the entire native population of Tasmania was
killed by the flu, for which they had no immunological defenses.

Most people have no idea of the role smallpox played in the destruction of
an entire civilization after it was brought to America by the
conquistadores. Almost 50 percent of the indigenous population died of
smallpox and the speed at which people died is beyond our current
imagination (McMichael AJ, 2004; Winkle S., 1997, pp. 855-861). A
mortality of 50 percent for a new disease like smallpox, for which we have
no immunity, would kill half of the population. In modern times we not
only have to fear the accidental spreading of infectious diseases, but
bio-terrorism will challenge our defenses sooner or later (Broussard LA,
2001, Gottschalk R, 2004).

Finding the infectious agent for a new and unknown disease requires
something else than sequencing a genome as this approach only works when
we have the time to do the sequencing while the pathogen takes its course.
Analyzing the genome sequence of a new infectious agent can only start
after it has been isolated by more traditional means (Berger A, 2004).
Once we know the new pathogen, we can use its genome sequence to develop
rapid diagnostic tools, based on PCR, but in order to do this we must
first isolate it from the patient. Developing a therapy after this, takes
much longer and the genome sequence itself without additional functional
information is not enough. Only after Koch's postulates had been
fulfilled, the WHO officially declared on 16 April 2003 that a previously
unknown coronavirus was the cause of SARS. Focusing on cellular systems
and using tools for functional research at this level will hopefully allow
us to respond in time when we are faced with an unknown pathogen.

When we do not already have an antibiotic or antiviral drug at hand at the
moment a new disease hits us, we are in serious trouble. In this case the
only thing left is the medieval solution of quarantining the infected
people, which only works if we are able to contain them before they spread
over a country or even the planet (e.g. SARS or HIV).

Although all cells in the human body may share the same genome, there is a
high spatial and temporal differentiation in gene expression in different
cell type and organs. In HIV, it is the CD4 lymphocytes which express the
receptors by which the virus can enter the cell (Fauci AS, 1996). A
hepatocyte may share its entire genome with a CD4 lymphocyte, but it does
not express the proteins encoded by the gene which allows the virus to
enter the cell. The progress of a HIV infection is also a highly dynamic
process of interaction between the host and the virus (Wei, X., 1995). The
observation of differences in disease progress leads to the discovery of a
genetic restriction of HIV-1 infection and progression to AIDS by a
deletion allele of the CCR5 structural gene (Dean M, 1996). Clinical
observations lead to genetic conclusions, but the way back to clinical
treatment of diseases is a long and winding road for which the gene
sequence does not provide us with all the necessary information. There
remains a correlation and predictive deficit in our disease models.
Mendelian diseases

Mendelian inherited and monogenic diseases have always been at the center
of attention in the relation of genetic variation to diseases. Monogenic
diseases served as a model to prove the use of genetic information to the
development of a disease and the outcome of a disease process.
Phenotype-genotype relationships are complex even in the case of many
monogenic diseases. Increasingly complex interactions have now been
demonstrated in a number of monogenic Mendelian diseases (Nabholz CE,
2004). These inter-gene and epigenetic interactions have a significant
impact on the prediction of the age of occurrence, severity, and long-term
prognosis of even ‘genetic’ diseases (Cajiao I, 2004; Hull J,
1998; Frank RE, 2004; Salvatore F, 2002; Sontag MK, 2004; Sangiuolo F,
2004).

The beta-thalassemias show a remarkable phenotypic diversity caused by the
action of many secondary and tertiary modifiers, and a wide range of
environmental factors (Weatherall DJ., 2001). Sickle cell anaemia and
cystic fibrosis can serve as an example that genotype at a single locus
rarely completely predicts phenotype (Summers KM., 1996). Although the
gene defect in Huntington’s disease is known for years, the
contribution of the gene defect to the functional out come of the disease
is not yet known (Georgiou-Karistianis N, 2003). Cell based research will
help to elucidate the disease mechanism in Huntington’s disease
(Arrasate M, 2004).

In cystic fibrosis, the severity of the disease cannot be linked
one-on-one to genetic variation in CFTR (Grody W, 2003). Cystic fibrosis
is the most common autosomal recessive disorder in Caucasians, with a
frequency of approximately 1 in 3000 live births, so finding a cure for
this disease has a high impact on our society. Success stories with rare
diseases may sound impressive from a scientific point of view, but there
is no escape from the economic reality of the size of the patient
population. So let us take a closer look at cystic fibrosis as it is a
disease of which the gene held responsible for the disease was identified
about 14 years ago (Rommens JM, 1989; Collins FS., 1990). The method
(reverse genetics) used to identify the gene, did not require an
understanding of the gene function at that moment or any understanding of
the impact of genetic heterogeneity on the phenotypical expression of the
disease (Iannuzzi MC, 1990; Audrezet MP, 2004). By starting form the gene
for a single genetic disease such as cystic fibrosis, where did we get
after 14 years of hard labour?

A once ‘monogenic’ disease such as cystic fibrosis shows
remarkable phenotypic variation and clinical variation (Decaestecker K,
2004). By now about 1000 gene mutations of the cystic fibrosis
transmembrane conductance regulator gene (CFTR) have been identified,
which leads to a highly variable phenotypic and clinical presentation of
the disease. (McKone EF, 2003). Mutations in the CFTR gene have been
classified into 5 functional categories (Welsh MJ, 1993). A list of 1000
mutations is reduced to 5 functional classes at the protein level, which
leads to a ratio of 0.5 percent for each mutation to lead to a distinct
CFTR chloride channel dysfunction. Due to the functional uncoupling of
gene structure to protein function in cystic fibrosis, genetic sequence
variation has a low impact on functional variation on the protein level
(1000 to 5). More important than gene sequence variation is the spatial
location of a mutation in the 3D structure of a protein. (Rich DP, 1993).
Even more important is the cellular and organ location of a functional
defect as in Cystic Fibrosis mainly the pathological process (Pseudomonas
aeruginosa infection) in the lungs are a major cause of morbidity and
mortality (Elkin S, 2003).

Other genes act as modulators of the disease outcome, even in a disease
such as cystic fibrosis, once regarded as a monogenic disease (Hull J,
1998, Frank RE, 2004; Salvatore F, 2002; Sontag MK, 2004; Sangiuolo F.,
2004). We even need to take into account epigenetic information and
environmental influences on disease outcome, even in a so called monogenic
disease as cystic fibrosis.

Human populations show considerable genetic heterogeneity (allelic
variation) and even geographic variation, which leads to difficulties in
using gene sequence based diagnostic tools (Liu W, 2004; Raskin S, 2003).
So, the sequence of one individual’s genome allows studying one
person’s genetic profile, but does not lead to a population-wide
prediction of genetic profiles. Genetic heterogeneity uncouples clinical
outcome from model gene sequences (Imahara SD, 2004). This problem is not
solved by simply adding more sequence information without a functional
understanding of the meaning of sequence variation on phenotypic
expression and disease outcome in the patient. Structural information
without functional understanding leads to predictive deficits. The
functional understanding of a disease process must be at the level of the
patient and his cells and not at a lower order organizational level, such
as the genome or proteome alone.

Genetic heterogeneity leads to a reduced sensitivity and an increase in
false negative results if a genetic test is not adapted to this genetic
heterogeneity. A mutational test leads to a simpler almost
‘binary’ readout, instead of the more ‘analog’
interpretation of a continuum of values in a functional test, but this
comes at a price. A test which detects a disease marker at a higher
organizational level can detect a disease more easily and will lead to
less false negatives in this case.

The complexity of even monogenic diseases and the web of functional
interactions between at the genome level, protein interactions and
environmental influences on the disease outcome will dilute the predictive
power of structural sequence information and the DNA-level. Using
low-dimensional intracellular data to predict iso- and epicellular
phenomena has a low predictive power to be used in clinical situations as
such.

No pharmaceutical company would like the idea that it requires 14 years of
preclinical research to reach an IND after a new drug target was
identified as in cystic fibrosis. Even if only 1000 genes out of our
25,000 were involved in human diseases and would require the same amount
of work, it would take us the equivalent of 14,000 years of work on the
scale as was needed to achieve the same results as for the cystic fibrosis
gene. But up to this moment no causal (gene) therapy came out of the
identification of the CFTR gene, but an improvement of prenatal
diagnostics (Klink D, 2004).

Pseudomonas aeruginosa lung infection is the major cause of morbidity and
mortality in patients with cystic fibrosis (Elkin S, 2003). Over the past
decades we have seen an improvement of symptomatic therapy, but still no
causal therapy, leaving aside a lung transplant.

How are we going to develop drugs which have a large enough patient
population to pay for the costs of drug discovery and development if we
need to target individual mutant protein molecules? If it can be so
difficult to go from a single gene to develop a therapy based on genetic
information, how do we expect to proceed for the entire genome and
proteome?

Degenerative diseases and cancer

For degenerative diseases, such as Alzheimer disease and cancer, birth
defects, cardiovascular diseases, and nerve degeneration it is the
dynamics of the cellular machinery itself which fails. Sharing one genome
does not lead to sharing the same pathology, as cellular differentiation
leads to a highly diverse spatial and temporal cellular function and
morphology. Differential degeneration of cell types is the consequence of
a highly differentiated expression pattern of proteins in different cells
and different cellular compartments. Using stem cells from an individual
which suffers from a degenerative disease may not be cured when the same
deficient pathway is activated in the differentiating stem cell. The
molecular process may need to be corrected first in this case, for
instance by gene therapy or by using exogenous stem cells.

One of the most promising domains of research today is stem cell research,
which has to deal with the functional and structural characteristics of
cells which are being studied. Gene therapy holds many promises for the
therapy of life threatening diseases, but in order to improve gene therapy
we will need a better understanding on what goes on inside the cell and
what the consequences are on the cellular metabolism when we modify its
function by inserting genes. At this moment monogenic diseases are the
target for gene therapy, but in the future parts of pathways may need
reconstruction. The gene is the means to achieve the ultimate goal to
change the cellular metabolism to cure a disease. At this moment the cell
is the target for many therapeutic efforts to come to a causal therapy of
diseases, which we can now only treat with external substitution, such as
diabetes. These diseases are far more complex and multi-factorial than
monogenic diseases and should be studied from a different perspective to
capture the complexity of the disease process.

In Crohn’s disease the gene defect found does not explain the
severity of the disease (Peltekova VD, 2004). In breast cancer genetic
variants of BRCA1 and BRCA2 do not have a consistent level of penetration
and as such their presence alone does not explain the disease process
(Ford D et al, 1998; Hartge, 2003). Although there is evidence for the
involvement of the gene for PPAR-gamma in type 2 diabetes is, the
mechanism by which it contributes to the disease process of diabetes is
not clear (Barroso I, 1999) and could not be deduced from genetic
information alone.

Multiple genes and (multiple) environmental factors contribute to the
disease process and its clinical outcome (Liebman MN, 2002). In AD
(Alzheimer’s Disease), only a minority of cases can be linked to
hereditary gene mutations. Neurodegeneration in AD may be caused by
deposition of amyloid beta-peptide (Abeta) in plaques in brain tissue, but
no causal treatment has come out of this in 10 years of hard work (Hardy
J, 2002). In APC (Adenomatous Polyposis Coli) and HNPCC (Hereditary
Non-Polyposis Colorectal Cancer) a genetic origin, only accounts for about
5 percent of all cases of colorectal cancer (Kinzler, 1996). Genes which
are involved in diabetes, such as GCK (glukokinase) , HNF1A and HNF4A
(Hepatic Nuclear Factor) are linked to less than 5 percent of cases of
diabetes (Edlund, 1998, Fajans, 2001).

Instead of focusing on individual (molecular) targets in drug therapy,
complex diseases may require pathway-engineering to manage the dynamics of
spatial and temporal patterns of intracellular molecular processes.
Functional or structural protein (re-)modeling in-vivo may be a better
approach for complex diseases than just docking a small molecule to an
active binding site. Drug discovery and development Drug discovery and
development: an inefficient process

At the end of the drug discovery and development pipeline, there are
patients waiting for treatments, company presidents and shareholders
waiting for profit and governments trying to balance their health care
budget. For pharmaceutical and biotech companies, the critical issue is to
select new molecular entities (NME) for clinical development that have a
high success rate of moving through development to drug approval. Finding
new drugs (which can be patented to protect the enormous investments
involved) and at the same time reducing unwanted side effects is vital for
the industry.

The cost to develop a single drug which reaches the market has increased
tremendously in recent years and only 3 out of 10 drugs which reached the
market in the nineties generated enough profit to pay for the investment
(DiMasi, J., 1994; Grabowski H, 2002; DiMasi JA, 2003). This is mainly due
to the low efficiency and high failure rate of the drug discovery and
development process. Pharmaceutical companies are always trying to reduce
this failure rate in order to bring the enormous costs down involved in
drug discovery and development.

Only about 1 out of 5,000 to 10,000 drugs makes it from early pre-clinical
research to the market, which is not an example of a highly efficient
process. The current focus of the pharmaceutical industry on blockbuster
drugs is a consequence of the mismatch between the soaring costs and the
profits required to keep the drug discovery and development process going.
Only diseases with patient populations large enough (and wealthy enough)
to pay back the costs for a full blown drug development are now worth
while working on. If the industry cannot bring the costs down, it may as
well try to raise its income by changing its price policy, but this shifts
the solution for the problem from in- to outside the company and places
the burden on the national health care systems.

Companies which were more successful in the past achieved a higher
efficiency even without the availability of extensive genomic and
proteomic data, new low-level disease models. Introducing new technology
and generating more data alone, are not sufficient to improve the drug
discovery process (Drews J. 1999; Horrobin DF, 2003; Omta S.W.F., 1995).

To be complete, there are alternative views which criticize the
calculation of the cost of drug discovery and development. The consequence
of accepting this alternative view would be that the pharmaceutical
industry would be losing money due to costs outside its core mission,
which is even worse, because research and development can be improved, but
this would not help in this case. The result is in each case, that drugs
are only worth while to develop, if they have an enormous market
potential, otherwise they do not earn back the money invested, when they
finally they reach the market.

The basic numbers for time spent and costs made in drug discovery and
development can be found in several documents published by institutes
which generate reports about the pharmaceutical industry (Boston
Consulting Group, Tufts Center for the Study of Drug Development,
Pharmaceutical Research and Manufacturers of America (PhRMA) , etc.). Let
us now take a closer look at the drug discovery and development process.
It takes an average pharmaceutical company about 10 to 15 years and US$
500 to US$800 million to bring one new drug to the market. Of these 15
years about 6.5 years or 43 percent of the total time is spent in
pre-clinical research and about 7 years or 46 percent of the total time is
time spent in clinical research (1.5 years in phase I, 2 years in phase II
and 3 years in phase III). Phase I of a clinical trial deals with drug
safety and blood levels in healthy volunteers. Phase II deals with basic
efficacy of a new drug. Finally Phase III deals with the efficacy of the
drug in large patient populations. It is easy to understand that the
increase of the population used to study the effect has a dramatic impact
on the complexity and the cost of the clinical trial.

To process a New Drug Application (NDA) takes the U.S. Food and Drug
Administration (FDA) on average 1.5 years based on the results and
documents provided by the pharmaceutical industry. The situation in Europe
for the European Medicines Evaluation Agency (EMEA) is probably of the
same order of magnitude. About 0.1 percent of the original molecules
screened in drug discovery enter phase I (5 out of 5,000 to be optimistic)
and 0.02 percent of the original molecules finally reach the FDA (1 out of
5,000). Of the 5 molecules entering phase I, about 4 out of 5 or 80
percent fail to make it to a NDA. After approval by the FDA, the drug hits
the market and enters phase IV of the clinical study process.

In the 1990’s about 38 percent of the drugs which came out of
discovery research dropped out in phase I. Of those molecules which made
it out of phase I, 60 percent of those failed in phase II clinical
studies. And now we get to the really expensive phase III in which 40
percent of the remaining candidates failed. Of those drugs which made it
out of phase III to the FDA 23 percent of the ones that made it through
the clinical trials failed to be approved by the FDA. All this translates
to about 11 percent overall success rates from starting the clinical
trials (Kola I., 2004).

So the disease models used in drug discovery fail to predict failure in
clinical development in about 80 to 90 percent of the drugs which enter
drug development. And the combined predictive power of all clinical trials
fails to predict failure in 1 out of four or 25 percent of drugs submitted
to the FDA for approval.

What can we learn out this numbers and what is being done in drug
discovery? The role of absorption, distribution, metabolism, excretion and
toxicity (ADMET) is an important part of the drug discovery process as
ADMET is an important cause of failure in drug development (Yan Z, 2001;
Lin J, 2003). The drug discovery process (target identification, target
validation, lead identification/optimization …) and preclinical
development such as ADMET studies, fail to predict the failure of a drug
in clinical development for 4 out of 5 or at least 80 percent of the
molecules which enter phase I. Improving the predictivity of disease
models in drug discovery and ADMET is an important issue to reduce the
failure rate in drug development.

A new drug spends about 90 percent or 13.5 years of his career within the
discovery and development process, before it reaches the FDA for the last
10 percent or 1.5 years. So the FDA does not account for the majority of
the time it takes to bring a new drug to the market, nor does it account
for the majority of failures which is only 20-25 percent or 1 out of 5 or
1 out of 4 drugs which enter phase I or 1 out of 5,000 (0.02 percent) if
we start from the beginning of the process. Although the investments in
the early stages of the drug discovery process have increased
tremendously, this means nothing compared to the cost of failure in phase
III of a clinical trial. A failure in phase IV in general means lawsuits
against the company and a serious blow to its reputation. Improving drug
discovery

In order to improve this process, where should we try to optimize it?
After about 7 years in pre-clinical research, a new drug is ready for
filing an initial new drug application (IND) after which the FDA’s
Center for Drug Evaluation and Research (CDER) oversees the clinical
studies. The clinical trials, from phase I to III are highly regulated and
a company can only optimize the flow of events, but up to a large part it
cannot decide freely what needs to be done in these stages of the process.
Once a drug hits the FDA (CDER), strict rules need to be followed for the
approval and failure to comply will only delay this process. So it is by
improving the quality and shortening the process in drug discovery, a
pharmaceutical company can make a significant difference.

How should we proceed to improve drug discovery? We have seen an enormous
investment in research at the infra-cellular level, such as HTS, genome
based and proteome based disease models in the past ten years and at the
same moment have witnessed a disproportional decline in the productivity
of research and development in drug discovery (Horrobin DF, 2000; Horrobin
DF, 2001; Bleicher KH, 2003). The pharmaceutical industry has yet to find
a way to reduce its high attrition rates (Kola I., 2004). The
consolidation in the pharmaceutical industry will not solve this problem
in the long run, as it only reduces the costs but does not improve
scientific productivity; it only postpones the moment of truth. The
scientists themselves will have to find new ways to improve their
productivity; management cannot do this in their place. Society tries to
protect itself against the adverse effects of new drugs, such as with
Thalidomide in the sixties (McBride WG, 1961) This is done by increasingly
stringent regulations but the currently used methods in the discovery
process for new drugs cannot keep pace with these new requirements.
However, as we can see, increasingly strict regulations do not explain all
the problems pharmaceutical research is facing today. Failing disease
models in drug discovery

In the end it is clinical reality which decides on the fate of new drugs
and not the technology or disease models used to create them. The early
stages of drug discovery involve the identification and early validation
of a disease-modifying target (Lindsay MA., 2003). There is a fundamental
problem with studying disease-relevant mechanisms in the current disease
models as the pharmaceutical industry has been investing heavily in
studying the bricks, instead of looking at the building as a whole. The
emphasis in recent years has been on increasing quantity while at the same
moment sacrificing the quality of correlation with clinical reality.

You could also think of it as a pointillist painting, of which we have
been looking at the individual dots, instead of looking at the entire
painting. Another analogy is that we are trying to explain the tidal
patterns of the oceans, by studying a water molecule and ignoring the
moon. We have to look at biological phenomena at the appropriate scale of
integration and from a functional point of view in order to get a grip on
the development of pathological processes. We should try to understand the
dynamics of disease processes at a higher level of biological integration,
closer to the clinical reality, than the genome or proteome. An integrated
cellular approach is needed to study disease processes (Lewis W. 2003).

If we modify a gene, e.g. by creating transgenic animals, we must try to
understand the dynamics of the pathways we are modifying. Quite often
transgenic animals or animals with gene knock-outs do not show the
expected phenotype, because of a different genetic background and the
highly dynamic interplay of metabolic pathways and environmental
influences on the final phenotype (Sanford LP, 2001; Pearson H. 2002).

The (early stage) disease models we use don’t work as they should do
and do not provide enough predictive power. One can study cellular
components, like DNA and protein as such, but this will not reveal the
complex interactions going on at the cellular level of biological
integration or in other words, the cytome . Both medicine and
pharmaceutical research would benefit from using more cell oriented
disease models and even higher-order models, instead of using
infra-cellular models to try to describe complex pathological processes at
a molecular level and getting lost in the maze of molecules which are the
building blocks of cells.

An important moment in the drug discovery and development pipeline is the
transition from discovery research to clinical development, for which
different approaches to develop gatekeepers have been proposed to reduce
the failure rate in drug development on both sides of the transition
(Lappin G., 2003; Nicholson J.K., 2002; Pritchard J.F., 2003). Drug
discovery should improve the quality of drugs it allows to enter
development and drug development should be able to protect itself from
drugs likely to fail in phases I to III. A better quality of drugs
entering drug development is needed, not just more quantity. Failing in
larger numbers will not bring the solution to create a better process from
discovery to phase III an IV.

A highly defined oligo-parametric infra-cellular disease model used in
High Throughput Screening (HTS) which in its setup ignores the complexity
of higher order biological phenomena, may produce beautiful results in the
laboratory, but fails to generate results of sufficient predictive power
to avoid considerable financial losses later on in the drug discovery
pipeline (Bleicher KH, 2003). A living cell may be a less well defined
experimental environment for the biochemist, but it will provide us with
the additional modulating influences on our disease models which are lost
in lower-order disease models. Metabolic variation in disease models

Nowadays the first stages of drug discovery use genetically homogeneous
disease models, which as a result do not show the same metabolic
heterogeneity of patient populations. Genetic and metabolic heterogeneity
is now seen as reason to exclude potential patients from treatment, not as
a consequence of the failure of drug development. If we cannot develop
drugs which will work in a genetically and metabolically heterogeneous
environment, we try to reduce the patient population until it fits our
abilities. However this micro-management of patient populations leads to a
level of complexity in disease treatments the pharmaceutical industry,
physicians and society cannot deal with in the end.

Pharmaco-genomics is used to study differences in drug metabolism, but not
to design or use early stage disease models with sufficient genetic
heterogeneity to select drug molecules which will hold their activity in a
metabolic heterogeneous environment. Genetic heterogeneity, epigenetic
modulation and metabolic variation are not taken into account in the first
stages of the drug discovery process. Optimizing a drug molecule for
binding to one particular genetic variant, imminently leads to failure in
a genetically heterogeneous patient population. Randomization in
experimental design to counteract a systematic bias in one’s results
involves more than sample unit randomization patterns.

Biological variation in heterogeneous cell or animal population may be an
unpleasant fact of life, but it correlates better to the real conditions
of the genetically and metabolically heterogeneous patient populations.
Ignoring biological variation in drug discovery will cause failure in drug
development. Using pharmaco-genomics only to exclude slow metabolizers,
etc., from clinical trials and thereby homogenizing the trial population
can lead to a dramatic reduction in potential patient population and a
decline in profit generation potential. The important phase of a drug life
cycle starts when it hits the market and we better take care that it will
spend its full life cycle to generate enough revenue to fuel the company.
Hypo- or Subcellular disease models

We have seen an increase in capacity of DNA and RNA expression techniques,
but their information still delivers data up to the level of the expressed
protein, but not beyond. The quantitative chain of functional causation
stops at the protein level. Higher order spatial and temporal dimensions
of cellular dynamics are beyond the reach of these techniques. Gene
expression studies do not tell you about the functional outcome of protein
dynamics and enzymatic activity in the different cellular compartments. Up
and down-regulation of gene expression, does not inform you about the
functional interrelation of the encoded proteins and their spatial and
temporal dynamics in the cell. Molecular pathways do not exist as
parallelized unrelated up-and down regulating patterns, but are highly
dynamic and intertwined networks. Where and how should we interfere with a
disturbed cellular metabolic process in order to treat a disease? We fail
to predict success with our current disease models, which translates
itself in a high (up to 90 percent) attrition rate in drug development.

Southern, Northern and Western blots may show the quantitative sequence of
gene expression up to protein concentration (Alwine JC, 1977; Alwine JC,
1979; Howe JG, 1981; Hinshelwood MM, 1993). DNA microarrays give a
quantitative indication of gene expression (Barbieri RL, 1994; Schena M,
1995; DeRisi J, 1996; Jeong JG, 2004). Without a functional assay on
in-vivo dynamics of protein function and studying its spatial and temporal
expression patterns in the cell (compartments) and tissue, the functional
impact on the cell remains unclear (Kriete A, 2003; Egner, A., 2004).

Studying subcomponents of cellular pathways ignores the functional unity
of the biological processes in the cell and the functional interactions
between pathways. Without a better understanding of the phenotypic and
functional outcome in the cell, the failure rate of the drug discovery
process will remain high and very costly. There is a predictive deficit in
the current oligo-parametric disease models used in pharmaceutical
research which necessitates complex and expensive studies later on in the
drug development pipeline to make up for the predictive deficit.

The popular techniques to explore and analyze low-dimensional data at high
speed are based on the idea that this would provide all the data with
sufficient predictive power to allow for a bottom-up approach to drug
discovery. The current High Throughput Screening (HTS) and other early
stage methods allow gathering low-dimensional data at high speed and
volume, but their predictive power is too low as they lack depth of
descriptive power (Perlin MW, 2002; Entzeroth M, 2003). We are just
clogging the drug discovery pipeline with under-correlating data in
relation to clinical reality. A bigger flow of unmanageable data does not
equal a higher correlation to clinical reality.

The knowledge gathered at the infra-cellular level has to be viewed in its
relation to the (living) cell and the biological and non-biological
processes influencing its function and health, which requires a top-down
functional and phenotypical approach rather than a bottom-up descriptive
approach. Complex disease processes cannot be explained by simple
oligo-parametric low-level models. A high-speed oligo-parametric disease
model does not equal high predictive power. It is not the ability to study
a simplified disease model at high speed which will allow us to succeed,
but we must study and verify the functional outcome of the disease process
itself.

A game of chess is not described by naming its pieces, but by the spatial
and temporal interaction of both players or in other words the flow of
actions and reactions, described in a space-time continuum and if we add
the color it is a spatio-spectro-temporal flow of events. The individual
pieces or moves do not explain the final outcome of the game, only when
the entire process is analyzed from a positional and functional point of
view we can understand and predict the reason why one player wins or
loses. You have to study a game of chess at the appropriate organizational
level in order to understand it or you will fail to find an explanation
for the outcome of the game. Isocellular disease models

Using cellular disease models in more detail is not a trivial endeavor.
Cellular disease models need to be related to at least the in vivo
cellular disease process we want to study, so a validation of this
correlation is important (Dimitrova D. S., 2002; Lidington EA, 1999;
Thornhill MH, 1993).

We now know that metabolic pathways show complex interactions and that
gross genetic rearrangements can impair entire parts of cellular
metabolism. The cellular models used in research should be validated for
their functional and phenotypical representation of in vivo, in-organism
processes. However many popular cell lines are not selected for their
close linkage to clinical reality, but for their maintainability in the
laboratory, lack of phenotypical variation, ease of transfectability, etc.
.. It is assumed that those cellular models are a valid representative of
the disease process, but almost never a thorough assessment is being done.
Primary cell lines cells in general require a more complex tissue culture
medium than most popular cell lines. Cancer cells (and transformed cells)
can usually grow on much simpler culture medium. Replicative senescence
and varying behavior at each passage (which may necessitate a change of
cell lines for long term experiments) also make primary cell lines less
popular, as they necessitate a change of cell lines and variability in
experimental data. Reduction of unpleasant variability in experiments by
choosing a specific disease model may create ‘nice’ results, but
of a reduced predictive value. Quite often results obtained with one cell
line, cannot be confirmed by using another cell line, without even talking
about primary cells.

CHO cells (Chinese Hamster Ovary, Cricetulus griseus) are used in many
assays, but they are not derived from a human cell and are aneuploid
(Tjio, J. H., 1958). HeLa cells are derived from an aggressive cervical
cancer; they have been transformed by human papillomavirus 18 (HPV18) and
have different properties from normal cervical cells (Gey, G.O., 1952).
The U-2 OS osteosarcoma cell line is easy to maintain and transfect
(Ponten J, 1967). The PC12 cell line which responds reversibly to nerve
growth factor (NGF) has been established from a rat adrenal
pheochromocytoma, it has a homogeneous and near-diploid chromosome number
of 40 (Greene LA, 1967). HEC cells are derived of a human endometrial
adenocarcinoma cell line and are also very popular (Kuramoto H., 1972).

Some popular cell lines may correlate with themselves and not with the
complex dynamics of the pathological process they are supposed to
represent. Studying the dynamics of the involvement of a protein in a
disease in patients and transforming this knowledge into a disease model
in a particular cell line requires a careful assessment before embarking
on a drug discovery process. Functional cell model drift should be
verified at regular intervals and taken into account.

Even within individual cell lines there is not always homogeneity in
phenotype and function. Cancer cells in culture show chromosomal
instability as they tend to lose parts of chromosomes (Duesberg P., 1998,
Duesberg P, 2004). Continuous sub-cultivation of cells and an increase in
the number of passages may lead to chromosome rearrangements and loss of
functional reactivity (Dzhambazov B, 2003). Loss of function destabilises
a cell when critical parts of pathways are lost.

Many of the most popular cell lines lack parts or even entire chromosomes
and therefore large chunks of metabolic pathways. A drug molecule can not
interact with the proteins which are not present in the cell line and an
adverse or even positive effect will go unnoticed. Functional loss of
proteins and enzymes in cancer cell makes them unresponsive to drugs if
the protein(s) which are the target of a drug are lost without killing the
cell as such.

Even when a protein is successfully expressed in a cell as shown on a
Western blot, this does not equal functional success. Western blotting
tells you how much protein has accumulated in cells. Even knowing the rate
of synthesis of a protein by Radio-Immune Precipitation (RIP) does not
predict the functional outcome of protein expression. Protein function is
also depending on the metabolic background of the cell in which the
protein is expressed and its spatial and temporal organisation. If the
enzymatic and structural background of the cell does not meet the
prerequisites to put a functional protein in the right location, embedded
in the right functional environment, nothing appropriate will happen. An
appropriate functional assay is required to validate proper function of
the expressed protein.

In vivo enzymatic reactions are not linearly correlated to protein
concentration or of “zero order”. The intracellular environment
causes a more complex functional pattern for a given protein, such a bell
shaped relation between protein concentration and function. A
“blunt” on/off expression in a transfected cell does not
correlate well to the physiological condition in a primary cell. When the
appropriate metabolic environment is not present when studying a protein
in a cellular disease model, predictivity of the disease model may be low
compared to physiological conditions.

A traditional (homogeneous) cell culture in the laboratory may not yet
mimic the physiological conditions in an entire organism, so our approach
to cell-based research (and beyond) requires some redesign also. Creating
a virtual organism, by differential screening of a multitude of cell type
representing the main cell types in the human body (cardiomyocytes,
hepatocytes …) could help us to improve the predictive value of
cellular disease models. We need to study cell-to-cell and
cell-type-specific pathway dynamics in more detail, as is the case for
nuclear factor-kappaB (NF-KappaB) (Schooley K, 2003).

Metabolic pathways in cells do not exist in a void, but are interconnected
and highly dynamic processes. Blocking a pathway has far-reaching
consequences for the intracellular environment. The upstream metabolites
will either find their way through other metabolic pathways or pile-up.
Some inborn errors of metabolism are an example of this principle (PKU
…). Drugs blocking pathways also cause a distortion of the delicate
balance in metabolic processes and may cause upstream effects by
metabolites which are normally metabolized before they can cause any harm.
The kinetics of the “pharmakon” may be documented, but the
change in cellular metabolism and pathway-network distortion are less well
understood. Upstream metabolites may become processed by other pathways
and unexpected adverse effects may show up. Adverse effects on cellular
metabolism are only present in those cells which have an intact metabolic
pathway and not even all cell types activate the same pathways at all
times.

Differential multiplexing in cell based screening could help us to gain
more information from a diversity of cells. In recent years we have seen
developments towards an up-scaling of the capacity of cellular research.
Techniques such as High Content Screening (HCS) can be applied to cellular
systems on a large scale (Abraham VC, 2004). Subcellular differential
phenotyping is already possible on a large scale by using human cell
arrays (Conrad C, 2004). Analyzing a large number of tissues for candidate
gene expression is now greatly facilitated by using Tissue MicroArray
(TMA) technology (Kononen J, 1998; Simon R, 2002; Braunschweig T, 2004).

From individual cell to cytome

Studying cell function and drug impact at the level of the individual cell
is called cellomics (Russo E., 2000). However, the concept of cellomics
does not take into account the supra-cellular heterogeneity which is
present in every cellular system, such as a cell culture or an organism.
By studying cells while ignoring their diversity we make the same mistake
as the statistician who drowned crossing a river that on average was just
three feet deep.

Due to the heterogeneity of cell types and differences between cells in a
healthy and disease state, we need to take this heterogeneity into
account. Cytomes can be defined as cellular systems and the subsystems and
functional components of the body. Cytomics is the study of the
heterogeneity of cytomes or more precisely the study of molecular single
cell phenotypes resulting from genotype and exposure in combination with
exhaustive bioinformatics knowledge extraction (Davies E, 2001; Ecker RC,
2004b; Valet G, 2003; Valet G, 2004).

In order to get the broader view on pathological processes, we should move
on to the phenotypical and functional study of the cellular level or the
cytome in order to understand what is really going on in important disease
processes. Although the genome and proteome level have their predictive
value in order to understand the processes involved in disease (and
health), the cytome level allows for an understanding of pathological
phenotypes at a higher level. By integrating the knowledge from the genome
and proteome, we could give guidance to the exploration of the cytome,
which was not possible before this knowledge was available.

The cytome level will also provide guidance to focus the research at the
genome and proteome level and so creating a better cross-level
understanding of what is going on in cells (Gong JP, 2003; Valet G, 2004;
Valet G, 2004b). Some would see this as taking a step back from the
current structural and systematic descriptive approach, but it is mainly a
matter of integrating research at another level of biological integration
and looking in a different way to the web of interactions going on at the
cellular level. Biological processes do not exist in a void, but they are
a part of a web of interactions in space and time, rather than being an
island on their own. A cell is a multidimensional physical structure (3D
and time) with a finite size, not a dimensionless quantity. We cannot
ignore the spatial and temporal distribution of events, without losing too
much information.

In recent years the tools have matured to start studying the cellular
level of biological integration, but the tools are still used in the same
way as if they were derived from low-content high-throughput phenomena as
this is still the dominant research model. The tools to generate and
explore a high-dimensional feature space are still scattered and not
brought into line with the exploration of the cytome.

Functional processing in cellular pathways

The interconnection of genome, proteome and cytome data will be necessary
in order to allow for an in-depth understanding of the processes and
pathways interacting at the cellular level. A monocausal approach will
have to be replaced with a poly- and pluricausal approach in order to
understand and explain the phenomena going on at the cellular level.
Pluricausal means causal contributions at different levels, such as genes,
other cells and environmental influences. Polycausal means multiple causal
contributions at the same biological level, such as polygenic diseases or
multiple agonistic and antagonistic environmental influences. The concept
of a multithreaded, multidimensional, weighed causality is needed in order
to study the web of interactions at the cellular level. A drug modulates
cellular function, but changes can be studied at different levels of
biological integration:

Disease outcome = drug x (a x clinicaln + b x physiologicalp + c x
cellularq + d x geneticr )

Diagnosis and drug discovery merge if we take parallel models for both.
The clinical diagnosis or para-clinical diagnosis of a disease should show
a high correlation with the disease models used to study its possible
treatment. A cause (e.g. a single gene defect, a bacteria) can have
multiple consequences and as such be poly-consequential, which is the
mirror situation of a single consequence being caused by multiple causes
(co-causality or co-modulation) acting either synergistic or antagonistic
(e.g. a disease with both a genetic an environmental component). In
reality, a pathological condition is a mixture of those extremes (e.g. a
bacterial or viral infection and the host’s immune system) and as
such a simple approach is not likely to succeed in unraveling the
mechanism of a disease. With the current systematic and descriptive
approach however, we get lost in the maze of molecular interactions. We
are looking at too low a level of biological integration and we get lost
in a maze of structures and interactions. The cell is the lowest
acceptable target, not its single components, like DNA or proteins.

We are looking at the alphabet, not even words or sentences, nature is not
a dictionary, but it is a novel. We should study the flow of events in a
cell with more power, not only the building blocks. As an example, Mendel
did not need to know about DNA in order to formulate his laws of
inheritance and he did not know that the discovery of the physical carrier
of inheritance, DNA, would confirm his views later on, but his laws are
still valid as such. Certainly physics was not at the stage it was in the
20th century when Newton formulated the law of gravity, but his
observations and conclusions were valid. When Einstein formulated his
relativity theory, he did not have modern physics at his disposal. His
theory does not fit well to the quantum level, but does explain phenomena
at a higher level of functional integration and as such is an appropriate
model.

The value of a scientific model does not lie in the scale of phenomena it
describes, but in its predictive correlation to the reality it tries to
capture. The more we may try to exclude elements from reality, the better
we may be able to build a model which holds in a tightly controlled
situation in our laboratory, but fails when challenged by full-blown
reality in the outside world.

What we find should not be in contradiction to what lower level structural
descriptive research discovers, but we should not wait for its completion
to start working on the problems we are facing in medicine and health care
today. Epicellular disease models

Organoids, parts of organs, isolated organs and animals are being used as
epicellular disease models. Animal models are an important part of the
drug discovery process. The correlation of the animal model to the actual
disease process in man is an important issue to consider (Huskey SE,
2003). In the past twenty years a lot has changed in the use of animal
models to study human disease and develop new drugs.

We can now use genetically modified animals to study gene regulation and
cell differentiation in a mammalian system (Gordon JW, 1980; Isola LM,
1991; Brusa R., 1999). Transgenic and gene-deleted (knockout) mice are
used extensively in drug discovery (Rudmann DG, 1999)

In genetically modified mice however, the observed phenotype is not always
the direct result of the genetic alteration (Linder CC., 2001; Schulhof J,
2001). The effect of the genetic modification is not completely
straightforward, due to variations in the genetic background of the
animals (Crusio WE., 2004). Transgenic mice containing the same genetic
manipulation exhibit profoundly different phenotypes due to diverse
genetic backgrounds (Sigmund CD., 2000; Sanford LP, 2001; Holmes A, 2003;
Thyagarajan T, 2003; Bothe GW, 2004).

Using inbred mouse strains reduces variation in genetic background, but
also reduces the correlation of the disease model to real-world genetic
and metabolic variation encountered in human populations. Do we want nice
results with a low standard deviation (SD), or do we need results highly
correlating with clinical reality? If one wishes to obtain the optimal
mouse model for a human disease, one needs to choose the correct genetic
background as well as the correct mutation (Erickson RP., 1996).

We still do not have an in-depth understanding of the delicate spatial and
temporal interplay in metabolic pathways in cells, organs or entire
organisms in transgenic animals. Introducing or removing a gene without a
clear understanding of its spatial and temporal expression pattern, leaves
us with a correlation deficit in relation to the disease process in man.

When we modify a gene, we modify a pathway-web with upstream and
downstream consequences for cellular metabolism in different cellular
compartments (nucleus, Golgi …). The metabolites which (dis-) appear
due to the modification will modify a highly dynamic network of metabolic
interactions. In-vivo spatial and temporal variation in protein structure
and activity profiles will add to the complexity of unravelling the
functional impact of modified gene expression. How to explore and find new
directions for research

We may now be capable to study a low-level layer of biological integration
in great detail, such as the genome or proteome, but it is in the
higher-order spatial and temporal patterns of cellular (and beyond)
dynamics where the answers to our questions can be found. However, these
higher-order levels of biological integration are still being studied is a
dispersed way, due to the formidable technological and scientific
challenges we are facing.

A 4-D physical space is still a formidable challenge to deal with compared
to the 1-D problem of a DNA-sequence. The even higher-order feature
hyperspace which is derived from this 4-D space is even further away from
what we can easily comprehend. We focus the major efforts of our applied
research on the level of technology we can achieve, not on the level of
spatial and temporal understanding which is required. Applied research is
suffering from a scale and dimensionality deficit in relation to the
physical reality it should deal with. Reality does not simplify itself to
adapt to the technology we use to explore biology just to please us.

At the moment we expect that an oligo- or even mono-parametric
low-dimensional analysis will allow us to draw conclusions with sufficient
predictive power to work all the way up to the disease processes in an
entire organism. We are using disease models with a predictive deficit,
which allow us to gather data at great speed and quantity, but in the end
the translation of the results into efficient treatment of diseases fails
in the majority of cases (up to 90 percent). The cost of this inefficient
process is becoming a burden, which both society and the pharmaceutical
industry will not be able to support indefinitely. As "the proof is in the
pudding", not in its ingredients, we have to improve the productivity of
biomedical and pharmaceutical research and broaden our functional
understanding of disease processes in order to prepare ourselves for the
challenges facing medicine and society.

If there were no consequences on the speed of exploration in relation to
the challenges medicine is facing today, the situation would of course be
entirely different. In many cases, the formulation of an appropriate
hypothesis is very difficult and the resulting cycle of formulating a
hypothesis and verifying it is a slow and tedious process. In order to
speed up the exploration of the cytome, a more open and less deterministic
approach will be needed (Kell DB, 2004).

Analytical tools need to be developed which can find the needle in the
haystack, without a priori knowledge or in other words we should be able
to find the black cat in a dark room, without knowing or assuming that
there is a black cat. An open and multi-parametric exploration of the
cytome should complement the more traditional hypothesis driven scientific
approach, so we can combine speed with in-depth exploration in a
two-leveled approach to cytomics. The multi-parametric reality which we
need to deal with requires a more multi-factorial exploration than the way
we explore the cellular level at this moment.

We now close our eyes to much of the complexity we observe; because our
disease models are not up to the challenge we are facing today. Feeling
happy with answers to questions in low-complexity disease models will not
help us at the end of the drug discovery pipeline. We reduce the
complexity of our datasets beyond the limits of predictive power and
meaningfulness. We must reduce the complexity of possible conclusions
(improvement or deterioration), but not the quality of data representation
or data extraction into our mathematical models. The value of a disease
model does not lie in the technological complexity of the machinery we use
to study it, but in its realistic representation of the disease process we
want to mimic.

A disease model which fails to generate data and conclusions which hold
into drug development, years later, fails to fulfill its mission.
Disease-models are not meant tot predict future behavior of the model, but
to predict the outcome of a disease and a treatment. The residual gap
between the model and the disease is in many cases too big to allow for
valid conclusions out of experiments with current low-level disease
models. Due to deficient early-stage disease models, the attrition rate in
pharmaceutical research is still very high (80 percent or 4 out of 5 drugs
in clinical research).

It is often said that the easy targets to treat are found already, but in
relation to the status of scientific knowledge and understanding,
“targets” were never easy to find. Disease models were just
inadequate to lead to an in-depth understanding of the actual dynamics of
the disease process. Just remember the concept of “miasma”
before the work of Louis Pasteur and Robert Koch on infectious diseases.
Only when looking back with present day knowledge we declare historical
research as “easy”, but we tend to forget that those scientists
were fighting an uphill battle in their days.

Instead of focusing on ever further simplifying our low-dimensional and
oligo-parametric disease models in order to speed them up and only
increasing the complexity of the machinery to study them, we need a
paradigm shift to tackle the challenge ahead of us. Increasing quantity
with unmatched quality of correlation to clinical reality leads to
correlation and predictive deficits. We have to create a quantitative
hyperspace derived from high-order spatial and temporal observations
(manifold) to study the dynamics of disease processes in cells and
organisms. The parameterization of the observed physical process has to
represent the high-dimensional (4-D) and multi-scale reality underlying
the disease process. Each physical or feature space can be given a
coordinate system (Cartesian, polar, gauge …) which puts individual
objects and processes into a relative relation to each other for further
quantitative exploration. Homo siliconensis

Gathering more and better quality information about cellular processes,
will hopefully allow us to improve disease models up to a point where
improved in-silico models will help us to complement in-vivo and in-vitro
disease models (Loew LM, 2001; Slepchenko BM, 2003; Takahashi, K., 2003;
Berends M, 2004; De Schutter E, 2004).

Gradually building the “Homo (sapiens) siliconensis” or
in-silico man will allow us to study and validate our disease models at
different levels of biological organization. Building a rough epi-cellular
model, based on our knowledge of physiology and gradually increasing the
spatial and temporal functional resolution of the model by increasing its
“cellularity” could allow for improving our knowledge and
understanding on the way to a full-fledged in-silico model of man.
(Infra-) Cellular resolution is not needed in all cases, so the model
should allow for dynamic up- and down-scaling its “granularity”
of structural and functional resolution in both space and time. What to do
and the way to go?

The goal of a Human Cytome Project

The phenotypical and functional characterization of the (human) cytome is
the ultimate goal of an endeavor on the scale of a Human Cytome Project
(HCP). We should reach a point where we are able to design disease models
which are capable to capture the complexity of the in-vivo in-organism
dynamics of (a) disease processes with high predictive power.

This knowledge should be made broadly available for the improvement of
diagnostics, disease treatments and drug discovery. It is the prerequisite
to come to a better understanding of disease processes and to develop and
improve treatments for new, complex and life threatening diseases for
which we do not find an answer with our current genome and proteome
oriented approach only. Studying the Cytome

First try to walk and then run. Studying the (human) cytome as such is
basically another way of looking at research on cellular systems. We go
from a higher level of biological organization (cytome) to a lower one
(proteome and cytome). Any research which starts from the molecular single
cell phenotypes in combination with exhaustive bioinformatics knowledge
extraction, is cytomics (Valet G, 2003). The only thing you need is
something like a flow-cytometer or a (digital) microscope to extract the
appropriate datasets to start with. Even a small lab or group can take
this approach and prove the concept, either for diagnostics, drug
discovery or basic research. Generating cytome-oriented data and getting
results is within reach of almost every scientist and lab. Increasing the
throughput may be required for industrial research and for a large scale
project, but this is not necessary for a proof of concept or for studying
a specific subtopic. Organizational aspects

To study the entire human cytome will require a broad multidisciplinary a
multinational approach, which will involve scientists from several
countries and various disciplines to work on problems from a functional
and phenotypical point of view and top-down, instead of bottom-up. Both
academia and industry will have to work together to avoid wasting too much
time on scattered efforts and dispersed data. The organizational
complexity of a large multi-center project will require a dynamic
management structure in which society (politicians), funding agencies,
academia and the industry participate in organizing and synchronizing the
international effort. Managing and organizing such an endeavor is a
daunting task and will require excellent managerial skills from those
involved in the process, besides their scientific expertise (Collins F.S.,
2003b).

The challenges of Human Cytome Project will not allow us to concentrate on
only a few techniques or systematically describing individual components,
but we must keep a broad overview on the cell and its function and
phenotype by multi-modal exploration. We will need an open systems design
in order to be able to exchange data and analyze them with a wide variety
of exploratory and analytical tools in order to allow for creating a broad
knowledgebase and proceed with the exploration of the cytome without
wasting too much time on scattered data.

The project should be designed in such a way that along the road
intermediate results would already provide beneficial results to medicine
and drug development. Intermediate results could be derived from hotspots
found during the process and worked out in more detail by groups
specializing in certain areas. As such the project could consist of a
large scale screening effort in combination with specific topics of
immediate interest. The functional exploration of pathways involved in
pathological processes, would allow us to proceed faster towards an
understanding of the process involved in a disease. It is best to take a
dual approach for the project, which on one side focuses on certain
important diseases (cancer, AD …), and on the other side a track
which focuses on cellular mechanisms such as cell cycle, replication, cell
type differentiation (stem cells). The elucidation of these cellular
mechanisms, will lead to the identification of hot-spots for further
research in disease process and allow for the development of new
therapeutic approaches. Technology

Human biology can be explored by using many techniques, such as CT, MRI,
LM, EM, etc. each providing us with a structural and functional subset of
the physical phenomena going on inside the human body. I will focus on the
cellular level.

Image based cytometry

Every scientific challenge leads to the improvement of existing
technologies and the development of new technologies (Tsien R, 2003).
Technology to explore the cytome is already available today and exciting
developments in image and flow based cytometry are going on at the moment.
The dynamics of living cells is now being studied in great detail by using
fluorescent imaging and many sophisticated light microscopy techniques are
now available (Tsien RY, 1998; Emptage NJ., 2001; Haraguchi T. 2002;
Gerlich D, 2003b; Iborra F, 2003; Michalet, X., 2002; Michalet, X., 2003;
Stephens DJ, 2003; Zimmermann T, 2003). Studying intra-vital processes is
possible by using microscopy (Lawler C, 2003).

Advanced microscopy techniques are available to study the morphological
and temporal events in cells, such as confocal and laser scanning
microscopy (LSM), digital microscopy, spectral imaging, Fluorescence
Lifetime Imaging Microscopy (FLIM), Fluorescence Resonance Energy Transfer
(FRET) and Fluorescence Recovery After Photobleaching (FRAP) (Cole, N. B.
1996; Truong K, 2001, Larijani B, 2003; Vermeer JE, 2004). Spectral
imaging microscopy and FRET analysis are applied to cytomics (Ecker RC,
2004). Fluorescent speckle microscopy (FSM) is used to study the
cytoskeleton in living cells (Waterman-Storer CM, 2002; Adams MC, 2003;
Danuser G, 2003).

Laser scanning (LSM) and wide-field microscopes (WFM) allow for studying
molecular localisation and dynamics in cells and tissues (Andrews PD,
2002). Confocal and multiphoton microscopy allow for the exploration of
cells in 3D (Peti-Peterdi J, 2003). Multiphoton microscopy allows for
studying the dynamics of spatial, spectral and temporal phenomena in live
cells with reduced photo toxicity (Williams RM, 1994; Piston DW, 1999;
Piston DW. 1999b; White JG, 2001).

Green fluorescent protein (GFP) expression is being used to monitor gene
expression and protein localization in living organisms (Chalfie M, 1994;
Stearns T. 1995; Lippincott-Schwartz J, 2001; Dundr M, 2002; Paris S,
2004). Using GFP in combination with time-resolved microscopy allows
studying the dynamic interactions of sub-cellular structures in living
cells (Rustom A, 2000). Labelling of bio-molecules by quantum dots now
allows for a new approach to multicolour optical coding for biological
assays and studying the intracellular dynamics of metabolic processes
(Chan WC, 1998; Han M, 2001; Michalet, X., 2001; Chan WC, 2002; Watson A,
2003; Alivisatos, AP, 2004; Zorov DB, 2004).

The resolving power of optical microscopy beyond the diffraction barrier
is a new and interesting development, which will lead into so-called
super-resolving fluorescence microscopy (Iketaki Y, 2003). New microscopy
techniques such as standing wave microscopy, 4Pi confocal microscopy, I5M
and structured illumination are breaking the diffraction barrier and allow
for improving the resolving power of optical microscopy (Gustafsson MG.,
1999; Egner, A., 2004). We are now heading towards fluorescence nanoscopy,
which will improve spatial resolution far below 150 nm in the focal plane
and 500 nm along the optical axis (Hell SW., 2003; Hell SW, 2004).

Exploring ion flux in cells, such as for Calcium, is already available for
a long time (Tsien R, 1981, Tsien R 1990; Cornelissen, F, 1993). Locating
the spatial and temporal distribution of Ca2+ signals within the cytosol
and organelles is possible by using GFP (Miyawaki A, 1997). Fluorescence
ratio imaging is being used to study the dynamics of intracellular Ca2+
and pH (Bright GR, 1989; Silver RB., 1998; Fan GY, 1999; Silver RB., 2003;
Bers DM., 2003).

Microscopy is being used to study Mitochondrial Membrane Potentials (MMP)
and the spatial and temporal dynamics of mitochondria (Zhang H, 2001; Pham
NA, 2004). The distribution of H+ ions across membrane-bound organelles
can be studied by using pH-sensitive GFP (Llopis J, 1998)

Electron Microscopy allows studying cells almost down to the atomic level.
Atomic Force Microscopy (AFM) allows studying the structure of molecules
(Alexander, S., 1989; Drake B, 1989; Hoh, J.H., 1992; McNally HA, 2004).
Multiple techniques can be used, such as combining AFM for imaging living
cells and compare this with Scanning Electron Microscopy (SEM) and
Transmission Electron Microscopy (TEM) (Braet F, 2001).

High Content Screening (HCS) is available for high speed and large volume
screening of cells and tissues (Conrad C, 2004; Abraham VC, 2004).

Flow cytometry

Flow Cytometry allows us to study the dynamics of cellular processes in
great detail (Perfetto SP, 2004; Voskova D, 2003; Roederer M, 2004).
Interesting developments are leading to fast imaging in flow (George TC,
2004). Combining both image and flow based cytometry can shed new light on
cellular processes (Bassoe C.F., 2003).

Image analysis

In order to come to come to a quantitative understanding of the dynamics
of in-vivo cellular processes image processing, methods for object
detection, motion estimation and quantisation are required.

New image analysis and quantification techniques are constantly developed
and will enable us to analyze the images generated by the imaging systems
(Van Osta P, 2002; Eils R, 2003; Nattkemper TW, 2004; Wurflinger T, 2004).
The quantification of high-dimensional datasets is a prerequisite to
improve our understanding of cellular dynamics (Gerlich D, 2003; Roux P,
2004).

Imaging principles based on physics and human vision principles allow for
the development of new and interesting algorithms (Geusebroek J. M., 2003;
Geusebroek J. M., 2003b; Geusebroek J. M., 2005). The necessary increase
of computing power requires both a solution at the level of computation as
increasing the processing capacity (Seinstra F.J., 2002; Carrington WA,
2004). Improving the automated quantification of image content allows for
a better understanding of microscopy images (Huang K.,2004).

The development of new and improved algorithms will allow us to extract
quantitative data to create the high-dimensional feature spaces for
further analysis. Data flow management

My personal interest however is to build a framework in which acquisition,
detection and quantification are designed as modules each using plug-ins
to do the actual work (Van Osta P, 2004). Data representing space, time
and spectral sampling are distributed throughout the system to be
processed. The focus is not on the individual device to create the data
or on individual algorithms, but on the management of the dataflow through
a distributed system to convert spatial, spectral and temporal data into a
feature (hyper-) space for quantitative analysis. The M5 framework manages
the flow of data from physics to features. Up- and downscaling of
cell-based research is dynamically managed by the system as the scale of
processing does not require a change in basic design. Data analysis an
data management

Managing and analyzing data in a multidimensional linked feature space or
hyperspace will require a change in the way we look at data analysis and
data handling in order to succeed. A quantitative multidimensional feature
space is less easily understood as looking at a 3D image in a qualitative
way. People are obsessed by reducing the complexity of their datasets, but
sometimes this is a “reductio ad absurdum”, below the level of
meaningfulness. We have to create tools to be able to understand
high-dimensional feature “manifolds” if we want to capture the
wealth of data cell based research can provide. Transforming a
high-dimensional physical space into an even higher order feature space
requires an advanced approach to data analysis.

Building the multidimensional matrix of the web of cross-relations between
the different levels of biological organization, from the genome, over the
proteome, cytome all the way up to the organism and its environment, while
studying each level in a structural (phenotype) and functional way, will
allow us to understand the mechanisms of pathological processes and find
new treatments and better diagnostics tools. A systematic descriptive
approach without a functional complement is like running around blind and
it takes too long to find out about the overall mechanisms of a
pathological process or to find distant consequences of a minute change in
the pathway matrix.

We should also get serious on a better integration of functional knowledge
gathered at several biological levels, as the scattered data are a problem
in coming to a better understanding of biological processes. The current
data storage models are not capable of dealing with heterogeneous data in
a way which allows for in-depth cross-exploration. Data management systems
will need to broaden their scope in order to deal with a wide variety of
data sources and models. Storage is not the main issues, the use and
exploration of heterogeneous data is the centerpiece of scientific data
management. Data originating from different organizational levels, such as
genomic (DNA sequences), proteomic (protein structure) and cytomic (cell)
data should be linked. Data originating from different modes of
exploration, such as LM, EM, NMR and CT should be made cross-accessible.
Problems to link knowledge originating from different levels of biological
integration is mainly due to a failure of multi scale or multilevel
integration of scientific knowledge, from individual gene to the entire
organism, with appropriate attention to functional processes at each
biological level of integration. Standardization and quality

On the experimental side, standardization of experimental procedures and
quality control is of great importance to be able to compare and link the
results from multiple research-centers. But quality is not only a matter
of experimental procedures, but also of disease model validation and
verifying the congruence of a model with clinical reality.

We need to design procedures for instrument set-up and calibration (Lerner
JM, 2004). In addition we need to define experimental protocols
(reagents…) and standardize data exchange procedures and standards
such as CytometryML, Digital Imaging and Communications in Medicine
(DICOM), Open Microscopy Environment (OME XML) and the Flow Cytometry
Standard (FCS) (Murphy RF, 1984; Seamer LC, 1997; Leif RC, 2003; Horn RJ.
2004; Samei E, 2004). The methods used for data analysis, data
presentation and visualization need to be standardized. We need to define
quality control (QC) procedures and standards which can be used by
laboratories to test their procedures. A project on this scale requires a
registration and repository of cell types and cell lines (e.g. ATCC,
ECCC). This way of working is already implemented for clinical diagnosis,
by organizations such as the European Working Group on Clinical Cell
Analysis (EWGCCA), which could help to implement standards and procedures
for a Human Cytome Project.

Conclusion

The future development of this idea will decide if a Human Cytome Project
(HCP) will become reality. I can only provide ideas and explain why it
would benefit society and science to explore the cytome in a more
organized and systematic way as is currently being done. This level of
biological organization deserves more in-depth exploration and analysis to
improve our understanding of important human disease processes to allow us
to deal with the challenges we are facing today and in the future.

Acknowledgments

I am indebted, for their pioneering work in automated digital microscopy,
to Frans Cornelissen, Hugo Geerts, Jan-Mark Geusebroek, Roger Nuyens, Rony
Nuydens, Luk Ver Donck and their colleagues. Many thanks also to the
pioneers of Nanovid microscopy, Marc De Brabander, Jan De Mey, Hugo
Geerts, Marc Moeremans, Rony Nuydens and their colleagues.

References, etc.:

http://ourworld.compuserve.com/homepages/pvosta/humcyt.htm
   

Copyright © 2006 knowledge-database   -   All rights reserved