Messages from the Genome
Genetics is the new science. Every day, something fascinating appears in the newspapers or on television, in magazines or books, in connection with the genetic engineering of crops and animals, the Human Genome Project and all that it portends for our future, or the imminent conquest of human disease through gene therapy. The last-named of these has become especially salient lately with the report from France of success in treating an immune disorder by adding working genes to cells. But whatever the specific subject may be, the cornerstone of almost every story about the science of genetics is the central significance of the DNA molecule—the genome—that contains the complete set of instructions for making an organism.
The concept is, indeed, astounding. No matter how many times you encounter the notion that from a small set of chromosomes—tiny objects that can be seen only under extreme magnification—a fly or a whale is constructed, it remains staggeringly hard to take in. But you would not know this from reading the journalists and researchers who interpret the new science for us. To them, there is little that is incomprehensible about the process. We may be ignorant of details, to be sure, but our ignorance, we are assured, is rapidly evaporating. The theory is sound and proved, the research proceeds apace, the authorities have spoken. And what they have told us is that, though the information contained in the genome is immense, we know how it works, and we stand on the verge of controlling it.
Item: In early April, the papers carry a story about the Monsanto Corporation’s having arrived at “a working draft” of the genetic code of the rice plant. Obviously, if rice, which is eaten regularly by most people in the world, can be redesigned in ways to make it hardier, healthier, or tastier, it will do a lot of people good (in addition to making Monsanto a lot of money). What does a “working draft” involve? “By analogy,” a researcher is quoted as saying, “think of an encyclopedia of how to construct a rice plant made up of 100 volumes, each with 1,000 pages and with 1,000 words to a page.” Monsanto printed out all these pages, albeit with some errors and gaps. The volumes contain the information, the information is vast, and this information constructs the rice plant. Q.E.D.
Item: The lead story in the October 1999 National Geographic is entitled “Secrets of the Gene.” It mainly concerns the potential (for both good and evil) that we will have to deal with when the results of the Human Genome Project come in. In it, the director of the Sanger Center, a major research laboratory, asserts: “All the information required to make a human being is written into our DNA.” And he goes on: “We can even put an upper limit on the size of it—about one gigabyte [a billion characters] of data. Your entire genome will easily fit on the hard disk of your desktop.” Again: the informational capacity of the genome is huge; being “written,” the information is readable; and it is all that is required to construct an organism.
Item: In The Origins of Life, a book published last year, the evolutionary biologists John Maynard Smith and Eörs Szathmáry declare that, when it comes to making an elephant, “what we can say . . . is approximately how many base pairs”—the chemical bases in the DNA that carry the gene code—“are actually used.” True, there are lots of unused stretches of DNA; true, too, the base pairs do not always combine with maximum mathematical efficiency. But the genome of elephants, like that of all mammals, is about one gigabyte of data in size, and it does the job.
Item: In Genome1 the most recent of his three books on evolution and human nature, the science writer Matt Ridley proclaims ours to be a lucky generation, for we will be the first “to read the book that is our genome.” Just how abundant is that genome? As long, Ridley writes, as 800 bibles—“a gigantic document, an immense book, a recipe of extravagant length.” (This is another way of saying: one gigabyte.) Reading the book of our genome, moreover, “will tell us more about our origins, our evolution, our nature, and our minds than all the efforts of science to date.”
Why so much stress on the size of the genome? Clearly, building an organism requires a tremendous amount of information, and that information must come from somewhere. As Ridley puts it: “Something, somewhere must be imposing a pattern of increasing detail upon the egg as it grows and develops. There must be a plan. But unless we are to invoke divine intervention, that imposer of detail must be within the egg itself.”
Ridley’s “must” expresses the assumption that all of science quite properly rests on—that explanations can appeal only to natural objects and processes. In the present context, what this means is that the capacity of the genome to construct an organism must lie both in the sheer magnitude of the information it contains and in its power to transform that information from “inputs” to “outputs.” How this transformation comes about is precisely what we are told we now know for a virtual certainty. But do we? To address that question properly will require a brief immersion in scientific description.
The physical capacity of the genome—that gigabyte of data—is calculated this way. In the case of mammals, there are about three billion base pairs of nucleotides, each one of which forms a rung on the twisted ladder that makes up the DNA molecule. In sets of three, those base pairs participate in a code; in the code, each set, appropriately called a codon and representing one character of data, designates a specific element of the protein eventually to be constructed by the gene. Divide three billion, the physical length of the genome, by three, the biological character called a codon, and you get one gigabyte.
As for the discrete series of codons that are the genes themselves, there are, in the mammalian genome, only about 100,000 of them, and therefore about 100,000 different kinds of proteins that are used by the body in some capacity or other. In molecular biology there is something called the central dogma: one gene, one protein. This dogma declares what it is genes do, and all they do. Genes, it tells us, do not code for body parts, for processes, for habits, or for instincts. All such organic and temperamental components must somehow be erected out of proteins, or whatever proteins themselves build, and the construction program for those components must be provided for in Ridley’s “egg”—i.e., the genome.
And how does that construction program work—that is, how do we get from inputs to outputs? The genome seems to function as a vehicle of computation, systematically causing a staggering array of factors to coalesce and thereby leading in an orderly fashion to different structural configurations in an organism’s complex arrangement of proteins. To perform this marvelous function, there are, over and above the somatic genes, regulatory genes that somehow control the formation of organs and appendages. The most notable of these genes are the homeotic ones, and their discovery in 1983 is described by Ridley as “probably the greatest intellectual prize that modern genetics has won since the code itself was cracked.”
The important thing about regulatory genes is that their protein products do not themselves form parts of the body or join in its metabolic processes. Instead, they switch on other genes, and sometimes they switch them off. Here, for example, is how the early stages of development of a fruit fly might be described in an introductory textbook, with the regulatory genes identified in italics:
The body formation starts with a gene (actually its protein), a morphogen called bicoid, that establishes the front-rear polarity of the egg; then segmentation genes regulate the origin of segmentation; then gap genes map out the basic subdivision; then pair-rule genes set up modular pair segmentation; then, at last, the homeotic genes come along to specify the type of appendage that belongs at some body segment location.
As this paraphrase suggests, there is a hierarchy of increasing refinement that culminates in the homeotic genes. There are, in fact, hundreds of these homeotic genes, and they assist in the manufacture of a bewildering range of body and organ structures. One homeotic gene will switch on several other genes, and if they, too, happen to include a homeotic gene, that gene will in turn switch on other genes, and so forth, thus producing a “cascade” effect. In combination, as though by a polynomial formula, the mutual effects of these overlappings are modulated or expanded in an orderly fashion—a fashion that may properly be called computational.
Ridley observes that once a cell knows where it is in the organism—a feat possibly attributable to the action of earlier regulatory genes—it “looks this up in its guidebook and finds the instruction: ‘grow a wing,’ or ‘start to become a kidney cell’ or something like this.” Of course, he does not mean it literally. As he proceeds to explain, “There are no computers and no guidebooks, just a series of automatic steps in which gene switches on gene which switches on gene.” Nevertheless, he asserts, there is a virtue to the guidebook analogy:
[T]he great beauty of embryo development, the bit that human beings find hard to grasp, is that it is a totally decentralized process. Since every cell in the body carries a complete copy of the genome, no cell need wait for instructions from authority; every cell can act on its own information and the signals it receives from its neighbors.
There is, in short, a sort of solipsism through which each cell conducts itself. Although it is not ignorant of its neighborhood, or indifferent to signals from its neighbors, it receives no instruction from anywhere outside itself. (This is not to exclude influences of the environment, but such influences are influential only according to the rules published in the genome.) Self-assembly—the title of Ridley’s chapter on the homeotic gene—is carried out according to each cell’s genomic instructions, instructions that are somehow generally available for each cell to read and decipher.
The wonder, then, is that the genome operates everywhere and always in the same way. “Flies and people are just variations on a theme of how to build a body that was laid down in some worm-like creature in the Cambrian period,” writes Ridley. The genome is the creative machine working always in the identical style, no matter what the organism: yeast, lily, fly, elephant, or human.
Now let us retrace our steps, and ask again whether the genome does, in fact, possess both sufficient information and sufficient power to make an organism.
First, there is a minor correction to be made in the one-gigabyte claim. More than 90 percent of the DNA does not code for proteins, and within each DNA gene there is again another percentage that is unused. Thus, rather than a gigabyte, it may be more correct to say that there are fewer than 300 megabytes of useful data in the genome, an amount that even a vintage hard drive can manage.
Then there is a curious messiness surrounding the numbers. The number of chromosomes differs among species, and for no apparent reason. Chimpanzees have 24, one more than we do. Sheep have 26 plus the X and Y; the Muntjac, a small deer, has just two, plus an X and two Y’s. Crabs have over 250. Perhaps there is no importance to this fact, but it does shake one’s confidence that everything always, and everywhere, operates in a strict order.
Another confidence-shaking fact is this: the number of nucleotides in the genome also varies wildly, and again for no perceptible reason. If the human genome has three billion base pairs, the tiger lily has 100 billion, over 30 times as many; the lowly salamander has fewer than the lily, but fifteen times more than the human. How many genes do their genomes divide into, how much junk? The answer is not known exactly, though it is known that the little zebra fish that swims in the Ganges has 100,000 genes, just as we do, and that those genes happen to be very similar to ours.
Does any of this matter? Perhaps not. What matters, clearly, is that the base pairs, the materials of the genome, function in a way that will build an organism from scratch and without any instructional aid from outside. But consider the stupendous labor to be discharged in this task. Creating enzymes, cells, skeleton, and organs, together with the systems for their regulation; monitoring, metering, repairing, timing, orientating, sensing, and coordinating in movements designed for hunting, fleeing, breeding, seeking shelter, and so on—such a task is quite beyond our ability adequately to describe.
Nest-building, for example—think what it involves, and remember that, as an instinctual behavior, all of it must be provided for in the genome. A timing event occurs, one coordinated with the season; courtship and mating behavior ensue; a nest must be built. A bird is not equipped with cognitive knowledge of the world: “twig,” “branch,” “cat,” “hawk,” “seed,” “berry” are not given to its understanding as concepts but programmed into modules of the brain. Assuming a certain species of bird, one that eats seeds and berries, builds nests of twigs in branches and flees from cats and hawks, we can see that, without knowing what these things are, the bird is nevertheless able to get nourishment, reproduce, and escape from predators.
In nest-building, as in its other activities, the bird’s brain modules must be tuned to sensory inputs, so that their vocabulary is one of light and dark, angles and colors, shimmerings, textures, scents, and accrued recollections of these elements. The bird must coordinate its flight movements of wing and tail to land near a twig, then coordinate movements of eye, leg, and beak so as to ascertain that the thing does not wiggle (that would be not a twig but a snake) and then to pick it up. It must have override mechanisms in the program so that if the twig is near a cat, no landing will take place; or so that, if the bird needs nourishment, it will eat first before collecting the twig. Once a twig is picked up, it must be flown back to the nest location, a matter of utilizing acquired memory clues of light and dark and a sense of distance, again completely without cognitive understanding. The first twig must be placed in a suitable branch for security from ground scavengers and for the engineering requirements of further construction. And so forth, and so on, and so forth, and so on again.
Think of the social behavior of ants, the web-building of spiders, the culture of elephants. Think of the growth of deciduous trees, the elaborateness of a flower’s design, the intricacies of the basal bodies in the cortex of a protozoan. No need to go on: each of us knows enough about life to generate plenty of examples on our own. Even before we ask how it all works, we must confront the ineluctable fact that the labor involved is immense, while the number of genes constructing this immensity and the bytes of information they deploy are quite limited.
There are some 100 trillion cells in the human body. Inside each cell there are many vital and complex materials; the cells in turn collect in mutually dependent ways into the many organs of the body; each organ has many design components and cell types; and there are different cell stages associated with embryological development and with the mature organism. There are so many possible combinations of cellular assignments that the number quickly becomes astronomical; in fact it becomes superastronomical, to an unmanageable degree.
It is very difficult, and always artificial, to attempt to quantify biological processes. It is very difficult because one does not know what to count as units—individual molecules, cells, sets of cells (materials), or some sort of minimum functional constituent such as bone, liver, immune system. It is artificial because we are trying to quantify a living system, and we do not know the essential or permissible parameters of such systems even if we know—which we do not—how many individual elements belong to them. But for illustrative purposes we can take a whack at it.
We are helped in this regard by the late Walter Elsasser, who after a distinguished career in nuclear physics produced four books in the field of biology. In one of them, Reflections on a Theory of Organisms (1986, new edition 1998), he looks at the numbers. To the question, how many different cells can theoretically be compounded out of the four organic elements already fixed—carbon, oxygen, hydrogen, and nitrogen—his answer is, “the number is always extravagantly large.” What he means by that is a number larger—in the mammalian case, much larger—than ten to the 100th power: a figure already so large as to be virtually infinite. For comparison’s sake, the number of protons in the universe is estimated to be ten to the 70th power.
What does this tell us? It tells us that until we know the range of permissible cell constitutions, and the method by which that permissible range is disciplined, the alternatives the genome must control will lie on a scale whose extent is impossible to stipulate.
How, again, does it control them? To capture the constructive power of the genome, three metaphors have been commonly invoked.
In the first, the genome is described as defining or actually making an organism’s “building blocks.” In a quite straightforward sense, this is correct. The genome delineates and, with the associated apparatus in the cell, makes all the proteins used in an organism. Some proteins have duties of their own—hemoglobin carries oxygen in the blood; enzymes promote chemical reactions in the cell; Titin, a 30,000-amino-acid-long protein, ties muscle fibers together—while some merge with others and with molecules of different kinds from fats to metals to become the substance of the organism’s structures: fibers, leaves, fruit, skin, bone, brain.
In the sense that building blocks are what buildings are made of, the genome makes building blocks. But the trouble with the building-block metaphor is that it suggests nothing about the dynamics of construction, or of use. It is not a bad metaphor as far as it goes, but it does not go very far.
The second metaphor is that of a blueprint. This figures regularly in many popularized renditions of contemporary genetics (as in Newsweek‘s April 10 cover story about the Human Genome Project). The image of a blueprint seems fitting because blueprints prefigure the object pictured and in so doing specify the relationships of its elements. More importantly, the design is read as a set of instructions: make it come to pass that the object looks like this. Ridley, however, forswears the use of this metaphor because blueprints are two-dimensional maps, not one-dimensional codes, and because “each part of a blueprint makes an equivalent part of the machine or building; each sentence of a recipe book does not make a different mouthful of cake.” These are trenchant criticisms.
And there is a more fundamental difficulty with the blueprint metaphor. In the genome, there is no reader, no agent to be instructed and to follow directions. It is the machinist who knows his mill and drilling machines, who knows his materials and how they can be cut and shaped; it is the carpenter or electrician who knows wood and wire and how to cut them to the proper lengths and fasten them correctly together. A blueprint presupposes this kind of intelligence, but that is exactly the kind of intelligence that, in the genome, we need to explain.
Ridley, and he is hardly alone in this, prefers a third metaphor: that of an instruction book or guidebook of some kind, a book carrying the instructions that direct each cell in the self-arrangement routine appropriate for its particular place in the organism, and that then call for the cells to fit themselves together so as to form first parts of the organism and then the whole.
“The genome is a very clever book,” Ridley writes, “because in the right conditions it can both photocopy itself and read itself. The photocopying is known as replication, and the reading as translation.” These are the only two functions ascribed to the book, but he thinks they are enough. By means of replication (from one DNA molecule to another), every cell can have its own book to read. By means of translation, proteins are made.
But does this take us where we need to go? Making proteins is important, but we still do not have a metaphor that captures the genome’s presumed power to design and construct an entire organism. That power, if it is to be found anywhere, must reside in the computational role I described earlier, and that is played by the proteins that are expressed by the homeotic genes.
It is a remarkable thing that the genome is never described as a computer, especially since the computer is a standard metaphor for the brain. Indeed, one kind of computer design—the “neural network”—bears a name that reflects the apparent affinity between brains and machines, being a parallel-processing design rather than the serial processing of the standard computer.
One trillion of the body’s 100 trillion cells are involved in the human brain, and 10 percent of that number, 100 billion, are neurons of various types. Those neurons in most cases put out thousands of connections (called dendrites) that hook up to one another in the millions, forming vast interconnected networks that are very precisely wired to one another. The brain’s system uses chemical and electrical signals that are weighed, balanced, and processed in very elaborate and complex, massively parallel ways. And yet, still and all, the brain is an inferior in the chain of command, for the genome is credited with making the entire organism, including, as one of its parts, the brain.
So, does the genome itself resemble a computer? Not in the least. It is not just that computers do not make anything. After all, sometimes they do: a powerful computer hooked up to an assembly line might very well make an automobile, a washing machine, or (when the day comes) a household robot; and it might replicate itself to boot. At some ultimate echelon of mechanical achievement, where intelligence itself is fully computerized, we might even have a system that begins to approximate the economy within an organism’s genome. No, the real reason the genome does not resemble a computer is that, so far as we understand it, anything having to do with cognition, or simulating cognition, is foreign to the genome and what it does.
Consider a few facts about the marvelous homeotic gene. In an embryo, a very small portion of the protein produced by this gene (according to the usual formula, only 60 amino acids in length) insinuates itself into one of the chromosomes in a small set of cells at some definite regional location at an appropriate time and stage of development; there it switches on a small group of genes, usually between eight and thirteen. But in so doing it is (metaphorically) quite mindless—a very minor trigger, like a stick one might use to prod a herd of cows toward the barn.
What triggers the triggerer? Nobody knows. More than that, nobody has any theoretical proposal to suggest. It is the farmer who picks up the stick, and it is the cows that know their way to the milking barn. Nothing in the homeotic story simulates the farmer’s or the cows’ intelligence. “Triggering” is an interesting biological event; it goes nowhere toward explaining construction. What kicks the homeotic gene into action? No answer exists, factual or theoretical.
And why does a leg or a forebrain form under the prodding of the homeotic gene? This is a somewhat different question from the question that I have just said has no answer. It is like the question, why do the cows come home? The homeotic gene triggers other genes, and genes make proteins. How do the proteins that are expressed by those subordinate genes find their way into a highly structured organization like a leg or an eye? Once again the answer is, no one knows. Not only does no one know, no one has the slightest idea how to look for an answer.
The flamboyant images Ridley uses, based on cognitive metaphors like reading, understanding, knowledge, intention, and so forth, are utterly irreducible to biological mechanics, and there is also no analogy in mechanical or computer technology to help us. The homeotic recital is somewhat like a tale of components for a Boeing 747 coming together to form a landing gear or tail assembly when the shipping crates in which their parts have arrived at the plant are opened by a crowbar. In a way, the crowbar is essential to the landing gear, and, in a way, the crowbar also controls the downstream pathway of the nuts, bolts, struts, wheels, tires, and so forth: for unless the crates are opened, the parts cannot be assembled. But in the genome, the parts themselves know where they belong. Moreover, they go where they belong by themselves, without any help from the crowbar, or anyone wielding it.
At last we are able to observe something very clearly: it is not ignorance that befuddles us, but what we know. We know, to repeat, that the mammalian genome contains only about 100,000 genes. And we know—we can see all around us—that what the genome must do is immense. From this knowledge we have constructed explanations that do not explain.
By way of working toward a conclusion, let me introduce a final consideration that is again based on the work of Walter Elsasser. Within a wide range of variation, there seems to be no definable concept of anatomical or metabolic normalcy. The size of a stomach varies greatly from one individual to another, irrespective of the size of the person; similarly, the location of the stomach can be anywhere from about one to about nine inches below the sternum. Biochemical variations among persons are, as Elsasser points out, even greater. In the “normal” range of blood chemistry, the ratios of upper to lower limits can be two, six, or even nine; and there are always some people whose chemistry falls well outside the normal without any deleterious effect, and much farther outside without fatal consequences. Bone density, which most people tend to think is fairly uniform, has been shown to vary by almost sixfold among young males. Since the body’s chemistry and its organs are functionally interdependent (though to what degree we do not know, and cannot measure) the possible viable combinations are, as before, “extravagantly large.”
This flexibility of design is, of course, what enables life and evolution to proceed. One interesting, even exhilarating, possibility arising from it is that individuality itself may be a feature of living systems, and that this individuality becomes most striking when we look at details rather than at crude summations. If the genome is the sole cause of organic development and maintenance, then in each case, rather than in the general case, it not only makes the component parts but adapts the entire system to itself. This takes us back to the point about solipsism I raised earlier; but what it suggests is not that (as Ridley would have it) the genome works always in the identical style but rather that it operates in an environment of such freedom that, at every point, its alternative viable courses are virtually infinite.
Two sets of conclusions follow from this, the one intellectual and the other practical.
Obviously, incontestably, the fertilization of a single egg by a single sperm cell is materially sufficient to make an organism. As has been proved by the successful cloning of goats and cows from differentiated cells, the “information” to make an organism remains within every collection of chromosomes. The genome does it; it makes the organism. But we do not know how it makes the organism, and what we do know, as we presently know it, offers us no theory, no model, of how an explanation might be framed to tell us how the thing is done. We can sequence the genome until those cows come home, but that will not instruct us how it works. The genome does too much for the capacity we presently perceive it has, and it does not do what it does in any way that we can faintly understand.
Will we ever understand it? At the risk of sounding reckless, or mystical, I would say: perhaps not. But at the very least it should be clear that until some profound new idea, some tremendous leap in human understanding, comes upon us, well beyond anything we now comprehend, the genome will indeed remain a mystery. In my own judgment, this new explanatory idea, if and when it arrives, is likely to take us very far from anything that is intuitively comfortable, farther than relativity theory or quantum mechanics ever took us in physics. Explanation awaits a genius, and probably a long developmental period thereafter; it will not come into the world full-blown.
But if, in the end, we come back to the genome after all, though by a means as yet undreamt of, will we not be returning to what we are being told by our experts that we already know for a certainty? No, we will not—any more than, after relativity theory, we came back to the same three dimensions. It is not a matter of a detail here or there that remains to be cleared up, or of a small and rapidly receding area of ignorance. Rather, everything truly essential about the process is utterly and even radically incomprehensible. Until the moment of illumination arrives, we should, in the name of scientific and intellectual honesty, acknowledge as much.
Bur there is an urgent practical and indeed moral consideration here as well.2 Gene therapy, for example, is premised on the idea that we are in fact confident about which genes do what, and (crucially) how, and that by locating and replacing a faulty or missing gene in an organism we can thereby effect a cure. This, after all, is what has happened recently in France, where working genes were added to cells in bone marrow to save the lives of infants who might otherwise have died of a particular immune disorder. Human genetic engineering, for its part, goes still further than this, manipulating genes in the germline itself—that is, at or before the sperm-and-egg stage—and hence affecting not just a single living organism but its heirs unto all eternity.
The promise held out by these enterprises is nothing short of breathtaking. But they are, to put it gently, inherently dangerous, and not just because they can mislead ill people into entertaining false hopes—although they can certainly do that. As has been pointed out in the more thoughtful news stories about the recent French breakthrough, the therapy in this one case is itself highly limited in nature, as is the disease being treated, and its implications for other diseases and other therapies are quite uncertain. More important, this achievement comes after a decade of gene-therapy research characterized by the most grandiose and sweeping promises of imminent cures for everything from cystic fibrosis to Alzheimer’s to all forms of cancer. In practice, there has been one crushing failure after another, including, last year, the death of a teenage boy in a clinical trial at the University of Pennsylvania.
“There were a lot of people going around grandstanding” in the past decade, one honest medical researcher has acknowledged in the New York Times, citing the large numbers of trials that have been approved “without extraordinary oversight in terms of scientific or clinical benefit.” This “awful history,” he went on, indicts not just the companies that have stood to benefit commercially from successful trials, and therefore have an interest in hyping the potential of gene therapy. “Some of it, unfortunately,” is the fault of “the investigators and the academic institutions.”
This is very bad, and it sheds a rather lurid light on academic and scientific pretensions to probity and cautiousness. But there is real cause of alarm when one considers just what all the “grandstanding” has been about. Genes are not machines, and we ourselves do not work like computers. Unless and until we know how collections of genes go together, how they combine to form organisms, whether plants or animals or human beings, we will not know what, really, we are doing when we add a gene to an existing organism, or manipulate the genes in a germline in such a way as to affect future generations. If the process of forming an organism possesses, as I have argued, an element of freedom, and is fraught with an infinite number of choices along the path, each of them setting in train numerous complex and far from predetermined actions and reactions, it is terrible to contemplate the responsibility that lies upon us when, profoundly ignorant as we remain, we make bold to add to, subtract from, or alter the pages in the book of life.
1 The subtitle is “The Autobiography of a Species in 23 Chapters.” HarperCollins, 344 pp., $26.00.
2 For an extended discussion, see also “The Moral Meaning of Genetic Technology” by Leon R. Kass in the September 1999 COMMENTARY—Ed.