Unifying Logic: Searching for the Biggest Truths in the Smallest Elements
An Interview with Dr. Leroy Hood
President and Founder
The Institute for Systems Biology
by Mark Compton
There are stars and then there are superstars. In the life science community, Dr. Leroy Hood would certainly be counted among the latter. Besides having been among the first scientists to advocate the Human Genome Project, Dr. Hood is credited for having played a lead role in inventing automated DNA sequencers in the mid-1980s. Moreover, he has remained over the past 30 years at the forefront of efforts to shape the technology scientists use today to read, record and analyze the massive volumes of information required to fathom the secrets of life.
After a distinguished career at the California Institute of Technology, Dr. Hood moved in 1992 to the University of Washington, where he created the cross-disciplinary Department of Molecular Biotechnology. Beyond the academic realm, he's also helped to create numerous biotechnology companies, including Amgen.
Today, Dr. Hood serves as President of the Institute for Systems Biology, a non-profit organization he founded nearly a year ago. Dedicated to the goal of uniting biologists and specialists from other fields in the effort to unravel complex biological codes, the institute was one of the 16 genome centers which contributed last year to the completion of a working draft of the human genome sequence.
[Mark Compton]: What exactly is systems biology?
[Leroy Hood]: For the past 30-40 years, biology at the molecular and cellular level has been studied from the perspective of analyzing individual genes and individual proteins. Systems biology, on the other hand, is interested in analyzing whole systems of genes or proteins. What this means is that we use tools for capturing information from many different elements of the overall system. And we have to be able to integrate the information that's obtained from all the different biological levels--DNA information, RNA information, protein information, protein interaction information, pathways and so forth. The ultimate objective is to use this information to write mathematical models that are capable of predicting something about the structure of the biologic system under evaluation as well as predicting something about its properties, given particular kinds of stimuli or perturbations.
Suppose you're flying over Manhattan and you'd like to find out how Manhattan works. You'd have to start by cataloguing the infrastructure, the buildings, the roadways, the communication channels, the cars, the bus routes and all the rest of it. You'd also have to study how power was brought into the city and how it was used and dissipated. And you'd have to study traffic patterns, work habits, human interactions and a great many other things we don't have time to talk about here. Then you'd have to take all that data and integrate them to develop a model capable of predicting how the city functions. And it's exactly the same for biological systems. We have to gather information at different levels and fully integrate it to really understand how systems work.
Why is the systems approach to biological research only now becoming feasible?
I'd say there are a lot of reasons and most of them revolve around the fundamental changes engendered by the Human Genome Project. Altogether, that has altered the way we look at biology in several major ways. First, it's led to a new kind of science which we call "discovery science", where you start by defining all the elements in an object, irrespective of any questions you might want to pose about the object. That is, you sequence all of the bases in the genome or you describe all the proteins present in a cell. By doing so, you create an infrastructure of information, which enormously enriches hypothesis-driven science. And it turns out that systems biology requires the deep integration of both discovery and hypothesis-driven science.
The second major contribution of the Human Genome Project is that it's given us a kind of periodic table of genetics parts that's going to let us define 100,000 or so human genes. Using that parts list, we can look at the regulatory sequences that dictate when, where and how the genes get expressed in different cells. And it actually gives us access to the big, exciting topic of human variation--differences in our genes that cause changes in physiology and predispositions to various diseases. Just having that table alone will change the course of biology.
And then there also have been several interesting paradigm changes that have helped to push this idea of systems biology in a very powerful way. The first is the idea that biology is an informational science that contains many different levels of information that ultimately must be integrated if we're to understand overall system behavior. The second idea is that this information can be captured through the use of high-throughput biological tools such as large-scale sequencing, genotyping, DNA arrays and proteomics analysis. All those are examples of tools that let us define all the elements in a system really effectively. And, of course, you can't actually study systems before you can at least define what the elements in the system are.
Another important point is that we've discovered that computation and statistics and applied mathematics are critical to improving our understanding of biology. Taking all that information and capturing it, storing it, analyzing it, graphically displaying it and ultimately modeling it and distributing it requires the tools of computer science. And another crucial point is that we've learned the importance of model organisms. That is, in order to study a system you have to be able to perturb it. And you can't carry out many perturbation experiments in humans for a variety of ethical reasons. So we have to use mice and flies and worms and yeast and the like, because there we can perturb the biology and analyze within a systems context how all the elements of the system behave and respond to various perturbations.
Of course, underlying the power of model systems is another fact that genome research has made explicit. And that has to do with the grand unity of life--the fact that organisms from bacteria or yeast up to humans deploy at the most basic level very similar informational systems. Which points to life descending from a common ancestor. The great advantage there is that we can learn a lot about humans by studying the informational pathways in much more primitive and manipulable organisms.
Does that suggest that sequencing the mice genome or the fly genome or the yeast genome is at least as important as sequencing the human genome?
Oh, it's critically important. Absolutely. There are some who have argued we would have been better off doing the mouse genome first, but Congress never would have funded that research with the same level of enthusiasm. They never would have understood the importance of the mouse as a model system. But that actually gets us to another paradigm change--the whole idea of comparative genomics. That's where you take a genome and decipher the logic of life in that organism and then work to compare that with other organisms to learn how the logic of life changes from one to the next. That actually gives you fundamental new insights into evolution, its constraints and all the information pathways and networks that come into play.
Advanced computing capabilities obviously play a critical role in all this. In fact, I understand that IBM is currently working to develop an experimental computer called Blue Gene to enable the modeling of protein behavior. But even though that represents only a fraction of the overall systems knowledge you're pursuing, IBM believes Blue Gene will need to be at least 100 times more powerful than today's fastest computer--that being the monstrous Lawrence Livermore Lab system used to model nuclear warhead yields. Doesn't that suggest that the microbiological systems you're hoping to study are complex beyond imagining?
Well, in the midst of all that complexity there also is clarifying simplicity. So what one has to do in systems biology is search for relationships that tend to simplify the various levels of complexity. To put it another way, if you want to think about networks strictly in terms of biochemical mechanisms, things can get enormously complicated. But you can also ratchet it up to a higher informational level and look at how quantitative expressions of proteins or RNA change as you perturb systems. And as you look at these higher levels, you can tremendously simplify the system and yet learn about it in very deep ways. It's just like when Maxwell and other people wrote equations in the 19th century that allowed exact solutions to some very complicated electronics problems. But over the next 100 years, a lot of engineers came along and developed high-level approximations of those solutions. They didn't care about exact solutions at all. And light bulbs, television, satellite radio and all manner of things came out of that. So you can see that by taking a higher-level view of physics, we've managed to skirt a lot of bewildering complexity and yet develop plenty of practical applications. And that's exactly what's going to happen to biology. You don't need to understand all the nitty gritty details to be able to gain fundamental insights into biological systems and to learn how to manipulate them in the interest of preventive medicine.
But, given that your goal is to see how all these various components of an overall system relate to one another and interact with one another, doesn't that suggest it's absolutely critical for academic researchers and biotechnology concerns to collaborate and freely exchange information with one another?
It does. And I think everybody in academics would agree to that. Having the genome available to everybody has changed biology more than any other single event in the history of the science. Were enormous amounts of data to be gathered up by certain companies and made unavailable to academics, that would be tragic. That's why the government is going to have to pay to make all that data freely accessible.
Certainly, over the course of the Human Genome Project, a rather emotionally charged schism emerged between private and public research interests.
You know, I think that was largely a schism between personalities. Well, that and the other schisms that arose due to the different mandates of academia and industry. But I think, from day one, many of us in the community felt there didn't need to be all that heat and dissension. It was quite obvious from the start that the investors who paid $340 million to set up Celera weren't going to make everything public since they were entitled to some return on their investment. But [Celera CEO] Craig [Ventner] made the whole thing quite confusing by claiming early on that Celera was, in fact, going to make everything public--even though it was pretty obvious that was an impossible business arrangement. They could make some things public, but they certainly couldn't make everything public.
There are many, of course, who are very concerned about how patent law is currently applied to genetic discoveries, arguing that the tests are much too broad and that the technology itself is poorly understood by the patent office. What are your thoughts?
Well, I certainly agree that the present patent system doesn't capture intellectual property in biology very effectively because it's predicated on thinking that applies to the patenting of machines. In 1900, half of all the US patents were for bicycles, just to give you an idea. And living organisms aren't machines. My argument for the past three to four years has been: "Look, we should completely rewrite the intellectual property laws for biology." We should rewrite them from the point of view that biological systems are informational systems. And if, in fact, the hierarchical nature of biological information were taken into account, I think you could write a set of patent laws that would be perfectly reasonable and consistent. On the one hand, they'd give the companies what they need--that is, the protection to develop particular products. But on the other hand, they wouldn't allow claims to be so global that they end up choking the development of whole fields. I think, in general, there has been a legitimate concern that patents on fragments of gene sequences might let the patent holder gain access to the whole gene or claim ownership of all the proteins that gene is capable of encoding. The fear is that if you capture too much information via a patent at a primitive informational level, that could end up dominating higher levels of information as well. So my argument would be that, if we can't restructure the patent laws to take into account the informational system viewpoint, we should at the very least make quite stringent requirements that define exactly what's demanded to gain control of a gene or a protein--or even an overall system.
Does that mean being able to demonstrate its therapeutic benefit?
I think it really means two things. One, you really ought to know something about the entity you're trying to patent. That is, if it's a protein or a gene, you ought to have a real biological assay that shows its function. Two, you shouldn't have any right to claim a completely new function that someone else later discovers in that gene. Now, you might want to provide for cross-licensing, but I don't think that merely sequencing the gene should give you property rights over all the wonderful things that gene is capable of doing.
In the interest of learning more about these complex systems, your institute seems to be busily adding chemists, computer scientists, engineers, mathematicians and physicists--in addition to biologists--to the fold. What is it about the study of biological systems today that calls for such a multi-disciplinary approach? Is that purely a function of the complexity of the biological systems under evaluation or does it stem more from the convergence of traditionally distinct test and measurement technologies?
Well, it's really all of the above. On the one hand, we need these cross-disciplinary skills because we have to invent new global technologies for analyzing systems information ever more effectively. So it's by drawing on all these different fields that we're getting better DNA arrays, better devices for sequencing and certainly much better techniques for proteomics research. And the techniques that are going to improve these technologies are increasingly focused on miniaturization--in particular microfluidics and microelectronics. And later on we'll probably see more of an emphasis on nanotechnology and beyond. So we need all of these physical, electronic and chemical tools to be able to apply these new global technologies. Moreover, the imperatives for integrating information technologies with biotechnology are enormous. So all of the major problems in the IT area represent challenges in biology as well--whether the concern is data mining, data warehousing, the integration of heterogeneous data formats or any of the other really daunting IT problems. So we need computer scientists, mathematicians and applied mathematicians to generate the computational tools and algorithms that we need to deal with biological information. Finally, there's the whole realm of how to develop models, which will require completely new kinds of integrative mathematics and modeling mathematics. Those are the kinds of questions that biologists don't have the background to even approach, so we'll have to overcome those challenges in partnership with computer scientists and mathematicians and systems engineers and the like.
So you're telling me you've learned a lot more about statistical modeling than you anticipated 30 years ago.
Well, yes. And about all the other disciplines as well--about modeling, about computational tools, about algorithms, about chemistry...I mean, in just about every dimension. If you want to push the envelope, the tools you need rarely lie solely in the biological realm. That isn't to say we can't improve our current cloning techniques. But the really big advances are going to be achieved through microfluidics, nanotechnology, computational biology, single-cell analysis and single-molecule analysis. Those are the real frontiers for deciphering biological information in the future and they all require scientists who are familiar with bleeding-edge techniques in physics, mathematics, engineering and chemistry.
What do you hope to accomplish by discovering how normal physiology differs from the physiology of diseased individuals?
The future of medicine is going to be revolutionized by one thing--and that is the study of human variation. So in a sense, the first generation of genomics was dedicated to sequencing the genomes. The second generation in genomics and proteomics is going to be focused on studying the variations the genomes exhibit, and to correlate those variations both with normal physiology and with predispositions to various diseases. Medicine, in turn, is going to go through three phases. It's now in what I call the reactive phase. That is, you generally wait until people get sick before you try to make them well. We'll move to a predictive phase--and in some cases, I think we've already moved there. That is, we can look at your DNA and make probabilistic statements about how likely it is you'll contract cardiovascular disease or cancer or some neurologic or immunologic disease. My guess is that, over the next 20 years, we'll identify hundreds of genes that give us some very good markers for many of the most common late-onset diseases. And that will lead to the third phase, which is what I call the preventive phase. That is, we'll then take these defective genes, we'll define the systems in which they operate, we'll learn how to manipulate the systems through the application of preventative drugs, and then we'll have the ability to say to people: "In 20 years, you're going to have a lot of heart trouble unless you start taking this pill on a regular basis." And that will allow us to significantly extend the average creative and productive lifespan of individuals by steering them clear of these late-onset diseases.
And when would you expect we'll begin to enter that preventative phase?
The preventative phase and the predictive phase are going to be intermingled all the way across. In some cases, once we discover the genes we're interested in, we can begin to think about what we can do to prevent the disease. Look, the cardiovascular arena already offers a wonderful example, with some of the chemicals that have been designed based on an understanding of the lipid metabolism to lower cholesterol. And these sorts of developments have had a marvelous effect on heart disease. So some things have already started to happen. Some of the more complicated diseases--in the nervous system and such--may well take another 20-30 years.
Of course, historically, pharmaceutical companies have displayed a bias toward research on diseases that appear economically viable--that is, where the number of potential patients looks to be broad enough to support robust sales. And that's created the so-called "orphan disease" problem. How do you suppose we're going to end up dealing with that?
Well, I think that's quite simple, actually. That is, if systems biology lets us analyze these problems thoroughly and effectively, we should be able to stratify patients so we know exactly how they're going to respond to drugs. And that will bring down the whole cost structure of creating new drugs from $600 million per drug to something more on the order of tens of millions per drug. And if you bring the cost structure down like that, then you can begin to address a much broader repertoire of diseases. Now maybe there will still be some really rare diseases that, even with those economies of scale, still won't be economically viable for drug companies. And so maybe in those cases the government will have to step in, as it already does to a certain extent, to help underwrite the necessary drug development.
Speaking of the government...As systems biology tools become increasingly sophisticated, do you think that will help simplify and streamline the FDA approvals process?
Yes, I do. That is, if you can go to the FDA and say: "In this population of people with cardiovascular disease, we've done this array analysis and we find that it's this 20% that's going to respond really well to this drug," then we'll have succeeded in eliminating many of the variables the FDA worries about most--namely, inadvertent reactions or the lack of a positive reaction.
On another note, one of the goals you've established for the Institute has to do with fostering a better understanding of biology throughout society. But the charter of your organization also calls for you to work in concert with pharmaceutical and biotechnology companies which, to date, have shown little enthusiasm for dialogue with the public. Might not your first challenge be to educate industry about the need for that dialogue?
I think that education already is in progress. I think you can look to early pioneers like Monsanto, which got the bovine growth hormone issue out before the public almost 13-14 years ago. They spent a lot of money at that time on public education. And I think they really did a reasonable job in that regard. Now, they didn't do such a good job with regard to their genetically engineered foods. But I think more and more of the large pharmaceutical companies (and even some of the smaller biotech companies) recognize that we have to start educating the public because, in the end, that's going to determine the level of resources we get and the laws by which we'll be constrained. So I think that point of view is becoming more and more commonly accepted. Now it's still a long ways from acceptance to aggressive involvement. And encouraging greater involvement, I think, is going to require some effective proselytizing.
But part of the issue is: how do you educate the public? There's not a simple answer to that. The approach we're most excited about here is K-12 science education, which I think is marvelously effective. But it's going to be some time before the kids you're educating today are going to be voting citizens. And that leaves the question of what to do about all the adults that are out there today. Which is really a challenging question. The Nova programs and all that are wonderful but they tend to educate those who are already somewhat knowledgeable. How do you get to the ordinary lay public? That's a question I just can't answer.
I know Eric Lander of the Whitehead MIT Center for Genome Research has been quoted as saying that the public is "deeply, deeply uneducated" about the issues pertaining to biotechnology.
Oh, and they are! There's no question about that.
So an interesting question that comes out of that is: during this interim period, how can the public be effectively drawn into meaningful dialogue?
Well, that's hopefully some of what we're trying to accomplish here. But I don't know if articles like this published on the Internet are going to be the key. I don't know if regular commercial TV is going to be critical. I just have no idea how you're going to break through all the noise created by sports and classic entertainment and all the things people spend most of their time consuming.
Going back to an earlier comment, do you think it's the case that industry really wants to educate the public, but just isn't sure how to go about it?
I think industry is at all levels of sophistication on this subject. I think the more sophisticated pharmaceuticals fully recognize the importance of public education. There's no question about that. Most of the smaller biotech companies are just struggling to survive. They don't have the time to think about the niceties of public education. But it's my feeling that young scientists are much more aware of these issues than the scientists of my day ever were. And it's my feeling that the executive leaders of the large pharmaceuticals and even some of the biotechs are really aware of these issues. But getting them to do something about it is another challenge.
In an earlier interview with noted health law expert George Annas, he worried that it was possible that someday we might come to the point where the word "bio" would become as repugnant to popular sensibilities as the word "nuclear."
You know, I think that's unbelievably unlikely. And I'd like to think it's unlikely because, unlike "nuclear", what "bio" has going for it is that it's going to help us improve the human condition by overcoming human disease. And, for all of us, I think that's a deeply personal issue.
About the Interviewer
Mark Compton monitors trends in information technology and biotechnology from a comfortable perch midway between the Silicon Valley and Oregon's Silicon Forest.
