| Introduction | Timeline | Credits & Copyright | Archives | heart beat

Dr. Frank Harrell's Interview


Harrell

Dr. Frank Harrell, PhD. Professor of biostatistics and department chair, Vanderbilt School of Medicine; Nashville, TN.

Dr. Frank Harrell was a statistician in the Duke databank. In this interview, he speaks about statistics and about the history of the databank from his own perspective.


Click here to go back to timeline.

TRANSCRIPT OF INTERVIEW

INTERVIEWEE: Dr. Frank Harrell

INTERVIEWER: Jessica Roseberry

DATE: April 30, 2007

PLACE: Dr. Harrell's office, Vanderbilt School of Medicine, Nashville, TN


JESSICA ROSEBERRY: This is Jessica Roseberry. I’m here with Dr. Frank Harrell. And he’s professor of biostatistics and the department chair at the Vanderbilt School of Medicine. It’s April 30, 2007, and we’re here in his office in Nashville, Tennessee: Vanderbilt’s campus. And I want to thank you very much for agreeing to be interviewed today, Dr. Harrell; I really appreciate that.

FRANK HARRELL: Great to be here and an honor.

ROSEBERRY: Thank you. If you don’t mind just letting me know how you got started in the field of statistics, how that began for you.

HARRELL: Yeah, I was a high school student in Birmingham, Alabama. And I was bored one summer, and my mother said, “Why don’t you volunteer at the VA hospital to be a male candy striper?” And so I volunteered, and the group I was working with, besides needing help moving patients around, they did a lot of research. This was in gastroenterology. And I started helping them with data. And they needed a whole lot of data summarized. They would give me these sheets and sheets of data. I used a hand calculator. And I got interested in data, and I was always interested in math. And I started meeting the biostatistics faculty at UAB [University of Alabama at Birmingham], especially David Hurst, and they started telling me about the field. And then I got into UAB and majored in math, and I kept working and transferred to cardiology. And I got to working with the myocardial infarction research unit [MIRU] and the cardiac cath lab and started developing programs to analyze their data, especially the pressure waveforms and ECG. That was fascinating to me, and I did a little bit of statistical work. And then I still wanted to go into math in graduate school, and Professor Hurst said that, “If you go into math, go to the best school—” at the time—“for PhD in math,” which was UCLA. And he said, “By the way, last year they graduated ten PhDs, and nine are still looking for a job, and one’s an elevator operator.” (Roseberry chuckles) And he said, “Why don’t you go into biostatistics and go to University of North Carolina and get a supporting program in—“ I think he said computer science and biomedical engineering. And I did exactly what he said, except my supporting program was physiology and biomedical engineering, and on my project for my supporting program, I worked at Duke with Harold Straus in cardiology and Frank Starmer and Kerry Lee. And then when I got working on my dissertation, Kerry Lee—I asked Kerry to be on the committee, and he was just a wonderful person to work with. And he was always responsive and just always cared about my work, and he also let me know about a job opportunity at Duke. And so when I was ready to apply for positions, I only applied for one position, and that was the one at Duke to work for Kerry. So Kerry let me know about Duke, he recruited me, and got me started in just a wonderful way.

ROSEBERRY: Well, I understand that there was a fairly strong connection between UNC’s [University of North Carolina’s] statisticians and that department at Duke. Is that right?

HARRELL: That’s right, and that was thanks to Gene [Eugene] Stead and to the chair of Biostatistics for a long time at UNC, Jim Grizzle. And something got Gene Stead really excited about biostatistics, and I don’t know really how it all started. But he was—he thought it was important, and he was very excited about it. And he was extremely excited about this link with UNC. And so he was proud to make sort of a farm system from UNC Biostat, which is one of the biggest departments anywhere, to start this group at Duke. And there’ve been good statisticians at Duke over the years, like Michael O’Fallon and Kerry Lee, and they can probably tell you some more of the ancient history (laughing). But this partnership with Stead and Jim Grizzle was really exceptional. And the last time I heard Dr. Stead speak, which I think was just about a year before he died, he was still talking about it. It was amazing (laughs).

ROSEBERRY: So it really got him excited, it sounds like.

HARRELL: Yeah. And it was a great opportunity for us who came to love the Triangle area to be able to go to a great graduate program—just like Kerry Lee went to UNC—and to have a job in that area, because a lot of people come to there, and they don’t want to leave. So it was a wonderful setup. And then we were active—even though we were at Duke, we were active in things at UNC. Kerry and I would teach courses and direct students in the biostatistics program. So it was a great marriage.

ROSEBERRY: Well, you’ve mentioned Dr. Kerry Lee several times. Do you mind telling me a little bit about him?

HARRELL: Well, one of the most special persons I’ve ever met in my entire life. And I feel a very dear spot for Kerry, because not only did he give my first really big job, he created an environment that was completely fun to work in. And the discussions he and I had through the years about statistics and cardiology and analyzing cardiovascular data were so stimulating and just plain fun. And he’s so wonderful to talk with. I never saw Kerry get mad. I probably did many things that should have made him mad. The only time I ever heard him get mad, someone in another group did something that wasn’t very wise, and I think Kerry said to them, “The way you did that was suboptimal.” That was about the strongest (laughing) he would say it. And so—but then if you look at Kerry as a person, the example that he sets for other people is such that it makes you a better person. And the patience, the intelligence, the way he smoothes out edges. So you know, life is a roller coaster: there’s always ups and downs. And when you know somebody like Kerry, he takes the roller coaster ride and smoothes it out. And his calmness and his wisdom have a great impact on everyone around him.

ROSEBERRY: Well, you said he made statistics exciting, or the department exciting. I wonder what was exciting to you about—?

HARRELL: Well, first of all getting an opportunity to work on data that was exceptional data. And when Kerry was working full time on the databank, he and Bob [Robert] Rosati and others recognized that there was more statistics persons needed to handle the quantity of research and quantity of data there, so he helped expand the group. But he made it fun in that we actually got into a lot of controversial issues of how data should be analyzed and how you should make conclusions from data that were not from randomized trials. Because in the earlier years of the databank, all the research was done on observational data. Almost all. So we had to get into all these discussions and all this back and forth between Kerry and myself and all the cardiologists. So the cardiologists were intimately involved with the statistical issues. And we actually had some wonderful arguments. I mean, with Kerry, it would never look like an argument. It would just look like a friendly discussion. With some of the other people in the group, it sometimes looked like an argument. (chuckles) But it was always good, because the arguments were at a high level intellectually. And so Kerry was always open to any ideas and always had great feedback when you said, How about this, Kerry? Do you think this idea might work for analyzing this kind of data? So that made it a lot of fun.

ROSEBERRY: What were some of those controversial issues?

HARRELL: Well, there were issues about how to handle continuous variables. Like one of our strongest variables we ever had was left ventricular ejection fraction. In the early days when we started out, we didn’t really know how to consider that variable, because if you look at the effect of it against cardiovascular death, it’s not a linear relationship. So once the ejection fracture’s up above sixty, it doesn’t really matter how high it gets—or, say, above fifty-five maybe. So it has kind of a flat relationship there, and you don’t get any extra benefit from having a seventy that we could see, but the slope of it, as you go from the critically ill level of, say, ten, all the way up to fifty or so is pretty steep. So it took us a while to figure out how to handle that kind of variable, because that was the days before the statistical tools were developed before they are now in terms of non-linear statistical modeling. So we had a lot of discussions about that. And another variable that we liked was the AVO2 difference—

ROSEBERRY: What is that?

HARRELL: That’s something that was collected, I think, in the right heart cath [catheterization]. And when Duke got into the real coronary artery disease, high-throughput era, I think right heart caths sort of got out of vogue. And so they quit collecting that variable. It measures, I think the oxygen uptake and how much oxygen is available, and so it’s sort of the efficiency of the circulation throughout the body. But one of our statisticians, master’s level statisticians—I think it was Karen Pieper, probably—that she saw one of the cardiologists one morning when Duke cardiology was really booming, and she said, “You guys are doing cardiac catheterizations left and right.” And he looked at her, he said, “No, just left.” (laughs) So there were a lot of variables that we had to figure out and also how to analyze categorical variables such as New York Heart Association class. So we started out not knowing how to handle these variables that are ordered, but they’re not continuous like ejection fraction. So you have class 1, 2, 3, 4, but it’s not the very, very powerful variable. We evolved to the point of trying to understand how that variable had some information that was redundant, because once you know the left ventricular ejection fraction, you don’t need to know quite as much about these more subjective class variables like New York Heart Association. So there was a lot of modeling issues, and then there were issues in relationship to the fact that the Duke databank collected a lot of variables on each patient. So it had more data than we were used to dealing with. We had to have some ways to deal with multiple variables and to distill that down to something that the cardiologist could deal with as well as the statisticians.

ROSEBERRY: So were you developing the models yourself, or were you using existing models?

HARRELL: Kerry and I—one thing we tried to do was to keep up on the literature about what were the best available models. And I think we did a pretty good job of that. So we were using models such as the Cox proportional hazards model by trying to use them in a pretty smart way, and by trying to make them very useful in everyday analysis. Because in those days there wasn’t the kind of software that people have, and of course computing is a whole other story, what we were doing back in the dark ages with computing at Duke. So we had to develop a lot of software to make those models useful, especially for our fairly large database.

ROSEBERRY: So I’m not a statistician, so I wonder if you could maybe boil down for me what—was the statistics that you were using, was it predictive for what the cardiologist or the surgeon might do? I mean, was he or she able to see, if I plug in this variable, then that means this outcome will—?

HARRELL: We were—yeah, our main—and this is where Bernie McCants comes in. Because we had this amazing follow-up system that Bernie led and still leads that follows every patient for long periods of time. And so you have these magical outcome variables like how many days it was from the cardiac catheterization until the patient had a heart attack or death, and was it a cardiovascular death. And if they haven’t had those events, we would know how long they’d been followed, so you can make use of the fact that somebody was followed seven years, and that they had not had a myocardial infarction or death. And that’s very good to know. So you would treat them as—if they were seven years out, you would treat their time to an event as seven plus, but plus what you don’t know, because you haven’t followed them beyond that point. And so using the Cox model, you can analyze that kind of data very efficiently. So our main outcomes were time until cardiovascular death, time until cardiovascular death or myocardial infarction, nonfatal. And sometimes we analyzed time until death due to any cause. And we had a few other outcomes we looked at, but those were the most important ones. Those were very hard outcomes that give you a lot of information in a very objective fashion.

ROSEBERRY: And then you were able to communicate that with the physician, the surgeon.

HARRELL: Right. So we could give estimates of the probability that somebody would go five years without having a cardiac event. Or go one year or go eight years, and we could give a whole survival curve. So this was one of the few if any places that was giving out prognostigrams. These were from statistical models, and we also supplemented that with finding similar patients in the data and reporting their outcomes, more of an empirical approach. So we could estimate these long-term survival trajectories for individual patients and feed those back. And we could do that for both predominant treatments at the time, which were medical therapy and coronary bypass surgery. And you could give these survival curves, and you could point out something that’s still important today, which is: patients who get surgery, they undergo a high risk for a short period of time. If they make it through that high-risk period—which is often just a day or so—and they survive that, their risk might be lower. But by showing the curves, you can see that they cross. And so somebody who’s thinking about getting surgery instead of medical therapy—because in those early days we didn’t have angioplasty or anything else—they would have to think about this risk and see if it’s worth it, to get the long-term benefit. So that was very helpful in decision-making for patients and for their cardiologists and for their other doctors. So—and it was very interesting, and we had to work very hard to make those estimates reliable, because we didn’t want false information to be fed back to physicians that would result in faulty decisions. So it was a real challenge. And then one of the questions that was hot in that day—and I think we did some neat work in that, that we should have probably published more about—is that some surgeons would not take very high-risk cases. And some referring physicians thought if you were high risk, surgery was a bad idea. So we did an in-depth analysis to try to estimate how high would your risk have to be for dying during the surgery or shortly after before the benefits of surgery would be washed out by this extra risk. And so virtually everyone thought that if you had a risk of dying in the operating room of 25 percent, that you were a really bad candidate for surgery. And a lot of surgeons wouldn’t take a patient like that. Because if the patient did die, it would look bad for the surgeon’s operative mortality statistics. Later, much later, people were giving surgeons back mortality statistics adjusted for how sick their patients were. So it became less of an issue. But in the early days, 25 percent or higher looked really terrible. But we were able to show, with all the really wonderful data from the databank, that we could not measure a point beyond which the benefits of surgery did not outweigh the risk. So we even estimated that somebody had, say, a 70 percent chance of dying from surgery, that their chances on medical therapy were even worse than that. And it also depends on what is your time horizon. Because your medical chances in the first thirty days are going to be better than if you were treated surgically. But much after that, the medical survival for these very high-risk patients was very, very poor. So we thought that there was not a real break point—this is not a good surgical candidate. But really the physicians had to think hard about what was their time perspective. There’s a famous cardiac surgeon that I got to work with, John Kirklin, and the way he would handle it is he would ask, “What do you want to be alive to see?” And if the patient says, My son is graduating in five months, he would say, Maybe you want to hold off on this surgery. And if the patient said, Well, my daughter is just starting college, and I would like to see her graduate in four years, and he would say, Well, maybe this surgery is the thing for you. So he had a wonderful way of talking to the patients.

ROSEBERRY: Did you find those estimates to be fairly accurate?

HARRELL: Yes, we did. We did a lot of testing on the prognostic models, and we also kept refining how we developed the models to make that process more accurate. I think in the long run, the models were more useful than the physicians would make you think. Because we found that once there were a lot of financial pressures and there was maybe too much money being made by being more invasive with the patient, that the prognostic estimates got downplayed. And that was kind of a disappointment, because we thought that they would drive medical decision making. And then we did a lot of work in terms of diagnosis. So how would you diagnose a patient for a coronary disease before the invasive cardiac catheterization? And we developed some wonderful models, and David Pryor did a lot of the leadership of that. We developed great models and published some great papers on that. We thought those models would change Duke in terms of who was getting a cardiac catheterization, but it really didn’t very much. Which was kind of sad. But doctors are very busy, and sometimes treating the patient in front of them is something they focus on a great deal; stepping back and looking at the value of maybe changing the treatment decisions in groups of patients is not something that they have time to think about always. And then there’s the financial pressures. So we had a lot of models that we thought should have changed medical practice but did not.

ROSEBERRY: So the doctors maybe who were using the databank, the cardiologists—there were a few cardiologists who were very interested in the databank and then some cardiac surgeons as well, but then beyond that, maybe those were the physicians that were not as focused on this kind of work?

HARRELL: (speaking at same time) To some extent. And even the ones on the front lines at Duke, I don’t think it affected their practice maybe as much as we’d hoped. But I think in many ways the published results from the Duke databank had more of a result than they did at Duke.

ROSEBERRY: In what way is that?

HARRELL: I think we gave a lot of physicians very objective information about long-term survival and risk factors. And I think people that didn’t work at Duke would not have access to that information. So trying to summarize it in a concise way for publication—which is a real job in and of itself, and our cardiology researchers at Duke were amazing at that. Including Phil Harris, Rob Califf, Mark Hlatky, David Pryor, Galen Wagner. and many others. Summaries that were in published research papers were quite useful to people. And so I think it did have a big impact there. And we could show the practicing physicians and research physicians what they thought were strong risk factors were not as strong, and some things they didn’t really take into account were pretty strong. And we had some pretty surprising results. And I think the most surprising to me was something Kerry worked very hard on, working with Bob [Robert] Jones, because Bob was a great researcher in the area of radionuclide angiography imaging for diagnosis and prognosis. The prevailing wisdom was that if you had the luxury of measuring the physiology of patients at rest and then at exercise and you studied how exercise stressed the heart and it changed the cardiac function, then it would tell you a lot about the reserve that the patient has: what sort of coronary bloodflow is there to the major muscles of the heart? And so the wisdom was that if someone changed a lot in their left ventricular ejection fraction from rest to exercise, that that was a marker. That if that dropped, it was a marker for coronary disease. And so luckily, Bob had done about a thousand radionuclide angiographies, and so I had this great data, and Kerry—and I helped Kerry a little bit—starting looking at that data and analyzing it. And we were very surprised to find that the drop from rest to exercise was almost not even correlated with having coronary disease. And the reason was, where you started at rest really didn’t matter. The only thing that mattered was, if you put the heart under maximum stress, how well did it do? So you can forget whatever you started at, how well the heart does under no stress. Just forget it. And so what Kerry showed was that the left ventricular ejection fraction under peak exercise was amazingly predictive of things, especially prognosis. And I don’t think we’d ever seen a risk factor that strong and probably haven’t seen one since. I think Kerry and I thought that this would really change practice, because Kerry showed that the information in this left ventricular ejection fraction at peak exercise exceeded the information that great cardiologists get out of the cardiac catheterization. It had more prognostic information than that. And so when you think about what it means to put catheters up into coronary arteries and inject dye and actually see closures, it’s very powerful to know how big the closure is and where there’s coronary stenosis. But to measure this other thing that doesn’t require that degree of invasiveness, and to have it have more prognostic information about long-term cardiac events was just amazing. So we thought it would change things, and I don’t think it ever did. And I don’t think exercise ejection fraction was ever collected outside of maybe that series at Duke, was ever really collected in such a rigorous fashion to be useful in other settings. And I’ve always regretted that. And I’ve talked to Bob Temple and other people at FDA about using that kind of endpoint as an endpoint in clinical trials; I think it would be a magical endpoint. And I think people have not really taken that up. So maybe someday it will make a comeback.

ROSEBERRY: Well, why do you think some of those things didn’t transfer over into the clinical realm?

HARRELL: Well, I think inertia in medicine is a very powerful force. I mean, the worst way to say that is—there was a famous physicist, Max Planck, and he said that people don’t change, they just die, and they’re replaced by the next generation. (laughs) So I think that’s a pretty pessimistic view, and I don’t think it’s that bad, but I think he was half right. Some people get into habits, and the people that get into the worst habits that are the most unchangeable are actually statisticians moreso than cardiologists. So Kerry and I felt that we would change how we’d do statistics according to what works the best. But we saw a good many statisticians that were trained in a certain way or used a certain kind of software, and they were going to keep doing that no matter what. And there is a certain fear factor of having to learn something new, which I think the cardiologists also probably felt in their practice—and then it takes work. So if you want to learn something new, like a new way to diagnose coronary disease, it takes real work to understand that. And then it takes work to change how your clinic does treadmill exercise tests or how it does some imaging. It takes real work; it takes a risk, and then there’s financial disincentives, which is the worst excuse of any. But because of our healthcare system, very often we have to help hospitals make money by doing as many high-cost invasive procedures or high-cost noninvasive procedures as possible, because the society doesn’t really provide the right care for the uninsured. So hospitals have all these pressures. And of course, there’s individual financial pressures, because some individuals who do procedures make a lot of money on each procedure they do. So that creates another kind of motive. So it’s very complicated.

ROSEBERRY: Were there examples of times when the work you were doing did change someone’s practice or did change a—?

HARRELL: I think the things that come to mind is when we got into therapeutic comparisons, that—and I can say a lot more about that—that had a pretty big impact. I think also some of the risk-factor discovery that we analyzed in detail. So Phil Harris was really into Prinzmetal’s angina and these different variants of angina that are not the typical presentation. And he showed the value of knowing some of these symptoms that are not talked about as often. And there was some work that Marty Connelly did with regard to what is the prognostic impact of having a heart attack during coronary bypass surgery? So I think it was around 8 percent of patients were diagnosed as having a myocardial infarction during bypass surgery, which used to be more of a stressor on the patient than the way it’s done now. And then he showed that if you do have such a heart attack, that your later risk of dying is higher, as you might suspect. But he actually quantified that, and once you have a number to put on it, I think it gets people’s attention more. In terms of the therapeutic comparisons, in the early days, many of us, especially Bob Rosati, thought that coronary bypass surgery was being overhyped. And in some ways it was in the early days. We found that the majority of patients were not getting survival benefit of coronary bypass surgery versus medical therapy. Now, this turned out to be a time-dependant phenomenon. It turns out to be dependant on how long you follow patients. And in the early days, we didn’t have the very long follow-up that’s available now under Bernie McCants’s leadership. But it also depends on the evolution of the surgical technique. So when bypass surgery was just getting to be a very big phenomenon, which was probably in the late seventies, mid-seventies maybe, the surgeons hadn’t refined the technique. And they hadn’t refined exactly which blood vessels to harvest to graft in in place of the occluded area. That evolved quite a bit. And Scott Rankin, who’s actually now here in Nashville, and I’m working with him again, when he was at Duke, he did a lot of this kind of research along with Doc [Lawrence] Muhlbaier and Dick Smith, and they learned over time how to do the bypass surgery better and better so that as we started to analyze the data in later years, bypass surgery seemed to be more effective in more patients. And we could quantify that very well because of the long-term follow up. And we also knew details about how the surgery was done, what kind of blood vessels were used during the operation and so on. And so it looked like bypass surgery was more helpful than we thought, in larger numbers of patients, especially those that had more disease like left main disease and bad three-vessel disease. And then angioplasty got big, and we started doing three-way comparisons. And there weren’t many groups in the world who could do three-way comparisons to look for survival benefits long term for the less invasive approach of the—this was balloon angioplasty at the time—and invasive bypass surgery versus medical therapy. I think when we started doing these three-way comparisons that I think that Rob Califf and probably Dan Mark were spearheading, it was really putting the three therapies in perspective and trying to find out which patients should maybe be treated by which one. But it was just very good, objective data. And it didn’t always tell you exactly how to practice cardiology, because you have these curves crossing, and it remained very important, What do you mean by your time perspective as a patient? By what period do you want the benefit to be there versus an early harm by getting a certain therapy? But I think we learned a lot about the three therapies. And we also did I think the most comprehensive job ever undertaken in therapeutic research in cardiology to really understand who was getting selected for which therapy. Because those were not randomized studies, and the selection was primarily driven by the cardiologist with some input from the patient. The patient mainly trusts the cardiologist, with good reason. The patient may give a little input, like if they’re afraid of surgery or whatever. But it’s driven mainly by what the doctor knew about the patient and the particular risk factors that the patient had as well as the coronary anatomy of the patient. And the disease in the coronary arteries. And so that means that when you analyze the data and you look at people that happen to get bypass surgery versus those that happen to get angioplasty versus those that remained on medical therapy, it gets to be very tricky, because that’s treatment by indication. That means that your allocation to the different treatments is not random. And if you just compare the treatments and look at the outcomes, that will be biased. So you have to get into a very complex adjustment for who gets which treatment. You have to really understand that. The databank measured so many variables that we could do extremely comprehensive analyses of who was selected for which treatment. And the selections generally make sense. Somebody that more severe coronary disease, they tended to have more chance of getting bypass surgery. Somebody with one-vessel disease would get angioplasty, or a lot of them would get medical therapy. And then patients who had worse ventricles—if part of the ventricle was essentially dead; it wasn’t pumping, so the left ventricular ejection fraction at rest was low—the patients were more likely to get medical therapy. So you had to adjust for all of these reasons why the patients got the treatments. And that takes a lot of statistical analysis and a lot of great data. And I think we did a really great job on that. And I’m really proud of that paper about the spectrum of coronary disease and treatments. I think Rob Califf was the first author on that paper. So that was wonderful.

ROSEBERRY: So you said not many places were able to do that three-way analysis. Were you able to do it because you had such good data and because you had kind of developed the techniques to figure that out?

HARRELL: Yes, so when a new therapy came along, like angioplasty, we could put the same sort of data collection in play with a few new variables. And then the patients got into the same follow-up system regardless of their treatment. So that means you start collecting this wonderful data. And then because Duke had great cardiac surgeons and great interventional cardiologists, that means that there was a lot of bypass surgery being done and a lot of PTCA [percutaneous transluminal coronary angioplasty] or angioplasty done. So there were a lot of patients getting all the treatments so that you could study them.

ROSEBERRY: So how was this data being transferred? I mean, a doctor would fill out a form, give it to you or give it to someone, they would enter it into the databank? You would kind of use software to come up with these models. Is that kind of the flow of—?

HARRELL: Yeah, ideally the cardiologist would enter the data directly into the computer.

ROSEBERRY: Oh, okay.

HARRELL: I don’t think that worked in every case, and some of them probably filled out paper forms and had someone then enter it into the computer. But the computer system was very advanced, and we had some great people leading it: Paul Elliot and Frank Starmer. Frank Starmer’s son is here in bioinformatics at Vanderbilt, and I got to meet him a few months ago. But they developed a system that would code the data in a way that was not only useful clinically, but it was also useful for research purposes. A lot of data are collected all over the world, and it’s not research quality data like we had at Duke back in the ancient days. So it was really ahead of its time. Now, it was still controversial, because many physicians wanted to dictate their notes and just have somebody transcribe them. And then you’re just looking at a whole bunch of words and sentences, and to pull out data to analyze is a nightmare. So this system that we had also took the data that were entered and it output textual reports that resembled reports that were dictated. Not exactly but very close. And some of the physicians were not ready for that, and there was one in particular—who will go unnamed—who said, “These reports that you’re giving back to me are terrible. They have terrible sentence structure, they have commas in the wrong place. I would just never do a report like this.” So unbeknownst to this cardiologist, somebody went to one of his dictated notes and just typed it into the computer so it would print out exactly in the same font, and it would be, you know, the same kind of page as was automatically generated. And gave it to him to look at and sort of implied to him this was all computer generated, and he said, “This is bad. I just can’t use a report like this.” (laughs) So when you’re moving people into the computer age, there are a lot of old habits that you have to help with, and some of them are a little bit irrational. That’s one of the great things about Gene Stead, because he would never, ever say that something should be done a certain way because that’s the way it’s been done. He would always be breaking ground, and he would want something done a better way. And if it had to be radically redone, he didn’t care. He just wanted it to be better. Visionary people like Stead are not tied to tradition. I mean, he’s tied to a tradition of excellence and character, and that would always count. But in terms of the technology and the way that he would deal with medicine, he didn’t have this great hangup like some doctors do of being tied to the status quo. And the mentorship that he provided was really key. So when the databank first started—and it had been in operation a few years before I got there in 1978—it was still fairly small. And you had Kerry Lee, Linda Shaw, and Bernie McCants then; and I can’t believe how long they worked at the (laughing) databank. But it was still a pretty small group. And so we did not know what we were part of. We just thought it was a fun place to work. We had no idea. Maybe Kerry did, maybe Bob Rosati did, but I don’t really think we had any idea what we were part of. But we should have had a clue. And the clue was that Gene Stead was the mentor. And if you ever step back and think about it a little bit, just about anything that he had a hand in had a chance of magic happening. So we should have realized then that this is really going to have an impact, and it’s really going to be big. But we had no idea how big. Maybe it was good we didn’t. Maybe we would have gotten big headed and not tried as hard as we did.

ROSEBERRY: So how big did it become?

HARRELL: Well, it got—I think when I started working it was about six or eight people. And I know the Duke Clinical Research Institute now is something like eight hundred people. And so that’s something I never would have believed. And there’s a little bit of a story to that, because—I don’t remember what year this was, but I’m guessing it was about 1989 or ’90, and Rob Califf thought that a new direction should be looked at and that was clinical trials. I had, when I was in graduate school at UNC, worked on a clinical trials coordinating center, and that was the most boring place I’ve ever worked; it was just horrible. And it was under really bad leadership. And of course the coordinating center had to worry with a lot of details, because we were getting data from clinics around the country, and when you’re dealing with remote hospitals and clinics, you have more data issues than if you’re just dealing with Duke data like we did. So it wasn’t a fun place to work, and that was my taste of clinical trials. And then—so I didn’t think it was that interesting, and I thought the observational research was everything, because I thought it was fascinating, fun, and challenging. And because you had this biased treatment selection, it’s not just a clean comparison of groups like surgery and medicine. It’s very complicated to get an unbiased estimate. But Rob kept talking about clinical trials, and he kept pushing it. And I said to Rob, “You don’t want to go there. This is a mistake. This is just boring. And you’re going to be bored to death.” And I really pooh-poohed the whole idea. (laughing) And that was about the dumbest thing I ever said in my whole life. I still can’t believe I told Rob that.

ROSEBERRY: He didn’t listen, did he?

HARRELL: Luckily he didn’t listen. Because it just mushroomed. Because he found out how to get things done and to use a lot of talent that was already there and to expand on that and start hiring more good people, and then he had all these trials with industry. And he found ways to do trials extremely effectively and to recruit patients. And that’s also where Galen Wagner came in. Because he created this group called DUCCS [Duke University Cooperative Cardiology Studies]. One of the real important aspects of being at Duke was the cardiovascular program at Duke was one of the largest in the world. And I think at one time they had maybe fifty fellows there at any one year. So they had all these great young cardiologists coming up who were starting to do great research; we worked with a lot of the fellows. And after they left Duke, Galen and I think with probably some help from Gene Stead and I don’t know who else, started forming kind of a club of all of the former fellows. And I remember going to a big—it was either an American College of Cardiology or American Heart Association meeting—and this was probably around 1996. They had a meeting of this DUCCS group and all these former fellows that are out in practice around the country and in academic research. There were 400 people in the room if I remember correctly; I was absolutely blown away. So Galen and others got the idea of networking these former fellows that had this amazing training at Duke. And they turned this—he and Rob turned this into a clinical trial patient accrual machine that I don’t think has ever been seen in cardiovascular research. And so when Rob started working with pharmaceutical companies, he had this network behind him of patient accrual and excellent treatment and excellent data gathering. And that was a smash success, just a smash. Clinical trials could be launched and patients could start to be accrued much faster. This I think got the pharmaceutical companies interested in a big way.

ROSEBERRY: Well, how does the idea of clinical trials change things, as far as looking at the data?

HARRELL: Well, it has some impact. And not all of it’s good, actually. Because in a clinical trial, you can do an analysis of the data by a simple comparison of the patient groups, the treatment groups. And you don’t really need to take into account risk factors, and all the patients are randomized. The groups are balanced. You’ll find the ages of the patients getting treatment A are about the same as the ages of the patients getting treatment B on the average. They have the same sex distribution, they have the same left ventricular ejection fraction distribution. Because the randomization is done by a random device, like flipping a coin. So you can get an answer by comparing the groups, and you can do a trivial analysis. But with that analysis, the answer is really trivial, because it’ll mainly tell you, is there some kind of difference between the outcome for patient group A and B? But you don’t really understand how that difference got to be there, and you’ll find the differences will vary according to how sick the patients are. You’ll find that sometimes the treatment may be harmful for a group of patients. So there may be an interaction between the type of patient being treated and the treatment. So the benefit of the treatment gets higher the older the patient is. Or it may get higher for a certain type of heart attack. So there’s so much more that you can get out of the data, and some people involved in clinical trials rush the process. There are groups, especially in England, that believe that clinical trials should be presented in a trivial way so that any idiot can understand the results. And I think this has really hurt clinical trials. Because I think what it means is you spend a lot of money to do the trial, often tens of millions of dollars, and the amount of information the trial provides is very low when you look at the published papers. And you have sort of a crude answer. But you don’t understand what happens to patients who have different etiology or different risk factors, and you also don’t understand what happens to, say, short-term risk versus long-term risk. This emerged in a huge way in the VIOXX controversy. There was a lot of confusing data that was misanalyzed about a lack of early harm due to VIOXX, and then maybe the harm increased over time. Then if you analyzed the data properly, the harm should’ve been evident even in the early period. So, often clinical trials are analyzed in an oversimplified way. And the way I like to describe that is you look at a budget for a clinical trial: it might be twenty million dollars. If you look at the budget for the statistician part of the trial, and it might be two percent of the whole trial, or I don’t know what percent, but something very low. But if you look at the harm that a bad analysis can do to good data, it’s actually extreme. Or a bad analysis can fail to unlock information in data. And so I’ve always thought that the budget that’s put on the biostat end of things—and I’m a little biased, being a biostatistician—is a little bit too low, and it should have been enough to allow for the analysis to be in depth. So what happens to a lot of clinical trials is the clinical trial gets done, the crude things are summarized, there’s maybe another couple of papers that go into a little bit more detail, but then there’s another exciting clinical trial that comes along, and the statisticians have to drop what they’re doing and go to the next trial so they don’t get behind on that one. The public and the FDA and the industry and the research cardiologists and physicians in general have a thirst for the next big thing. And I think there’s a real harm in not getting enough information out of the last big thing. So clinical trials tend to make many people oversimplify how they think about data. And the other thing that happens—and this was in the GUSTO I study—the GUSTO I study was great quality data on 40,000 patients. Huge study. Is a gold mine for statistical analysis. The Duke Clinical Research Institute has made that data available to statisticians to really dig in and do some really wonderful work. And one statistician that came from—who’s actually a clinical epidemiologist in the Netherlands, Ewout Steyerberg, came and spent time at Duke, and he’s spent time with me since then when I was at University of Virginia and did some amazing analyses, and he showed that if you do a real in-depth analysis of clinical data like GUSTO I and you do an analysis where you don’t just compare the groups but you adjust for the risk factors that patients have, you actually gain a lot of sensitivity, a lot more power.

ROSEBERRY: (speaking at same time) Is that within the groups?

HARRELL: Yeah. You have risk factors measured within the groups. The risk factors are perfectly balanced. So if you look at, say, What’s the proportion of patients on tPA therapy who had an anterior myocardial infarction versus those on streptokinase, the proportions are virtually identical. But the patients that have the different kinds of heart attacks, they have different outcomes, so they have different thirty-day mortality. In GUSTO I, thirty-day mortality was the endpoint. And so if you take into account that different patients have different outcomes, which you could call outcome heterogeneity, you actually get a much better analysis. So what Ewout showed is that with the GUSTO I database, if you did a real statistician’s kind of analysis—which was called covariate adjustment, or analysis of covariates—you could have gotten the same information with 30,000 patients, I think was the number he came up with. So you could have had a trial that was 10,000 patients smaller and get the same information as if you just compared the groups in a crude fashion, which was the primary analysis reported in the New England Journal of Medicine. So by taking into account these patient differences—these are not differences in treatment, or different risk factors in treatment groups, because they’re all balanced in randomization, but it’s the fact that older patients have a higher mortality. And then there’s effects of blood pressure and other risk factors. That gives you a much more refined analysis. And all it takes is more time for the statistician. It doesn’t take anybody else any time. But for several more person weeks of time, you could do a really better analysis, and the percentage of the budget that that would entail is tiny. So that’s the one reservation that I have about how most people think about clinical trials. I know Rob Califf does not think about them that way, and I know Kerry doesn’t. But I know a lot of people do. But when you come up to clash with the people in England that believe in the large, simple trials, it can be not very nice. Large, simple trials are actually a great idea, but what the British statisticians mean is large simple trials analyzed simply or trivially. And once you collect the data, there’s no reason to throw away data. That was something we always believed in in the databank, that most of the data you collect, when it’s driven by experts—so you get cardiologists to define what data to collect—most of that data’s really useful. Some of it overlaps, and some of it’s redundant, but almost none of it is useless. And you get a lot of information from analyzing that data.

ROSEBERRY: So at the DCRI they do give that time to the statisticians to analyze the data properly?

HARRELL: I’m not sure. And this would be a question for the current group of statisticians. I know Kerry does a lot of clinical trials that he’s really in charge of the data-coordinating center. And his area of expertise in later years is really cardiac arrhythmia treatment. So he has a lot of say in how those data should be analyzed. I think a lot of that is National Institute of Health-sponsored studies. I think a lot of the pharmaceutical studies, we have a tendency to let the pharmaceutical company dictate the deadlines of how the data are analyzed and reported. And I think that can be a little hazardous. Then there’s so many trials going on that you tend to run from one to another. So I would like to know from the current group that’s involved, especially in the industry-sponsored trials, what are they thinking about this pressure to produce and to meet deadlines and how much time they have for doing these more in-depth analyses?

ROSEBERRY: So did you leave Duke about that time, about the time when that shift began to happen?

HARRELL: Yeah. I left when the pressures were getting a little higher and Duke was becoming a little bit more like a business in terms of the Clinical Research Institute. And I had such great memories of the way we were back in the old days when there was time to argue. There was time to have two-hour discussions about, How should we do this? And that was the greatest thing in the world. And the two-hour discussions maybe got shortened a little bit, because of pressures and clinical trials, but the thing that I left that was really hard to leave was the people, because the people were the same. And some of the greatest people I’ve ever known and ever worked with. But there were a lot of pressures, and the institute was getting a little more legalistic. And they were talking to lawyers, and once you talk to lawyers, things tend to get complicated. (chuckles)

ROSEBERRY: So who were in some of those discussions, those arguments? Who was discussing what direction the databank would be—?

HARRELL: In terms of making the big decisions? Well, Rob was leading the effort, and he was talking to everyone and getting input, exploring various options in the usual way he does and really think it through. So I think a lot of people were involved in that, but Rob was definitely the leader. And then Dan Mark would be a good person to ask that question to, to get his recollection of how all that unfolded. And Kerry also. But I remember there were a lot of discussions. And I wish I had not been such a naysayer at that time, because I missed that call by (laughs) a mile.

ROSEBERRY: Well, those long arguments that you would have in the early days, what was—who was involved in those?

HARRELL: Well, this was beautiful. One of the most fun people to argue with in my whole life has been Phil Harris, who is in Australia and was a cardiology fellow at Duke and just a powerhouse of a thinker. And a high producer in terms of research publications. He just has the personality that you feel that when you have a very heart-felt argument with him, it was always at the highest level. And so there was never any sort of character assassination or anything like that. And he was just a joy to talk with. And we would, I think—if you ask Phil, we might have even created arguments when we didn’t need to. We probably argued about things we actually agreed upon just because we loved arguing with each other so much. It was just fun. It was just total fun. And of course, Kerry was involved in those, and then David Pryor was involved in a lot of those, and so David would have very strong opinions about certain aspects of data and analysis. And David had more of an understanding of statistics than most cardiologists. So that would make him actually more interested in the problem. Sometimes the statisticians would say, I wish you were a little less interested (laughing) in that! But he brought in a tremendous amount of good thinking. And he challenged us and in many cases got us to think about things in a different way. Probably more so in diagnostic cardiology research than in any other one area. So we had some great arguments, and arguments with David got louder than arguments with other people. But they were great. And with Kerry Lee around, things never got out of hand, because he was always the force for calm and rationality.

ROSEBERRY: Were some of the arguments kind of statisticians versus clinicians? Was it kind of, See my point of view? See—?

HARRELL: I don’t really think it was. I think it was the fact that we had really interesting data, and there were totally different statistical approaches for analyzing the data, and the different approaches had some clinical ramifications. So the different approaches might have been—one of the approaches might have been easier to understand by a cardiologist in practice than another, and so we would have to have the argument about, What did we sacrifice to make this easier to understand analysis appear in this paper? Sometimes there were arguments just about data and how to use data in, say, prognosis. And should you use the New York Heart Association class for angina or congestive heart failure, should you use it how it stands, should you modify it a little bit, or should we group some of the categories? So this is getting down to the nitty gritty of what an item of information means, how do you translate that meaning into how it appears in a statistical model? So it’s kind of a geek argument that statisticians love to have, and most people outside of our group, say, outside of David Pryor and Kerry Lee and Phil, Rob Califf, Mark Hlakty, Dan Mark, and the others, they would not find it as interesting as we did. So many of those arguments, when you read the paper that came out of it, you wouldn’t know how much arguing (laughing) actually went on.

ROSEBERRY: Well, I want to go back and follow up a little bit about talking about Dr. Stead. And I know he retired while—kind of while the databank was still progressing—

HARRELL: Yeah.

ROSEBERRY: —while it was being used in its observational phase. Was his input still there, was his leadership still felt even after his retirement?

HARRELL: It really was. He never stopped thinking. He was coming up with new ideas way into his nineties, and challenging us to think outside the box. And so sometimes he might come up with an idea and it sounds really weird, and it might not actually be feasible even with current technology. But he might give us an idea of a better way related to that. So—but I think somebody like Stead, even when he’s not giving you ideas, you see a fire in his eyes. And that fire, like a tiger, some of it kind of rubs off. And I can’t really talk about it very intelligently, but you meet somebody like that, and they tend to have an impact on you even when you don’t know how. Maybe they elevate your energy level.

ROSEBERRY: How was the databank supported by other department chairs?

HARRELL: That’s a good question, because my interactions were mainly with the chiefs of the Division of Cardiology and the chair of Medicine. Although there was a very strange thing at Duke in statisticians in that, even though we worked full time in Cardiology, we were actually in, say, Community and Family Medicine or preventive medicine or family medicine—there were different titles for that department. So our academic base was in a department we didn’t interact with that much. And that was because of an old problem at Duke that existed for maybe twenty years. There was a tradition in the Department of Medicine where all this research was going on that they would never appoint a non-MD to its faculty. And so the PhDs got put out in Family Medicine, that didn’t do that much research. And so it made us—our home base there was very strange, very bizarre. And that’s one of the reasons I ended up leaving Duke. Because that was always an odd design. And so that idea of not having PhDs in a department I think had some unintended consequences for many, many years. That was a little awkward. But we tried to keep the statisticians in different groups. Because the other big group was cancer, where there were a lot of great biostatisticians, and then there were other statisticians in other areas. We tried to keep people talking to each other, but I think we had such a critical mass of great statisticians in the cardiovascular end that we also kept to ourselves just a little bit. But not having a strong department of biostatistics at that time at Duke was really—really wasn’t too good. We really should’ve had a department a long, long time ago.

ROSEBERRY: Did Dr. Stead ever push for that kind of—?

HARRELL: You know, I can’t remember. That’d be a good question to ask Kerry. He would probably know. I can’t remember if he pushed for that. I think what he saw was probably a team of researchers that were working together so well, he probably didn’t care what the actual format or what the actual design of the department setup was. He crossed so many boundaries, so probably to him it didn’t matter. But I’d like to know that from more of an insider—?

ROSEBERRY: So did Dr. Wyngaarden or Dr. Greenfield support the databank?

HARRELL: Yes. Yes. Very much. And of course Joe Greenfield, we had a lot of interactions with him. And he was really supportive, because he was really an ECG analysis and interpretation expert. So he liked data. A great basic researcher, too. But I think he created a good environment for people to prosper, and I think he ran a good ship there. So yeah, I had a lot of good interactions with the leadership. And then there was such a strong cardiovascular research environment, including—Kerry and I would very often go to the research conferences that cardiology put on. And of course we would assist some of the researchers in analyzing data leading to their presentations. But we got a lot out of listening to the presentations and learning about all the areas of research in cardiology from basic up to clinical. So it was just such an amazingly strong division that that was just a great group to be associated with. Clinicians were just wonderful, and their research was first-rate.

ROSEBERRY: So it was primarily used for research? There was a strong research element that the databank was involved in?

HARRELL: Yeah. Of course, Cardiology was doing all kind of animal research, cellular level research. It went the whole spectrum. But we had more involvement with the clinical researchers, we also tried to help some of the basic researchers. But on a day-to-day basis, we worked with clinical researchers a great deal. The ones that were actually working almost fulltime in the databank were the ones that we worked with the most. But everyone—not everyone, but a lot of the clinicians were involved in cardiac catheterization, angioplasty, and later stents. And so we would see a lot of them. And they would have a lot of good input. And of course they were responsible for the quality of the data, because the data really originated with them. And this idea of putting out reports that were computer generated was—Gene Stead pushed to a great degree. That was really amazing because a lot of people had tried to collect data for research purposes for clinical practice. But they made it more of an afterthought where the data were collected maybe in a separate phase. And then they either found they could not sustain that data collection because they couldn’t keep hiring extra people to be responsible for data collection, or the quality wasn’t so good. But the Duke system was to integrate that data collection at the most clinical level of patient care. And so you look at the quality of the data that came out of that, it was just really excellent. And it was a model that could be sustained for decades.

ROSEBERRY: How was it sustained fundingwise?

HARRELL: Um, I think the data collection, since it ended up producing clinical reports, actually got sustained from clinical practice. And I think there was even some insurance company reimbursement for parts of that effort, for report production. In terms of the follow-up, the follow-up was more for research purposes, and I think there had to be more original thinking to sustain the follow-up engine that Bernie led. And so I think there was some creative grant writing. And I know Gene Stead was at the forefront of creative grant writing. He would get a grant to do some research, say, in acute myocardial infarction, and he would say, You know, we really need to do more research in chronic coronary artery disease. Let’s twist a little bit of this grant money and start creating a chronic CAD research tool. And he in some ways subverted some of the original grant funding. And it turned out to be a great (laughing) great idea. And in those days, I don’t think NIH [National Institutes of Health] was tracking where the grant money went quite as much as they do now.

ROSEBERRY: Well, I know that Bill Stead [son of Eugene Stead] is here at Vanderbilt, is that right?

HARRELL: This has been amazing, because now I work with Bill. And I worked with him some when I was at Duke. So he’s a major force in informatics, and he’s put systems together for hospital and clinical informatics that’s really famous worldwide. And Vanderbilt is actually exporting that system to other hospitals. And so he has gotten to be really famous for collecting data, a little like what the Duke databank collected but on a wider scale, you know. All diseases, all clinics, all inpatients. And plus a really first-rate physician order entry system. And then the only part of it I’ve really been involved in has been analyzing the laboratory data, so there’s quite a huge databank available for clinical laboratory data for patients who’ve been at Vanderbilt. Now that’s going to be extended, and we think it’s going to be a real gold mine because now the vast majority of patients coming to Vanderbilt, they have excess blood that’s not used when they get blood tests. And unless they opt out, that blood is going to be put in deep freeze for later DNA analysis where the DNA can be correlated with the clinical data that’s in the informatics system. So that’s going to take it to the whole other level One irony in all of this is that the targeted data that was collected in cardiology in the Duke databank, the quality of that data far exceeds anything that you can get in a full clinical practice system like Vanderbilt has. So if you really want to analyze risk factors, physical exam, patient history, signs and symptoms, the data that we had there—because the data was redefined for the cardiologists in a coded fashion—the quality of that data for research is unbelievable. And when you try to do that on a bigger scale like Vanderbilt does, you can get some great data in pieces, but if you wanted to go and try to do what we did with the databank, you wouldn’t find that data available unless you did a very complex textual analysis of the clinical notes. So with enough sort of artificial intelligence you can summize from the wording of the notes that this patient was a smoker. But you have to do a great deal of analysis to make sure that the physician didn’t say in the note, This patient used to be a smoker. Or, At no time in his life was this patient a smoker. So you have to make sure you interpret smoker as a positive only when it’s positive and not negative. So the quality—it’s hard to put into practice what Duke did. But Duke concentrated on cardiology in those days and so could really get the data collection fine tuned and ultimately agreeable by the cardiologists who were generating the data.

ROSEBERRY: So what kind of variables were being collected at Duke?

HARRELL: Well, it was a huge amount, but like, What were the characteristics of a chest pain that the patients had? And of course, Was it a new-onset chest pain? Did it wake up the patient at night? That’s very important: if your chest pain’s bad enough to wake you up at night when you have no stress, and it’s severe enough to wake you from sleep. Other characteristics of the chest pain, and then there were risk factors like smoking and cholesterol. Blood pressure data, and then vital signs like height and weight. It’s amazing how many medical centers who want to do research, and they don’t collect the heights of the patients. Or the weights. So those are very important pieces of data, but like at Vanderbilt, you can’t really get that. So there were a lot of different variables that we used in the different areas such as physical exam, such as the cardiac physical exam and heart sounds. What heart sounds are present? And we found that the heart sound data, like, Did the patient have a third heart sound? Which is something that’s a symptom of congestive heart failure. We found that that’s a pretty strong prognostic factor. So we did a lot of research into these findings from the cardiac exam. And people criticized that things like third heart sound are not measured reliably. And we showed that they’re not measured reliably, but they’re still useful. They still have prognostic information.

ROSEBERRY: Do did those variables change as people’s research interests changed or grew?

HARRELL: Yeah. New variables were added. I don’t know the state of old variables being dropped except the reduction in the right heart catheterization. The biggest example of that is when Bob [Robert] Jones started doing the radionuclide angiography and added a whole new source of data. And that gave us some amazing data to analyze that I think that was more of a fixed series, and I don’t know if that went on in perpetuity like the other data did.

ROSEBERRY: Well, tell me about Dr. Rosati, who was in charge of the databank?

HARRELL: Yes, for many years. And so when Kerry got me the job there, Bob was in charge. H-e also helped bring me on board there. He was wonderful to work with and a real character. And we got into great discussions with him. He liked to argue about these things just like the other guys. And so we had wonderful discussions. He was one who was really a real medical cardiologist, and he wanted to see a lot of evidence that invasive therapies were beneficial. So he was sort of our conservative overseer. He thought that bypass surgery was being overhyped, and he was right for the longest time. He kept us honest in that way. Until the surgical techniques were really refined, and then it reversed a little bit. Yeah, but he was great. So Kerry and he and I and Phil Harris and Marty Connelly and David Pryor, Mark Hlatky, Dan Mark, we would, at one time or another just get into great discussion.

ROSEBERRY: Well, I know the Duke model was replicated in a lot of other places. I wonder if you can talk a little bit about that.

HARRELL: A lot of places such as Harvard were very envious of what Duke had done. We had a cooperative program with the Harvard databank. I think Harvard also tried to copy Duke in terms of the clinical research institute, and I think there’s a Harvard Clinical Research Institute—I don’t really know much about it. I have the feeling Harvard is still pretty envious of Duke. I don’t think they achieved what Duke did. They didn’t have the people that were quite as much a moving force as what Duke had. But I think they did some really nice things, and we had fun working with the Harvard group, and the two groups could check what the other group had found. I think we did a lot of cross work, especially in regard to diagnostic cardiology, treadmill exercise testing to check each other’s work. And they had some great people there, and one of them’s Lee Goldman, who’s become quite an accomplished person, who’s I think at University of California, San Francisco now. But Lee came here to Vanderbilt a couple of years ago to give an amazing talk about the impact of cardiology in cardiovascular treatments on public health. And it was great to see him again; I got introduced through this cooperative Harvard-Duke databank.

ROSEBERRY: Were there any other replications or cooperations?

HARRELL: I think there are, but right now I can’t remember them.

ROSEBERRY: Sure. Well, I know maybe the fellows who came through took the model and carried it forward, I think.

HARRELL: Right. It’s hard to replicate some of it because the people were so magical. And there aren’t many Kerry Lees out there. And I don’t know of anybody who could manage a follow-up system like Bernie McCants. And then Linda Shaw worked in so many research projects, and she did a lot of the programming on the computerized record system that was developed for the databank. And is still doing a lot of great research, and I still see papers with Linda’s name on it, and she gets me involved in some of it. So people like that, you don’t come across them that often. You don’t come across that quality of person and that sort of constancy, so there must be something at Duke that makes people like that stay that long. And when I left Duke in ’96, that was the biggest mistake of my whole career. And I didn’t know that at the time, but I had no idea that the kind of research that was going on at Duke was not typical of other academic medical centers. And I never—you don’t realize how great a place is until you leave it. And so I started working with clinical researchers at another institution, at the University of Virginia, and it wasn’t the same. And what made me leave University of Virginia after I was there for seven years is one day I was just—got into work, and I was sitting at my desk, and I said, You know, I’ve been here seven years. The amount of clinical research that I’ve gotten exposed to is less than what I would have been exposed to in seven months at Duke. And I said, Wow, that’s pretty bad. That’s pretty bad (laughing). It just wasn’t the powerhouse like Duke had. It didn’t have the same people. The leadership in cardiology, the research done by the cardiologists and other clinical research going on was not at the Duke level. So it’s difficult to do that. And I still don’[t fully understand how it all happened. But Gene Stead is a big part of it. Bob Rosati and all the other great leadership that was there.

ROSEBERRY: Well, you mentioned a couple of times the follow-up that was done by Bernie McCants. I wonder if you could talk a little bit more about that.

HARRELL: Well, most cohort studies—when you have a cohort of patients that come in for a defined event such as a diagnostic test or such as because of having chest pain, you follow these cohorts of people, and over the years you have more and more data to handle. The biggest challenge in cohort studies is usually loss to follow-up. So patients move, they somehow change phone numbers and you can’t find the new phone number, they lose interest and you have a larger percentage of patients over time that are lost to follow-up. And that means your follow-up—say somebody at eight years and then they move and you can’t find them again, you know that they’re followed up for eight years, and you know they’re alive at least eight years after they came to Duke. But you don’t know how long they’re going to live. You don’t know if they died at year nine. And so it takes a very dedicated team and it takes a lot of knowledge to run a follow-up operation to not lose people and to keep the data resource being the gold mine that it is. Because data can degrade over time when you start losing people. And especially since you usually lose people not at random. You tend to lose people who are sicker or who are very well. So people who move out of state, usually you don’t move out of state if you’re very sick. So you tend to get a more biased view of that patients that are left. Bernie had an amazing degree of perseverance, and he was smart in how he designed the follow-up system and how he had people making the phone calls and tracking the data and just not losing many people. So he has just a huge success rate. And that’s the bottom line, is what proportion of people have been followed by what length of time without being lost? And that’s what gives you golden data to analyze.

ROSEBERRY: So it’s the follow-up data that’s a really key piece to the work that you were doing?

HARRELL: That’s really what turns clinical data into research data. Well, the clinical data have to be of research quality. But if you don’t have patient outcomes, then clinical data is not going to be nearly as valuable. So the long-term outcome of being able to say, Here’s more than twenty years of follow-up, and we can tell who lived and who died within that period, and we can do mortality trends. We also did a lot of time-trend analysis that I didn’t talk about. David Pryor was leading this effort. Do people survive more who are being treated in the 1990s as opposed to people who were being treated in the 1970s? Is the survival getting better? So you need long, long-term follow-up to be able to answer that question and not losing many people. And that kind of data is, that’s hard to come by.

ROSEBERRY: But you had it?

HARRELL: We had it. Yeah, it’s still there.

ROSEBERRY: Well, what events in the life of the databank may I have missed and not asked you about?

HARRELL: Um, I can’t think of any big events except we just loved being with each other and we would have—oh, the Division of Cardiology would have a picnic every—I want to say September. And it would be sometimes at the lodge in Bahama or it would be at Jess Peter’s farm where he had this amazing huge lobster boil. We’d have all these events where everybody would get together in Cardiology and just had the greatest time. It was just fun. And the families get together. So we enjoyed each other’s company outside of work a lot. So that made a difference. Then the one other thing that I can think of is I would never be in the position to be a department chair if I hadn’t known what it was like to work with great people, and more important than that, if I hadn’t been nurtured by great people. So to have had the mentors I’ve had, like Kerry and Dr. Stead is just amazing. And Dr. Stead for a time actually personally mentored me. think Bill Stead told me that there are or have been twenty-five department chairs that were mentored by him. Which is a lot. But to have had that opportunity. And when I was trying to get my first paper published in a medical journal as a first author—this was in JAMA [The Journal of the American Medical Association]—he offered to help me with the paper. I don’t know why he did this. But he looked at it in detail, and then he called me into his office. And he said, “Frank, there’s not a thing in this paper I wouldn’t change.” (laughing) And of course, I was pretty much destroyed. But then he did the greatest thing, he said—he was editor of Circulation, and he said, “I’ll have you work with Penny [Hodgson], editorial expert, and she will help you turn this into a readable paper.” (laughs) And so I started working with her, and it was amazing how this paper turned around. And I endured the pain, and I learned from it. Eventually learned how to write. And then the paper got accepted into JAMA, and that paper’s gotten a lot of citations. And so for him to take that kind of an interest—and I can brag about having been mentored by Gene Stead.

ROSEBERRY: Well, is there anything specific that you were working on at Duke or learned at Duke that you now carry on into your work here at Vanderbilt?

HARRELL: Well, a tremendous number of data analysis techniques. Kerry and I were developing more and more flexible ways to model outcome data and prognostic models and then diagnostic models. And I use that every day almost. Because we still are doing analysis—different data set, but using the same techniques or slight refinements of those techniques. And so during that time, I learned an amazing amount of statistics that I didn’t learn in graduate school. And I did that working with Kerry. And working with Doc [Lawrence] Muhlbaier and Dick Smith. Liz Delong and Linda Shaw and many other terrific people plus all the cardiology researchers that had such an interest in data analysis like David Pryor. And so I use those things every day practically. And I’ve just made them easier and easier to do. And continue to implement software that is easier for people to use, which is just building on what we did back in the old days of gigantic computers back at Duke.

ROSEBERRY: Maybe we can talk about some of the software as it has changed through the years from when you were at Duke.

HARRELL: And the hardware changes are even more astounding.

ROSEBERRY: Right.

HARRELL: In this Duke Hospital South, we had—I think it was room 2000 was where I first started working. And several of the people on the team were in the Old Chemistry Building, but I was in the room 2000 where Bob Rosati and the fellows were. And Kerry Lee. And Kerry and I shared an office that’s about one-fourth the size of this office. Kerry never complained for a minute. I think we may have had separate telephones, but we’re in this small office together, room 2000, and the computer was like as big as a car. Huge thing. I even forgot the type—I think it was a Sigma 5 or something. And then we started—the computers would get smaller and smaller and more powerful. And what we were doing with that giant computer then, it would—it probably had one one-thousandth of the capability of this computer that’s on my desk now. Probably even less than that. But we made do and got an amazing amount of work done with the computing of that day. And I think the way we entered things into the computer was pretty archaic, the kind of terminals that we used. We may have had a key punch machine still. Can’t remember. But then the software, in those days you had to write a lot of the software in a very home-grown fashion. And so the programmers working with Kerry were developing software that was very usable for analyzing data from the databank and doing some of this Cox regression analysis. And then I had come from a group at UNC that was using a big software package that was called SAS. And I knew how to add modules to that so that you could put it as part of a system to make it have more capabilities for how you massaged the data. So we started writing code to work in that system, making that available worldwide, and just thousands of people were using that code worldwide. And now the SAS Institute, which is now this huge company in Cary, they ended up rewriting that code and taking advantage of a lot of the code that was there without giving any credit to the original developers, by the way. And making it available even to more people. And then later after I left—actually before I left Duke, I was starting to write in a different language, which was called S. And that’s a language that has much more flexibility, and that’s where I’ve added to what I was doing at Duke. And now the language is called R. And that’s a free language that’s available to anyone in the world. And it’s the most powerful language for statistical analysis. So I’ve rewritten all the stuff I started at Duke into that language, and now even more people can use it because they can download it for free. A lot of people use it in developing countries where the software licenses are prohibitive. But that all makes use of code that I started back in the Duke days. That we were using just to analyze the Duke databank. And that now is helping thousands of people analyzing data, that they don’t understand how important the Duke databank was to the development of that software.

ROSEBERRY: Tell me more about the developing countries and how they’re able to use that information.

HARRELL: Well, they’re just able to click on the World Wide Web and download software and not pay any money, whereas if they needed to use SAS or some of the other commercial software, they would have to pay, in many cases something like ten thousand dollars a year. Because a lot of commercial software, you don’t actually get to buy it and own it. You get to rent it, and every year you have to renew the lease. Or the software quits working. So the commercial companies really make a lot of money using that model. But they really get people tied in, and a lot of people cannot afford to do that. So when it costs zero, it means anybody can do it.

ROSEBERRY: Well, Dr. Harrell, are there any questions that I didn’t ask you today that I should’ve asked you?

HARRELL: Um, I can’t think of anything. I think we covered a wide variety of things. And I guess the only thing we didn’t cover is what the Duke campus meant to me, because we would work really hard. But on a hard day, or when we’ve had an argument that was maybe a little more taxing than the usual argument with some of the cardiologists, I would escape and walk over to Duke Gardens. And what an amazing place. And that gives you a whole renewal of your outlook on life. I would sit there by a little pond overlooking the steps that led up to that gazebo, and just to lie there on a beautiful afternoon and clear your head and then you go back into the office and pick right up where you left off with renewed energy. You really cannot overestimate the importance of Duke as a whole to our work in Cardiology and in the databank and the Clinical Research Institute. Plus, Duke being what it is, it attracts a really special type of person and keeps them there for a long time, like I wish I had. Although I love this job; it’s just amazing. But leaving it to go to another place just made me appreciate that even more. But it attracts so many people that it keeps. And that’s because it’s Duke. And that allows you to form something like the databank and to sustain it for decade after decade.

ROSEBERRY: Well, thank you, Sir. I appreciate talking with you.

HARRELL: You’re welcome.

ROSEBERRY: It’s been a pleasure.

HARRELL: My pleasure.

(end of interview)