Biostatistics or data science for public health—whatever you choose to call it—informs understanding of the health and environmental impacts of exposures. Emory University’s Howard Chang discusses with co-hosts Anne Chappelle and David Faulkner the intricacies of interpreting data, the controversial P value, and the team science involved in studying public health challenges.
About the Guest
Howard Chang, PhD, is a Professor in the Emory University Rollins School of Public Health Department of Biostatistics and Bioinformatics, jointly appointed to the Gangarosa Department of Environmental Health. He also serves as the Director of the Master’s Program in Biostatistics for Emory University.
Dr. Chang received a Bachelor of Science from the University of British Columbia in 2004, followed by a PhD from Johns Hopkins University in 2009. Before joining Emory University, he was a Statistical and Applied Mathematical Sciences Institute (SAMSI) postdoctoral fellow and worked with the North Carolina State University Department of Statistics and Children’s Environmental Health Initiative based at the University of Notre Dame.
Dr. Chang’s primary research interest is in the development and application of statistical methods for analyzing complex spatial-temporal exposure and health data. His current projects focus on two broad areas of population health: (1) exposure assessment for air quality and extreme weather events, especially under a changing climate; and (2) health effect estimation and impact assessment leveraging large databases, such as birth/death certificates, hospital billing records, electronic health records, and disease surveillance systems. Dr. Chang also collaborates with colleagues for studies related to ecology, infectious disease, social epidemiology, and community intervention trials.
[00:00:00] Adverse Reactions “Decompose” Theme Music
[00:00:05] David Faulkner: Hello, and welcome to Adverse Reactions Season 2. My name is David Faulkner, and this is my co-host,
[00:00:11] Anne Chappelle: Anne Chappelle.
[00:00:12] David Faulkner: As much fun as the first season of Adverse Reactions was, I think Season 2 is better.
[00:00:16] Anne Chappelle: Hidden.
[00:00:17] David Faulkner: Secretive.
[00:00:18] Anne Chappelle: Exactly.
[00:00:19] David Faulkner: The toxicology that happens when you’re not looking,
[00:00:22] Anne Chappelle: or toxicology that you forgot about.
[00:00:24] David Faulkner: It’s still important, and we’re here to talk about it. Welcome to Season 2 of Adverse Reactions:
[00:00:28] Anne Chappelle: “Hidden Toxicology.”
[00:00:31] Adverse Reactions “Decompose” Theme Music
[00:00:38] David Faulkner: Biostatistics, the hottest science.
[00:00:40] Howard Chang: What makes statisticians or biostatisticians a little bit different is that we are very much aware of the need to think of uncertainties. So, when we come up with a risk estimates, when we come up with some products, we want to think about, what’s the uncertainty associated with this risks?
[00:00:58] David Faulkner: Or, heat stressed: the real Howard Chang talks biostats and public health.
[00:01:04] Howard Chang: For many of us, extreme heat exposure might not be that big of an issue because we have air conditioning, but what’s the parts of the city or the population that are more vulnerable to heat exposure?
[00:01:15] Adverse Reactions “Decompose” Theme Music
[00:01:21] Anne Chappelle: This is Anne Chappelle, and I am really happy welcome the real Howard Chang, Professor at the Rollins School of Public Health at Emory University, with joint appointments in the Department of Biostatistics and Bioinformatics and Environmental Health. He’s a very busy man. Howard, what do you do for a living? How would you explain it to my mom?
[00:01:45] Howard Chang: The sexy terms now is that I work in data science for public health. I think that very much encompasses what I do and what I think about every day.
[00:01:54] David Faulkner: Very cool. So, data science—isn’t science all about data? So, what is special about how you do data?
[00:02:02] Howard Chang: There has been this explosion of the data that we’re getting from different data streams. From the chemics halls we can measure, from satellite imagery, from all the electronic health records that’s being collected. So, a lot of the health research is not, are very interested in leveraging all these different data sources and just the volume of that data. And to design epidemiologic studies and to think about how do we understand environmental impacts? We think a lot about the challenges with using these datasets, some of the biases that may be in these data, and try to do our best to answer timely questions.
[00:02:42] Anne Chappelle: So, when I think of data science, I think of computer science, but your undergraduate degree was in statistics.
[00:02:50] Howard Chang: That’s right.
[00:02:51] Anne Chappelle: So, do you think of statistics as like an applied computer science? How much of the computer part of it do you think of with biostatistics?
[00:03:01] Howard Chang: That’s a great question. What is our contribution? And it is a very broad term. I’d like to think of me as a statistician. I think about, how do we use these data to make inference? So, how do we say something about the real world? So, I think a lot about how the data are generated and some of the issues. Data science is a huge field, and there’s a lot of research on, for example, how do we even store the data? How do we process the data? How do we make them accessible? But I think what makes statisticians or biostatisticians a little bit different is that we are very much aware of the need to think of uncertainties. So, when we come up with a risk estimates, when we come up with some products, we want to think about, what’s the uncertainty associated with this risks?
[00:03:49] David Faulkner: I think this is a really interesting point. I want to spend some time on this. So, you’re talking about biases and data, right? Where does the data come from? And I think that this is a really important point is that whenever we’re talking about research for science or some kind of data collection, we can’t extract fully the human element there. What sort of things do you think about when you’re trying to say, this is good data or a well-collected dataset? What sort of things do you think about when you’re trying to remove that bias?
[00:04:18] Howard Chang: I tend to think of two ways we collect data. One is prospectively. So, in health studies, we might recruit participants. So, that’s a usually very well-designed study where we will recruit them, we will have a protocol plan of what could be measured, and because we’re actively collecting data throughout that process, we can do a lot of QA/QC and checking, because we’re following individuals; we are actually actively sampling. We can go and make corrections. The other way is if we have a scientific question you want to address, we have to go back and look at how data are collected that are actually not initially planned to be linked with health data, right? So, if they go back in time and then link that to other health data that might not be collected for health studies, let’s say, air quality monitor, or our sample is biased in some way. So, either way, we do have to evolve. Nowadays, almost everything is a team science. So, going to avoid this bias, even when we are recruiting individuals. You know, what are some of the populations we should target, and some of the questions that when we were designing measurements we need to be aware of?
[00:05:28] Anne Chappelle: So, in my world, I deal a lot with a particular chemical. In the sixties, seventies, eighties, there was lot less control of that chemical. The workplace exposures tended to be very high. Since that time, we’ve dropped those exposures, but it is so hard get those old exposures away from consideration for these adverse health effects. Two things I wanted ask you about was, one is, how do you say, you’ve got to exclude that? And two is, we didn’t look for the same endpoints maybe 5, 10, 15 years ago. Now we do. That doesn’t necessarily mean it’s a bigger problem now. But can you comment on that?
[00:06:12] Howard Chang: Luckily of course, many of the toxicants and chemicals have decreased over the past few decades. So, for example, now working on air pollution and our air quality has improved quite a lot, the last few decades, but there’s still a question about even at current outdoor levels, do we still see health effects? And like you said, some of the health endpoints, it’s also different now that we’re often interested in long-term exposure, even if it’s pretty low level, that if you are exposed to 10, 15 years, let’s say, what’s the association with cognitive decline from that that’s really relevant now?
Once you’re at the lower level, it’s even more important to leverage as much data as you have, as you really want to exploit, not just a temporal contract, but potentially the spatial contract, different regions or even with being an urban setting might have differences in exposures. So, I think that the way we design our studies to make the part of the dose-response curve more relevant has a lot do with thinking about currents that the a population is exposed to. And of course, a workplace setting is a little bit different because they’re system populations. Many of these chemicals are exposed in a population level. So again, thinking along that line can kind of help us along various studies we do.
[00:07:28] David Faulkner: This leads me to wonder, you’re doing a lot of public health research, generating these data and analyzing and then trying to make some kind of statement about this. Ideally, someone would take this information that you’ve generated and use it to craft policy. I’m curious if you could talk a little about generally what you think about in terms of the role of biostatistics in crafting policy. My experience with policy is that policymakers generally have a very poor grasp of statistics. The American legal system struggles with statistics, because we don’t want certainty, right? In science, can you prove everything beyond the realm a reasonable doubt? I think there was a quote, there’s lies, damn lies, and statistics.
[00:08:13] Howard Chang: And statistics, yes, of course. Yeah.
[00:08:15] David Faulkner: What do you think of the suggestion that, oh, well, you can use statistics to make any set of data say anything?
[00:08:21] Howard Chang: Yes, no, that’s something that I actually say a lot when I teach our students, that one of the reason why statistics department is often under a science is because there are actually a lot of choices. It’s easy, for example, like you say, if you are a shady scientist, is you to throw some data out and then create more data and then try different ways to analyze the data, slicing, cutting the data different ways. That’s why we rely on the integrity of the scientific community, that some of the choices you make, be transparent about. So, we try to justify some of the choices we make, why we think this data might be collected incorrectly, so it’s best to not use it. And it’s just one data point that will change my story, right? So, I think transparency is very important. There’s been a great push for reproducibility. So, making sure that when possible, your data is made available and all your analysis culled. How you cleaned the data is also made available. So, if anyone wants to incorporate your data in their own analysis, they want to reproduce what you have done and double check, they can do that. And I think it’s becoming not a requirement, but very much encouraged.
[00:09:32] Anne Chappelle: Could you talk about biological significance against statistical significance? Because this season of Adverse Reactions is about hidden toxicology, you know, to hide underneath some of these numbers and say well, it’s not biologically significant. That is something from an essential toxicology standpoint and pharmacology, all these things, what is important and who gets to decide what is biologically significant?
[00:10:01] Howard Chang: So, I am actually a very big fan of what’s called descriptive epidemiology. So, we think about, what is the burden of a risk factor on the population? So, when we do some of our temperature study or some of our air pollution study, the risks we find are usually very, very tiny in comparison to, say, smoking. To help communicate the significance that we will try to convert that risk to a burden estimate. And because someone the environmental exposures, the entire population exposed to it and often exposed to that unknowingly. And so, we’ve tried to use the best available science statistics to try to pick it out. If I say, ozone air pollution leads to asthmatic exacerbation, what does that number look like? The state level, country level? I think that will help in terms of understanding the relative contribution of so many different risk factors. There’s the individual risks, there’s the community. And all the risks that we measure is on the population level. I actually don’t have a good answer for that because it’s philosophical. You have your own measure of looking at all the scientific data and recommendations we have and think about, in your own world, what will you interact with? And how do you minimize risk? What’s important to you? So, I think it’s, at the end it’s a very individualized choice.
[00:11:25] Anne Chappelle: You’re doing biostatistics and we talk about bias and confounding and relative risk and the burden estimate. So, the way that you’ve approached biostatistics has really been more in the public health area. When we were talking a little earlier, you said that you’re not toxicologist, but could you talk about intersection between biostatistics and toxicology?
[00:11:50] Howard Chang: Yes, so, statistics is used everywhere. So, for example, there are many different types of air pollution, different sources of air pollution. You mentioned traffic, electricity generation, power plants, is one, and of course wildfires is another major source in many regions the country. Knowing that air is bad for human health, as is well-established, something that we are very interested in is, what’s the most vulnerable population? So, let’s say, air pollution and asthma, right? Is it because these are mostly individuals who have uncontrolled asthma such that they are more vulnerable to, the asthma will exacerbate? And so, really understanding what are the subpopulations that are more at risk for the air pollution effects? I think that’s one way we can do a lot more. And so, whether communications or potential things to reduce exposure. Similar things to some of the heat waves work we do, that for many of us, extreme heat exposure might not be that big of an issue because we have air conditioning, but what’s the parts of the city or the population that are more vulnerable to heat exposure? And think about what the city can do when there’s a heat wave warning to target specific populations. So, I think there’s a lot of work my field; it’s challenging because, let’s say, for heat exposure, we can get everyone air conditioning, but that will drive up the electricity usage. There are lots of considerations to think about, how do we reduce risk?
[00:13:20] David Faulkner: You’ve done so much work on so many different topics, and I think that’s really cool. You’re in biostatistics, you’re a professor of biostatistics at Emory, which is extremely impressive. So, how does one get to your position?
[00:13:35] Howard Chang: My undergraduate degree in the beginning was actually in microbiology and immunology. So, I’ve always been interested health sciences, and in my early twenties it was the biotechnology what was a big thing. So, I’ve always been interested in human health. But later on, I found out that I’m just not a very good bench scientist.
[00:13:55] Anne Chappelle: That’s me! That’s me, too!
[00:13:58] Howard Chang: And I’ve always enjoyed analyzing data, collecting data. So, I ended up with the statistics major, and biostatistics is challenging because there actually aren’t that many undergraduate programs in biostat. So, I would say many of us are math majors, data majors, computer science or engineering majors who are interested health application that got us into it. And of course, there are many fields of biostatistics. I mostly work on environmental health, population health. There’s all the genomics. That’s also a very important. So, really all biomedical public health disciplines will need quantitative thinking. I think one great thing about being data person is we get to play in everyone’s backyard. Different exposures, different health outcomes will be called upon to help input some of the issues. I’ve been lucky in this, my collaborators, and they are always excited try out new methods, and when there are new, emerging data, they’re always excited about thinking, what’s the best way to analyze it? So, I would say I’m lucky in the sense that I’m excited with them.
[00:15:04] Anne Chappelle: So, you have graduate students, right? Postdocs? Where do they come from and where do they go?
[00:15:11] Howard Chang: Our biostat MPH or MSPH students, they really can go into many different fields. So, some do go into PhD degrees. But not really sure to biostat. There’s actually quite a few that goes into epidemiology, environmental health sciences, based on their interests. The masters degrees will teach them the quantitative skill sets that really can be applied to different areas of biomedical sciences. Definitely a large proportion of the students are going to pharmaceutical companies because, of course, being able to run trials and those data are very important, but then also a big proportion will go into government or state health departments. Of course, we’re next to the CDC, so many of our students have summer internships there and a good team.
[00:15:55] Anne Chappelle: So, how often you get into situation where your research colleagues like, oh, look at this data, and you go through and you’re like, nope. You are, your field is so critical to correct interpretation of the data, but I would say it tends be this hidden field, but it really is crux of—
[00:16:18] Howard Chang: Of everything, yeah.
[00:16:20] Anne Chappelle: Of everything, because if you don’t do your tests right, you make these associations, there’s all kinds of classic graphs you can see that the increase in avocado use associated with the opening of Whole Foods stores or something in the area. So, this is a really critical, and I think as a scientist, we all know important biostatistics is but how do you get that—
[00:16:45] Howard Chang: Message across to—
[00:16:46] Anne Chappelle: Or to get them to pay for you because you’re not free, you’re not free, either. So, how do you—
[00:16:53] Howard Chang: No, I’m the neighborhood free statistician.
[00:16:56] Anne Chappelle: Exactly! 5 cents for a P value, you know.
[00:16:59] David Faulkner: Right, right.
[00:17:00] Howard Chang: I’m always having to have coffee and talk to people about the problem. And I think there are situations where your data just don’t support your hypothesis, or there’s something wrong with data collection. And I think we just have to move on. But it’s a learning experience, you can gain something from everything. But often if a statistician is involved early on, a lot of the time, we can help them. I mean, there’s the dreaded Howard calculations for all every study. It may seem kind of trivial when it’s just part of your exam question or a homework question that everyone in the first statistics class have to do, but it does have a lot of implications leafing through with you. Do you have enough data? Because some of these experience are very expensive, so being able to have a good grasp and ensure success, it’s extremely important.
[00:17:50] David Faulkner: I’ve definitely read a few articles by statisticians saying that they don’t like the way the P values are used and that’s not how they’re supposed to be used. Could you, for the benefit of some of our more research-inclined listeners, tell us a little bit about the humble P value?
[00:18:06] Howard Chang: The humble P value. My personal view on the P value.
[00:18:07] David Faulkner: What do you it’s actually good for?
[00:18:13] Howard Chang: Personal opinion.
[00:18:14] David Faulkner: Okay. Right, right.
[00:18:16] Howard Chang: I think when we think about statistical inferences, saying something, all the way from our data, I think there are two goals, right? One is providing an estimate, a quantitative estimate, maybe with some uncertainty. The other goal is to say something with a binary decision: yes or no.
And whenever we use a P value, I think we’re in that second paradigm, that my goal is to declare something yes or no, making a hypothesis test. And the P value in my view is a way for the readers and the consumers to decide your own error or risk tolerance in making that decision. The faith that the infamous P value 0.05, for example, explicitly saying that my tolerate for risk is 5% here, that if I do this experiment many, many times, I will make the wrong decision 5% of the time. And sometimes in some settings, that might be too liberal. I might want to be only tolerate 0.1%; so, then I will look at the P value differently.
I think P value is a tool for those who write papers, do experiments, to present, but the onus is on the readers and those who are reading this study to understand whether or not based on my results and the P value I provided, it’s enough for me to make that decision. Of course, also taking into account the biological significance of whether or not estimated is relevant.
[00:19:50] Anne Chappelle: So, what do wish that people knew about biostatistics?
[00:19:56] Howard Chang: So, in terms of when I collaborate with people, I wish that they have a precise hypothesis of what do they want to get. Very clearly, what is the risk factors, what is the outcome, and the specific question they want to answer, instead of saying, I have this outcome, I have this exposure, I want some health effects—help me. So, I think the more precise and more tailored question helps science and helps study design, helps data analysis. And there are always more questions to ask, but to start somewhere, I think, will be most helpful.
[00:20:31] Anne Chappelle: It is really hard just do that, but that is the first part of being scientist, is writing the null hypothesis.
[00:20:38] Howard Chang: Even before you collect data.
[00:20:39] Anne Chappelle: You’ve got to boil it down into something really specific.
[00:20:43] Howard Chang: Yeah, and that translates to what they have to collect, how many participants, and what the model might look like. So, it kind of drives everything. So, without a scientific question.
[00:20:54] Anne Chappelle: Data is so expensive to generate. You want to collect lot of endpoints sometimes, because I only get one mouse and I only got one shot at this. But I could see how that could drive you crazy.
[00:21:05] David Faulkner: What about for people maybe outside of the realm of research? Because statistics are everywhere. When you see a statistic, what should your reaction be? How should people think about it?
[00:21:17] Howard Chang: You definitely want them see where the data have come from. Is it from large survey, a national or a representative survey, or is it from a very special subpopulation? We see that a lot during election years, but the good thing is that many of the media you can actually click on it and it takes you to the original paper, to preprints, so if you’re interested you can dive into it little bit more. But definitely representativeness of the data. And of course, the samples size, because like you say, statistics can lie. So, think about how that applies to yourself and whether or not you can really trust that number.
[00:21:52] David Faulkner: Yeah. It’s like, who’s generating this data and why would they want me to know this number? What are they trying to get me to do?
[00:21:58] Howard Chang: Right. And scientists are usually very open about all the limitations in the study and what are the next steps?
[00:22:05] David Faulkner: What are some things, some hidden things, you’d like to see toxicologists tackle?
[00:22:10] Howard Chang: So, first of all, I want you to know I mostly work in population health and epidemiology. Of course, we go hand-in-hand with toxicology. Our research has been really interested is heat exposure. So, how can extreme heat impact mortality or morbidity? But what’s particularly interesting is, what makes a person vulnerable? Is it the medication they are taking? Is there some pre-existing condition that makes them more vulnerable to dehydration, thermoregulation? So, I think there are all these mechanistic things that we’re starting try to decipher using population data that will be very interesting to know what the toxicology, filling the gaps.
[00:22:48] Anne Chappelle: I think there’s increased focus that a child is not a little adult and a grandparent is not an adult, either. And the toxicology and the metabolic changes of aging and of the young, and so I think that looking at slices of these populations will become more important as these issues become highlighted. Like, wait a minute—why this OK for this population? That’s fantastic. I’m really glad that I to talk to the real Howard Chang.
[00:23:21] David Faulkner: The real Howard Chang.
[00:23:23] Howard Chang: And the statistician. Thank you, Anne and David. It’s been a wonderful experience. This is probably the best part of our job—talk about research.
[00:23:30] David Faulkner: Yeah, yeah, you statisticians are the avenue through which all of the tox data actually becomes meaningful. That’s so fascinating to me. And then you have so many cool things that you’re working on, I regret that we only have the short period of time. But thank you so much joining us.
[00:23:46] Adverse Reactions “Decompose” Theme Music
[00:23:50] David Faulkner: And now, it’s time for the teaser. Next week, on Adverse Reactions,
[00:23:54] Anne Chappelle: the delicious world of Alex Lau,
[00:23:58] David Faulkner: or tox and treats, unconventional careers in food safety.
[00:23:58] Alexandria Lau: If you want to isolate that flavor and put it into like a pudding cup, it’s not just banana puree added to pudding. They have to chemically isolate those compounds and then add it back. So now, you’re eating banana compounds in a place that you don’t typically find it naturally, and so as a toxicologist, you have look at whether or not those additional uses of that flavor or component or whatever ingredient they’re adding is going to be safe over a person’s life.
[00:24:34] Adverse Reactions “Decompose” Theme Music
[00:24:40] Anne Chappelle: Thank you, all, for joining us for this episode of Adverse Reactions, presented by the Society of Toxicology.
[00:24:46] David Faulkner: And thank you to Dave Leve at Ma3stro Studios.
[00:24:49] Anne Chappelle: That’s Ma3stro with a three, not an E.
[00:24:52] David Faulkner: Who created and produced all the music for Adverse Reactions, including the theme song, “Decompose.”
[00:24:59] Anne Chappelle: The viewpoints and information presented in Adverse Reactions represent those of the participating individuals. Although the Society of Toxicology holds the copyright to this production, it has,
[00:25:10] David Faulkner: definitely,
[00:25:11] Anne Chappelle: not vetted or reviewed the information presented herein,
[00:25:15] David Faulkner: nor does presenting and distributing this podcast represent any proposal or endorsement of any position by the Society.
[00:25:21] Anne Chappelle: You can find out more information about the show at AdverseReactionsPodcast.com,
[00:25:27] David Faulkner: and more information about the Society of Toxicology on Facebook, Instagram, LinkedIn, and Twitter.
[00:25:33] Anne Chappelle: I’m Anne Chappelle.
[00:25:34] David Faulkner: And I’m David Faulkner.
[00:25:36] Anne Chappelle: This podcast was approved by Anne’s mom.
[00:25:39] Adverse Reactions “Decompose” Theme Music
[00:25:42] End of Transcript