Using Data Science in the Real World: Expert Tips

Many businesses are deploying Big Data applications for competitive advantage, yet many of these businesses are “learning on the job,” using trial and error to do the best they can, with mixed results. To provide guidance, I spoke with two leading practitioners of data science discuss how this rapidly evolving technology is used in business today.

The experts:

Seth Clark, Senior Associate, Booz Allen Hamilton

Dr. James McCarter, Head of Research, Virta Health

Below the video, see edited highlights from our wide ranging conversation:

What are some challenges or hurdles you’ve encountered as you’ve helped develop data science?

S. Clark: I think one of the biggest challenges toward that cultural gap is just around trust. Human beings trust each other, and developing trust between two people has a sort of defined process. Usually, it just takes time. It takes experience. It takes many examples of someone saying they’re going to do something and then doing it. It’s harder to figure out how you trust an algorithm. How do you trust a data analytic that’s telling you to make a decision. That’s a much fuzzier thing for a human to try and trust.

So, I think focusing on cultural transformations that can help an organization from the technologist level, the data scientist and the developer through the executive leadership. To have a strong enough understanding of how data science works so that when there’s an output they can actually look at the data behind it and say, I see where this came from and I feel that this is trustworthy. We’re going to go out on a limb here and make some decisions based on these insights and see what happens. You have to have a little bit of faith that it’s going to turn out well.

J. Maguire: Jim, what about yourself? As you’ve encountered data science and used it, what sort of hurdles or challenges have you come across?

J. McCarter: Like Seth had said, my answer also tends toward one of culture, and the culture that you build. Over the last three years as we’ve built Virta, it’s been an amalgam of those coming out of the clinical practice of medicine and research world, and marrying that with people coming out of a fast moving software and Silicon Valley culture.

I think part of it is trying to understand what is the purpose of the algorithm you’re building, or what is the purpose of the data you’re generating. It’s sort of a question of quality versus speed. So, are you trying to make an internal decision that you need today that’s going to impact the next two weeks? Or are you putting together a data set that’s going to be submitted for peer reviewed publication and is going to have to last for decades. Trying to find that kind of balance between rigor versus speed is something that we’ve gotten better at over the last couple of years.

J. Maguire: I think it was Mark Zuckerberg who famously said ‘move fast and break things’, which might be nice in a social media network, but I don’t think it’s going to work in a medical setting quite as well.

J. McCarter: I completely agree, yes. Don’t move fast and break things in a medical practice. What matters most to us is our patient outcome, safety and sustainability. So, anything that jeopardizes those is a direction we can’t have.

Can a data science practitioner “trust” an algorithm?

J. Maguire: It seems like one of the challenges is that the algorithm might change. I mean, if it has AI built into it or even if its going to evolve over time. How can a practitioner really trust his or her algorithm to know that it’s going to be right? In other words, this very tool that we’re using is a flexible, ever changing tool. It’s a piece of evolving software. How can we trust that?

S. Clark: You’re hitting on one of the topics that the field of artificial intelligence and deep learning is trying to come to terms with. There’s a bunch of aspects to trust. So, one is thinking of an algorithm less as, like, something that gives a black and white answer, and something that’s more of a virtual assistant. Something that’s giving you some advice, and that advice has some amount of confidence associated with it.

It’s in the same way that if I gave you some advice. If it’s something that I know a lot about, maybe you trust me. If it’s something that I don’t know a whole lot about, then my advice probably isn’t worth a whole lot. I’m giving you something but it may not be valuable because the things I know a lot about, music and sailing, give me a certain kind of knowledge that help me give a good answer that I don’t have in other spaces.

So, in the same way, making sure not to expect that an artificial intelligent algorithm is going to know everything, and understanding it’s limitations. A lot of those limitations come down to the data that you use to create these predictive algorithms. So, you often have to go down the stack a little bit to say how good is the data that we’re working with? Is this actually useful data? And can I trust the data that’s been training this algorithm? If you can’t trust the data you have to step down the stack a little bit.

So, I think it’s changing your mindset, and then also looking at all of the constitutive components that build up into a prediction to help understand whether or not you should trust this. I actually have the same question for Jim, now. I’d be curious, from his perspective as a clinician. How do you convince your patients to trust a predictive insight? I think using it for something like operationalizing a chemical plant is a lot different than convincing someone to take a certain course of treatment.

J. Maguire: I’m sensing that’s a really vital question. I’ll go with that one. For five thousand bonus points, Jim, what is your take on that? Are you either trusting the algorithm? Or, how do the patients know to trust the algorithm?

J. McCarter: Each of our patients has their own health coach and their own physician. As I mentioned, we’re a physician-led organization. So, the Virta doctors are all Virta employees, as are our health coaches who mostly dietitians and nurses and other health practitioners and clinicians. So, it’s really that personal relationship.

But, if you think about how do we actually reverse type 2 diabetes, we don’t do it by building an AI that replaces the doctor. What we do are two innovations. One is in the area of nutrition. It’s an approach called nutrition ketosis that’s very effective at reversing type 2 diabetes if you can make it work in the real world. The second, to do that in the real world, we’ve developed what we call continuous remote care. The idea behind that is that it’s like having a health coach and a doctor at all times. They’re available through an app.

So, we’re in touch with that patient multiple times a day, as opposed to several times a year. So, we think about our data science as being foundational to the delivery of that continuous remote care. So, rather than replacing the doctor, it’s like giving the doctor superpowers.

A key thing you’ve learned in terms of data science?

S. Clark: I think it comes back to ‘people come first’. I’m of the mindset that technical challenges are easy compared to people problems. Focusing on the team that you’re building and the way that you’re supporting that team and the way that you’re facilitating a diverse team is really important. There’s a big topic right now in the world of artificial intelligence around bias that’s built into artificial intelligence. You have a bunch of white dudes in their twenties developing a particular kind of algorithm. Is their life experience going to somehow imprint itself into the code that they’re writing?

The way that they’re training their data or the way that they’re selecting their data for training these algorithms? The answer is yes. It’s almost like as the technology becomes more capable. As companies like Nvidia are releasing new hardware that makes deep learning and really high end artificial intelligence available to all kinds of people, we have to focus more on the humanity part of it than on the technology part of it. Not to say that the technology is easy, but the human problems are even harder.

So, focusing on building a diverse team of people who can think about the ultimate value that artificial intelligence and data science is going to provide to humans, I think that’s essential.

J. McCarter: One thing I’d mentioned along the lines of scaling, that we’re trying to do a number of features that are similar to what might be called crowdsourcing. One is to build a patient community. That’s part of our intervention, in addition to biomarker feedback and online educational materials and a health coach and a physician, the fifth component of the intervention is an online community where patients are providing information to one another.

J. Maguire: It’s actually a peer-to-peer network.

J. McCarter: It’s a peer-to-peer network. It’s optional. If people don’t feel comfortable sharing they don’t have to. But if they’d like to. It can be as simple as recipes and menu choices for dining out. More often it’s emotional support. Sharing victories and sharing setbacks and asking others for advice and support. Another aspect of scaling that we’re working on now is that we’ve actually conducted the largest and longest trial for reversing type 2 diabetes. But, it’s only been two years so far and it’s only been five hundred people.

And so, as we now are treating many thousands of people we’ve created the Virta Health registry, which is an institutional review board, IRB approved protocol that our patients can consent to. We’re finding that over 80% of our patients are choosing to opt in. That allows their anonymized and aggregated data to be used for clinical research. So, that allows us to actually look at many thousands of outcomes as opposed to just hundreds. So, a couple things we’ve built first on the data science side have been predictive algorithms that have allowed us to know how our patients are doing over the next few weeks and months, and then to prioritize them for the health coach.

So, in the same way that there is an app that faces the patient, there is also an app that faces the health coach and the physician. So, rather than just seeing a long list of patients that you could spend time on as you begin your day as a health coach, we actually provide a prioritized rank of which patients are perhaps most in need of your care.

The way in which that rank is built is by looking at how are people’s glucose control predicted to go over the next few weeks, how is their weight predicted to trend over the next few weeks, how is their likelihood of retention and engagement in our treatment likely to go over the next few weeks. So, those are all built with data science algorithms that are based on baseline health characteristics of the patient, as well as day-to-day feedback that we’re receiving from them. And we’re continuing to refine those algorithms.

Data analytics and data science is evolving even as we speak. If we’re going to be having the same conversation in the years 2020 or 2022, what are you going to be talking about when we talk about data science?

S. Clark: This is my wish list. So, you know, one thing I’d love to see is the acceptance of artificial intelligence more broadly across the whole country and the whole world. There’s a stigma around some aspects of artificial intelligence, simply because it has a very scary science fiction vibe to it that turns people off, when in fact, a lot of artificial intelligence is going to used to basically just provide. Everyone gets their own personal assistant. So, imagine if you had your own personalized Siri that can help you with all kinds of aspects in your daily life, from small daily life stuff to your job. I would love it if I didn’t have to search through my company’s meeting room list to find an available meeting room where I can go get together with seven other colleagues. I want something to just do that for me. So that’s one area.

I think there’s another area where I’d just love to see better representation in the data science space. I’m really looking forward to a time and a space where you take any microcosm of citizens by race, gender, sex or creed, religion or any category you want to look and you slice and dice it, you find that equal representation working on data science and predictive analytics. I think it’s really important to have better representation so we don’t get into this situation where we have this unconscious bias. So, I’d love to see that.

And then I’d also love to see, coming from the federal government consulting world, I’d love to see the adoption of a lot more artificial intelligence in the way that the US governs itself. I think there are things that we do manually as human beings that could be done a lot better. If we could take advantage of that in the federal government, and the federal government is actually doing that to a degree, I think that’s going to improve citizen services. You know, find more efficient tax dollar usage and ultimately just result in a better life for Americans. So, those would be three things that I would love to see on my wishlist the next time we get together.

J. Maguire: I like the optimism, to be sure. Jim, what are we going to be talking about when we talk about data science a few years in the future. It’s hard to predict, but heck, why not give it a try.

J. McCarter: I think that data science together with software engineering, user experience design, the ability to be mobile and remote, those elements together are going to turn medicine upside down. So, if you think about how medicine is delivered today, I would say very unsuccessfully, right? We have a lot of capital investment in hospitals. People generally go to a medical clinic to see their doctor, but they only see their doctor maybe once a year, or maybe 2-3 times a year for fifteen minutes if they’re dealing with a chronic condition.

So, we deliver medical care in a very old fashioned way. But if you look at where the expense is in the system, the majority of dollars for medicine are being spent on chronic conditions. Mostly chronic metabolic conditions. The solution to those is not to build more hospitals. It’s not to deliver more pharmaceuticals and medical imaging and surgery through hospitals. The way to deal with that is through behavior change. The way to help support behavior change is through models like ours that use continuous remote care. So I think what you’re going to see over the next decade is a majority of dollars spent on medicine are going to shift from in person to remote.

Most of this care, most chronic metabolic conditions, can be dealt with remotely. They don’t require in person visits. If we want to get a handle on that cost curve and bring that cost curve down, it’s going to have to be using technology to drive behavior change, not technology to deliver more pharmaceuticals and surgeries.

J. Maguire: So, it’s really kind of a decentralized medical model that you’re talking about.

J. McCarter: I think so, yeah. The whole medical space is ripe for disruption. As you look at how you actually deliver that continuous remote care, data science has to be at the core.