May 20, 2025

Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

20 minutes

Today, we’re speaking to Professor Martijn Schut, Professor of Translational AI in Laboratory Medicine and Professor Henk CPM van Weert, GP and Emeritus Professor of General Practice, both based at Amsterdam University Medical Center.

Title of paper: Artificial intelligence for early detection of lung cancer in GPs’ clinical notes: a retrospective observational cohort study

Available at: https://doi.org/10.3399/BJGP.2023.0489

In most cancers, the prognosis depends substantially on the stage at the start of therapy. Therefore, many methods have been developed to enhance earlier diagnosis, for example, logistic regression models, biomarkers, and electronic-nose technology (exhaled volatile organic compounds). However, as most patients are referred by their GP, who keeps life-long histories of enlisted patients, general practice files might contain hidden information that could be used for earlier case finding. An algorithm was developed to identify patients with lung cancer 4 months earlier, just by analysing their files. Contrary to other methods, all medical information available in general practice was used.

Transcript

This transcript was generated using AI and has not been reviewed for accuracy. Please be aware it may contain errors or omissions.

Speaker A

00:00:01.600 - 00:00:55.370

Hello and welcome to BJGP Interviews. I'm Nada Khan and I'm one of the associate editors of the journal. Thanks for taking the time today to listen to this podcast.

Today we're speaking to Professor Martin Schutt, who is a professor in translational AI and Laboratory medicine, and Professor Hank Vanwort, GP and Emeritus professor in General Practice, who are both based at Amsterdam University Medical Center. We're here to discuss their paper, which is titled Artificial Intelligence for Early Detection of lung cancer in GP's clinical notes.

So, yeah, it's great to see you both here today. And Martin, I'll come to you first.

I suppose we know that it's important to try and diagnose cancer early, but could you talk us through what's the potential for artificial intelligence here in terms of identifying cancer earlier based on patient records?

Speaker B

00:00:55.810 - 00:01:52.220

Yeah, that's a very interesting question because the potential kind of like goes hand in hand with the huge amount of interest in AI. And I think there are great opportunities. There are also great challenges.

But talking about the opportunities, especially in the context of the article that we wrote, is on the data side. So on the data side, the digitalization of electronic health records gives great opportunities.

A lot more is digitalized, and that means that we also, in our case, have access to free text, and that we, with the advent of the large language models, with also new developments in AI, we also have better ways of making use of those data. So those two combined creates a really interesting formula for big opportunities for AI in the general practice and healthcare in general.

Speaker A

00:01:52.300 - 00:02:05.960

And you mentioned access to free text records. So what GPs are typing into the record records?

But before we get into the study, can you just briefly describe what is natural language processing and how that can be used in free text records?

Speaker B

00:02:06.760 - 00:03:10.100

So we know that a lot of clinical risk scores, they work with features of patients, so their age and their gender or sex. And. But of course, a lot of information is also written up in unstructured way. And in our case that is text.

But we can also think of images and audio, and in that sense we have access to that data by different ways, which natural language processing is one of them. And it means that we give AI access to this text through, for example, advanced models like we now have, like ChatGPT users.

But that's only one extreme of the spectrum that we can talk about, because you could also imagine that we just simply look with keywords through the text, and then if certain keywords were mentioned, that you include that in the information that is available to your Docu to your, to your model.

Speaker A

00:03:10.260 - 00:03:18.820

And Hank, I don't know if you want to comment on just what we know already about clinical scoring systems for early diagnosis of cancer.

Speaker C

00:03:19.140 - 00:04:21.310

The problem with what we already know is that we know things because they have been coded in the past. If, if you look at the ways to access data, the only way to access data was by using codes.

And the big jump forward is made by using not only codes, but also text, because codes will always be replicating themselves.

By which I mean that a GP who likes to, to have to make notes of what he has been speaking about with patients, he cannot code all the things that he will write down.

So codes will always form a very exquisite extraction of the content of a consultation and will never present us with new information because codes only exist when the information was already there. Otherwise there will be no codes. Just so implicitly there is be a replication of what we know when we have to code our things.

Speaker A

00:04:21.899 - 00:04:49.139

Yeah, absolutely.

And I work with a colleague called Sarah Price who's done some research around coding and she's shown in her research that clinical coding can be biased depending on the outcome. So people who have bladder cancer, they're more likely to have codes for hematuria or blood in the urine.

So, yeah, there could be a discrepancy in how clinicians code things rather than write it in the free text.

Speaker C

00:04:49.139 - 00:05:09.160

Yeah, because in the past there has been done some marvelous research by Willie Hamilton, Hamilton and for example, and Judy Hippisley Cox is well known, but they had to use codes. So there was never a jump forward. And I think that now with the aid of natural language, we can make a jump forward.

Speaker A

00:05:09.559 - 00:05:36.620

And the methods that you use here are quite complex, but I'll try to summarize it briefly.

So essentially you analyze the electronic health records of over half a million Dutch patients and used these natural language processing techniques and machine learning to look back in the records of people diagnosed with cancer. And then you look to see what data in those records could be used to predict lung cancer.

But is there anything you want to add to that, just for a lay audience? Martin?

Speaker B

00:05:36.700 - 00:06:20.170

Yeah, one nuance, a small correction on that is that we don't only look at the patient with cancer, but we look at the cases and controls. So we both look at that because the AI needs to be able to distinguish the case from the controls.

I think that's one important distinction because in healthcare, fortunately, we always have to do with low prevalences. We don't have too many patients compared to the healthy patients. That is Something of what the complexity of these kinds of models is.

I think that is also important to realize when you develop these kinds of models.

Speaker C

00:06:20.810 - 00:06:22.250

May I add something because.

Speaker A

00:06:22.250 - 00:06:22.730

Yes, please.

Speaker C

00:06:22.730 - 00:07:09.230

Because if you look at the, the scientific side of it, then if you develop a prediction model for, for a cancer, for example, then you have to do that with a logistic regression method. And logistic regressions can, can contain many variables, but not as many as you can use when you, when you can use new large language models.

So you can also analyze many more variables. But you can. That's one point. And the second point is that you can analyze those variables in connection to each other.

Great advantage compared to the past. So if you look at the model that we are, we used for this research, I think we use two layers of 100 variables in different relations to each other.

So that gives you 100 times, hundred possibilities.

Speaker A

00:07:09.470 - 00:07:14.630

Talk us through what you did develop here. So what? Talk us through that. Maybe Martin, you can try to explain.

Speaker B

00:07:14.630 - 00:08:21.700

Yeah, Can I start with. So we picked up a signal.

So we develop prediction models taking into all of these, what you said, over half a million patients, all the clinical notes, the consultations that they had, put it in a prediction model. We pick up a signal, we can make a prediction model that can. That performs well. So that's one.

But the second step is that ideally we would also like to get some information from that model. It's like, what do you use to predict what does contribute to a prediction for lung cancer?

And then we come to the nature of the complex methods that we use is that they are black box. We are not able to open them up and see what is in them.

And that is actually, I say, planning forward that we would like to peek into those boxes to see like, what triggers these predictions for lung cancer, which can then be again used in clinical knowledge and independent of the algorithm or the model that we developed.

Speaker A

00:08:21.780 - 00:08:34.980

And the model that you developed actually performed quite well in terms of the sensitivity of the model in terms of distinguishing which patients should be referred for potential lung cancer symptoms.

Speaker B

00:08:36.669 - 00:09:05.149

Correct. I'm going to end that off to Henk.

Maybe just say in between that, when I mentioned predicted performance, I'm talking about the C statistic or the area under the curve, which is the first, how you say, performance criteria. If that doesn't go well, then we should try other things. But that performed well.

And then we translate those indeed into clinically relevant specificity sensitivity. And that's where Hank played a big role.

Speaker A

00:09:05.490 - 00:09:06.450

Yeah, go ahead, Hank.

Speaker C

00:09:06.770 - 00:10:48.000

Yeah. First I'd like to say something about the content of what we found because we did a small exercise to, to discover what was inside the black box.

But that's. Therefore we need much more money to do a good project to, to come up with that. But we found some predictions which were quite astonishing.

The, the thing, two things I, I always tell as an example, and the first thing is that when a GP starts to prescribe incontinence material to a man, then he has a risk for lung cancer, which you can, you can of course explain, because if you have lung cancer, you start coughing and then you start coughing, there is, there is a small chance that you, you wet yourself. And the other thing we found is that the number of slashes which was in the file was related to the, to the risk on lung cancer.

And that was quite a big question for us what that would mean. And at the end we came up with the explanation that there is a connection between lung cancer and cardiovascular diseases.

And that connection is, of course, smoking and GPS always use a slash to note blood pressures. So if you have a lot of slashes in your file, you have a lot of blood pressures noted.

And if you have a lot of blood pressure noted, then you probably will have a high blood pressure, which is related to lung cancer. That are two small explanations of what you find inside the black box, as we now used.

And if you see what's in, you can always think of an explanation, which is the funny thing, of course.

Speaker A

00:10:48.720 - 00:10:58.240

Yeah. So do you think models like this could help clinicians target investigations like chest x rays or CTs in people who might be at risk of lung cancer?

Speaker C

00:10:58.480 - 00:11:59.590

Of course. And why we did this, so is that you can of course use a model like this for a number of applications.

If you use it for a diagnostic, in a diagnostic way, you will have other concerns about your sensitivity and specificity than when you used in, for example, a screening way. If you look at screening, the number of positives will be much lower than when you used in a diagnostic sense.

So it is the way you want to use this algorithm which gives you the decision about what thresholds you will use. We worked out the 3% threshold because that is the referral threshold, which is defined by nice a few years ago.

And if you want to have 3%, then you have to. You need to investigate 33 people to find one with lung cancer.

Speaker A

00:11:59.830 - 00:12:26.810

Yeah. I'm also thinking about sort of the potential practical application of something like this in a practice.

So if you were bringing this sort of tool to a general practice, would you be able to Then suggest sort of what thresholds they would be interested in or what the availability was of certain tools like chest X ray or how do you think that this could be applied in practice? And are there more 10 or. Hank?

Speaker C

00:12:26.970 - 00:12:35.450

Yeah. For example, now if, if I would make, would have to make the choice now I would go for the 3% because that is the advice threshold by Nice.

Speaker A

00:12:36.250 - 00:12:38.010

Martin, do you want to add anything to that?

Speaker B

00:12:38.090 - 00:13:14.510

Yeah.

It's interesting that talking about thresholds, that it is important to realize that these models are not fixed in the sense of you can configure them with a different threshold depending on the evasiveness of a follow up action, the costs of a follow up action, the severity of the disease. So extending this to other diagnosis, to other conditions.

But it's important to realize that these models are kind of like moldable to still use one model in different situations.

Speaker A

00:13:15.030 - 00:13:33.270

And just in terms of applying something like this, how do you imagine it might work at a practice level, at that GP's level? So might it suggest an alert or something if a patient was above a certain threshold to trigger an investigation?

Or how do you envisage this being used in practice?

Speaker B

00:13:34.070 - 00:14:47.990

Could very well manifest as a flagging system. But still looking at bringing a model from theory or from research into practice has a number of steps which in this case still need to be done.

So we took data from three big cities in the Netherlands on which we externally validated models that we used. So we developed the model in one city and then externally validated in the other two.

So that's one big step is external validation, but then also the clinical uptake, setting the thresholds, the technological infrastructure in different GP systems and connections to other systems.

And when you do the updating, that's, that's another big challenge and also the step to maintaining the model afterwards because it's not something that we set and then it's fixed in time.

Of course we have to be open, we have to be aware of the fact that these models need to be maintained and we have problems of drift and the setting might be changed and say that might have different application, how things are registered, which all has implications as to how useful this model remains in practice.

Speaker A

00:14:48.150 - 00:15:41.470

And one thing I wanted to touch on is that you mentioned that these sorts of models will use hundreds of different variables.

And I think the way that a lot of GPS practice when they're thinking about cancer is they're thinking about maybe five to 10 alarm symptoms or red flag symptoms that they're attuned to.

So when their Patient presents with that, they kind of are already thinking, right, I need to be doing something, maybe doing, making referral or ordering more tests.

But in this sort of model, because there could potentially be hundreds of variables, it's more that the system is learning or as Martin says, flagging which patients might need anything further alongside the clinician's intuition or concern about a patient's symptoms as well. So it's in addition to the clinical intuition and thinking, thought processes as well.

Speaker C

00:15:41.790 - 00:16:49.340

Of course, this is very, this will be very disrupting in a GP's mind because he will have to refer patients who, who are not in his mind as at risk. And that's not what we, what we used to do.

I mean, the GP is somebody who would, who calculates the risks for patients and if the risks are low, are low, he will not refer in his mind. And if you don't know how a risk is, is made up, then of course the mind of a GP will be, will be in problems.

Because one thing you have to say, if you speed up the process of diagnosing cancer with four weeks, until now, what we see is that if you speed up surgery for four weeks, there will be a 6% decrease in mortality, which is a huge gain.

So I think that in the end gps will be prepared to accept that the system might be better than themselves, because that's the step you have to accept.

Speaker A

00:16:51.500 - 00:17:10.119

It's really fascinating work and obviously, as Martin has mentioned, there's a lot more work to be done for these AI driven and natural language processing driven models. But it's very exciting and I can already see the application potentially for lots of different cancers and not, not just lung cancer.

So is that where you're heading now with this?

Speaker C

00:17:10.599 - 00:17:25.319

Of course, this project is almost 10 years old now, so we saw, we saw in the, in the start, we saw the potential for, I mean, it's not only for cancer, even also for many other disease.

Speaker B

00:17:26.039 - 00:19:01.850

So in addition to that, indeed, what ankle also just mentioned, there's lots of variety in, in different words. Of course, let it be said, the different languages is also a challenge.

If you challenge, if you look at texts, which is also something we have to tackle technically for the different models and approaches that we have, but also clinically that these words have different meanings.

And then also what you say is like, yes, this was for lung cancer, we did similar work for

...more

View all episodes

By The British Journal of General Practice

May 20, 2025

Using artificial intelligence techniques for early diagnosis of lung cancer in general practice

20 minutes

Title of paper: Artificial intelligence for early detection of lung cancer in GPs’ clinical notes: a retrospective observational cohort study

Available at: https://doi.org/10.3399/BJGP.2023.0489

Transcript

This transcript was generated using AI and has not been reviewed for accuracy. Please be aware it may contain errors or omissions.

Speaker A

00:00:01.600 - 00:00:55.370

Hello and welcome to BJGP Interviews. I'm Nada Khan and I'm one of the associate editors of the journal. Thanks for taking the time today to listen to this podcast.

So, yeah, it's great to see you both here today. And Martin, I'll come to you first.

Speaker B

00:00:55.810 - 00:01:52.220

Speaker A

00:01:52.300 - 00:02:05.960

And you mentioned access to free text records. So what GPs are typing into the record records?

But before we get into the study, can you just briefly describe what is natural language processing and how that can be used in free text records?

Speaker B

00:02:06.760 - 00:03:10.100

Speaker A

00:03:10.260 - 00:03:18.820

And Hank, I don't know if you want to comment on just what we know already about clinical scoring systems for early diagnosis of cancer.

Speaker C

00:03:19.140 - 00:04:21.310

The problem with what we already know is that we know things because they have been coded in the past. If, if you look at the ways to access data, the only way to access data was by using codes.

And the big jump forward is made by using not only codes, but also text, because codes will always be replicating themselves.

By which I mean that a GP who likes to, to have to make notes of what he has been speaking about with patients, he cannot code all the things that he will write down.

Speaker A

00:04:21.899 - 00:04:49.139

Yeah, absolutely.

So, yeah, there could be a discrepancy in how clinicians code things rather than write it in the free text.

Speaker C

00:04:49.139 - 00:05:09.160

Speaker A

00:05:09.559 - 00:05:36.620

And the methods that you use here are quite complex, but I'll try to summarize it briefly.

But is there anything you want to add to that, just for a lay audience? Martin?

Speaker B

00:05:36.700 - 00:06:20.170

I think that is also important to realize when you develop these kinds of models.

Speaker C

00:06:20.810 - 00:06:22.250

May I add something because.

Speaker A

00:06:22.250 - 00:06:22.730

Yes, please.

Speaker C

00:06:22.730 - 00:07:09.230

So you can also analyze many more variables. But you can. That's one point. And the second point is that you can analyze those variables in connection to each other.

Great advantage compared to the past. So if you look at the model that we are, we used for this research, I think we use two layers of 100 variables in different relations to each other.

So that gives you 100 times, hundred possibilities.

Speaker A

00:07:09.470 - 00:07:14.630

Talk us through what you did develop here. So what? Talk us through that. Maybe Martin, you can try to explain.

Speaker B

00:07:14.630 - 00:08:21.700

Yeah, Can I start with. So we picked up a signal.

But the second step is that ideally we would also like to get some information from that model. It's like, what do you use to predict what does contribute to a prediction for lung cancer?

And then we come to the nature of the complex methods that we use is that they are black box. We are not able to open them up and see what is in them.

Speaker A

00:08:21.780 - 00:08:34.980

Speaker B

00:08:36.669 - 00:09:05.149

Correct. I'm going to end that off to Henk.

And then we translate those indeed into clinically relevant specificity sensitivity. And that's where Hank played a big role.

Speaker A

00:09:05.490 - 00:09:06.450

Yeah, go ahead, Hank.

Speaker C

00:09:06.770 - 00:10:48.000

Yeah. First I'd like to say something about the content of what we found because we did a small exercise to, to discover what was inside the black box.

But that's. Therefore we need much more money to do a good project to, to come up with that. But we found some predictions which were quite astonishing.

And that was quite a big question for us what that would mean. And at the end we came up with the explanation that there is a connection between lung cancer and cardiovascular diseases.

And that connection is, of course, smoking and GPS always use a slash to note blood pressures. So if you have a lot of slashes in your file, you have a lot of blood pressures noted.

And if you see what's in, you can always think of an explanation, which is the funny thing, of course.

Speaker A

00:10:48.720 - 00:10:58.240

Yeah. So do you think models like this could help clinicians target investigations like chest x rays or CTs in people who might be at risk of lung cancer?

Speaker C

00:10:58.480 - 00:11:59.590

Of course. And why we did this, so is that you can of course use a model like this for a number of applications.

And if you want to have 3%, then you have to. You need to investigate 33 people to find one with lung cancer.

Speaker A

00:11:59.830 - 00:12:26.810

Yeah. I'm also thinking about sort of the potential practical application of something like this in a practice.

Speaker C

00:12:26.970 - 00:12:35.450

Yeah. For example, now if, if I would make, would have to make the choice now I would go for the 3% because that is the advice threshold by Nice.

Speaker A

00:12:36.250 - 00:12:38.010

Martin, do you want to add anything to that?

Speaker B

00:12:38.090 - 00:13:14.510

Yeah.

But it's important to realize that these models are kind of like moldable to still use one model in different situations.

Speaker A

00:13:15.030 - 00:13:33.270

Or how do you envisage this being used in practice?

Speaker B

00:13:34.070 - 00:14:47.990

Could very well manifest as a flagging system. But still looking at bringing a model from theory or from research into practice has a number of steps which in this case still need to be done.

So we took data from three big cities in the Netherlands on which we externally validated models that we used. So we developed the model in one city and then externally validated in the other two.

So that's one big step is external validation, but then also the clinical uptake, setting the thresholds, the technological infrastructure in different GP systems and connections to other systems.

And when you do the updating, that's, that's another big challenge and also the step to maintaining the model afterwards because it's not something that we set and then it's fixed in time.

Speaker A

00:14:48.150 - 00:15:41.470

And one thing I wanted to touch on is that you mentioned that these sorts of models will use hundreds of different variables.

And I think the way that a lot of GPS practice when they're thinking about cancer is they're thinking about maybe five to 10 alarm symptoms or red flag symptoms that they're attuned to.

So when their Patient presents with that, they kind of are already thinking, right, I need to be doing something, maybe doing, making referral or ordering more tests.

Speaker C

00:15:41.790 - 00:16:49.340

Of course, this is very, this will be very disrupting in a GP's mind because he will have to refer patients who, who are not in his mind as at risk. And that's not what we, what we used to do.

So I think that in the end gps will be prepared to accept that the system might be better than themselves, because that's the step you have to accept.

Speaker A

00:16:51.500 - 00:17:10.119

So is that where you're heading now with this?

Speaker C

00:17:10.599 - 00:17:25.319

Of course, this project is almost 10 years old now, so we saw, we saw in the, in the start, we saw the potential for, I mean, it's not only for cancer, even also for many other disease.

Speaker B

00:17:26.039 - 00:19:01.850

So in addition to that, indeed, what ankle also just mentioned, there's lots of variety in, in different words. Of course, let it be said, the different languages is also a challenge.

And then also what you say is like, yes, this was for lung cancer, we did similar work for

...more