Highest Rated Comments


nlpster3 karma

There has been a lot of interest in work on structured data (eg lab tests, clinical codes, patient meta-data), lots of progress on imaging data, but comparatively little work & progress on medical free text. Given so much information is is stored in free text, why do you think this is, and is it a problem for advancing clinical ML?

nlpster3 karma

I hear you on the awful data! In addition to your points, it turns out that quite a few clinicians can't type or spell either (under time pressure).

My concerns about the automated free text analysis lagging behind is motivated from an epidemiological perspective. There is a growing body of evidence that clinical retrospective studies are going to be a bit 'iffy', if they don't use the clinical notes [1] inter alia.

Some highlights taken from [1] :

  • Diagnostic codes are not always applied at the time of diagnosis (Tate et al 2011), 22% of patients have a free text diagnosis before a coded diagnosis
  • Some diagnoses are uncoded, and the diagnosis is only recorded in free text (11% in Bogon at al 2013)
  • Prescription codes underestimate the duration of long-term therapeutics usage when compared with free text notes in 6 / 28 patients, further patients had been prescribed but there was no prescription record (Close et al 2014)
  • Suicide was not recorded as the reason for death in 74% of evaluated cases (Thomas et al 2013), free text matched on 11%, reducing this to 63%.

The problem is, that manually reading these notes to extract the information is very time consuming for researchers. This task, if it could be automated, would be a real win for epidemiology. And yes, it's extremely hard...

[1] Price, Sarah Jane 2016What Are We Missing by Ignoring Text Records in the Clinical Practice Research Datalink? Using Three Symptoms of Cancer as Examples to Estimate the Extent of Data in Text Format That Is Hidden to Research. https://ore.exeter.ac.uk/repository/handle/10871/21692

edit : list formatting

nlpster2 karma

Thanks! I will have a listen to that, it looks really interesting. I think the correct link is here

Totally agree about healthcare dynamics. Deciding what a ground-truth is going to be for a project is critical and it's not something that you can decide before you explore the data. A lot of people are put off because they don't get a clean dataset to run some stats to test a hypothesis on, but I find handling the noise is where the fun is!