Publication
Assessing the quality and reliability of ChatGPT’s responses to radiotherapy-related patient queries: GPT-3.5 versus GPT-4
dc.contributor.author | Grilo, Ana | |
dc.contributor.author | Marques, Catarina | |
dc.contributor.author | Corte-Real, Maria | |
dc.contributor.author | Carolino, Elisabete | |
dc.contributor.author | Caetano, Marco | |
dc.date.accessioned | 2024-08-29T14:10:06Z | |
dc.date.available | 2024-08-29T14:10:06Z | |
dc.date.issued | 2024-06 | |
dc.description.abstract | Patients frequently resort to the Internet to access cancer information. Nevertheless, these online websites often need more content accuracy and readability. Recently, ChatGPT, an artificial intelligence-powered chatbot, signifies a potential paradigm shift in how cancer patients can access vast medical information. However, given that ChatGPT was not explicitly trained for oncology-related inquiries, the quality of the information it provides still needs to be verified. Evaluating the quality of responses is crucial, as misinformation can foster a false sense of knowledge and security, lead to noncompliance, and delay appropriate treatment. Objective: This study aims to evaluate the quality and reliability of ChatGPT’s responses to standard patient queries about radiotherapy, comparing the performance of GPT-3.5 and GPT-4. Methods: Forty commonly asked radiotherapy questions were selected and inserted into both versions. Responses were evaluated by six radiotherapy experts using a General Quality Score (GQS), assessed for consistency and similarity using the cosine similarity score, and analyzed for readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Statistical analysis was performed using the Mann-Whitney test. Results: GPT-4 demonstrated superior performance, with higher GQS and a complete absence of lower scores compared to GPT-3.5. The Mann-Whitney test revealed statistically significant differences in some questions, with GPT-4 generally receiving higher ratings. The cosine similarity score indicated substantial similarity and consistency in responses from both versions. Readability scores for both versions were considered college-level, with GPT-4 scoring slightly better in FRES (35.55) and FKGL (12.71) compared to GPT-3.5 (30.68 and 13.53, respectively). Both versions’ responses were deemed challenging for the public to read. Conclusions: While GPT-4 generates more accurate and reliable responses than GPT-3.5, both models present readability challenges for the public. ChatGPT reveals potential as a valuable resource for addressing common patient queries related to radiotherapy. However, it is crucial to acknowledge its limitations, including the risks of misinformation and readability issues. | pt_PT |
dc.description.version | info:eu-repo/semantics/publishedVersion | pt_PT |
dc.identifier.citation | Grilo A, Marques C, Corte-Real M, Carolino E, Caetano M. Assessing the quality and reliability of ChatGPT’s responses to radiotherapy-related patient queries: GPT-3.5 versus GPT-4. JMIR Preprints. 2024 Jun 27:63677. | pt_PT |
dc.identifier.doi | 10.2196/preprints.63677 | pt_PT |
dc.identifier.uri | http://hdl.handle.net/10400.21/17642 | |
dc.language.iso | eng | pt_PT |
dc.peerreviewed | no | pt_PT |
dc.relation.publisherversion | https://preprints.jmir.org/preprint/63677 | pt_PT |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | pt_PT |
dc.subject | Radiotherapy | pt_PT |
dc.subject | Artificial intelligence | pt_PT |
dc.subject | ChatGPT | pt_PT |
dc.subject | Large language model | pt_PT |
dc.subject | Patient information | pt_PT |
dc.title | Assessing the quality and reliability of ChatGPT’s responses to radiotherapy-related patient queries: GPT-3.5 versus GPT-4 | pt_PT |
dc.type | preprint | |
dspace.entity.type | Publication | |
oaire.citation.startPage | 63677 | pt_PT |
oaire.citation.title | JMIR Preprints | pt_PT |
person.familyName | Grilo | |
person.familyName | Carolino | |
person.givenName | Ana | |
person.givenName | Elisabete | |
person.identifier.ciencia-id | 6E17-A1D8-95BE | |
person.identifier.ciencia-id | 1216-EFA3-1E0F | |
person.identifier.orcid | 0000-0003-1986-8814 | |
person.identifier.orcid | 0000-0003-4165-7052 | |
person.identifier.rid | F-1012-2015 | |
person.identifier.scopus-author-id | 55936359500 | |
person.identifier.scopus-author-id | 25821697000 | |
rcaap.rights | openAccess | pt_PT |
rcaap.type | preprint | pt_PT |
relation.isAuthorOfPublication | 3f6308c3-b858-4307-bfc6-4533bc5181c0 | |
relation.isAuthorOfPublication | 77930d39-ed34-44dc-a4a6-9bf833e5e688 | |
relation.isAuthorOfPublication.latestForDiscovery | 77930d39-ed34-44dc-a4a6-9bf833e5e688 |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Assessing the quality and reliability of ChatGPT’s responses to radiotherapy-related patient queries_GPT-3.5 versus GPT-4.pdf
- Size:
- 593.15 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: