Sounding the Alarm
It’s 7am here in Utah, and already the sun is bearing down - time to get some thoughts on paper before the heat truly sets in!
In my recent Investigations & CAPA workshop, the question was posed something to the tune of: “is it OK that we use our enterprise chatbot for assisting with the drafting of deviations text?”. “Sure”, I replied – “AI assistants can assist with deviation management in a variety of ways, from historical trend analysis to root cause analysis, among others”. As this particular workflow is meant to primarily comply with 21 CFR Part 211.192 [investigations into unexplained discrepancies], I obviously had to follow-up with the following statement: “what documentation might you have available to the regulator to demonstrate a validated workflow?” Note deliberate wordsmithing here, I am referencing Process Validation Guidance, as the introduction of an AI assistant to meet a GMP predicate rule certainly needs controls to ensure a “high degree of assurance” that the workflow will produce an outcome commensurate with its risk to product/patient.
The PIC/s Guide for Data Management gives us an excellent roadmap here in its outline of the principles of Process/Data Governance [design – operation – monitoring]. If I was the regulator, I would simply ask the site to demonstrate each of the 3 governance principles have been met and the site has a clear understanding of the validated workflow. I would already have ‘targets’ in mind that must be answered during the inspection, here are just two of several:
Inaccurate Information: inaccuracy and hallucination is a well-known risk associated with the use of chatbots: making up information that is not true. Anyone who has used Chatbots knows this happens on a regular basis (I have to admit that I recently purchased an incorrect bike tube based on wrong information that the chatbot proposed as a statement of material fact. It was incorrect and I am now stuck with a useless tube, I should have verified with the user manual or Reddit but that takes time…). The root cause of this risk is difficult to pinpoint and even more difficult to control.
Inspection Strategy: ask the QA associate if all the facts included in the deviation text are statements of material fact. As they may have been drafted by the chatbot, did they perform interviews with staff to confirm the statements and conclusions made by the chatbot? Did the associate account for any recent process changes, as the model is proposing details based on an averaging/weighting of historical [and potentially outdated] knowledge (is unaware of the larger quality system, unlike the human). This is unlikely, because the convenience of using a chatbot makes assuming the accuracy of the model output too easy [going to interview staff takes time, for example]. Here I start to lose confidence in the ALCOA of the deviations data (specifically accuracy). Don’t blame the QA associate for this, it is simply human nature to take the path of least resistance, blame the head of QA or CDO for failing to understand the principles of ICH Q9… Assuming the accuracy of chatbot outputs is a disaster in the making – in my opinion it is only a matter of time until the recalls and regulatory actions start. Inaccurate data in deviations has always been a risk factor considered during inspection, but was not a main concern as it was very difficult to cite on a 483. In the future, it becomes easy if clear instructions on how to control the risk from inaccuracy and hallucination is not clearly identified (design stage), controlled (operations stage) and regulatory evaluated (monitoring stage).
Bias: As more repetitive information is fed into the data set and the model is automatically and/or manually fine-tuned, an echo chamber of repeating information is a risk factor that will occur if not governed. This is one manifestation of bias, among others. This risk is closely related to inaccuracy/hallucination, but it is important to break it out as a separate hazard as it must be controlled differently.
Inspection Strategy: request a list of root cause conclusions, and look for repeats. Well designed and controlled AI models are decent critical thinkers, but this takes expertise and risk management from the admin to the operations level. Without this expertise and active control strategy, bias is almost certain to manifest itself in repeated information/conclusions. Has the staff considered all potential relevant contributing factors prior to signing off (per the SOP), or was the model output assumed to be correct (or the SOP is silent). The PIC/s guide for inspecting a risk management program (a little known but excellent guidance) proposes 3 main observations in the universe of risk management, one of which is termed “unfair assumptions”. Here is where this becomes easy to cite.
Historically, these two bullet points have been cited countless times in FDA 483’s, obviously. The difference is that the scope of the observation is limited (2-3 deviations, or a small cluster considered deficient). The risk to the patient has always been present, but the diversity of humans in QA has mitigated the risk via limitation of scope. In the world of chatbots, the scope may be considered “all”, todo, tous, alle… Yes, the human is still in line, but can they still be considered an adequate risk control measure? I think it is time to sound the alarm, as there are currently an absence of ‘effective control measures’ in place and the entire deviations program is vulnerable to collapse during an external inspection. The ungoverned use of chatbots is reckless, and ultimately unfair to the patients we serve. We can do better.
Pete

