Published in April 2025
We have previously discussed how generative artificial intelligence (genAI) has applicability in literature reviews and the development of global value dossiers. Here, we take a deep dive into its applicability to network meta-analysis (NMA). By making use of a case study and evaluating the strengths and weaknesses of AI for NMA, we explore its potential to improve the precision and speed of network meta-analyses and the further considerations required for NMA informing health technology assessments (HTAs).
In their 2024 article, Reason et al. examined the use of genAI to conduct semi-automated NMAs by using a large-language model (LLM; GPT-4) to:1
The study conducted this process twenty times for each of four example datasets, and compared the results of these AI-conducted analyses to those NMAs conducted without the use of AI.
Overall, the study showcases that using genAI for data extractions, code generation and report writing provides a fast and largely accurate method to automate an NMA workflow, even if human input may be needed at various stages and the accuracy of results and reporting need to be independently verified.
However, before wholly adopting this technology in NMAs within the pharmaceutical industry and specifically within the context of HTA, the following challenges must be considered:
While the approach described by Reason et al. automates large parts of the NMA workflow, there are significant aspects which it does not capture. For example, the study did not include a comprehensive feasibility assessment to determine which trials should be included, whether trials were of suitable similarity to compare, and which analyses should be conducted. The analyses conducted assumed that studies to be included were already identified and sufficiently homogenous for analysis via NMAs. However, this feasibility process usually consists of strategic, statistically and clinically informed decision-making which often requires collaboration between multiple stakeholders, such as statisticians, clinicians and other healthcare or HEOR professionals.
Any implicit ‘decision-making’ done by genAI is in a ‘black box’, so without transparency to the researcher. This contrasts with the standard HTA process, where each decision made needs to be clearly and strongly justified. For example, AI can be instructed to use specific NMA code (e.g. National Institute for Health and Care Excellence [NICE] recommended NMA model code). But the extent of precision in which AI could follow an analysis plan, and to what extent this would require human oversight is unclear. There may be efficiencies in AI interpretation of an analysis plan, but there is no guarantee (without oversight) that this would follow the intended instructions.
While genAI prompting may be an efficient way of completing key NMA steps, it is unclear how much time (if any) would be saved in practice for the analysis stage. Since existing NMA processes already incorporate a large degree of automation via standardized and quality-controlled template code, genAI prompting may not actually save time for this step. In contrast, data extraction and report writing are more promising ways for efficiency gains with genAI, even if we’d still fully recommend independent human modification and verification at these stages. That said, and at least initially – substantial time investments would be required to either manually adapt NMA reports generated by AI or develop detailed and case-specific prompts which ensure accuracy as well as adherence to specific style and content requirements in reporting.
Where available, national-level criteria for use of AI in HTA should be fulfilled. For many countries, there is currently a lack of guidance in HTA guidelines as to how AI fits within the analysis and review process. However, in the UK specifically, the NICE position statement regarding use of AI for evidence synthesis states that (amongst other criteria): there must be engagement with NICE regarding plans to use AI; AI should be used to augment rather than replace human input; its usage must be transparent and justified; considerations must be given towards compliance with copyright; and risks in use of AI must be reported.2 More specifically focusing on NMAs, NICE has noted that the use of AI is NMAs is less established than the automation of stages in the literature review process.
Formal guidance on best practice of the use of AI in evidence synthesis should be developed and built upon. For example, the Guidelines International Network (GIN) have published a set of key principles to abide by when using AI in evidence synthesis, and Responsible AI in Evidence SynthEsis (RAISE) guidance and recommendations are being developed by a collaboration of agencies, including the Cochrane collaboration, to provide targeted guidance on the use of AI in particular evidence synthesis roles.3, 4
Overall, AI shows potential to increase the efficiency and accuracy of network meta-analysis. Collaborations, such as the new AI methods group, which aim to spearhead adoption of AI for evidence synthesis methods will likely ensure much development over the coming years.
For use of genAI to conduct the analysis portion of an NMA workflow, care needs to be taken with respect to transparency of decision making, and adherence to regulations. Additionally, efficiency gains in this area are currently much less obvious since a large degree of automation typically occurs in running code manually (i.e. without AI).
For incorporation of AI in evidence synthesis methods for HTA, we recommend following national-level guidelines, such as those provided by NICE, where available. For many countries, there is currently a lack of guidance in HTA guidelines as to how AI fits within the analysis and review process. While national-level guidelines are being developed, guidance from collaborations such as RAISE and GIN could provide a useful framework to work towards.
While AI does not currently allow for end-to-end automation of the entire NMA workflow, transparency and close human oversight will be essential if this becomes possible in the future. In particular, there are components of NMAs, such as interpretation of feasibility of analysis, where human input (from Statisticians and clinicians) will remain invaluable. As such, we would always recommend that any comprehensive AI approaches are designed in a modular fashion to enable human input (and verification) at each stage. This should be reflected in future guidance for the use of AI in statistical analyses for HTA.
With intelligent usage growing over time, AI processes will become more established and refined, and there may well be a role for fully automated NMAs using genAI. However, unless future developments in the field manage to open the ‘black box’ of genAI, human verification of every step of the NMA process will remain essential if results are to be used in key decision-making processes.
References
If you would like any further information on the themes presented above, please get in touch, or visit our Statistics page to learn how our expertise can benefit you. Tristan Curteis (UK Head of Statistics), Laura Clark (Senior Statistician), and Matt Hempfling (Statistician) created this article on behalf of Costello Medical. The views/opinions expressed are their own and do not necessarily reflect those of Costello Medical’s clients/affiliated partners.