A Network Meta-Analysis to Compare Effectiveness of Baricitinib and Other Treatments in Rheumatoid Arthritis Patients with Inadequate Response to Methotrexate

Background/Objectives This article compares the effectiveness of baricitinib (BARI) 4 mg (oral, Janus kinase [JAK] 1/2 inhibitor) versus other targeted synthetic/biologic disease-modifying antirheumatic drugs, in combination with methotrexate (MTX), in moderate-to-severe rheumatoid arthritis patients with inadequate response (IR) to MTX. Methods A systematic literature review was conducted to identify randomized controlled trials (RCTs) of the interventions of interest. Bayesian network meta-analyses (NMA) were used to compare American College of Rheumatology (ACR) responses at 24 weeks. A series of prespecified sensitivity analyses addressed the potential impact of, among others, baseline risk, treatment effect modifiers, and trial design on treatment response. Results Nineteen RCTs were included in the NMA (primary analysis). For ACR20, BARI 4 mg + MTX was found to be more effective than adalimumab (ADA) 40 mg + MTX (Odds Ratio [OR] 1.33), abatacept (ABA) 10 mg + MTX (IV/4 weeks) (OR 1.45), infliximab (IFX) 3 mg + MTX (IV/8 wks) (OR 1.63), and rituximab (RTX) 1000 mg + MTX (OR 1.63). No differences were found on ACR50. For ACR70, BARI 4 mg + MTX was more effective than ADA 40 mg + MTX (OR 1.37), ABA 10 mg + MTX (OR 1.86), and RTX 1000 mg + MTX (OR 2.26). Sensitivity analysis including 10 additional RCTs with up to 20% of patients with prior biologic use showed BARI 4 mg + MTX to be more effective than tocilizumab (TCZ) 8 mg + MTX on ACR20 (OR 1.44). Results for all sensitivity analyses were consistent with the direction and magnitude of the primary results. Key limitations include the time span in which trials were conducted (1999–2017), during which patient characteristics and treatment approaches might have changed. Conclusion This NMA suggests that BARI 4 mg + MTX is an efficacious treatment option in the MTX-IR population as evidenced by the robustness of results.

. List of criteria for the inclusion and exclusion of studies during the initial screening process (Level 1) Table S2. List of criteria for the inclusion and exclusion of studies during the Level 1b Screening process Table S3. List of criteria for the inclusion and exclusion of studies during the Full-Text Review Process (Level 2)  Table S9. All trials included in the analysis in MTX-IR population (N=29) -Overview Table S10. Overview of pre-planned sensitivity analyses Table S11. Percentage of patients achieving ACR20/50/70 response per trial and treatment arm: MTX-IR population Table S12. Primary Analysis: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR20 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S13. Primary Analysis: Relative treatment effect of pairwise comparisons expressed as post. Median odds ratios (with 95% CrIs) -ACR50 response at week 2 -MTX-IR (Simultaneous Fixed-effects model) Table S14. Primary Analysis: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CRIs) -ACR70 response at week 24-MTX-IR (Simultaneous Fixed-effects model) Table S15. Primary Analysis: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR20 response at week 24: MTX-IR (Simultaneous Random-effects model) Table S16. Primary Analysis: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR50 response at week 24: MTX-IR (Simultaneous Random effects-model) Table S17. Primary Analysis: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR70 response at week 24: MTX-IR (Simultaneous Random-effects model) Table S18. Baseline Risk-adjustment (Primary analysis): Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR20 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S19. Baseline Risk-adjustment (Primary analysis): Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR50 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S20. Baseline Risk-adjustment (Primary analysis): Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR70 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S21. Sensitivity analysis including trials with prior bDMARD use of up to 20%: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR20 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S22. Sensitivity analysis including trials with prior bDMARD use of up to 20%: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR50 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S23. Sensitivity analysis including trials with prior bDMARD use of up to 20%: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR70 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S24. Sensitivity analysis excluding trials conducted solely in Asia-Pacific and/or low MTX dose: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR20 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S25. Sensitivity Table S26. Sensitivity analysis excluding trials conducted solely in Asia-Pacific and/or low MTX dose: Relative treatment effect of pairwise comparisons expressed as Post. Median odds ratios (with 95% CrIs) -ACR70 response at week 24: MTX-IR (Simultaneous Fixed-effects model) Table S27. Model Fit Summary Figure S1. PRISMA Diagram Figure S2. Response rate in PBO+MTX arm vs. ln (Risk Ratio) -Primary analysis Figure S3. Sensitivity analysis excluding trials conducted solely in Asia-Pacific and/or low MTX dose: Network of Evidence -Simultaneous fixed effects: ACR20 Figure S4. BARI 4mg + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S5. TOFA 5mg + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S6. ADA 40mg + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S7. CZP + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S8. ETN + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S9. GOL 50mg + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S10. IFX 3mg + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S11. ABA 10mg + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S12. ABA SUBCUT + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S13. RTX + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses Figure S14. TCZ + MTX: Estimated posterior median ACR response rates across primary and main sensitivity analyses

Inclusion and Exclusion criteria
The inclusion and exclusion criteria was based on a strategy (Table 1) that identified the population and disease condition, interventions, comparators, outcomes, and study types of interest (also known as the PICOS criteria). The criteria listed in Table 1 were used after the initial, broad searches were completed and after the top-level list of articles (titles and abstracts) was identified.
However, due to the large number of studies included at the level 1 screen, the studies identified as being relevant for the review were re-screened using the more stringent criteria presented in Table 2 (Protocol Amendment). The criteria listed in Table 3 were used at the level 2 screening the full-text articles.   a If the disease severity of included patients was not clearly stated in the article, the following approach was used and validated by Lilly: if DAS-28 scores were reported, then DAS-28 scores of > 3.2 were considered to be moderate RA; DAS-28 scores of > 5.1 were considered to be severe RA. If DAS-28 scores were not reported, then swollen and tender joint counts both > 6 was considered to be a good proxy for moderate to severe RA.

Interventions and comparators
b Systematic reviews and meta-analyses will be used only for identification of primary studies that may have been missed in the electronic searches.  Table 2  Same as Table 2 Interventions  Interventions listed in Table 2 that meet the following criteria: -Licensed treatments at the labelled doses -Treatments not yet licensed in any form or dose  Same as Table 2 Comparators  Same as Table 2  Same as Table 2 Outcomes a To be included in the review, a study must report at least 1 of the outcomes of interest.  #23 "randomized controlled" NEXT trial* OR "randomised controlled" NEXT trial* OR "randomized clinical" NEXT trial* OR "randomised clinical" NEXT trial* OR randomized NEXT trial* OR randomised NEXT trial* OR "random allocation" OR "double blind method" OR "single blind method" OR ((singl* OR doubl* OR treb* OR tripl*) AND (blind* OR mask*)) OR allocated NEXT random* OR random NEXT assignment* OR "open-label" NEXT trial* OR "open-label" NEXT stud* OR "open label trial" OR "non-blinded" NEXT stud* (  #14 su("Randomized Controlled Trials" OR "Randomized Controlled Trial" OR "Clinical Trials Phase III" OR "Clinical Trials Phase II" OR "Controlled Clinical Trials" OR "Controlled Clinical Trial" OR "Random Allocation" OR "Clinical Trials" OR "Clinical Trial") OR ti,su(randomized AND trial*) OR ti,su("phase 3" AND trial) OR ti,su("phase III" AND trial*) OR ti,su("phase 2" AND trial*) OR ti,su("phase II" AND trial*) 122,361

#15
dtype,su("Randomized Controlled Trial" OR "Controlled Clinical Trial" OR "Clinical Trial Phase II" OR "Clinical Trial Phase III" OR "Clinical Trial Phase IV" OR "Multicenter Study") OR ti,su("phase IV" AND trial*) OR ti,su("phase 4" AND trial*) 6,035 #16 ti,ab(randomized OR randomised OR randomly) 463,144 #17 ti,ab,su("randomized controlled" P/0 trial* OR "randomised controlled" P/0 trial* OR "randomized clinical" P/0 trial* OR "randomised clinical" P/0 trial* OR randomized P/0 trial* OR randomised P/0 trial* OR "random allocation" OR "double blind method" OR "single blind method" OR ((singl* OR doubl* OR treb* OR tripl*) AND (blind* OR mask*)) OR allocated P/0 random* OR random P/0 assignment* OR "open-label" P/0 trial* OR "open-label" P/0 stud* OR "open label trial" OR "nonblinded" P/0 stud*)     a-addition of trials that allowed for up to 20% of patients with prior bDMARD use (NA = not applicable as already part of primary analysis; Yes = added in analysis) b-exclusion of trials solely conducted in Asia Pacific and/or low/unknown dose of MTX (<7.5 mg/week) (No = excluded from analysis; Yes = included in analysis) c-labeled as "SUBCUT" (subcutaneous) in the analyses d-treatment arms not included in the analysis, i.e. only presented for reasons of completeness e-open-label trial, excluded via corresponding sensitivity analysis f-labeled as "cDMARD + MTX" in the analyses

Endpoints
The following endpoints were chosen for the analysis: • ACR response (20%, 50%, and 70% improvement in criteria) 30 Safety endpoints were not included as part of the NMA, as the majority of trials allowed the use of rescue therapy for the control arm if an ACR response of 20% was not observed. Hence, once patients on the control arm are allowed to be switched to the active treatment, there is no longer a common comparator for the network.
Therefore, only endpoints that are measured prior to rescue therapy (generally the 12-week outcomes) would have had a common comparator; however, in most of the publications, safety endpoints are only reported for the duration of the trial and not at intermediate endpoints.
Discontinuations were not included as part of the NMA for reasons that are also linked to rescue therapy. Discontinuation rates are included in many of the trials for ACR response, as most of the trials use an imputation method of no response for patients who have missing data. Therefore, discontinuations are already considered in the ACR response outcomes.

Overview of Analyses
NMA was conducted using Bayesian mixed treatment comparisons as described in the National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) Technical Support Documents (TSDs) 31 .
Two classes of models were assessed: 32 It was decided that the simultaneous model was to be used for several reasons: a) the data for both baseline and treatment effects come from the same sources, b) there were some networks that had zero cells and fitting this type of model increased the stability of the relevant models, and c) the evidence for several networks was sparse.
In addition, for sensitivity analysis, frequentist NMA using the Rücker method was conducted to assess the robustness of the results 33 . Note for cells with zero counts the following rule was implemented: if 1 arm in trial had 0 count then 0.5 was added to all arms for that trial.
The method based on Bayesian models was first proposed in 1996 by Higgins and Whitehead 34 .
The results of the different interventions from the included trials were combined by means of a Bayesian NMA using a logistic-regression model with a binomial likelihood distribution for the categorical (ACR response) outcomes. 35,36 As with any meta-analysis, NMA can be performed with a fixed-effects approach or a randomeffects approach 35,37 . With a fixed-effects model, it is assumed that differences in true relative treatment effects (whether estimated directly or indirectly) are only caused by the difference in treatment and no other factors. There is no heterogeneity in true relative treatment effects beyond differences in treatment effect caused by the differences in type of interventions compared. With a random-effects assumption, differences in trial-specific response rates (beyond the differences attributable to the actual interventions compared) are exchangeable and the heterogeneity is constant between the different comparisons. The choice for a fixed or were to be used instead. 41 The initial model runs used 3 chains, with a burn-in of 10,000 simulations, and estimating the posterior probabilities from a sample of further 10,000 simulations. In order to address insufficient convergence and auto-correlation, this was increased to 4 chains, a burn-in of 60,000 and a sample of 120,000.

Bayesian Mixed Treatment Comparisons Binary Endpoint Model for Network Meta-Analysis:
For the primary analysis we modelled the proportion of patients who experienced an ACR response (20%, 50%, and 70%). This follows a binary endpoint model where the underlying model is that of a logistic regression. Observed data were included in the model using a binomial likelihood where the probability ( ) of response for study and treatment is as follows: ,~� , , , � Where , is the number of events in treatment arm of study , and , is the total number of subjects in treatment arm of study .
Treatments included in the FE model were indexed as positive integers with the baseline treatment ( ) being the lowest index treatment in study . A logit link function was used to map the probability of response to the linear model such that for treatment arm of study: Where is the study-specific baseline term, and , − , is the study-specific log odds ratio of treatment compared to baseline . For study arms receiving the baseline treatment (ie, = ), this simplifies to the study-specific baseline term . A vague prior ~(0, 100 2 ) was used for the treatment effect coefficients.
The corresponding RE model replaces the constant treatment effect with the study-specific treatment effect , . This is normally distributed with mean , = ( , − , ) and variance (Note that this model was be equivalent to a FE model when 2 = 0). The following changes were made for the RE model: The parameters of interest modelled were the log odds ratios ( ) which provide the relative treatment effect for each treatment compared to the reference treatment in the analyses.
Estimates of these parameters were iteratively sampled using Bayesian methods (as described in the heading below). The parameter value can be summarised by calculating the mean and standard error of these samples (i.e., mean log odds ratio and corresponding standard error which can be converted to odds ratios). In addition, the CrIs can be estimated from these samples. These are similar to confidence intervals (CIs) in a Frequentist analysis; however, the interpretation differs as described below for a 5% significance level:

Zero Cells
Bayesian models with zero cells do not usually require special precautions (refer to 39 ).
However, in the case of the frequentist models (for the analysis of ACR70), we had to add the continuity correction of 0.5 to 0 cells for them to converge. This is known to generate biased estimates of effect size so these results should be interpreted with precaution. 44,45 Choice of the treatment effect: random vs fixed effects As with any MA, NMA can be performed with a fixed effects approach or a random effects approach 35,36 . In Bayesian NMA, it is assumed that differences in trial-specific relative treatment effect (beyond the differences attributable to the actual interventions compared) are exchangeable and the heterogeneity is constant between the different comparisons. Initially, both fixed and random effects models were performed for each endpoint.
For Bayesian NMA the choice for a fixed or random effects model for the primary analysis was evaluated on: • The model fit as measured by Deviance Information Criterion (DIC), • Assessment of the residual deviance, • Convergence of the models, • Sensitivity of the results, • Whether there is limited data to inform the random effects variance, • Whether there is evidence of the random-effects prior dominating the posterior simulations indicating that there is not enough data in the analysis to inform this additional parameter.

Inconsistency, Model Fit, and Convergence
It was pre-planned to explore the consistency assumption via the "node-splitting" approach as defined by Dias 46 . However, networks were primarily star-shaped, with only 2 closed loops. Both loops were informed by one trial, respectively. Therefore, this analysis was not performed.
Model fit was assessed with the DIC and the posterior mean of the total residual deviance. 47 Deviance measures the fit of the model to the data using the likelihood function. A good model fit is indicated by a total residual deviance approximately equal to the number of data points available. The DIC is a statistic that measures Bayesian model fit and penalizes the deviance by the model complexity. When comparing 2 DIC values, a difference of 5 or more is regarded as a meaningful difference. 48 Convergence was verified by trace plots, monitoring the Monte Carlo error, and with Gelman-Rubin diagnostics. 49

Sensitivity Analyses
In addition to the analyses described above, the following sensitivity analyses were pre-planned and performed:

Baseline-risk adjustment
In order to account for potential heterogeneity amongst ACR response rates in the PBO + MTX treatment arms, meta-regression models adjusting for baseline-risk were performed. 38,50 Inclusion / Removal of Trial Types Additional sensitivity analyses were conducted, which included or removed specific trial types, to investigate the potential impact of treatment effect modifiers. These were: • Inclusion of trials with prior bDMARD use in up to 20% of patients • Removal of trials solely conducted in Asian-Pacific; or low (<7.5 mg/week) / unknown dose of methotrexate • Removal of open-label trials. Table S10 provides an overview of all pre-planned sensitivity analyses.  Removal of trials due to inconsistency (node splitting) Not performed as only 2 closed loops coming from single trials, respectively (RA-BEAM, ATTEST)           Odds ratios >1 are in favour of Treatment 1; and odds ratios <1 are in favour of Treatment 2. Odds ratios >1 are in favour of Treatment 1; and odds ratios <1 are in favour of Treatment 2.