Using Propensity Score Matching Technique to Estimate Utilization and Costs of General Practitioners’ Services associated with Alzheimer’s Disease

Objective: General practitioners (GPs) play an important role in caring for people with Alzheimer’s disease (AD). However, the cost and the extent of service utilization from GPs due to AD patients are difficult to assess. This study aimed to explore the principles of propensity score matching (PSM) technique to assess the additional GP service use and cost imposed by AD in persons aged ≥60 years in Denmark. Design: PSM was used to estimate the additional use and cost of GP services attributable to AD. Case and control baseline characteristics were compared with and without the application of PSM. Propensity scores were then estimated using the generalized boosted model, a multivariate, nonparametric and automated algorithm technique. Setting: Observational data from Statistics Denmark registry. Subjects: 3368 cases and 3368 controls; cases with AD were defined as patients with diagnoses G30 and F00 and/or those with primary care prescriptions for anti-AD drugs from the years 2004 until 2009. Main Outcome Measures: GP service utilisation and costs attributable to AD. Results: PSM brought a large improvement to the balance of observed covariates among the cases and control groups. AD patients received around 20% more GP services and utilized services that cost 15% more than non-AD controls during a calendar year. Conclusion: AD patients utilize more GP services and incur higher costs as compared to their matched controls. The PSM technique can be an effective tool to reduce imbalance of observable confounders from register based data and improve the estimations.


BACKGROUND
Alzheimer's disease (AD) is a clinical syndrome caused by non-reversible neurodegeneration and is characterized by severe deterioration in cognitive ability. 1 The World Health Organization (WHO) has projected that AD and dementia will become the third-leading cause of burden of disease by 2030. 2 Around 15 000 new dementia cases are detected every year in Denmark. 3 The majority of dementia cases are AD patients.The prevalence of AD is projected to rise in the future with more elderly people and longer life expectancy.In Denmark, General Practitioners (GPs) are the primary caregivers for most chronic conditions.They play an important role in diagnostic evaluation and ongoing care as well as provide usual consultation services. 4It is important to ascertain the difference in GP services use and cost between those with and without AD to assist with planning for future disease management.When the attributable cost related to dementia is estimated, it is important that the comparison is made for people with similar characteristics.That can be achieved by some form of matching technique.Propensity Score Matching (PSM) appears to offer a better match of observational covariates than simple sub-group matching.In comparison to other traditional regression methods, PSM has two major strengths.First, the matched cases and controls are only selected if they lie within the pre-assigned 'common support' region, 5 while regression methods have to rely on functional form of covariates.Secondly, PSM does not require functional form assumptions since it is non-parametric.Regression methods are based on certain assumptions such as linearity that might or might not hold true.Thus, the main aim of this research was to explore the use of PSM to estimate the impact of AD on service utilization and costs of GP services among patients with AD of ≥60 years of age.

Subjects
The data for this study was made available by Statistics Denmark Research Services.Cases with AD were defined as inpatients and outpatients diagnosed with diseases having International Classification of Diseases (ICD)-10 codes G30 and/or F00 and those with primary care prescriptions for at least one of the drugs among donepezil, rivastigmine, galantamine and memantine from the year 2004 to 2009.The diagnostic data were obtained from the National Patient Registry (NPR) 6 and data related to drugs were obtained from the Danish National Prescription Registry (DNPR). 7From these, a total of 3378 living AD cases were extracted.
The control group consisted of randomly selected 302 436 people in the general population for each year from 2004 to 2009.The selection of the control group was done by Statistics Denmark.All the cases and controls below the age of 60 were excluded from further study because of the low incidence of AD and for comparability.Thus, the propensity score estimation was based on 3378 cases and 302 436 controls which were alive by the beginning of 2011, and were residents for the entire year.On matching at the ratio of 1:1 within calipers, a total of 3368 AD cases and the same number of non-AD controls were matched resulting in a total sample of 6736.A total of 10 out of 3378 cases did not match with controls within the set criteria (that is within calipers = 0.25 * propensity score).A detailed illustration regarding subjects' selection for the study is shown in Table 1.Standardized mean difference and percentage improvement in the mean difference between treatment and control samples were used to quantify the performance of the generalized boosted model (GBM) and logistic models.

Covariates
Covariates included marital status, region of residence, gender, age, highest completed education, comorbidity, hospital service utilization, early retirement status and taxable income.A dummy variable was created for living with a partner, including officially married (G), registered partners (R) and ambient living of two partners (L).The unmarried category also included divorced and widowed individuals.Similarly, a variable was created to recode the municipalities into five Danish regions.Education was grouped into seven categories: 'primary', 'secondary', 'bachelor', 'tertiary', 'long degrees', and 'PhD degree' and 'unknown' for missing data.In order to account for comorbidity, Charlson comorbidity scores were generated based on ICD-10 codes. 8Charlson comorbidity index (CCI) scores were categorized into 3 levels: Low, moderate and high comorbidity, with Charlson comorbidity index (CCI) scores of 0-1, 2 and ≥3, respectively.Personal taxable income for the year 2010 was used for income variable.A small proportion of income data were missing in 2010 and were retrieved from the previous years.Similarly, a dummy variable was formed to include information on early retirement status.

Outcome Variables
Two dependent variables were used: the number of GP services utilized and gross payment made to GPs.Information about these variables was obtained from Danish health insurance data.

Propensity Score Estimation: Generalized Boosted Model
This study used 'twang' package in R software for GBM to estimate propensity scores.GBM is a data-adaptive, multivariate nonparametric and automated algorithm that fits several models through regression trees and merges the predictions of each model. 9The number of trees assigned was 5000.The iterative fitting algorithm initiated with a single simple regression tree and a new tree was added at each new iteration, so that it provided the best fit to the model as compared to the previous iteration and increased the fit of the model for the data to create a perfect fit.The trees were indifferent to functional forms of covariates and provided the same propensity scores for any functional form of a variable specification.Four interaction depths were provided that determined the layers a tree can have.At this level of interaction depths, the software will consider two, three and four way possible interactions in the final model.A shrinkage factor of 0.01 was specified to reduce the impact of each additional tree in an effort to avoid over fitting.

Matching: Nearest Neighbor Matching within Calipers
This study used propensity scores obtained from GBM in order to match cases and controls to obtain matched dataset.'Matchit' package in R implements matching through semi-parametric and non-parametric methods. 10 caliper or an acceptable distance of 0.25 times the standard deviation of propensity score for each covariate was used in this study.'Matchit' analyses are 'doubly robust' in the sense that inferences will be statistically consistent if either the matching analysis or the analysis models is correct. 10The matching was done in such a way that the neighbourhood contains a control participant as a match for treated participant under the condition that absolute difference of propensity scores is the smallest among all possible pairs of propensity scores.

Estimation
After the PSM process was complete, Average Treatment Effect (ATE) and Average Treatment Effect on Treated (ATET) were estimated based on matched data using regression method in STATA 13.0.Estimations for service utilization and costs were carried out using Poisson regression and multiple linear regression, respectively.The concept of treatment effect is derived from Neyman-Rubin Framework. 9Mathematically: Where Y 1 and Y 0 are the potential outcomes; X = covariates included in the study and Z=1 and Z=0 denotes the presence or absence of AD, respectively.The 'E' denotes the expected value.For the sample data, the estimation was done using the following equation: This assumes that selection of AD and non-AD cases depended on observable covariates X. Conditional on X, treatment assignment was assumed to be un-confounded (y 0 , y 1 ┴ z)|X. 11Depending on underlying circumstances, difference in mean outcomes between two groups (Z=1 and Z=0) in the same condition is calculated.
Similarly, while estimating ATET, the interest was to find the difference between expected outcome values for patients with or without AD.Mathematical expression for estimating un-confounded ATET is given as:

Sensitivity Test
Rosenbaum sensitivity test for Hodges-Lehmann Point Estimate 12 was carried out to assess how robust the findings were to hidden bias due to unobserved confounder in this study.The 'hlsens' function within 'rbounds' package was used to carry out this test in R software.

Comparison of Performance of GBM and Logistic Methods in balancing Covariates
A standardized difference of less than 0.1 was considered to indicate a negligible difference in the mean of covariate between treatment groups. 13The standardized mean differences in terms of propensity scores for all covariates were less than 0.1 in the case of GBM but not so in the case of the logistic method, as illustrated in Table 2.Moreover, even though percentage improvement in mean difference looked similar for both models, such improvements were larger in the GBM model.

Baseline Characteristics before and after PSM
Table 3 shows the distribution of observable covariates in AD and non-AD groups.Imbalance in covariate distribution was greatly reduced by PSM.Estimations were, therefore, not impacted by the differences in observable confounding covariates.Apart from retirement status, none of the covariates have p-values less than 0.05 indicating a diminished difference between the groups.The mean age for both groups after PSM is almost identical (81 years).The mean difference of income was only around DKK (Danish Krone) 2900 in favour of non-AD subjects compared to DKK 21 000 before matching.Similar balances were also brought about in other covariates such as gender, marital status, educational status, regional distribution, retirement status, Charlson Comorbidity Index scores and hospitalization status.

Service Utilization
As indicated in Table 4, ATE and ATET values for service utilization varied widely prior to PSM.This might be due to the different distribution of exogenous covariates in the case and control groups.ATET estimation for number of services used by people with AD was 3 (95% confidence interval [CI]: 2-5) from the mean of 34 (95% CI: 34-35) GP services.ATE for an average individual was 17 (95% CI: 14-20) in addition to the mean value of 28 services.PSM, ATE and ATET values almost coincided.On average, an AD patient was likely to use 7 (95% CI: 5-8) more services than the average of 31 (95% CI: 30-32) services in the population.This finding was highly statistically significant (p<0.01).

Costs Related to GP Services
Table 4 illustrates the gross payments made (costs) for GP services.Prior to PSM, ATE and ATET values differed considerably.ATET value was DKK 97 more in addition to the mean of DKK 3639 in the controls (95% CI: 3608-3670).The ATE value prior to PSM was DKK 1976 (95% CI: 1553-2399) in addition to the mean of DKK 3143 (95% CI: 3131-3155).However, post-PSM, ATE and ATET values were similar.An average person with AD was likely to use services worth DKK 485 (95% CI: 296-676) more than the mean of around DKK 3250 (95% CI: 3129-3349) in the control group.# GBM is the model used for final estimations.
Propensity score estimation by logistic method was done using 'glm', a command for multiple logistic regression and balance statistics (standardized mean difference and percentage improvement in mean difference) were obtained using 'Matchbalance' command in R software with 500 bootstraps.Tertiary education includes Danish education level of 'Korte videregående uddannelser'; Long degrees include 'Øvrige mellemlange videregående uddannelser'.
GBM: General boosted model; GPs: General practitioners

Sensitivity analysis
Sensitivity analysis was carried out after separate matching process in R using 'Matching' package using logit model to estimate propensity scores along with 500 bootstraps which performed poorly as compared to GBM (Table 2).The maximum value of Gamma (Γ) was set to be 2 with increments of 0.1.In the absence of hidden bias, the median difference in gross payment made to GPs was only DKK 30.5.It is important to note that the median difference was smaller than the mean difference.Table 5 demonstrates that once the gamma value increased by 0.1, the bounds denoting the significance level included zero as the lower bound turned negative.This suggests the GP gross payment estimate was not robust and the finding was sensitive to possible hidden bias due to an unobserved confounder.In the sensitivity test for the number of services received from GPs, the lower bound turned negative only when the gamma value increased to 1.3.The median difference in number of services received in absence of hidden bias was 4.

Discussion
This study used PSM technique that has a number of advantages over traditional regressions [14][15][16] and experimental evaluation techniques. 17First, it avoids ethical considerations that occur in Randomized Controlled Trials when both treated and control groups do not receive equally effective treatments.Second, data generation is usually less costly as already available data can be used.Third, possibilities regarding the loss of treatment and control patients are less as compared to randomized assignment.Compared to experimental techniques, PSM has certain limitations.First, PSM can only take into consideration the observable characteristics, as it requires conditional independence assumption, while in other experimental techniques with randomization the treated and non-treated populations are similar for both observed and non-observed characteristics.Second, experimental techniques ensure common support across the whole sample through random assignment, while PSM can only estimate treatment effects provided there is common support among the control population for the treated ones.Third, PSM does not answer the distributional effects of the variables, such as the percentage of AD patients that utilized more services.
This analysis found increased primary care utilization and costs for GP services by the AD patients compared to non-AD controls (Table 4).This is an important finding for three reasons.First, it supports the intuitive view that patients with AD have higher health care needs than their non-AD counterparts.Secondly, this study identifies that these increased needs are at least partially being met.Finally, it is revealed that the costs of meeting this need is substantial and has significant implications for health care budgets, given the aforementioned ageing population.
9][20][21] A study carried out in the UK study found higher rates of primary care resource utilization in an AD cohort relative to general older adult control patients both prior and preceding the AD diagnosis. 18Similarly, a Dutch study reported increased contact frequency by preclinical dementia to their GPs more than controls. 21However, a few studies such as the ones conducted in the US 22 and France 23 showed no differences in primary care utilization among AD cases.
The number of additional GP services utilized by the AD patients compared to controls in the current study is consistent with the findings of the UK study that reported approximately 4 more consultations per 6 months on average. 18Service utilization in the current study was, on average, around 20% more for AD patients as compared to the non-AD controls but the cost was only around 15% more on average (Table 4).This can be explained by the fact that the services utilized by AD patients are comparatively low cost compared to other services in Denmark, such as email and telephone consultations.
Literature regarding use of GP services by AD patients is scarce, particularly for Denmark.One study investigated costs of patients suffering from all dementias in Denmark, although GP services were not a specific focus. 24he study found that demented cases utilized GP services less compared to the control group, although the overall cost for demented patients was significantly higher.Such costs were associated with a higher utilization of hospital services, rather than primary care.Kronborg et al. observed that the GP and medical consultation costs did not differ between patient groups.The mean costs found were almost half of the estimations in our study (Table 4).This might be due to the 15-year gap between studies during which the number of services, frequency of service utilization and costs might have actually increased in addition to annual inflation.However, such differences might have also arisen due to different methodologies adopted for estimations.

Limitations and Strengths of the Study
A potential limitation of this study is the discrepancy between the real figure of people with dementia and the number of cases used in this study.Moreover, sensitivity analysis revealed possible hidden bias for unobservable covariates.6] However, including stage-specific estimations was not possible due to lack of data.
The well-defined study population, the large sample obtained from a national database, controls matched for all observed covariates, based on propensity scores mimicking a RCT based on observed covariates and estimations based on completely matched data are important strengths of this study.Likewise, the use of GBM for estimating propensity score was efficient in dealing with uncertain functional forms of covariates and interactions among them, a major problem in model specification. 27Resource use and cost data were obtained from national databases containing data about actual payments made, thus eliminating recall bias.This is evident from the fact that ATE and ATET calculated from the matched data almost coincided.

Implications
This study has certain implications in terms of costs, research and methodology.The increased utilization and costs are likely to result in a higher total cost of primary care, an important implication for future health care resource allocation.This also has certain research implications.It was found that service utilization and costs among AD patients on average was over 20% and 15% higher, respectively.This might point to the fact that even though the AD patients utilize more services, the corresponding increase in costs might not be directly proportional.These findings imply that further research is required to confirm the pattern of service utilization and costs by the AD patients.Lastly, although PSM was successful in diminishing the imbalances in the covariate distribution among treatment groups, other efficient methods can be explored for better estimations.9] Genetic Matching brings about the reduction in biases even when the property of equal percent bias reduction (EBPR) does not hold.However, PSM performs poorly when such a property does not hold 29 and achieves better covariate balance than unadjusted analyses of the RCT data. 28

CONCLUSIONS
The study concludes that people with AD utilize more services as well as incur higher GP related costs.PSM aided in reducing the imbalance in observed covariates to large extent, provided a balanced matched data and improved the estimations.However, sensitivity analysis showed there may be some potential hidden bias.Inclusion of important health related covariates like quality of life and functional status to would be necessary to address the hidden bias.

Table 2 .
Comparison of Performance in Balancing Covariates by Logistic and GBM Methods *Log of Age was used while estimating propensity scores by logistic regression.

Table 3 .
Baseline Covariates: Before and After PSM

Table 3 .
Baseline Covariates: Before and After PSM (continued) The number of Alzheimer's patients (n=3378) is different from matched number of Alzheimer's and non-Alzheimer's patients, as 10 cases and controls did not match during nearest neighbor matching within calipers.PSM was done in two steps: First, estimation of propensity scores using generalized boosted regression.Second, nearest neighbor matching within calipers using propensity scores obtained in the first step.
*Mean difference; AD: Alzheimer's disease; GP: General Practice; PCPs: Primary Care Providers; PSM: propensity score matching Continuous variables expressed as mean ± standard deviation; categorical variables expressed as N (%); Tertiary education in this table includes 'Bachelor', 'tertiary', ´long degrees' and 'PhD and research' levels.

Table 4 .
Annual (2011) GP Service Utilisation and Costs AD: Alzheimer's disease; ATE: Average Treatment Effect; ATET: Average Treatment Effect on Treated; CI: confidence interval; GP: General Practice

Table 5 .
Sensitivity Analysis *Unconfounded estimate from Rosenbaum Sensitivity Test for Hodges-Lehmann Point Estimate Gamma is log odds of differential assignment to treatment due to unobserved factors DKK: Danish Krone; GP: General practitioner