Does fecal calprotectin equally and accurately measure disease activity in small bowel and large bowel Crohn’s disease?: a systematic review
Article information
Abstract
Fecal calprotectin (FC) is a highly sensitive disease activity biomarker in inflammatory bowel disease. However, there are conflicting reports on whether the diagnostic accuracy in Crohn’s disease is influenced by disease location. The aim of this study was to undertake a systematic review of the published literature. Relevant databases were searched from inception to November 8, 2016 for cohort and case control studies which had data on FC in patients with isolated small bowel (SB) and large bowel (LB) Crohn’s disease. Reference standards for disease activity were endoscopy, magnetic resonance imaging, computed tomography or a combination of these. The QUADAS-2 research tool was used to assess the risk of bias. There were 5,619 records identified at initial search. The 2,098 duplicates were removed and 3,521 records screened. Sixty-one full text articles were assessed for eligibility and 16 studies were included in the final review with sensitivities and specificities per disease location available from 8 studies. Sensitivities of FC at SB and LB locations ranged from 42.9% to 100% and 66.7% to 100% respectively while corresponding specificities were 50% to 100% and 28.6% to 100% respectively. The sensitivities and specificities of FC to accurately measure disease activity in Crohn’s disease at different disease locations are diverse and no firm conclusion can be made. Better studies need to be undertaken to categorically answer the effect of disease location on the diagnostic accuracy of FC.
INTRODUCTION
Crohn’s disease (CD) is a chronic disorder characterized by transmural inflammation and patchy distribution in the GI tract. The importance of assessing ongoing GI mucosal inflammation in this condition lies in the fact that it helps predict course of disease [1-7], response to therapy [1], advent of complications [7], need for hospitalization and surgery [4]. To this effect, various studies have shown mucosal healing to be the best predictor of positive long-term outcomes [3-6]. Endoscopy is currently regarded as the gold standard test for assessment of mucosal healing [8]. However, it is expensive, invasive, associated with patient discomfort and has an associated small risk of serious complications, thus making it an unfeasible modality for frequent monitoring. Biochemical markers like CRP are inexpensive but have moderate diagnostic accuracy with a specificity of 0.92 (95% CI, 0.72–0.96) but a sensitivity of only 0.49 (95% CI, 0.34–0.64) [9], hence limiting its use as a disease biomarker.
Since the acutely inflamed intestinal mucosa is deemed to be neutrophil–rich, fecal tests based on neutrophil-derived markers are a realistic option for assessing mucosal inflammation. Among the various fecal markers of intestinal inflammation; fecal calprotectin (FC) is the one most commonly used in clinical practice [10]. FC has a sensitivity of 87% and specificity of 67% when used to detect endoscopic activity in symptomatic CD [9]. It accurately predicts the response to therapy as well as 1-year risk of relapse [11-13]. There are though conflicting reports on whether the diagnostic accuracy in CD is influenced by disease location. FC has been shown to have a lower specificity in CD than in UC and this might be driven through the different disease locations [14-16]. Some studies report that the FC level is lower in small bowel (SB) disease location compared to large bowel (LB) location [17,18], while others did not observe any difference [14,19]. We feel this is an important matter that could potentially either change practice or serve as a basis for downstream research. We thus aimed to undertake a systematic review of published literature and discuss the effect of disease location on the sensitivity and specificity of FC to accurately measure disease activity in CD.
METHODS
1. Criteria for Inclusion and Exclusion
Case control and cohort studies that provided data on FC separately by SB and LB locations were selected. Only those studies which had clearly mentioned the use of endoscopy, MRI, CT or a combination of these modalities as reference standard to assess disease activity were included [8,20]. The subjects included both adult and pediatric patients who had been diagnosed with CD on the basis of their clinical symptoms and supporting investigations (endoscopy, biopsies, imaging, blood and stool tests). We also included studies in which healthy volunteers and subjects with IBS were recruited as controls. We excluded studies focusing only on SB-CD and studies where the reference standard for activity used was based on clinical or biochemical criteria. We also excluded studies specifically dealing with postoperative CD as it would not have been possible to define the disease location as SB or LB if the recurrence was limited to the anastomosis.
2. Search Strategy
Our search included Medline, Embase, Web of Science and Cochrane Library from inception up to November 8, 2016 with the help of a senior librarian to obtain the appropriate studies. There were no language or publication restrictions applied while searching. Details of the search strategy are provided in the Supplementary Material 1.
Conference proceedings from Digestive Diseases Week, United European Gastroenterology Week, European Crohn’s and Colitis Organisation (ECCO) and British Society of Gastroenterology annual meetings over the past 12 years (2005–2016) were also searched for relevant additional studies. We performed a manual search from references in the included studies and pertinent review articles. We also searched the Grey Literature Database OpenGrey to check for eligible studies.
3. Selection
The selected studies were initially screened for eligibility by 3 authors (E.G.S., R.W., and A.A.T.). The abstracts were reviewed and those eligible were included for full text review. The full manuscripts were independently assessed (E.G.S. and G. W.M.) as per the inclusion criteria. If there were any disagreements, these were resolved by discussion and consensus with the other authors (S.S., R.W., and A.A.T.). Studies published only in abstract format were included as long as inclusion criteria were satisfied.
4. Data Extraction
Two authors (E.G.S. and G.W.M.) independently completed the data extraction forms for studies in the final selection list. The following data was collated: general information (journal, year, author, title), publication type (full paper or abstract), location, number of centers involved, study design (prospective/cross-sectional), total number of CD subjects and stratification based on disease location, age group (adult/pediatric/both), follow up period in months, FC levels with cutoff, clinical disease activity index, relevant reference standard (with appropriate disease activity score if provided), number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) and miscellaneous details. If any of the selected studies had missing data or needed clarification, multiple attempts through electronic mail were made to contact the authors to furnish the same.
5. Risk of Bias
To assess the risk of bias, QUADAS-2 was used (Supplementary Material 2). This is a research tool to check the quality of systematic reviews of diagnostic accuracy studies [21]. This was assessed independently by 2 authors (E.G.S. and G.W.M.) while any disagreement was resolved by consensus with coauthors (S.S., R.W., and A.A.T.).
6. Data Synthesis
Sensitivity and specificity in the SB and LB locations were separately derived by calculation from the information provided (i.e., TP, TN, FP, and FN) or as reported in the published literature.
RESULTS
The electronic data base search on November 8, 2016 identified 5,619 results. After the removal of 2,098 duplicates, 3,521 records were screened for inclusion. From the latter, 61 studies were deemed to be relevant and subjected to full text review. Thereafter, 45 studies [12,13,15,22-63] were excluded either because the numerical data on FC at SB and LB locations were not separately available or because the reference standards used did not conform to inclusion criteria. Finally, 16 studies were included in the qualitative review (Fig. 1) involving 328 patients with SB-CD and 332 patients with LB disease location.

PRISMA flow diagram. aSixteen studies, numerical data not available for fecal calprotectin (FC) at large bowel and small bowel locations separately; 16 studies, reference standards for assessment of disease activity were different from those mentioned in inclusion criteria; 13 studies, both numerical data for FC at the 2 locations were not separately available and reference standards used for assessment of disease activity were different from those mentioned in inclusion criteria. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
1. Study Demographics
Fourteen relevant cross-sectional [17,19,64-75] and 2 prospective studies [14,76] were published between 2008 and 2016 (Table 1). Four of the 16 studies were published as conference abstracts [65,69,73,74] although 1 of these was subsequently published as a full text article [66]. Almost all the studies were single/dualcenter based other than the studies by Faubion et al. [66] and Lin et al. [76] which were multi-center. All the studies were performed in Europe and North America apart from a single study originating from Asia [76]. The majority of the studies involved adult subjects. The study by Jones et al. [64] included 23 subjects who were less than 16 years while the youngest subject in the study by Jensen et al. [19] was 16 years. With regard to the reference standards utilized; 11 studies used endoscopy, 2 used endoscopy and CT in combination while there was one each for MRI alone, endoscopy and MRI in combination and a composite assessment of endoscopy/capsule endoscopy/surgery (Table 1).
2. Risk of Bias Assessment
With regard to QUADAS-2 risk assessments of the selected studies (Table 2), only a single study [75] scored low in all 4 domains of risk of bias and the domain of concern for applicability. There were again just 3 studies [64,71,76] by which scored low in 3 domains of risk of bias. Most studies had an unclear risk of bias in patient selection. With respect to the index test, there were 3 studies that had high risk [67,68,72] while one [69] was unclear. The studies were almost evenly distributed between low and unclear risk with regard to the reference standard. Eight studies had either high or unclear risk of bias under subject flow and selection [18,64,66-68,70,73,74]. There were just 6 studies [64,67,70,72,75,76] which had low concern for applicability under subject selection.
3. Sensitivity and Specificity of FC by Location
The data on the effect of disease location on FC is heterogeneous (Table 3). Some studies [17,18,67,69] showed that the FC was significantly higher in LB vs SB location while others [14,19,68,70,74-76] did not corroborate this finding, though absolute values have limited value.
The studies by Jones et al. [64], Sipponen et al. [71], and Zittan et al. [73] showed that FC significantly correlated with the reference standard only at the LB location but not at the SB location while the other 2 studies [67,72] showed that FC correlated with the reference standard at both the locations (Table 4). The reference standard used in these studies was endoscopy with the scoring system being either Simple Endoscopic Score for Crohn’s Disease (SES-CD) [64,71-73], Crohn’s Disease Endoscopic Index of Severity (CDEIS) [67] although in the study by Zittan et al. [73], MR enterography score (MaRIA, magnetic resonance index of activity) was also used in the SB location.
The sensitivity and specificity data were available for 8 studies in total (Table 5). Sensitivities were available in the published literature for just 2 studies [19,67] while in 1 study [73], these were retrospectively provided by the author. For the remaining 5 studies [14,17,66,74,75], the relevant authors provided the raw data on the number of TP, TN, FP and FN, from which the sensitivity and specificity values were retrospectively calculated.
Including data from all the 8 studies, the sensitivity and specificity of FC in the SB ranged from 42.9% to 100% and from 50% to 100% respectively. The sensitivity and specificity of FC in the LB ranged from 66.7% to 100% and from 28.6% to 100% respectively.
DISCUSSION
A variety of clinical studies have indicated a wide range of sensitivities and specificities for FC in CD at different disease locations [14,17-19]. We have undertaken a systematic review to objectively appraise the literature. Overall, the sensitivity and specificity of FC in the SB ranged from 42.9% to 100% and 50% to 100% while those in the LB were from 66.7% to 100% and 28.6% to 100% respectively indicating that FC may be equally useful to measure disease activity in CD at these 2 locations but no firm conclusion can be made from the published literature. The QUADAS -2 tool indicated that the quality of the selected studies was modest.
The data represented here is heterogeneous with varying gold-standards. There are only 5 studies in the published literature with the primary aim of investigating the effect of disease location on the sensitivity and specificity of FC [17,19,72-74]. In the remaining eleven studies, this information was expressed as a sub-analysis. Moreover, apart from the published data, raw data to calculate sensitivity and specificity was only available in 5 small studies. These data did not pertain to all the cohorts published but only relevant to smaller sub-groups [14,17,65,74,75]. One might speculate that LB disease location is within reach of colonoscopy and hence is more commonly validated with a gold-standard investigation. As for SB disease location, unless the disease is in the terminal ileum this might not be as accurately located though the sensitivities and specificities of MRI to measure disease activity is widely published [77]. A possible reason for the effect of disease location on the specificity of FC might be that other common disease of the colon such as diverticulitis, microscopic colitis or infectious enteritides might raise FC other than LB-CD. The same might not be said for SB inflammation in cohort studies undertaken in the Western Hemisphere where CD is the commonest cause for ileal inflammation. Effectively, this systematic analysis highlights the need of properly designed prospective studies to answer this important question.
Despite endoscopy being the gold standard for assessment of disease activity, we also included studies where radiological tests such as CT or MRI were utilized as reference standards to evaluate the SB activity as these have been supported by the ECCO guidelines [8,20]. However, the lack of a uniform gold standard was a limiting factor. This heterogeneity multiplied by the inter-observer variability for the various investigative modalities used, limited the validity of the reported sensitivities and specificities. The limitations of CT and MRI may include decreased sensitivity to detect early disease that may otherwise be detected on endoscopy. Even in those studies that have used endoscopy as the reference standard, various scoring systems such as the SES-CD and the CDEIS scores were utilized. These scoring systems themselves have limitations such as the endoscopic evaluation being confined to the terminal ileum or colon subject to the reach of the colonoscope and inter-observer variability. Capsule endoscopy is a non-invasive way to evaluate the entire SB. However, its disadvantages include lack of utility when there is a SB stricture as well as subjective nature of reporting.
There are certain limitations in the published literature that need to be highlighted. The FC cutoffs used in all the reported studies are different. The cutoff values can influence the test accuracy and there are different cutoff values for FC depending on the intent of use. The current National Institute for Health and Care Excellence (NICE) guideline [78] indicates that an FC value <50 μg/g suggest no significant GI mucosal inflammation, with a value of >250 μg/g corresponds well with endoscopic and histology activity [9,79]. The cutoff values used in the studies presented in this systematic review were not uniform. Most of the studies used cutoff of 100 μg/g with just 3 studies using a cutoff value of 50 μg/g. The diagnostic test used to determine the FC levels were not uniform. Most studies used ELISA test while some used the rapid test (Quantum Blue). Stool collection time was not standardized across the studies described in this systematic review. There was a paucity of detail regarding processing of the stool samples across the studies. These factors could also contribute to differences of FC across the studies.
Our systematic review included both pediatric and adult studies though most of the data was from the adult population and the pediatric population appeared under-represented. The specificity of FC appears to improve with patient age. van Rheenen et al. [80], in their meta-analysis of 13 studies, obtained a pooled sensitivity of 93% and specificity of 96% in adults and 92% and 76% in children respectively. The larger share of irritable bowel disease with absence of alarm symptoms was thought to overestimate the specificity in the adults subjects compared to children. Henderson et al. [54] undertook a metaanalysis of 8 studies and concluded that the sensitivity and specificity of FC in IBD in the pediatric cohort were 97.8% and 68.2%. Factors that could contribute to the difference in specificity of FC in adult versus pediatric populations include the variation in the disease prevalence and spectrum, variation in the FC threshold to trigger endoscopic evaluation, parental expectation and concerns about missed diagnosis [54]. The pediatric cohort in this systematic review was too small to be able to make any firm conclusions.
We observed that most of the studies originated from the Western Hemisphere except for the study from Taiwan [76], perhaps indicating that these findings may not be reflective of the situation in the general population worldwide. It would be difficult to get homogenous world-wide data on the accuracy of FC in SB and LB locations due to differences in incidence and prevalence of IBD across regions [81].
This systematic review has some major strengths. We had undertaken a comprehensive search including important online databases (Medline, Embase, Web of Science, and Cochrane Library). We had no language or publication restrictions. Moreover, relevant conference proceedings were searched since 2005 to ensure no publication bias was introduced within our search. We excluded studies that were merely restricted to SB-CD since we also needed information from the LB in order to compare. We excluded those studies solely describing postoperative cohorts to exclude the effect of non-IBD related anastomotic ulceration on the analysis. Moreover, since the raw figures (i.e., TP, TN, FP and FN in both SB and LB locations) of the selected studies were not provided in the original published manuscripts, electronic communication with relevant study authors was undertaken as part of our data extraction process for this systematic review.
The range of sensitivities and specificities for FC by disease location are variable and incomparable. As the gold standard comparators used in various studies are heterogeneous it has not been possible to pool the data and calculate common variables for FC. Prospective cohort studies with common comparators and similar quantification methodologies for FC are needed to answer this question; in order to better understand the right place for FC as a disease monitoring tool.
Notes
FINANCIAL SUPPORT
The authors received no financial support for the research, authorship, and/or publication of this article.
CONFLICT OF INTEREST
E.G.S. was supported through the NIHR Nottingham Digestive Diseases Biomedical Research Centre, Nottingham University Hospitals NHS Trust and University of Nottingham. S.S. has received consultancy fees from Falk; speaker fees from MSD and financial support for educational activities from Merck Sharp, Dohme Ltd, Abbvie and Ferring. G.W.M has received educational support from Abbvie, Janssen, NAPP, Takeda Pharmaceuticals, Merck Sharp & Dohme Ltd, Ferring and Dr Falk; speaker honoraria from Merck Sharp & Dohme Ltd, Abbvie, Janssen, Ferring and Takeda Pharmaceuticals and is on the Advisory boards for Abbvie, Takeda Pharmaceuticals, Janssen and Dr Falk. R.W., A.A.T., and J.E. have no conflicts of interest to declare.
AUTHOR CONTRIBUTION
Simon EG: conception & design of the study; data acquisition, analysis & interpretation; drafting and revising the article; final approval. Wardle R, Thi AA, Eldridge J, and Samuel S: data interpretation; revising the article; final approval. Moran GW: conception & design of the study; data interpretation; revising the article; final approval. All authors have approved the final version of the manuscript.