The results of an external quality-assessment experiment for serum creatinine measurement are described. Fifty-one laboratories performed quintuplicate analyses during three different analytical runs on six lyophilized sera and two frozen human serum pools. Isotope dilution gas chromatography–mass spectrometry (ID GC-MS) target values were assigned to all the materials. Intralaboratory within- and between-run imprecision results were very similar for all the materials tested (CV ≤2.20% and ≤4.70%, respectively). The overall imprecision obtained was high (CV 6.5–20.0%) because of increased interlaboratory–intermethod variability. A significant positive bias (+9.2–+43.7%) was found for all the materials at lower creatinine concentration. By using two human sera at different concentrations, we could calculate the constant and the proportional calibration bias displayed by each peer group. The majority of the lyophilized materials showed a behavior divergent from the frozen pools, indicating matrix-related problems. We propose a new algorithm for calculating matrix bias correction factor instrument–reagent specific for each material.
The analytical goals for creatinine on the basis of biological variability are very demanding (CV ≤2.2% for precision and ≤2.8% for accuracy (1)). Imprecision is strictly dependent on analyzer characteristics, and can be easily verified. On the contrary, variables affecting inaccuracy (method specificity, type of calibration, calibrator matrix, and value assignment) are more difficult to identify and control. These variables lead to a wide dispersion of results among different laboratories. As a result, the measurement appears far from the desirable performances.
Currently, there is more room for improvement in accuracy than for precision in creatinine determination. Through the use of reference methods and appropriate materials, it is possible to come closer to the “trueness” of the results. The availability of a definitive method is certainly a problem, but for accuracy of routine methods the real difficulty is the material used. The case of creatinine is particularly critical because the lack of commutability (2) is emphasized by the poor specificity and weakness of the majority of the routinely used picrate reaction-based methods (3)(4)(5).
This fact forces almost all the proficiency testing programs to use peer group target values without any means to verify the real accuracy of any single laboratory, and without progress toward improvement of the agreement between the different laboratories.
Here we describe the results of an external quality-assessment scheme (EQAS) from 51 laboratories of Lombardy region (Italy).1 We tried to focus on several aspects related to the accuracy of creatinine measurement. First, with a peculiar experimental design of replicate analyses, we could estimate components of variability. Second, through the use of frozen sera and the ultimate accuracy reference, an isotope dilution gas chromatography–mass spectrometry (ID GC-MS) method, we calculated constant and proportional components of the calibration error. Third, we verified the presence of matrix effects in most lyophilized sera and, with a modification of the algorithm proposed by Ross et al. (6), we calculated a matrix bias correction factor.
Materials and Methods
id gc-ms creatinine method
A Finnigan MAT95 mass spectrometer (Finnigan MAT, Bremen, Germany) was used for GC-MS analysis. HPLC purification was carried out with a Jasco HPLC pump (model PU880, Tokyo, Japan) and with a variable-wavelength ultraviolet detector (model 875-UV, Jasco). The HPLC column was a Lichrosphere 100 RP-18 (250 × 4 mm, 5-μm particles) from Merck (Darmstadt, Germany).
Creatinine [Standard Reference Material (SRM) 914a, 99.8% purity] and lyophilized reference sera SRM 909a1 (certified value = 84 ± 1 μmol/L) and 909a2 (463 ± 6 μmol/L) were from NIST (October 13, 1993; revision of certificate dated February 24, 1993). Sera were reconstituted according to the NIST insert. [2H3]Creatinine (98 atom % excess) was from Isotec (Miamisburg, OH). N-methyl- N-(tert-butyldimethylsilyl)-trifluoroacetamide (MTBSTFA) was purchased from Fluka (Buchs, Switzerland). All solvents and general chemicals used were of analytical grade.
All solutions and sera were dispensed with known accuracy and imprecision as already reported (7). Calibrators were prepared by mixing various amounts of SRM 914a creatinine with [2H3]creatinine to provide a series of mixtures with known ratios of the two isotopomers between 0.8–1.2.
Weighed amounts of each serum were supplemented with a weighed aliquot of the [2H3]creatinine solution to get about a 1:1 ratio of [1H]creatinine:[2H]creatinine. After stirring, supplemented sera were kept at room temperature for 2 h to allow equilibration before protein precipitation obtained with acetone. The aqueous phase was separated and evaporated to dryness under reduced pressure. An isocratic separation of creatine from creatinine was achieved by HPLC with H2O containing 0.1% HCOOH (pH 5.5–5.7 with NH4OH) as mobile phase at 1 mL/min flow rate. Creatinine was monitored at 235 nm and the collected fraction was dried under vacuum at 40 °C. Creatinine was converted into its tert-butyldimethylsilyl derivative with 70 μL of CH3CN:MTBSTFA (2:1 by vol) at 70 °C for 30 min. Gas chromatographic separation was achieved with a 30-m SPB-35 column (Supelchem, Milan, Italy). The injector temperature was at 250 °C, the initial GC oven temperature was set at 170 °C for 1 min and subsequently increased to 180 °C at 2.5 °C/min, and to 270 °C at 30 °C/min. Injections of samples were alternated with duplicate analysis of calibrators having 1H:2H ratios of 0.8, 1.0, 1.2, 1.0, 0.8, etc. The isotopic ratio was determined by monitoring ions at m/z 298 and 301 for unlabeled and labeled creatinine, respectively.
Concentration of serum creatinine (μmol/L) was then computed from the measured isotopic ratio on the basis of the weight of each serum aliquot, the density, and the internal calibrator added, as already described (7).
Eight different materials were sent to 51 clinical laboratories of the Lombardy region: two fresh-frozen human serum pools (CON1 and CON2) and six lyophilized materials (LYO1–LYO6). The frozen pools were delivered in solid CO2, stored at −20 °C, thawed on the day of analysis, and analyzed within 1 h. Lyophilized sera were stored at 4 °C and reconstituted 1 h before analysis. In each material, creatinine was measured in quintuplicate in three consecutive days with the automated analyzers routinely used (15 results per laboratory, per control material). The study participants were asked to classify their analytical method according to the chemical principle, the instrumentation, the source of reagents, and the type of calibrator. According to this classification we identified three homogeneous groups [Boehringer–Hitachi, Johnson & Johnson (J&J), Beckman] and two miscellaneous groups.
CON1 and CON2 were prepared from sera obtained with Serum Separator Tubes (SST Vacutainer; Becton Dickinson, Milan, Italy). Concentration was adjusted by adding appropriate amounts of creatinine (SRM 914a). LYO1–LYO6 were lyophilized commercial materials: LYO1 (Roche N, lot no. A 1136); LYO2 (Roche A, lot no. S 1135 2); LYO3 (Boehringer, Precinorm U, lot no. 177111 61); LYO4 (Boehringer, Precipath A, lot no. 177481 71); LYO5 (Bio-Rad, Lyphochek 1, lot no. 15011); and LYO6 (Bio-Rad, Lyphochek 2, lot no. 15012).
Analytical instruments used in this experiment were: Boehringer Hitachi analyzers 704 (2), 717 (7), 747 (6), 911 (3), (Boehringer Mannheim, Milan, Italy); Beckman CX7 (4), CX3 (1), CX5 (1) (Beckman Analytical, Cassina de Pecchi, Italy); Dax 24 (1) (Bayer, Cavenago, Italy); Olympus AU 5000 (3), Au 510 (1) (Kontron Instruments, Milan, Italy); Shimadzu CL 7000 (1), 7200 (1) (Shimadzu Italia, Milan, Italy); IL 900 (4), ILAB 1800 (2), Monarch (1) and Phoenix (1) (Instrumentation Laboratory, Milan, Italy); Ektachem analyzers 700 XR (8), 500 (3), 250 (1), (J&J, Cinisello Balsamo, Italy).
EQAS data were collected via an ad hoc computer program compiled in CA-Clipper Version 5.2 (Computer Associates, Milan, Italy) and distributed on floppy disk together with the samples. Data were automatically transferred in a Lotus 1-2-3 spreadsheet (release 3.1; Lotus Italia, Milan, Italy).
The mean of each analytical run, the laboratory mean (mean of three analytical runs), the group mean, and the grand mean (mean of laboratory means) were calculated. SD and within-run CV (CVw), between-run CV (CVb, containing only the across-day component of variability), between-laboratories CV (CVinter), and overall CV (CVovr) were calculated with analysis of variance performed on a Lotus 1-2-3 spreadsheet.
Calibration bias line.
For each peer group we calculated the equation of the line defined by the two frozen pools (CON1 and CON2): where y is peer group mean and x is the IDMS value; fixed constant calibration bias (ap) and fixed proportional calibration bias (bp − 1) of each peer group were obtained from the parameters of the line.
Statistical verification of matrix effect occurrence.
Each laboratory mean, obtained for every lyophilized material, was corrected for the calibration bias of the laboratory itself according to the following formula: where YiL is the mean of lyophilized sera L of the laboratory i and ai, bi are parameters of the laboratory calibration bias line.
The statistical significance of the difference between corrected results, grouped according to the peer groups, and ID GC-MS value of each material was calculated (Student’s t-test). A statistically significant difference indicates the presence of a matrix bias (i.e., noncommutability of the material).
Matrix bias correction factor.
A factor to correct bias introduced by the matrix of the lyophilized control materials has been obtained by modifying the formula proposed by Ross et al. (6) to take into account the problem of the constant component of the calibration bias, very common in creatinine measurement with routine methods. The algorithm proposed by Ross et al. (6) for the calculation of the matrix bias correction factor of lyophilized sera is: where Yp F is the peer group mean of fresh frozen human pool.
Yp L is the peer group mean of lyophilized sera, CF is the GC-IDMS value of fresh frozen human pool, and CL is the GC-IDMS value of lyophilized sera.
Modified algorithm: where bp and ap are the parameters of the calibration bias line of a peer group method p.
The reliability of our ID GC-MS method is demonstrated by the results obtained on NIST reference materials SRM 909a1: 84.3 μmol/L, CV 1.05% and 909a2: 470.0 μmol/L, CV 0.41%.
An overview of all results obtained on the six lyophilized materials and the two frozen pools is shown in Table 1⇓ . We report ID GC-MS target values and overall and peer group means. Results obtained by the clinical laboratories (including overall means and ANOVA) are also summarized. In some cases, such as LYO5, very large discrepancies among method means are evident.
The results obtained on the two frozen human serum pools were used to calculate constant calibration bias and proportional calibration bias of three homogeneous groups of analytical systems (we considered homogeneous the groups constituted by instruments, reagents, and calibrators from the same manufacturer). Table 2⇓ shows the biases from the ID GC-MS values and the parameters of the lines obtained.
The results of the statistical verification of the occurrence of matrix effect are presented in Table 3⇓ . Only three of 18 material/analytical system combinations exhibit commutable behavior.
By using the formula illustrated in Materials and Methods, one can calculate a “matrix bias correction factor” taking into account the different components of the calibration bias. Table 4⇓ shows the matrix bias correction factor of each lyophilized material for the different method groups. By multiplying the peer group means by these factors, one can remove the component of intermethod variability due to matrix effects from the results obtained on lyophilized materials. Fig. 1⇓ shows the peer group means obtained, for each material, before and after results modification according to the matrix bias correction factors. In Fig. 2⇓ comparability of data achievable on fresh frozen sera and lyophilized sera after correction is shown (J&J method group).
Effects of the application of matrix bias correction factor.
Each bar represents the mean percent bias of every peer group mean from ID GC-MS value, before (a) and after (b) the application of the matrix bias correction factor. Plot (b) is representative of the calibration bias. Arrows indicate the frozen pools.
Bias/concentration profile of J&J peer group.
Mean values before (open symbols) and after (closed symbols) correction are compared with mean values of fresh frozen sera (x-axis). (♦, ⋄, LYO5; ▪, □, LYO1; ▾, ▿, LYO3; ✚, ✙, LYO4; ▴, ▵, LYO2; ⬡, ⬡, LYO6; ×, CON1 and CON2).
Results of creatinine proficiency testing (μmol/L).
Data of calibration bias lines for three homogeneous groups of analytical systems.
Results (μmol/L) obtained on lyophilized materials corrected for calibration bias and significance of difference (μmol/L) between corrected values and definitive method values.
Matrix bias correction factors of each material for the different method groups.
We developed an ID GC-MS method similar to the one proposed by Stöckl and Reinauer (8) that combines a sufficient practicability with good precision (CVs from 0.41% to 1.42%) (Table 1⇑ ) and accuracy (bias of 0.36% and 1.51% from NIST target values on 909a1 and 909a2, respectively).
Our experiments (see Table 1⇑ and Fig. 1a⇑ ) emphasize that: (a) there are very discordant percent biases from the ID GC-MS target values—very high for control materials with lower creatinine concentrations, very small for sera with higher concentrations; (b) the major component of variability is the between-laboratories variability that is always very high; (c) the intralaboratory variability can be considered acceptable but, especially at the lower concentrations, it is far from the analytical goal calculated on the basis of biological intraindividual variability (1); and (d) the frozen pools (CON1 and CON2), although showing very similar intralaboratory variability, exhibit a lower interlaboratory imprecision. The results are comparable (in terms of imprecision and inaccuracy) with those obtained in a previous experiment (9). Large differences among the method means were present (Table 1⇑ ). In particular, LYO5 shows bias of >50% between enzymatic and picrate methods, suggesting the presence of some noncreatinine substance reacting with picrate. Unfortunately, 16 laboratories were working with miscellaneous conditions [calibrators and (or) reagents from manufacturers different from those of the instrumentation] or with unique systems, and it was not possible to classify and treat those data. Also, the enzymatic group is not homogeneous, with one laboratory using the UV creatinine reaction and another using the Trinder coupled reaction. For these reasons we performed further calculations only for the three homogenous groups of analytical systems: J&J analyzers, Beckman CX family, and Boehringer–Hitachi family.
Assuming the frozen pools as not affected by any matrix effect, we used them to calculate calibration bias (e.g., method bias observed relative to ID GC-MS method) according to Ross et al. (6). Percent biases obtained on CON1 were completely different from those on CON2, thus suggesting the occurrence of a significant constant calibration bias (Tables 1⇑ and 2⇑ ). We decided to take into account constant calibration bias by calculating the equation of the line defined by the two pools. The data of slope and intercept (Table 2⇑ ) clearly individuate a different behavior of the three peer groups. Note the similarity of the parameters of our regression line for Hitachi systems with the equation of the correlation between an HPLC reference method and the Hitachi 911 results presented by Blijenberg et al. (5). Clearly the methods based on the Jaffe reaction are affected by an important positive constant calibration bias (Table 2⇑ ) caused probably by an aspecific signal. This is particularly evident at low creatinine concentrations or with some type of artificial material such as LYO5. This positive bias, in the case of the Boehringer–Hitachi group, can be almost completely attributed to the picrate reactivity with proteins. The reading window of the Boehringer method is quite long (∼90 s), with a prolonged delay from the starter addition (∼90 s). This favors the interference from slow-reacting interferents such as proteins (10). In fact, an extensively dialyzed albumin solution (50 g/L) gives (on an Hitachi 747) an apparent creatinine value of 21 ± 0.9 μmol/L. The apparent accuracy displayed for samples with intermediate concentration is due to a concomitant negative proportional bias. Better performances were obtained with enzymatic methods, both for dry and wet chemistry (Table 1⇑ ). In particular, laboratories using wet chemistry enzymatic methods provided very promising results. This finding is in agreement with Blijenberg et al. (3)(4), but the very limited number of participants using these methods (two) does not allow any generalization.
Table 3⇑ shows clearly that lyophilized sera behave differently from the frozen pools. Only in three of 18 material/method combinations was the difference between the two types of materials not significant. These results imply that the use of target values on these types of materials is useless and can lead to faulty considerations. The bias introduced by the matrix is typical for a defined analytical system. Fig. 1a⇑ shows how different this effect is for the various materials and analytical systems. With the application of the algorithm proposed, it is possible to calculate factors (shown in Table 4⇑ ) that are able to correct for the error introduced by the matrix. Fig. 1b⇑ , in which the matrix effect is corrected, shows almost identical behavior for the different materials with similar creatinine content, whether frozen or lyophilized. Indeed the bias/concentration profile of results obtained on lyophilized sera after correction closely resembles behavior of fresh frozen sera (Fig. 2⇑ ). The proposed algorithm has a more general applicability than the previous one (6) and can give reliable results even when a constant calibration bias is present.
All matrix bias correction factors were calculated with the peer group means, but we tried also to calculate the factors by using single laboratory data of the same peer group. The results obtained showed a noteworthy concordance among laboratories of the same group. The variability of the obtained factors, measured as CV, ranged between 0.80% and 3.75% according to the material and the group of methods. This homogeneity of data allows us to hypothesize the possibility of the use of a relatively small number of pilot laboratories to calculate the matrix bias correction factor for a defined lot of control material to be used in an EQAS.
The major problem of EQAS, when artificially manipulated control materials are involved, is the bias introduced by the materials themselves for the different types of methods. This fact forces the use of peer group means, but without any guarantee, apart from the producer declaration, of the real accuracy of the analytical system. However, it is not possible to verify whether the difference among the various analytical systems are caused by the characteristics of the material only or by real accuracy problems with a risk “of an implicit endorsement of methodologies that fail to satisfy fundamental accuracy goals” (11). Obviously the more straightforward approach to this problem should be the use of fully commutable material such as fresh or frozen sera, but the costs of distributing this type of material prevent its use, at least on a regular basis. The matrix-adjusted target values can be an acceptable compromise that allows the utilization of the lyophilized sera provided that two important limitations are adequately considered: (a) the matrix bias correction factor can be calculated only for well-defined analytical systems; (b) the serum pools used in generating the algebraic correction are the same as normal fresh serum specimens. The last one can be an important drawback; the probability that a minimally manipulated serum pool could exhibit a noncommutable behavior is low, but a check of the commutability, e.g., according to the College of American Pathologists’ protocol (12), is advisable. Moreover, this approach is not intended to substitute the direct comparison with a Reference Method on fresh sera (13), but only to minimize the matrix effect, thus allowing the use of Reference Method target values for lyophilized materials.
This work was supported by grant no. 869, funded by Lombardy Region, Italy. The assays performed in this study were made possible through the efforts of: L. Guerrini, M. Cavalleri, C. Petrini, A. Bianchi Bosisio, E. Solbiati, P. Cueroni, A. Nespolo, E. Scarazzatti, R. Vigoni, S. Bossini, A. Petrella, P. Maestrini, F. Tirelli, A. Cespa, L. Ferrari, G. Casiraghi, C. Okely, R. Antinozzi, F. Mariani, M.L. Carati, A. Ferrari, T. Baratto, G. De Leo, C. Ottomano, E. Guagnellini, G. Gallina, G. Giocoli, M. Musmeci, V. Malacrida, M. Iannone, A. Marocchi, M. Panteghini, M. Mancosu, A. Marelli, R. Trotti, F. Aguzzi, P. Mocarelli, A. Pagano, P.A. Bonini, and M. Murone.
↵1 Nonstandard abbreviations: EQAS, external quality-assessment scheme; ID GC-MS, isotope dilution gas chromatography–mass spectrometry; SRM, Standard Reference Material; and MTBSTFA, N-methyl-N-(tert-butyldimethylsilyl)-trifluoroacetamide.
1P <0.05; NS, not significant.
- © 1997 The American Association for Clinical Chemistry
Gene-level analysis of ImmunoChip or genome-wide association studies (GWAS) data has not been previously reported for systemic sclerosis (SSc, scleroderma). The objective of this study was to analyze genetic susceptibility loci in SSc at the gene level and to determine if the detected associations were shared in African-American and White populations, using data from ImmunoChip and GWAS genotyping studies. The White sample included 1833 cases and 3466 controls (956 cases and 2741 controls from the US and 877 cases and 725 controls from Spain) and the African American sample, 291 cases and 260 controls. In both Whites and African Americans, we performed a gene-level analysis that integrates association statistics in a gene possibly harboring multiple SNPs with weak effect on disease risk, using Versatile Gene-based Association Study (VEGAS) software. The SNP-level analysis was performed using PLINK v.1.07. We identified 4 novel candidate genes (STAT1, FCGR2C, NIPSNAP3B, and SCT) significantly associated and 4 genes (SERBP1, PINX1, TMEM175 and EXOC2) suggestively associated with SSc in the gene level analysis in White patients. As an exploratory analysis we compared the results on Whites with those from African Americans. Of previously established susceptibility genes identified in Whites, only TNFAIP3 was significant at the nominal level (p = 6.13x10-3) in African Americans in the gene-level analysis of the ImmunoChip data. Among the top suggestive novel genes identified in Whites based on the ImmunoChip data, FCGR2C and PINX1 were only nominally significant in African Americans (p = 0.016 and p = 0.028, respectively), while among the top novel genes identified in the gene-level analysis in African Americans, UNC5C (p = 5.57x10-4) and CLEC16A (p = 0.0463) were also nominally significant in Whites. We also present the gene-level analysis of SSc clinical and autoantibody phenotypes among Whites. Our findings need to be validated by independent studies, particularly due to the limited sample size of African Americans.
Citation: Gorlova OY, Li Y, Gorlov I, Ying J, Chen WV, Assassi S, et al. (2018) Gene-level association analysis of systemic sclerosis: A comparison of African-Americans and White populations. PLoS ONE 13(1): e0189498. https://doi.org/10.1371/journal.pone.0189498
Editor: Masataka Kuwana, Keio University, JAPAN
Received: August 31, 2017; Accepted: November 27, 2017; Published: January 2, 2018
Copyright: © 2018 Gorlova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Genetic data is available from dbGaP repository (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000357.v1.p1). Additional data contains potentially identifying participant information and is restricted by the Ethics Committee of Instituto de Parasitología y Biomedicina. Interested, qualified researchers may request the data by contacting Comite de Etica del CSIC at firstname.lastname@example.org. All other relevant data are within the paper and its Supporting Information files.
Funding: Funding was provided to MDM by the National Institutes of Health (NIH) the National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS https://www.niams.nih.gov/) Centers of Research Translation (CORT) P50-AR054144, NIH grant N01-AR-02251 and R01-AR-055258, and the Department of Defense (DD) Congressionally Directed Medical Research Program (http://cdmrp.army.mil/) W81XWH-07-1-011 and WX81XWH-13-1-0452 for the collection, analysis and interpretation of the data. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Systemic sclerosis (SSc, scleroderma) [MIM 181750] is an autoimmune disease characterized by three key features: (1) fibrosis of skin and internal organs, (2) a vasculopathy, and (3) autoantibody production. It is a multiorgan system disease with considerable phenotypic heterogeneity, resulting in a broad spectrum of disease severity. Several genome-wide, ImmunoChip, and follow-up association studies were conducted to identify SNPs associated with SSc risk [1–8]. All published studies implemented SNP-level analysis meaning that each SNP was analyzed separately and those with genome wide level of statistical significance were deemed risk associated. SNP-level analysis is effective for identification of SNPs with strong individual effects, however, it is underpowered to detect genes carrying multiple SNPs in the same gene of small or medium effect size [9–11]. In the latter case, gene-level analysis can be beneficial because it will detect genes with multiple small effect size SNPs as significant even if these genes do not harbor any individual SNPs significant at the genome-wide level. However, a gene-level analysis has never been applied to SSc.
In this study we performed a gene-level analysis focusing on the data generated by the ImmunoChip platform. We compared results from the gene-level analysis with the results generated by traditional SNP-level analysis. We also performed a gene-level analysis of ImmunoChip and genome-wide association study (GWAS) data on of African-American SSc patients. Although based on a relatively small group of patients, this study represents the first report of genetic analysis of African Americans with SSc. The results of the gene-level analysis of SSc clinical phenotypes (limited SSc (lcSSc) and diffuse SSc (dcSSc)) as well as autoantibody subsets (anti-centromere autoantibodies (ACA) and anti-DNA topoisomerase I (ATA) autoantibodies) among Whites are also presented.
Materials and methods
The study has been approved by the Institutional Review Boards of the participating US institutions, namely Boston University; Georgetown University; Medical University of South Carolina; University of Alabama, Birmingham; University of California Los Angeles; University of Michigan; University of Minnesota; University of Washington; University of Pittsburgh; University of Texas Health Science Center, Houston, University of Texas MD Anderson Cancer Center, Houston, Geisel School of Medicine, Dartmouth College; and ethics committees of the participating foreign institutions, namely Institute of Parasitology and Biomedicine López-Neyra, IPBLN-CSIC, Granada, Spain; University of Florence, Florence, Italy; Hospital Universitario, Madrid, Spain; Valle de Hebrón Hospital, Barcelona, Spain; Hospital de la Santa Creu i Sant Pau, Barcelona, Spain; Hospital Universitario y Politécnico La Fe, Valencia, Spain; University Hospital San Cecilio, Granada, Spain. All clinical investigation has been conducted according to the principles expressed in the Declaration of Helsinki. Written informed consent has been obtained from the participants.
Race of the study participants was self-reported, and we used principal component analyses to remove race outliers as described below. Details on the White population, genotyping and quality control can be found in our previously published manuscript . The White sample included 956 cases and 2741 controls from the US and 877 cases and 725 controls from Spain, after exclusion of individuals based on quality control (QC) (low call rates, non-European ancestry, or relatedness). There were 1087 (59%) White patients with lcSSc, 574 (31%) with dcSSc, 671 (37%) ACA-positive (ACA+), and 347 (19%) ATA+ patients (not all patients could be classified into two distinct phenotypes or had either ACA or ATA antibodies).
The African American sample, after quality control measures, included 291 cases (56 men and 235 women) and 260 controls (72 men and 188 women). In line with , the distribution of clinical phenotypes and autoantibody subsets in the African American patient population was markedly different from that in Whites. There were 82 (28%) patients with lcSSc, 201 (69%) with dcSSc, 21 (7%) ACA+, and 69 (24%) ATA+ among the African American cases (as with Whites, not all patients could be classified into two distinct phenotypes or had either ACA or ATA antibodies two patients were not tested for ACA and four for ATA).
ImmunoChip analysis: Genotyping was done by Illumina Infinium single-nucleotide polymorphism (SNP) microarray–ImmunoChip. Genotype calling was done using the Illumina iScan System and the Genotyping Module (v.1.8.4) of the GenomeStudio Data Analysis software. We applied the following criteria for QC: (1) individuals with call rate <90% were excluded, (2) markers with call rates ≤ 90% were excluded, and (3) markers with allele distributions deviating from Hardy-Weinberg equilibrium (HWE) in controls (p < 1× 10−5) were also excluded. A total 126,270 markers (101,692 of them with a MAF > 0.1%) passed QC and were included in the analysis.
The same ImmunoChip platform was used for White and African American populations. The genotyping rate was 0.988 in the African American sample while the genotyping rate was 0.998 among Whites.
GWAS analysis: We also performed a genome-wide genotyping of both African Americans and Whites. African Americans were genotyped on the Illumina Omni2.5 BeadChip that features ~2.5 million markers capturing variants down to MAF 2.5% and covers, in particular, African genetic diversity. The same exact individuals that were successfully genotyped on ImmunoChip were successfully genotyped on this platform. The genotyping rate after QC was 0.993 in African American population. The quality control for African Americans also included principal component analysis as implemented in SNP & Variation Suite v.7 (Golden Helix). The first three principal components were derived for each individual from the African American sample along with HapMap Phase 2 samples as reference populations. Individuals deviating for more than 6 SDs from the African ancestry cluster centroid were discarded from further analysis. We also excluded individuals deviating more than 4 standard deviations from the cluster centroid. Finally, we excluded duplicate and closely related samples (PIHAT ≥ 0.5).
The genome wide genotyping of the White populations has been described previously in Radstake et al (2010) . In brief, Hap550K-BeadChip was used for US Whites and Illumina HumanCNV370K BeadChip in Spanish Whites.
For the SNP-level analysis the association statistics was computed via logistic regression including sex as a covariate for each dataset. For the White samples, meta-analysis combining odds ratios (OR) and standard errors (SE) of individual datasets (US and Spanish, so that the controls in each set were from the same country as cases) was performed by means of the inverse-variance method under the assumption of a fixed effect as implemented in PLINK v.1.07. .
For the gene-level analysis we used Versatile Gene-based Association Study (VEGAS) . We used VEGAS because it outperforms similar methods by sensitivity and specificity from simulation studies . VEGAS can be applied to the data generated by any GWAS designs, including family-based GWAS, meta-analyses of GWAS and DNA-pooling-based GWAS. The test uses information from the complete set of markers within a gene. To account for linkage disequilibrium between markers VEGAS uses simulations from the multivariate normal distribution. VEGAS assigns SNPs to autosomal genes according to positions on the UCSC Genome Browser hg18 assembly. In order to capture regulatory regions and SNPs in LD, the gene boundaries are defined as ±50 kb of 5’ and 3’ UTRs. VEGAS assigned SNPs genotyped by ImmunoChip to 11,501 genes. Assuming independence of the gene level tests, the threshold for statistical significance in the analysis of ImmunoChip in Whites was set to be 4.35x10-6, and at 2.8x10-6 for the analysis of GWAS data . However, since this threshold is likely to be conservative given the overlap between genes, we report findings with p-values <10−5. Also, since the sample size for African Americans was limited, for this population we present findings with the p-value below 10−3, acknowledging that this is a study limitation and that the analysis is exploratory and is in need for further validation. We use the term “nominal significance” to denote p-values in the range of 0.05 to 10−3, interpreting them as weak evidence of association. We excluded HLA region from the analysis because it is universally significant.
We used PathwayStudio  to build a pathway of known and novel SSc risk-associated genes. The PathwayStudio uses text mining to identify reported interactions between genes and build a network based on the known interactions.
Table 1 shows results from the gene-level and SNP-level analyses for 19 non-HLA genes previously shown to be associated with SSc in Whites [1–8, 17, 18]. Out of 19 known SSc genes, all except for SCHIP1, IRF8, and CD247 were nominally significant in the gene-level analysis in Whites. IRF5, STAT4, and TNPO3 were significant in both the SNP- and gene-level analyses among Whites.
In the gene-level analysis of clinical phenotypes (S1 and S2 Tables) and antibody subsets (S3 and S4 Tables), STAT4 was significant for lcSSc and ACA+ patients and TNPO3 in ATA+ patients. Of the 19 genes examined, only TNFAIP3 was nominally significant in African Americans in the gene-level analysis.
Table 2 shows non-HLA genes with the p-values below 10−5 in the gene-level analysis for Whites (excluding those already established in Whites), with the addition of p-values for these genes in African Americans. The genes with the p-values below 4.35x10-6 are shown in bold.
Table 2. Comparison of novel candidate genes (at p<4.35x10-6 indicated in bold) and suggestive genes (4.35x10-6<p<10−5) detected in Whites to the corresponding statistics in African Americans in the gene level analysis, based on ImmunoChip.
One gene out of four significant at this level in Whites, namely FCGR2C, was also nominally significant in African Americans. PINX1, which was only borderline significant in Whites, also showed a nominal significance in African Americans. Additionally, nominally significant SNPs where observed in STAT1 and SCT genes and in the borderline significant EXOC2 in the analysis of African Americans (Table 2), even though these genes did not reach significance in the gene-level analysis in that population.
The top genes identified for clinical phenotypes and autoantibody subsets are shown in S1–S4 Tables, in the left portion for the gene-level analyses based on ImmunoChip. FCGR2C, STAT1, and FCGR3B were significant in lcSSc, although FCGR2C and FCGR3B shared the most significant SNP rs455499 and the gene-level p-value for FCGR2C was more significant (S1 Table). In dcSSc, three genes (IL34, ABBA-1, and VAC14) were identified as significant, although they shared the most significant SNP rs11640251 and IL34 had the best p-value in the gene-level analysis (S2 Table). In ACA+ patients, in addition to STAT4, six genes (FCGR2C, SRCAP, PHKG2, LOC90835, RNF40, and FCGR3B) reached significance in the ImmunoChip-based gene-level analysis, but the middle four genes shared the most significant SNP rs7188927 and SRCAP showed the best gene-level p-value (S3 Table), and FCGR2C and FCGR3B also shared the most significant SNP rs455499. Of these two genes, FCGR2C showed a more significant gene-level p-value like in the case of the lcSSc phenotype. Among ATA+ patients, in addition to TNPO3, C16orf68, P2RX1, C3orf25, IFT122, and MBD4 reached significance in the gene-level analysis, with the last three genes sharing the same most significant SNP rs2307293 (S4 Table).
In the reverse approach we selected top non-HLA genes most significant in African Americans (p<10−3) based on the gene level analysis (Table 3; 13 genes but only 9 independent regions, due to the gene overlap).
Of these genes, only UNC5C (p = 5.57x10-4) and CLEC16A (p = 0.0463) were nominally significant in the gene-level analysis in Whites, and both these genes and PHF19 harbored a nominally significant SNP in Whites (Table 3).
Among the 63 top genes (27 independent regions) selected based on SNP level analysis in African Americans (S5 Table; genes with best SNP p-value<10−3 in African Americans), 16 genes (12 independent regions) were also nominally significant in Whites at the gene level and 42 genes harbored at least nominally significant SNPs in Whites (25 different SNPs due to assignment of some SNPs to several genes at once). Since under the null hypothesis the expected number of nominally significant SNPs in Whites is ~1 (27x0.05 = 1.3), the results suggest some overlap in genetic susceptibility loci for SSc between Whites and African Americans.
GWAS data analysis
We performed similar analyses also based on the GWAS genotyping which was, however, performed on different platforms in Whites and African Americans, and this made the results less comparable. The results are presented in S6–S11 Tables. In brief, among the genes previously identified in Whites, in addition to TNFAIP3, ATG5 showed a nominal significance in the gene-level analysis in African Americans, potentially due to a denser coverage of this gene (81 vs 62 SNPs) on the 2.5 M Omni platform than on ImmunoChip (S6 Table).
Beyond the genes previously established as associated with SSc in Whites, only one gene, TMEM175, showed a borderline significant association in the gene-level analysis in Whites (p = 3.0x10-6) in GWAS. Its most significant SNP rs2290405 was shared with two other genes (SLC26A1 and DGKQ;S7 Table). The analysis of the corresponding genes in the GWAS data in African Americans did not detect gene-level significance for these genes but TMEM175 harbored a nominally significant SNP rs11946340 unlike the other two neighboring genes.
In the analysis of clinical phenotypes and autoantibody subsets, out of already established SSc susceptibility genes, IRF5, TNPO3, and IRF8 were significant in lcSSc patients. There were also seven newly identified genes, of which four (DGKQ, IDUA, TMEM175, and SLC26A1) shared the same most significant SNP rs11724804; of these, DGKQ showed the best gene-level p-value. IRF4, CCDC104, and TLR10 were also significant. Notably all these genes were at least nominally significant in the ImmunoChip-based gene-level analysis, except for CCDC104 which is not on ImmunoChip (S1 Table). In dcSSc, in addition to IRF5 and TNPO3, CPSF4 and ATP5J2 showed significant p-values in the gene-level analysis; they shared the same most significant SNP rs10235235, and CPSF4 was more significant in the gene-level analysis. Except for IRF5 and TNPO3, no gene reached statistical significance in the ACA+ subset in the GWAS-based gene-level analysis. In the ATA+ subset, TLR10 and TLR1, sharing the same most significant SNP rs10024216, were significant (both reached only nominal significance in the ImmunoChip-based gene-level analysis) (S4 Table).
In the reverse analysis, considering the top genes identified in the gene-level analysis in African Americans (12 non-HLA genes but 11 regions due to the gene overlap, S8 Table), none of the corresponding genes was even nominally significant in Whites but 6 genes harbored nominally significant SNPs. Among the genes identified in the GWAS SNP-level analysis on African Americans as harboring most significant SNPs (p<10−3; 488 such genes but only 308 independent regions because of the gene overlap), 311 genes (255 independent regions) also harbored at least nominally significant SNPs in Whites. Eight genes (SLC2A13, NRG3, SLC10A7, MKL1, DZIP1L, C8orf58, KIAA1967, and HDAC1; seven independent regions, C8orf58 and KIAA1967 representing the same region) were nominally significant in both Whites and African Americans in the gene-level analysis. The results are presented in S9 Table (a). Five SNPs—rs2994241 in C10orf27/ADAMTS14, rs6025407 in BMP7, rs6796265 in OSBPL10, rs7734699 in MRPS27, and rs6075784 in STK35 –were nominally significant in both Whites and African Americans. These five SNPs are marked in green in S9 Table (a), and their risk effects are shown in S9 Table (b).
We also catalogued genes identified in African Americans either in GWAS or ImmunoChip gene-level analysis (S10 Table), or by the top SNP p-value (with p<10−3) (S11 Table). In the few cases where the same SNP was top in both GWAS and ImmunoChip analyses, a slight p-value variation is explained by the QC procedures that eliminated different number of individuals from the ImmunoChip versus GWAS analysis.
Genes for SSc and other autoimmune diseases are enriched by the immune response genes [19–21]. One can expect, therefore, that SSc genes will be often involved in direct interactions. We used PathwayStudio to build an interaction network of known as well as 6 novel candidate genes (both significant and suggestive) (Table 3) identified by the gene-level analysis. Such networks may be useful by providing guidance to explore biological mechanisms underlying SSc risk. We found that two suggestive candidates, namely EXOC2 and PINX1, interact with known genes associated with risk of SSc. The EXOC2 protein has been shown to bind LST1 , and PINX1 and STAT1 show protein/protein interaction  (S1 Fig).
We identified 4 novel candidate genes (STAT1, FCGR2C, NIPSNAP3B, and SCT) significantly associated and 4 genes (SERBP1, PINX1, TMEM175 and EXOC2) suggestively associated with SSc in a gene level analysis in Whites. Some of these genes have been shown to be directly involved in immune response. For example, FCGR2C encodes a member of low-affinity immunoglobulin gamma Fc receptors. FCGR2C is found on the surface of many immune response cells. The gene encodes a transmembrane glycoprotein involved in phagocytosis and clearing of immune complexes. A suggestive novel gene, SERBP1, encodes a B-cell antigen, shown to predict anti-tumor immune response . Another suggestive gene, EXOC2, is associated with innate immunity and has been shown to play a role in susceptibility to Crohn’s disease . We note that the suggestive signal at EXOC2 overlaps with the signal at IRF4 previously described in the cross-disease meta-GWAS of SSc and rheumatoid arthritis , which points at the importance of this region in autoimmune conditions. This gene harbored a SNP rs908026 with a relatively strong statistical evidence for risk association (P = 2.8x10-5). Risk-associated SNPs were observed in other gene-level candidates as well. For example, rs11893432 (STAT1 gene; p = 4.01x10-11) and rs4554699 (FCGR2C gene; p = 2.7x10-8) were significant at the GWAS level. The most significant SNPs in other novel candidates were: rs2290405 in TMEM175 (p = 1.82x10-6), rs17152571 in PINX1 gene (P = 1.5x10-5), rs3790569 in SERBP1 gene (P = 5.6x10-5), rs4963128 in SCT gene (P = 6.7x10-5), and rs3780540 for NIPSNAP3B gene (P = 7.8x10-5). We admit that a further revalidation in an independent study of SSc in Whites is necessary.
Out of the 19 genes that were previously identified as harboring SSc susceptibility SNPs in Whites, only TNFAIP3 was nominally significant in the gene-level analysis in African Americans. Previously, we showed that SNPs of TNFAIP3 had a strong association with expression of matrix metalloproteinase 1 and 3 in fibroblasts of ethnically diverse patients in response to silica particle stimulation .
Several factors could have contributed to the absence of an association in African Americans for genes found in Whites. First, the distribution of clinical phenotypes is markedly different in the two populations, with a considerably higher proportion of the diffuse phenotype among African Americans (69%) as compared to Whites (31%). Our previous publication  shows differences in the genetic architecture of SSc clinical phenotypes. Thus clinical phenotype-specific analyses by ethnic group would be most meaningful, because they would allow for more accurate racial comparisons. Unfortunately, the limited number of African American participants precluded the phenotype or autoantibody subset analyses in the current study.
Second, the power of the analysis in the African Americans was limited because of the sample size. Moreover, the power of the analysis depends not solely on the sample size but also on the risk allele frequency. S9 Table (b) exemplifies that there is a considerable variation in the allele frequencies between the two populations, which could have contributed to the inter-ethnic differences. Third, even if the effects of causal SNPs are similar across ethnicities, GWAS-identified tagging SNP alleles can be in the opposite linkage phases in two given ethnic groups. This will result in the opposite effects of the tagging SNPs identified as significant in both African Americans and Whites, and the data in S9 Table (b) suggest exactly that: some SNPs identified as nominally significant have very similar frequencies but the opposite direction of the effect in African Americans and Whites. It is also possible that the causal SNPs are different in different ethnicities although the susceptibility genes are the same. A gene-level analysis of dense genotyping data, such as ours, should be able to capture the susceptibility genes even in case of the ethnic heterogeneity for causal alleles, unless a given ethnicity lacks causal variants in a potential susceptibility gene, in which case the gene will not be associated with disease in that ethnic group. The genes listed in Table 2, except for NIPSNAP3B, are densely SNP-genotyped in both Whites and African Americans. Thus it is not very likely that an individual SNP being poly- versus monomorphic has led to the loss of an association. For NIPSNAP3B, the top SNP in both populations was the same, rs3780540, and the MAF was actually higher in African Americans (0.116) than in Whites (0.0179), yet it was only significant in Whites.
An interaction network built using Pathway Studio detected a large number of interactions between genes associated with SSc risk. Genes with the largest number of interactions include ITGAM, AIF1, STAT1, IL12RB2; these genes form hubs of the network and are likely to be master genes in biological control of SSc risk. Two suggestive candidates, EXOC2 and PINX1, are also part of this network.
As mentioned before, limitations of this study are (1) a relatively small sample size for African Americans, which prevents us from drawing any definite conclusion concerning the hitherto unresolved issue whether the same genes/SNPs influence SSc risk in different ethnic groups, and (2) the absence of independent validation cohorts. We acknowledge, therefore, that our analyses should be considered exploratory. Nevertheless, we carried out such analyses because the data are unique and their analysis may be important for the understanding of the role of ethnicity in the genetic architecture of SSc.
The results of our exploratory analysis might suggest that there exist both trans-racial and race-specific susceptibility loci for SSc, but further validation by independent studies, in particular a properly powered SSc GWAS in African Americans that allows subset analyses, is necessary to answer this question.
A gene-level analysis focusing on the data generated by the ImmunoChip platform was performed on White and African American SSc patients. This study represents the first report of genetic analysis of African Americans with SSc. The gene-level analysis identified four novel candidate genes (STAT1, FCGR2C, NIPSNAP3B, and SCT) significantly associated with SSc in Whites. As an exploratory analysis we compared the results in Whites with those generated from African Americans. There was weak evidence of existence of SSc susceptibility loci that showed effects in both Whites and African Americans. Our findings need to be validated by independent studies, particularly due to the limited sample size of African Americans. The clinical phenotype and autoantibody subset analyses for Whites are also presented, but future studies should compare the phenotype- and autoantibody-stratified analyses in Whites and African Americans.
We are grateful for the excellent technical support of Julio Charles, Marilyn Perry, Tony Mattar and Deepthi Nair. We also thank the subjects who generously provided the samples for these studies.
- 1. Bossini-Castillo L, Martin JE, Broen J, Simeon CP, Beretta L, Gorlova OY, et al. Confirmation of TNIP1 but not RHOB and PSORS1C1 as systemic sclerosis risk factors in a large independent replication study. Ann Rheum Dis. 2013;72(4):602–7. pmid:22896740; PubMed Central PMCID: PMC3887516.
- 2. Lopez-Isac E, Bossini-Castillo L, Simeon CP, Egurbide MV, Alegre-Sancho JJ, Callejas JL, et al. A genome-wide association study follow-up suggests a possible role for PPARG in systemic sclerosis susceptibility. Arthritis Res Ther. 2014;16(1):R6. pmid:24401602; PubMed Central PMCID: PMCPMC3978735.
- 3. Radstake TR, Gorlova O, Rueda B, Martin JE, Alizadeh BZ, Palomino-Morales R, et al. Genome-wide association study of systemic sclerosis identifies CD247 as a new susceptibility locus. Nat Genet. 2010;42(5):426–9. pmid:20383147; PubMed Central PMCID: PMC2861917.
- 4. Allanore Y, Saad M, Dieude P, Avouac J, Distler JH, Amouyel P, et al. Genome-wide scan identifies TNIP1, PSORS1C1, and RHOB as novel risk loci for systemic sclerosis. PLoS Genet. 2011;7(7):e1002091. pmid:21750679; PubMed Central PMCID: PMCPMC3131285.
- 5. Bossini-Castillo L, Martin JE, Broen J, Gorlova O, Simeon CP, Beretta L, et al. A GWAS follow-up study reveals the association of the IL12RB2 gene with systemic sclerosis in Caucasian populations. Hum Mol Genet. 2012;21(4):926–33. pmid:22076442; PubMed Central PMCID: PMC3298110.
- 6. Gorlova O, Martin JE, Rueda B, Koeleman BP, Ying J, Teruel M, et al. Identification of novel genetic markers associated with clinical phenotypes of systemic sclerosis through a genome-wide association strategy. PLoS Genet. 2011;7(7):e1002178. pmid:21779181; PubMed Central PMCID: PMC3136437.
- 7. Mayes MD, Bossini-Castillo L, Gorlova O, Martin JE, Zhou X, Chen WV, et al. Immunochip analysis identifies multiple susceptibility loci for systemic sclerosis. Am J Hum Genet. 2014;94(1):47–61. pmid:24387989; PubMed Central PMCID: PMC3882906.
- 8. Martin JE, Assassi S, Diaz-Gallo LM, Broen JC, Simeon CP, Castellvi I, et al. A systemic sclerosis and systemic lupus erythematosus pan-meta-GWAS reveals new shared susceptibility loci. Hum Mol Genet. 2013;22(19):4021–9. pmid:23740937; PubMed Central PMCID: PMC3766185.
- 9. Ball RD. Designing a GWAS: power, sample size, and data structure. Methods in molecular biology. 2013;1019:37–98. pmid:23756887.
- 10. van der Sluis S, Posthuma D, Nivard MG, Verhage M, Dolan CV. Power in GWAS: lifting the curse of the clinical cut-off. Molecular psychiatry. 2013;18(1):2–3. pmid:22614290.
- 11. Budhu A, Wang XW. Power play: scoring our goals for liver cancer with better GWAS study design. Journal of hepatology. 2011;54(4):823–4. pmid:21167853.