Loading...

JOURNAL OF CANCER RESEARCH AND ONCOBIOLOGY (ISSN:2517-7370)

An Improved Lung Cancer Diagnosis Model Based on Vocs and CT Data with Transcriptome Analysis

Qian Wu1, Jiajing Sheng1, Yingchang Zou1, Yanjie Hu2, Kejing Ying1, Hao Wan1* , Ping Wang1*

1 Biosensor National Special Lab, Key Lab for Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China
2 Zhejiang Sir Run Run Shaw Hospital,  Department of Medicine, Zhejiang University, Hangzhou, China

CitationCitation COPIED

Wu Q, Sheng J, Zou Y, Hu Y, Ying K, et al. An Improved Lung Cancer Diagnosis Model Based on Vocs and CT Data with Transcriptome Analysis. J Cancer Res Oncobiol. 2019 Feb;2(1):119

© 2019 Wu Q, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 international License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Purpose Lung cancer (LC) is a leading cause of cancer-related morbidity and mortality globally. Exhaled VOCs have been considered as promising biomarkers for LC diagnosis. However, the accuracy of VOCs for LC diagnosis is not high enough due to the interference from benign pulmonary diseases. This study aims to establish an improved lung cancer diagnosis model with high accuracy in distinguishing lung cancer patients from benign pulmonary patients and healthy controls.

Methods: Herein, numbers of exhaled breath samples were analyzed by TD-GCMS, and the power of discrimination of each VOC was evaluate by ROC curve. Optimization was performed by adding related variables and random forests. To explore the biological relationship between selected VOCs and lung cancer, transcriptome data was analyzed by edgeR and DAVIDE. 

Results: The VOCs based diagnosis model was optimized through adding variables. To facilitate the sensor measurement, a five variables model with high was established. Based on transcriptome analysis, lung cancer related metabolic pathways were obtained, and some pathways were consistent with biological metabolic processes which generate VOCs in vivo.

Conclusions: Our improved model can accurately discriminate LC patients from other patients and health people, which provides a promising approach for lung cancer diagnosis.

Keywords

Lung cancer; VOCs biomarker; CT data; TD-GCMS; Model optimization; Transcriptome analysis

Introduction

Lung cancer (LC) is continuously a leading cause of cancer-related morbidity and mortality worldwide [1]. However, with early LC diagnosis, LC patients can be cured by surgery treatment and subsequent chemotherapy. Nevertheless, a convenient diagnosis method is still in great demand in clinical applications. Hence, how to develop a good early diagnosis approach is the key requirement in clinical [2]. Volatile organic compounds (VOCs) exhaled by humans contain a lot of information about the condition of the body. In the last decades, an increasing number of studies about detecting and analyzing exhaled VOCs have emerged [3,4], and they considered VOCs as promising biomarkers for non-invasive and convenient LC screening [5]. However, many study samples only contain LC patients and healthy people [6]. These studies are different from the fact in hospital that clinical patients include many other pulmonary non-malignant disease (PNMD) patients. Furthermore, this method of detecting VOCs is limited by its poor accuracy and various VOCs results based on different conditions and methods in different studies [7,8]. To increase the reliability of VOCs in lung cancer diagnosis, expanding sample volume and optimizing diagnosis model are two effective approaches. Zou et al. [9] optimized the diagnosis model through the establishment of training cohort and independent validation cohort to exclude interferences of PNMD. However, outstanding improvements are hardly achieved via simply optimizing in the level of VOCs. Other related variables should be introduced into the diagnosis model. Besides, a mass of VOCs as biomarkers are difficult to develop a highly precise sensor which can be applied in clinical LC preliminary screening. 

On the other hand, due to a lack of biological evidence between VOCs and LC, the approach of VOCs for LC screening is still hard to be validated. Transcriptome analysis which can explain the source of specific VOCs may solve this problem. The genetic and genomic changes existing in cancer patients have been recognized broad [10], and these changes can be reflected by transcriptome analysis conveniently. Recently, an increasing number of researches were studying the LC transcriptome data from The Cancer Genome Atlas (TCGA, https://gdc-portal. nci.nih.gov/). Fidler et al. [11] evaluated transcriptome data from TCGA against serum specimens from lung cancer patients and found some protein biomarkers with inferior survival for patients. Since the biological process of cancer is really complicated, only genomic or proteomic perspective is inadequate to explain it. The study of pathway transform in cancer is critical in understanding the disease [12,13]. Lin et al. [14] identified specific pathways like Liver X receptor activation which possibly indicated important differences in cancer cell metabolism. However, to our knowledge, no study on combining VOCs and transcriptome analysis is reported.

In this study, a large amount of exhaled breath samples was analyzed to select a set of VOCs in all samples for LC diagnosis. A LC diagnosis model was established based on VOCs and optimized through adding CT data and other variables using machine learning. At last, transcriptome analyses as well as subsequently pathway analysis were employed to explain the biological metabolisms of the obtained VOCs.

Materials and Methods

The overall flowchart of our research is shown in Figure 1 which exhibits the VOCs analysis procedure as well as establishment, optimization and verification of LC diagnosis model. All the details will be discussed in the following sections.


Figure 1: The integral flow chart of our research-A lung cancer diagnosis model was established together with optimization and verification

Collection of study samples

From 1st January 2009 to 31st December 2016, we continually collected and analyzed exhaled breath samples of 197 LC patients and 70 PNMD patients from Sir Run Run Shaw Hospital, Hangzhou, China, and 178 healthy control samples were collected from Department of Biomedical Engineering, Zhejiang University and Sir Run Run Shaw Hospital (Part of the data was reported in our previous study [9]. LC patients were diagnosed on the basis of MRI or CT characteristics [15] and were confirmed by histology or pathology [16]. Informed consent was obtained from every subject. The approval was obtained from the institutional ethics review committee of Sir Run Run Shaw Hospital, Hangzhou, China (No. 20070525 and ChiCTR-DCD-15007106). The volunteers’ statistical numbers of gender, age, smoking status and CT data were shown in Table 1. Different classifications like smoking status and CT data are introduced to diagnosis model in optimization process.


Table 1: Statistical information regarding all volunteers

Collection, extraction and analysis of exhaled breath

Before breath sample collection, volunteers were asked to stop consuming food, drinks, and smoking for 12 hours and stay in a ventilated room for 30 minutes. In order to collect the same standard breath sample, every subject breathed normally, through a disposable mouthpiece, and room air was also collected as background at the same time. The VOCs in the collected gas samples were concentrated by Tenax TA sorption tubes (Sigma-Aldrich, St. Louis, MO, USA) for subsequent analysis immediately without any storage.

Thermal desorption device (TurboMatrix 300TD, PerkinElmer, USA) was used to release the VOCs from Tenax tubes. With the aid of carrier gas, the released VOCs were transferred to Gas Chromatography and Mass Spectrometry (GCMS) for VOCs separation and qualitative detection. GC procedures: initially, the column was heated to 40℃ and held for 1 min. After that, the column increased to 250℃ with 5℃/min and held for 2 min. MS condition: the temperatures of the interface and iron source in the MS were 250℃ and 200℃, respectively. Scanning charge ranged from 45 to 500 mass in scan mode. The solvent cut-off time was 0.4 min.

Chromatographic peaks with slope greater than 500/min and peak area higher than 3000 were selected from GCMS chromatogram. According to the mass spectrometry library (NIST 05 and NIST 05s), [4] the corresponding VOCs with similarity higher than 90% were selected for further analysis. Besides, the retention time, Chemical Abstracts Service (CAS) number and peak area of substances were extracted from the raw data. To eliminate the interference of air background, the peak area of substances subtracted the background air response. The substances which existed in less than 50% samples were discarded to guarantee valid VOCs for analysis.

Algorithms for VOCs selection

The receiver operating characteristics (ROC) curve is defined as a plot of test sensitivity as they coordinate versus its 1-specificity as the x coordinate [17]. ROC curve is an effective method of evaluating the performance of qualitative diagnosis, namely a binary result which is either positive or negative for the disease diagnosis. We evaluated the discriminating power of each VOC according to area under the ROC curve (AUC) which can take on any value between 0 and 1, and a VOC with an AUC value of 1 indicates that the VOC is extremely accurate for disease diagnosis. Furthermore, t-test was employed to determine if a VOC has statistically significant difference between LC patients and other people. In dichotomous problem, logistic regression was used to assess the likelihood of falling into one of the outcome categories based on a set of predictors [18]. Herein, binary logistic regression analysis was applied to establish a diagnosis model for LC diagnosis based on VOCs results from previous analysis. Random forests (RF), as a powerful machine learning model with high accuracy in statistical classifier was used to optimize the diagnosis model [19]. Moreover, it is worth stressing that RF enables determining variable importance by mean decrease accuracy (MDA), which can be used to decrease variables for convenient sensing in clinical applications without sacrificing the accuracy of the model.

Bioinformatics analysis

In order to explore the biological source of VOCs in LC patients, we obtained RNA sequencing data, including LC patients and healthy control from TCGA. To get higher accurate differentially expressed genes (DEGs) results, more healthy control cases are required for comparison. Hence, we downloaded RNA-Seq gene expression data for which all alive cases were available in TCGA database.

EdgeR package from R programming language was used for transcriptome data analysis to identify DEGs between cancer samples and healthy samples [20]. This classical method is based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. We used false discovery rate (FDR) to define DEGs’ threshold by the Benjami and Hochberg (BH) method. Genes with FDR less than 0.001 and mean gene expression fold-change larger than 4 were identified as the DEGs. For gene enrichment analysis, DAVID 6.8 was used for pathway analysis of gene sets [21]. The pathway analysis provides a comprehensive functional annotation to understand the biological significance behind large list of genes.

Results and Discussion

VOCs selection

According to the aforementioned screening method, VOC profiles of every participant were obtained, and the statistical numbers are shown in Table 2. On average, no significant difference of VOC numbers was observed from LC, PNMD and healthy control. In order to discriminate LC patients, we divided all samples into LC group and non-LC group (including PNMDs and healthy controls). After eliminating VOCs with low occurrence frequency, we obtained 174 ubiquitous VOCs and plotted ROC curves for everyone. Then 70 VOCs with p<0.05 were selected for t-test, and 31 VOCs that have significant difference between two groups were selected shown in Table S1. To evaluate the discriminating performance of the selected biomarkers, we used 31 VOCs as independent variable to establish the diagnosis model by binary logistic regression analysis and the ROC curve was shown in Figure 2a. The datasets were randomly divided into two sets: training set (70% of the datasets) and validation set (30% of the datasets). Based on the established model, the sensitivity, specificity, and overall accuracy were 80.3%, 71.0% and 75.1%, respectively. Table 3 shows the discrimination results of the model for LC, PNMD and healthy control. The established model has high accuracy in distinguishing LC and healthy control. However, the model is unable to effectively discriminate between LC and PNMD due to the very low specificity, and the results are in accordance with previous study [9]. Due to the interference of PNMD patients, the diagnosis model should be optimized further for accurate LC diagnosis.

Diagnosis model optimization

Since the established diagnosis model is unable to effectively discriminate LC patients from PNMD patients, more variables that explicitly relate to LC should be introduced into the model to increase its discrimination capability. CT scan is commonly used for detecting changes in the lung parenchyma in clinical, and it was reported that the diagnostic accuracy of PET-CT for lung cancer was 93.5%, and the false positive rate was 6.5% [22] (the ROC curve of CT data only was shown in Figure 2b and the sensitivity, specificity, and overall accuracy were 85.8%, 91.1% and 88.8%, respectively). In addition, LC is closely associated with gender [23], age [24] and smoking status [25] of patients. Therefore, we established a new 35 variables diagnosis model by combining variables including CT, gender, age, smoking status and previous 31 VOCs through binary logistic regression analysis. Consequently, the sensitivity, specificity, and overall accuracy were significantly improved to 89.3%, 89.5% and 89.4%. The ROC curve of optimized LC diagnosis model is showed in Figure 2c and the AUC was 0.957 which means this model have a superb power for discriminating LC group and non-LC group.

Developing a sensor for monitoring exhaled breath that can replace GCMS is of great significance in clinical. However, simultaneous detection of multiple substances is a big challenge for sensor development and applications. Hence, reducing the input variables and maintaining high accuracy was imperative in the model optimization. Previous 35 variables were used to establish random forests LC diagnosis model with 10 nodes and 600 decision trees. Then, top five variables with the threshold of MDA>10 (Table 4) were selected to re-establish the diagnosis model. The ROC curve was shown in Figure 2d and the sensitivity, specificity, and overall accuracy of the model for LC diagnosis were 88.7%, 90.1% and 89.4%, respectively. As a result, the five variables LC diagnosis model established by RF has similar accuracy comparing with 35 variables model. Therefore, the final optimized model for LC diagnosis is obtained based on three selected VOCs, CT data and age. The largely reduced variables in this model enables convenient sensor development for clinical LC screening.

In Table 5, we summarized the discrimination accuracy of three lung cancer diagnosis models in LC vs. NLC. First, 31 variables model is the worst as the interference of PNMD patients. Then, 35 variables model has a superb power for discriminating LC group and non-LC group through adding LC relevant variables. At last, 5 variables model has similar accuracy

The source of VOCs in vivo

VOCs detected in exhaled breath were generated by in vivo metabolism. Due to the insolubility of VOCs in the blood, the measurable VOCs can be reflected by breath exchange via the lungs. In Table 6, three selected VOCs above are divided into two groups and their main biological reaction processes as well as relative enzymes are presented.

Alkylbenzene is considered to be produced due to exogenous influences include smoking, alcohol and pollution. These exogenous substances leak into the cytoplasm and cause peroxidative damage to proteins, fatty acid and DNA. Most LC patients have a long smoking history, and toluene increased in the breath of smoking patients versus that in nonsmokers [26]. The defense mechanisms in the human body can eliminate exogenous substances by the cytochrome p450, glutathione S-transferases, sulfotransferases, and N-acetyltransferases enzyme system [27].

Alkane is generated by oxidative stress response of polyunsaturated fatty acids (PUFA) in cellular membranes. This process proceeds by a free radical chain reaction mechanism. In LC patients, cytochrome p450 enzymes which catalyze the oxidation of organic substances involve in the emission of VOCs via hydroxylation [28]. Pentane or ethane has been widely used as a sensitive and noninvasive indicator of lipid peroxidation in vivo [29].

Identification of DEGs in LC and enrichment analysis

To explore the changes of metabolism between LC patients and healthy controls, we compared gene expression levels between 347 LC samples and 22 normal samples from TCGA. After R statistical analysis (the code of R language can be found in supplementary materials), edgeR finally obtained 1159 DEGs. We set strict cutoffs with fold-change larger than 4 and FDR less than 0.001 to get highly significant genes which can distinguish the two groups. Table S2 lists the genes whose expression is significantly higher or lower in LC patients than in normal samples. The up-regulated genes were about twenty times larger than down-regulated genes, 1106 and 53 respectively. Figure S1 presents the Volcano Plot which shows the distribution about logFC (fold-change) and FDR of all genes with upregulated genes (red) and down-regulated genes (green).

To understand the roles of DEGs playing in vivo of human body, we performed KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and gene ontology (GO) terms enrichment analysis using DAVID to identify the biological process that enriches the DEGs. As a result, top ten ranked KEGG pathways were closely associated to lung cancer, especially alcoholism, nicotine addiction, metabolism of xenobiotics by cytochrome P450 (Table S3). Meanwhile, for the GO analysis, we categorized significant 30 terms in Figure 3. Obviously, these GO terms were significantly involved in cancer related general process such as DNA replication and repair, phosphorylation, and immune response.

Verification of VOCs through biological evidence

Kinds of genes encoded the key enzymes presented in Table 6, and those genes involved in the pathways which yielded specific VOCs were all differentially expressed in lung cancer samples (Table S2). For example, SULT4A1, GSTA8P, GSTA9P, NAA11 and NAALADL2-AS2 encode glutathione S-transferases, sulfotransferases, N-acetyltransferases, respectively, which play important roles in the production of alkylbenzene. These genes were all significantly upregulated, and the logCPM were 4.6, 6.0, 5.6, 6.0 and 5.2 respectively. It means that the pathway of body’s defense mechanism is more intense in LC patients, and excessive products would be released into the extracellular environment and enter the lung through blood circulation. This is the reason why exhaled breath from LC patients has more alkylbenzene such as 3-Ethyltoluene and 1,2,3-Trimethylbenzene.

Cytochrome p450 is a super enzyme family including various protein isoforms, and uses various molecules as substrates in enzymatic reactions. This kind of enzyme is encoded by varies of genes like CYP1A2, CYP24A1, CYP26A1, CYP4F11, CYP4F3 and CYP2AB1P. According to our analysis results, CYP1A2 was downregulated (logCPM=-4.4) and other five genes were up-regulated. It implies that this enzyme catalyzes different kinds of reactions involved in LC. Cytochrome p450 oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics. For instance, CYP26A1 plays a key role in retinoic acid metabolism [30], CYP4F11 plays a key role in vitamin K catabolism by mediating omega-hydroxylation of vitamin K1 and K2 [31], and CYP4F3 catalyzes the omega-hydroxylation of leukotriene-B4 [32]. Importantly, alkane and alkylbenzene are the metabolites of these reactions. When CYP gene family was highly up-regulated, a mass of VOCs like methylcyclohexane would be generated apparently in vivo. For the down-regulated gene, CYP1A2 involves in the metabolism of aflatoxin B1 and acetaminophen [33]. Therefore, when CYP1A2 was down-regulated in vivo, it’s reasonable to cause the retardation of related metabolism. In other words, aflatoxin B1 and acetaminophen would accumulate in the body or flow through other metabolic pathways which may generate the VOCs like 3-ethyltoluene and 1,2,3-trimethylbenzene as they all containing benzene ring. Consequently, alkylbenzene VOCs can be detected in the exhaled breath.


Figure 2: The ROC Curves-(a) 31 variables (Table S1) model, (b) CT data only model, (c) 35 variables (Table S1 add CT data, gender, age and smoking status) model and (d) 5 variables (Table 4) model. And the AUCs is 0.824, 0.911, 0.957, and 0.936 respectively


Figure 3: List of gene ontology terms analyzed by DAVID-The terms was order by the value of fold-change which represent statistical weight of DEGs in LC and black columns indicated the gene number involved in each GO term


Figure S1: The Volcano Plot of all DEGs-It is a type of graph used to relate fold-change to FDR. Each point represents one gene. Red and green points mean up-regulated genes and down-regulated genes respectively


Table 2: VOC numbers in exhaled breath of LC, PNMD and healthy control groups


1Non-LC group
2Healthy control
                                                         Table 3: The discrimination accuracy of 31 VOCs based diagnosis model in LC vs. NLC, LC vs. PNMD and LC vs. HC 


Table 4: Top five variables with MDA>10 in RF model

1 These 31 variables are showed in Table S1
2 These 35 variables are Table S1 adding gender, age, smoking status and CT data
 3These 5 variables are showed in Table 4
Table 5: The discrimination accuracy of different lung cancer diagnosis model in LC vs.NLC.


Table 6: Pathways which generate the selected VOCs and its relevant enzymes


Table S1: VOCs which had significant difference between LC group and Non-LC group


Table S2: The down-regulated genes in lung cancer


1 Relevant genes were represented by Entrez gene ID
Table S3: Dysregulated pathways in LC patients and its relevant DEGs

Conclusions

In conclusion, our work selected a set of VOCs from exhaled breath samples through GCMS. Besides, we established a LC diagnosis model based on 31 special VOCs and then improved sensitivity, specificity, and overall accuracy significantly to 89.3%, 89.5% and 89.4%, respectively, by adding LC relevant variables of gender, age, smoking status and CT data which can distinguish LC patients from non-LC people (including PNMDs and healthy controls). In order to develop sensor which can be widely used in clinical, we established an optimized five variables model with similar accuracy. Furthermore, to identify the source of VOCs, LC related metabolic pathways were obtained, and some pathways were consistent with biological process which generated VOCs in vivo. Overall, we established two optimized LC diagnosis model and illuminated the relationship between LC and VOCs.’

Acknowledgements

This research was supported by projects of Natural Science Foundation of China (No. 31571004, 31627801). And we thank all the volunteers for this study.

Conflict of Interest

The authors declare they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Ethical approval for human exhale breathes collection was obtained (The institutional ethics review committee of Sir Run Run Shaw Hospital, No. 20070525 and ChiCTR-DCD-15007106).

References

  1.  Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics,2002. CA Cancer J Clin. 2005 Mar-Apr;55(2):74-108.
  2.  Hu J, Qian GS, Bai CX. Chinese consensus on early diagnosisof primary lung cancer (2014 version). Cancer. 2015 Sep;121:3157-3164.
  3. Chen X, Xu F, Wang Y, Pan Y, Lu D, et al. A study of the volatileorganic compounds exhaled by lung cancer cells in vitro forbreath diagnosis. Cancer. 2007 Aug;110(4):835-844.
  4.  Wang Y, Hu Y, Wang D, Yu K, Wang L, et al. The analysis of volatile organic compounds biomarkers for lung cancer in exhaled breath, tissues and cell lines. Cancer Biomark. 2012 ;11(4):129-137.
  5.  Gordon S, Szidon J, Krotoszynski B, Gibbons R, O’Neill H. Volatileorganic compounds in exhaled air from patients with lungcancer. Clin Chem. 1985 Aug;31(8):1278-1282.
  6. Sakumura Y, Koyama Y, Tokutake H, Hida T, Sato K, et al. Diagnosis by volatile organic compounds in exhaled breath from lung cancer patients using support vector machine algorithm. Sensors. 2017 Feb;17(2):287.
  7.  Hakim M, Broza YY, Barash O, Peled N, Phillips M, et al. Volatile organic compounds of lung cancer and possible biochemical pathways. Chem Rev. 2012 Nov;112(11):5949-5966.
  8.  Lourenço C, Turner C. Breath analysis in disease diagnosis: methodological considerations and applications. Metabolites.2014 Jun;4(2):465-498.
  9.  Zou Y, Zhang X, Chen X, Hu Y, Ying K, et al. Optimization ofvolatile markers of lung cancer to exclude interferences of nonmalignant disease. Cancer Biomark. 2014;14(5):371-379.
  10. Balmain A, Gray J, Ponder B. The genetics and genomics of cancer.Nat Genet. 2003 Mar;33:238-244.
  11. Fidler MJ, Frankenberger C, Seto R, Lobato GC, Fhied CL, et al. Differential expression of circulating biomarkers of tumor phenotype and outcomes in previously treated non-smallcell lung cancer patients receiving erlotinib vs. cytotoxic chemotherapy. Oncotarget. 2017 Apr;8(35):58108-58121.
  12. Hanahan D, Weinberg RA. Hallmarks of cancer: the nextgeneration. Cell. 2011 Mar;144(5):646-674.
  13.  Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, DiazLA, et al. Cancer genome landscapes. Science. 2013 Mar;339(6127):1546-1558.
  14.  Lin EW, Karakasheva TA, Lee DJ, Lee JS, Long Q, et al. Comparativetranscriptomes of adenocarcinomas and squamous cellcarcinomas reveal molecular similarities that span classicalanatomic boundaries. PLoS Genet. 2017 Aug;13(8):e1006938.
  15.  Todd J, McGrath EE. Chest X-ray mass in a patient with lungcancer! QJM: An International Journal of Medicine. 2010Sep;104(10):903-904.
  16. Andrews T, Wallace W. Diagnosis and staging of lung and pleural malignancy—an overview of tissue sampling techniques and the implications for pathological assessment. Clinical Oncology. 2009Aug;21(6):451-463.
  17.  Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC)curve: practical review for radiologists. Korean J Radiol. 2004Jan-Mar;5(1):11-18.
  18. Maroof DA. Binary logistic regression. In: Statistical methods inneuropsychology. Boston (MA): Springer; 2012. P. 67-75.
  19. Breiman L. Random forests. Machine learning. 2001 Oct;45(1):5-32.
  20.  Mc Carthy DJ, Chen Y, Smyth GK. Differential expression analysisof multifactor RNA-Seq experiments with respect to biologicalvariation. Nucleic Acids Res. 2012 May;40(10):4288-4297.
  21. Huang DW, Sherman BT, Lempicki RA. Systematic and integrativeanalysis of large gene lists using DAVID bioinformatics resources.Nat Protoc. 2009;4(1):44-57.
  22. Feng M, Yang X, Ma Q, He Y. Retrospective analysis for the false positive diagnosis of PET-CT scan in lung cancer patients. Medicine. 2017 Oct;96(42):e7415.
  23.  Kiyohara C, Ohno Y. Sex differences in lung cancer susceptibility:a review. Gend Med. 2010 Oct;7(5):381-401.
  24. Tomasetti C, Li L, Vogelstein B. Stem cell divisions, somaticmutations, cancer etiology, and cancer prevention. Science. 2017Mar;355(6331):1330-1334.
  25. Lee PN, Forey BA, Coombs KJ. Systematic review with meta analysis of the epidemiological evidence in the 1900s relating smoking to lung cancer. BMC Cancer. 2012 Sep;12:385.
  26. Kischkel S, Miekisch W, Sawacki A, Straker EM, Trefz P, et al.Breath biomarkers for lung cancer detection and assessment of smoking related effects—confounding variables, influence of normalization and statistical algorithms. Clin Chim Acta. 2010Nov;411(21-22):1637-1644.
  27. Guengerich FP, Shimada T. Oxidation of toxic and carcinogenicchemicals by human cytochrome P-450 enzymes. Chem ResToxicol. 1991 Jul-Aug;4(4):391-407.
  28.  Fontana E, Dansette P, Poli SM. Cytochrome p450 enzymes mechanism based inhibitors: common sub-structures and reactivity. Cur Curr Drug Metab. 2005 Oct;6(5):413-454.
  29.  Terelius Y, Ingelman-Sundberg M. Metabolism of n-pentaneby ethanol-inducible cytochrome P-450 in liver microsomesand reconstituted membranes. The FEBS Journal. 1986 Dec;161(2):303-308.
  30. White JA, Beckett-Jones B, Guo Y-D, Dilworth FJ, Bonasoro J, etal. cDNA cloning of human retinoic acid-metabolizing enzyme(hP450RAI) identifies a novel family of cytochromes P450(CYP26). J Biol Chem. 1997 Jul;272(30):18538-18541.
  31. Edson KZ, Prasad B, Unadkat JD, Suhara Y, Okano T, et al.Cytochrome P450-dependent catabolism of vitamin K:ω-hydroxylation catalyzed by human CYP4F2 and CYP4F11.Biochemistry. 2013 Nov;52(46):8276-8285.
  32.  Christmas P, Jones JP, Patten CJ, Rock DA, Zheng Y, et al. Alternative splicing determines the function of CYP4F3 by switching substratespecificity. J Biol Chem. 2001 Oct;276(41):38166-38172.
  33. Zhou H, Josephy PD, Kim D, Guengerich FP. Functionalcharacterization of four allelic variants of human cytochromeP450 1A2. Arch Biochem Biophys. 2004 Feb;422(1):23-30.