Research Papers:
Comprehensive landscape of subtype-specific coding and non-coding RNA transcripts in breast cancer
PDF | HTML | Supplementary Files | How to cite
Metrics: PDF 2499 views | HTML 3366 views | ?
Abstract
Trung Nghia Vu1,*, Setia Pramana1,*, Stefano Calza1,2,*, Chen Suo1, Donghwan Lee1,3, Yudi Pawitan1
1Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, SE 17177 Stockholm, Sweden
2Department of Molecular and Translational Medicine, University of Brescia, 25125 Brescia, Italy
3Department of Statistics, Ewha Womans University, Seodaemun-gu, Seoul 120-750, South Korea
*These authors have contributed equally to this work
Correspondence to:
Yudi Pawitan, email: [email protected]
Keywords: breast cancer, RNA sequencing, subtype-specific isoforms, subtype co-expression, non-coding RNAs
Received: June 30, 2016 Accepted: August 24, 2016 Published: September 13, 2016
ABSTRACT
Molecular classification of breast cancer into clinically relevant subtypes helps improve prognosis and adjuvant-treatment decisions. The aim of this study is to provide a better characterization of the molecular subtypes by providing a comprehensive landscape of subtype-specific isoforms including coding, long non-coding RNA and microRNA transcripts. Isoform-level expression of all coding and non-coding RNAs is estimated from RNA-sequence data of 1168 breast samples obtained from The Cancer Genome Atlas (TCGA) project. We then search the whole transcriptome systematically for subtype-specific isoforms using a novel algorithm based on a robust quasi-Poisson model. We discover 5451 isoforms specific to single subtypes. A total of 27% of the subtype-specific isoforms have better accuracy in classifying the intrinsic subtypes than that of their corresponding genes. We find three subtype-specific miRNA and 707 subtype-specific long non-coding RNAs. The isoforms from long non-coding RNAs also show high performance for separation between Luminal A and Luminal B subtypes with an AUC of 0.97 in the discovery set and 0.90 in the validation set. In addition, we discover 1500 isoforms preferentially co-expressed in two subtypes, including 369 isoforms co-expressed in both Normal-like and Basal subtypes, which are commonly considered to have distinct ER-receptor status. Finally, analyses at protein level reveal four subtype-specific proteins and two subtype co-expression proteins that successfully validate results from the isoform level.

PII: 11998