Browsing by Author "Sousa, Lisete"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Arrow plot and CA maps on microarray preprocessing methodsPublication . Silva, Carina; Freitas, Adelaide; Roque, Sara; Sousa, LiseteMicroarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.
- Arrow plot and correspondence analysis maps for visualizing the effects of background correction and normalization methods on microarray dataPublication . Silva, Carina; Freitas, Adelaide; Roque, Sara; Sousa, LiseteAmong various available array technologies, double-channel cDNA microarray experiments provide numerous technical protocols associated with functional genomic studies. The chapter begins by detailing the arrow plot, which is a recent graphical-based methodology to detect differentially expressed (DE) genes, and briefly mentions the significance analysis of microarrays (SAM) procedure, which is, in contrast, quite well known. Next, it introduces the correspondence analysis (CA) and explains how the resultant graphic can be interpreted. Then, CA in both class comparison and class prediction applications and over the data sets lymphoma (lym), lung (lun), and liver (liv) is executed. The CA is applied to all three databases in order to obtain graphical representations of background correction (BC) and normalization (NM) profiles in a two-dimensional reduced space. Whenever possible, more than one preprocessing strategy on microarray data could be applied and results from preprocessed data should be compared before any conclusion and subsequent analysis.
- Arrow Plot: a new graphical tool for selecting up and down regulated genes and genes differentially expressed on samples subgroupsPublication . Silva, Carina; Turkman, Maria Antónia Amaral; Sousa, LiseteBackground: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
- Estatística em biologia molecular: o passado, o presente e o futuroPublication . Sousa, Lisete; Silva, CarinaVivemos na era mais mensurável da história. Na era do petabyte (1000 terabytes) o desafio não é mais o armazenamento de dados, é dar-lhes sentido. Sendo esta a era da revolução dos dados, a respetiva análise torna-se parte integrante de várias ciências. Por exemplo, a biologia molecular deixa de ser uma ciência onde os biólogos estudam um gene de cada vez, para passar a produzir milhares (agora milhões) de medições por amostra para analisar. Além disso, ao contrário da análise do ADN, que é estática, a análise da expressão genética é dinâmica, uma vez que nos vários tecidos expressam-se genes diferentes. O geneticista John Craig Venter, sequenciava organismos isolados, mas com o aparecimento de novas tecnologias e computadores com elevada capacidade de memória, que permitem a análise de dados bastante complexos, passou a estudar ecossistemas inteiros: sequenciação dos microorganismos do oceano, desde 2003, e do ar, desde 2005. A complexidade dos dados é ainda potenciada pelas novas tecnologias que, ao surgirem, são ainda pouco exploradas, produzindo dados com mais ruído dos que as anteriores. Esta complexidade e grau de variabilidade fazem com que a estatística seja um importante e inequívoco contributo na análise. Na realidade, o papel da estatística na biologia molecular vai além de uma mera intervenção. Trata-se de um pilar indissociável desta ciência! A estatística tem vindo a conquistar o seu espaço nesta nova área, tornando-se uma componente essencial de mérito reconhecido.
- Impact of OVL variation on AUC bias estimated by non-parametric methodsPublication . Silva, Carina; Turkman, Maria Antónia Amaral; Sousa, LiseteThe area under the ROC curve (AUC) is the most commonly used index in the ROC methodology to evaluate the performance of a classifier that discriminates between two mutually exclusive conditions. The AUC can admit values between 0.5 and 1, where values close to 1 indicate that the model of classification has high discriminative power. The overlap coefficient (OVL) between two density functions is defined as the common area between both functions. This coefficient is used as a measure of agreement between two distributions presenting values between 0 and 1, where values close to 1 reveal total overlapping densities. These two measures were used to construct the arrow plot to select differential expressed genes. A simulation study using the bootstrap method is presented in order to estimate AUC bias and standard error using empirical and kernel methods. In order to assess the impact of the OVL variation on the AUC bias, samples from various continuous distributions were simulated considering different values for its parameters and for fixed OVL values between 0 and 1. Samples of dimensions 15, 30, 50, and 100, and 1000 bootstrap replicate for each scenario were considered.