The ArrayExpress Archive is a public repository of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Bgee is a dataBase for Gene Expression Evolution. It allows one to automatically compare gene expression patterns between species, by referencing expression data on anatomical ontologies, and designing homology relationships between them.
CaSNP database for storing and interrogating quantitative copy number alterations (CNA) data from SNP arrays on 34 different cancer types in 104 studies. With a user input of region or gene of interest, CaSNP will return the CNA information summarizing the frequencies of gain/loss and averaged copy number for each study, and provide links to download the data or visualize it in UCSC Genome Browser. CaSNP also displays the heatmap showing copy numbers estimated at each SNP marker around the query region across all studies for a more comprehensive visualization.
COXPRESdb (coexpressed gene database) represents the coexpression relationship for human and mouse. Upgrades include a new comparable coexpression measure, Mutual Rank, five other animal species, rat, chicken, zebrafish, fly and nematoda, and addition of different layers of omics data into the integrated network of genes.
Cyclebase is an online resource of cell-cycle-related experiments. This database provides an easy-to-use web interface that facilitates visualization and download of genome-wide cell-cycle data and analysis results. Data from different experiments are normalized to a common timescale and are complimented with key cell-cycle information and derived analysis results. Included is information on cyclin-dependent kinase (CDK) substrates, predicted degradation signals and loss-of-function phenotypes from genome-wide screens. The web interface provides a single, gene-centric graph summarizing the available cell-cycle experiments. Links to orthologous and paralogous genes are included to further facilitate comparison of cell-cycle regulation across species.
EMAGE is a freely available online database of in situ gene expression patterns in the developing mouse embryo. Gene expression domains from raw images are extracted and integrated spatially into a set of standard 3D virtual mouse embryos at different stages of development, which allows data interrogation by spatial methods. An anatomy ontology is also used to describe sites of expression, which allows data to be queried using text-based methods. Data coverage has been increased by sourcing from a greater selection of journals.
The ENCODE project has a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) serves as the central repository for ENCODE data. The DCC contains a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser. ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay.
The Functional Annotation Of the Mammalian Genome (FANTOM) is a database for the transcriptional network that regulates macrophage differentiation. Data comes from cap analysis of gene expression (CAGE), sequencing mRNA 5'-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP-chip for epigenetic marks and transcription factors.
Gene Expression Barcode is a database that provides reliable absolute measures of expression for most annotated genes for human and mouse tissue types, including diseased tissue. This is made possible by an algorithm that leverages information from the GEO and ArrayExpress public repositories to build statistical models that permit converting data from a single microarray into expressed/unexpressed calls for each gene. For selected platforms, users may upload data and obtain results.
The Gene Expression Database (GXD) is a community resource of mouse developmental expression information. GXD integrates different types of expression data at the transcript and protein level and captures expression information from many different mouse strains and mutants. GXD places these data in the larger biological context through integration with other Mouse Genome Informatics (MGI) resources and interconnections with many other databases. Web-based query forms support simple or complex searches. All data are annotated and reviewed by GXD curators.
GeneSigDB is a manually curated database of gene expression signatures. GeneSigDB focuses on cancer, development, and stem cell gene signatures and was constructed from thousands of publications from which we manually transcribe gene signatures. Gene signatures are mapped to the genome to extract standardized lists of EnsEMBL gene identifiers. GeneSigDB provides the original gene signature, the standardized gene list and a fully traceable gene mapping history for each gene from the original transcribed data table through to the standardized list of genes. GeneSigDB release 3.0 (Decemeber 2010) contained over 2,000 gene signatures.
The Gene Expression Atlas is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive of Functional Genomics Data. A simple interface allows the user to query for differential gene expression either (i) by gene names or attributes such as Gene Ontology terms, or (ii) by biological conditions, e.g. diseases, organism parts or cell types.
H-DBAS is a specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. RNA-Seq tag information is correlated to the AS exons and splice junctions. A total of 148,376,598 RNA-Seq tags have been generated from RNAs extracted from cytoplasmic, nuclear and polysome fractions. A comparative genomics viewer allows users can empirically understand the evolutionary turnover of AS.
JASPAR is an open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. The database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family.
The Gene Expression Omnibus (GEO) database stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies.