The Array Clone Information Database (ACID) is a searchable resource for information about human, mouse, and rat cDNA clones. Each clone contains information about the assigned UniGene cluster(s), location in the full-length transcript, assigned gene ontology terms and position in the genome assembly.
The Alternative Splicing Gallery (ASG) takes an identifier such as an EnsEMBL gene ID or a RefSeq ID as input, and provides a graph mapping splice events to transcript information. The user can also view GO information for the record, and select one or more exons and download the resulting sequence. ASG also links out to other alternative splicing databases like ProSplicer.
AUGUSTUS is a eukaryotic gene prediction tool. It can integrate evidence, e.g. from RNA-Seq, ESTs, proteomics, but can also predict genes ab initio. The PPX extension to AUGUSTUS can take a protein sequence multiple sequence alignment as input to find new members of the family in a genome. It can be run through a web interface, or downloaded and run locally.
CaSNP database for storing and interrogating quantitative copy number alterations (CNA) data from SNP arrays on 34 different cancer types in 104 studies. With a user input of region or gene of interest, CaSNP will return the CNA information summarizing the frequencies of gain/loss and averaged copy number for each study, and provide links to download the data or visualize it in UCSC Genome Browser. CaSNP also displays the heatmap showing copy numbers estimated at each SNP marker around the query region across all studies for a more comprehensive visualization.
Cis-regulatory Element Annotation System (CEAS) is a resource for ChIP-chip analyses that retrieves repeat-masked genomic sequences, calculates GC content, plots evolutionary conservation, maps nearby genes, and identifies enriched transcription factor binding (TFBS) motifs.
COSMIC curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136,000 coding mutations in almost 542,000 tumour samples; of the 18,490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs. Biomart allows more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus. COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.
CREME (Cis-Regulatory Module Explorer for the human genome) is a tool for identifying and visualizing cis-regulatory modules for a given set of genes that are potentially co-expressed or co-regulated. It takes as input a list of accession numbers, and reports back common modules, grouping genes from the list by which modules are found in their promoter regions.
A database of differentially expressed proteins in human cancers (dbDEPC) collects curated cancer proteomics data, provides a resource for information on protein-level expression changes, and explores protein profile differences among different cancers. dbDEPC currently contains 1803 proteins differentially expressed in 15 cancers, curated from 65 mass spectrometry (MS) experiments in peer-reviewed publications.
The duplicated gene nucleotide variants database (dbDNV) (http://goods.ibms.sinica.edu.tw/DNVs/) promotes accurate variation annotation. Aside from the flat file download, users can explore the gene-related duplications and the associated DNVs by DGL and DNV searches, respectively. In addition, the dbDNV contains 304,110 DNV-coupled SNPs. From DNV-coupled SNP search, users observe which SNP records are also variants among duplicates.
DNannotator is a tool that performs de novo annotation of SNPs, STSs, and exons. It also allows for the migration of user-defined annotations onto different versions of genomic sequences (<30Kb size limit).
The ENCODE project has a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) serves as the central repository for ENCODE data. The DCC contains a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser. ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay.
Access the data through the Ensembl user interface (both for visualisation and data mining) to provide cross-species integration throughout Ensembl\'s comparative genomics resources.