ASPicDB provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256,939 protein variants from 17,191 multi-exon genes have been extensively annotated through machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user.
AUGUSTUS is a eukaryotic gene prediction tool. It can integrate evidence, e.g. from RNA-Seq, ESTs, proteomics, but can also predict genes ab initio. The PPX extension to AUGUSTUS can take a protein sequence multiple sequence alignment as input to find new members of the family in a genome. It can be run through a web interface, or downloaded and run locally.
The Bioinformatics Resource for Oral Pathogens (BROP) contains tools for genomics of oral pathogens including Genome Viewer, GOAL (genome wide ORF alignment), an oral pathogen microarray database, an entrez counter, oral pathogen specific BLAST, and a codon usage database.
The duplicated gene nucleotide variants database (dbDNV) (http://goods.ibms.sinica.edu.tw/DNVs/) promotes accurate variation annotation. Aside from the flat file download, users can explore the gene-related duplications and the associated DNVs by DGL and DNV searches, respectively. In addition, the dbDNV contains 304,110 DNV-coupled SNPs. From DNV-coupled SNP search, users observe which SNP records are also variants among duplicates.
The Ensembl project provides high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish.
Frequency of INherited Disorders database (FIND base) records frequencies of causative genetic variations worldwide. Database records include the population and ethnic group or geographical region, the disorder name and the related gene, accompanied by links to any related external resources and the genetic variation together with its frequency in that population. Other features include: (i) the systematic collection and thorough documentation of population/ethnic group-specific pharmacogenomic markers allele frequencies for markers in genes of pharmacogenomic interest from different classes of drug-metabolizing enzymes and transporters, representing 150 populations and ethnic groups worldwide; (ii) the development of new data querying and visualization tools in the expanded FINDbase data collection that facilitates querying of large data sets and visualizing the results; and (iii) the establishment of the first database journal, by affiliating FINDbase with Human Genomics and Proteomics journal.
Gene Set based Analysis of Polymorphisms (GeSBAP) implements the gene set analysis to the evaluation of genome wide association studies. Gene set analysis is based on testing the association of modules of functionally related genes.
Gibbs Motif Sampler allows you to identify motifs, conserved regions, in DNA or protein sequences. This tool can be applied for the detection of transcription factor binding sites (TFBS).
GLUE, PEDEL, and DRIVeR are tools for estimating completeness and diversity in randomized protein-encoding libraries; useful for guiding library design and for analyzing results. GLUE Including Translation (GLUE-IT) finds the expected amino acid completeness of libraries. PEDEL-AA calculates amino acid statistics for libraries generated by epPCR.
The Gramene database has become a resource for major model and crop plants including Arabidopsis, Brachypodium, maize, sorghum, poplar and grape in addition to several species of rice. Gramene has an Ensembl genome browser and host a wide array of data sets including quantitative trait loci (QTL), metabolic pathways, genetic diversity, genes, proteins, germplasm, literature, ontologies and a fully-structured markers and sequences database integrated with genome browsers and maps from various published studies (genetic, physical, bin, etc.). In addition, Gramene now hosts a variety of web services including a Distributed Annotation Server (DAS), BLAST and a public MySQL database.
The HGNC approves a unique gene name and symbol for each known human gene. The HGNC Database is searchable and contains all approved symbols. For each symbol, if known, the database associates gene location, aliases, previous symbols and links out to sequence data and other databases.