Ensembl Genomes is a new portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualisation platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualised in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site.
A suite of tools for immunological research. EpiToolKit offers a variety of prediction methods that may be run simultaneously for predicting MHC Class I and II ligands, and minor histocompatibility antigens. The influence of sequence polymorphisms or mutations on potential T-cell epitopes may also be examined.
GeMprospector is designed to find cross-species genetic marker candidates in legumes and grasses. GeMprospector automates PCR primer design based on multiple sequence alignments of submitted ESTs and their homologues in sequence databases from legumes or grasses.
A disease gene mining browser for association study. GenoWatch is a real-time batch SNP and short tandem repeat polymorphism pipeline that extracts current information from public domain websites such as NCBI, UniProt, KEGG and GO so that users can select the appropriate disease candidate genes.
The Gene Expression Pattern Analysis Suite (GEPAS) is a collection of tools for the analysis of microarray data. GEPAS includes tools for data pre-processing, clustering, differential gene expression, predictors, array CGH and functional annotation. A new pipeline module allows for automation of sequential analysis steps.
The H-Invitational Database (H-InvDB) is a comprehensive annotation resource of human genes and transcripts. The latest release of H-InvDB (release 6.2) provides the annotation for 219,765 human transcripts in 43,159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment Analysis Tool (HEAT)' and web service APIs.
The Immuno Polymorphism Database (IPD) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. IPD consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines.
MIRU-VNTRplus allows users to analyze genotyping data of their Mycobacterium tuberculosis strains either alone or in comparison with the reference DB of strains. The web server also includes tools to search for similar strains, phylogenetic analysis and mapping of geographic information.
MouseIndelDB is an integrated database resource containing thousands of previously unreported mouse genomic indel (insertion and deletion) polymorphisms ranging from approximately 100 nt to 10 Kb in size. The database currently includes polymorphisms identified from our alignment of whole-genome shotgun sequence traces from four laboratory mouse strains mapped against the reference C57BL/6J genome using GMAP. They can be queried on a local level by chromosomal coordinates, nearby gene names or other genomic feature identifiers, or in bulk format using categories including mouse strain(s), class of polymorphism(s) and chromosome number. The results of such queries are presented either as a custom track on the UCSC mouse genome browser or in tabular format.