Top : Literature : Text Mining
Links to tools for more complex searching and for making connections between text and other scientific information. This section focuses specifically on text such as scientific publications, article abstracts, or NCBI GenBank records.
BioLitBioLit is a web sever resource that integrates scientific publications with existing biological databases. To perform this link, BioLit searches the full text of the article for metadata such as database identifiers and ontology terms.
bioNMFA web-based tool based on nonnegative matrix factorization (NMF) that can be used to provide new information from multi-dimensional biological data sets.
BIOSMILEBIOSMILE is a web-based NCBI PubMed search tool. Users input keywords to be searched and BIOSMILE retrieves matching PubMed abstracts.
botXminerbotXminer is a literature searching tool that allows you to search BotDB. BotDB citations contain only those articles with either 'botulinum' or 'tetanus' anywhere in their text. botXminer is an interface to this subset of complete medline xml files loaded into an Oracle XMLDB.
CARGOCARGO (Cancer And Related Genes Online) is a portal that uses widgets to allow users to mine literature using iHOP, retrieve disease information from OMIM, visualize 3D SNPs, query protein interactions, and view summarized gene annotation information for cancer related genes in human.
ChilibotChilibot searches the PubMed literature database based on specific relationships between proteins, genes, or keywords. The results are returned as a graph.
CoPubUseful in microarray data analysis, CoPub is a text mining tool for the detection of biomedical terms that co-occur in abstracts with the list of input genes (human, rat, and mouse genes). CoPub also graphically displays differentially expressed genes and over-represented keywords in a network for better visualization of relationships.
E3MinerE3Miner is a web-based text mining tool that extracts and incorporates comprehensive knowledge about ubiquitin-protein ligases (E3) with their underlying mechanisms. This tool integrates available E3 data from the published literature as well as from the biological databases.
FABLEFast Automated Biomedical Literature Extraction (FABLE) mines the biomedical literature for information about human genes and proteins. User can find articles mentioning a gene of interest (Article Finder), generate a list of genes associated with one or more keywords (Gene Lister), or use a local mirror of the UCSC Genome Browser with a literature track (LitTrack).
G-SESAMEG-SESAME is a suite of online tools for measuring the semantic similarities of Gene Ontology (GO) terms and the functional similarities of gene products, as well as data mining the GO database.
GendooGene Disease Features Ontology-based Overview System (Gendoo) is a web tool for visualizing disease feature profiles generated from the assignment of MeSH vocabulary for associated drugs, biological phenomena and anatomy to OMIM data. This approach assists in interpreting -omic data for its molecular and clinical aspects.
GoGeneGoGene performs high-throughput text mining to complement annotation of genes. GoGene supports search for genes in PubMed, EntrezGene and BLAST.
GPSDBGene and Protein Synonym DataBase (GPSDB) is a collection of gene and protein names, organized by species that can be used to search for a given gene/protein name, retrieve all synonyms for this entity, and query Medline with a set of user-selected terms.
iHOPiHOP (Information Hyperlinked over Proteins) allows researchers to explore a network of gene and protein interactions based on published scientific literature. For each gene search, iHOP reports sentences from abstracts associating it with other genes, links out to full abstracts, and reports experimental evidence for the interactions, if available. You can also select sentences to create and visualize your own gene model.
LitInspectorA literature search tool providing gene homonym mining within the PubMed database. Search terms are highlighted in the results. LitInspector also performs signal transduction pathway mining using a manually curated database of pathway names, pathway components and pathway keywords.
LitMinerLitMiner is a literature data mining tool that is based on the annotation of key terms in article abstracts followed by statistical co-citation analysis of annotated key terms in order to predict relationships between genes, compounds, diseases and phenotypes, and tissues and organs.
MARBLMARBL is an open-source package for indexing the text components of GenBank records and the NLM article abstracts associated with them. A few demonstrations of the package are also available at this website.
Medical Acronym FinderThe acronym database provides free access to medical/biological acronyms. It has 100,000+ acronyms and the users can also contribute by rating the entries quality. In the backend, it is generated from medline data.
MedKitMedKit is a helper application for text-mining the MEDLINE abstract database that allows allows random sampling of abstracts and downloads of >10,000 MEDLINE abstracts in XML form. Java modules (query, sample, fetch, and parse) that can be easily integrated into other text-mining systems are also included.
MedlineRankerThe MedlineRanker web server allows a flexible ranking of Medline for a topic of interest without expert knowledge.
MedMinerMedMiner can be used to select genes from a microarray set based on GeneCards information. Based on the genes selected one can then search PubMed abstracts using known gene synonyms and other user-specified search parameters. The PubMed search can also be done independently of a microarray gene set. Results are grouped based on a set of relational keywords.
NLProtNLProt is a tool for finding protein names in natural language text. This data-mining method is a useful approach for extracting protein UniprotIDs from research articles for the construction of custom datasets and/or databases.
PIEThe Protein Interaction information Extraction system (PIE) is a configurable web server for extraction of protein-protein interactions from literature. Both co-occurrence of proteins in text and predefined phrase patterns for protein-protein interactions are employed in a machine learning framework.
PLAN2LA web based tool that integrates text mining and information extraction techniques to access useful information for analyzing genetic, cellular and molecular aspects of Arabidposis thaliana.
PolySearchPolySearch allows users to conduct comprehensive and associative queries, such as given X, find all Y's, where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also identifies, highlights and ranks abstracts, paragraphs or sentences.
PosMedTo assist in prioritizing candidate genes discovered in a linkage analysis, Positional Medline (PosMed) executes a full-text search of documents given a query word input and ranks the positional cloning candidate genes based on direct and indirect inference of the hit documents. PosMed currently supports prioritization of positional cloning candidate genes in human, mouse, rat and Arabidopsis thaliana.
PubFinderPubFinder is a tool to facilitate searching through PubMed abstracts. The user chooses a set of abstracts that are representative of the subject area of their search. PubFinder then uses words from the selected abstracts to search for other papers likely belonging to the same subject area.
PubGeneSearchable literature network of human genes with tools for gene expression analysis. Choose from the free public service, or purchase the commercial package.
SENTSENT is a text mining web server. SENT uses non-negative matrix factorization to identify topics in scientific articles associated with an input list of genes.
XplorMedXplorMed is a tool that summarizes MEDLINE search results according to subjects and allows you to navigate through abstracts in an interactive fashion.