Thursday, May 31, 2007

CutDB: a proteolytic event database

The Burnham Institute for Medical Research offers several databases, including PMAP CutDB. CutDB is a database of proteases and their proteolytic events - including predicted events - from experiments and the literature, at http://cutdb.burnham.org It can be searched by a wide range of field, listed to the right. Some entries link to extensive lists of events such as furin, while others have only one entry such as griselysin.
  • Clicking on the name in [Protease_definition] may lead to an entry in the PMAP-Proteases database.
  • Clicking on the [Substrate definition] leads to its NCBI Protein entry.
  • Clicking on [Structure] links may crash your web browser - it seems to be highly MSIE/default settings dependent.
  • Clicking on [Details] leads to an entry with substrate/structure, cut-site, cell line and original citation.



Tuesday, May 29, 2007

KEGG LIGAND

KEGG LIGAND is an alternative to the NCBI databases, covering the "molecular building blocks of life in the chemical space." Its interlinking is simpler and more up-front than NCBI's -- if not as complete. The Reaction database is easy to use. Compare diethylene glycol in NCBI and LIGAND.

Article
Susumu Gotoa Takaaki Nishioka, and Minoru Kanehisa.
LIGAND: chemical database of enzyme reactions.
Nucleic Acids Res. 2000 January 1; 28(1): 380–382.

Thursday, May 24, 2007

NCBI course - Principles of PubChem

http://www.ncbi.nlm.nih.gov/Class/PubChem/course.html
ftp://ftp.ncbi.nih.gov/pub/PowerTools/PubChem/Docs/handout.pdf
http://www.ncbi.nlm.nih.gov
http://pubchem.ncbi.nlm.nih.gov
http://www.mli.nih.gov/mlsmr/index.php

What is PubChem?
A public repository of electronic representations of small molecules and associated bioactivity assay data
- new program -- link chem informatics to bio-informatics
- A component of the NIH Molecular Libraries RoadMap
- Part of the NCBI Entrez search and linking system
- A system of four components: molecular libraries
--PubChem Substance DB
--PubChem Compound DB
--PubChem BioAssay DB
--PubChem Structure Search / tool like blast / vast

http://nihroadmap.nih.gov/ --> Grants
compund repository (MLSMR)
molecular libraries small molecule repository
molec lib screening center netwk MLSCN
predictive ADMET

The National Center for Biotechnology Information
What does NCBI do?
Accepts submissions of primary data.
Develops tools to analyze these data.
Uses these tools to create derivative databases based on the primary data.
Provides free search, linking, and retrieval of data, mainly through the Entrez system.

entrez - text / seq - blast / protein stru - vast / sm molec struc - pubchem

pubchem Types of Databases
=Primary Databases
Original submissions by experimentalists
Content controlled by the submitter
Examples: GenBank, SNP, GEO, PubChem Substance and BioAssay
=Derivative Databases
Built from primary data
Content controlled by third party (NCBI)
Examples: RefSeq, RefSNP, GDS, PubChem Compound

PubChem Databases
substance = real chemicals / non redundant
bioassay = experimental

PC Substance Record
structure display / subID = sid + [compund id=cid] / link to depositor / chem nomenclature
? (iupac names from ncbi)

Non-uniformity in PC Substance - diff ways to draw a chemical
The Bizarre / non-standard in PC (pubchem) Substance (chamomile tea, grapefruit)

PubChem Compound

Standardize Structures
Verify Chemical Data
Atom description (label, element)
Functional group clean-up
Atom valence verification to prevent non-sense structures
“Normalize” and “Standardize”
Valence-Bond canonicalize (for Tautomer invariance)
Aromaticity detection and self- consistency
Stereochemistry detection
Explicit hydrogen assignment
Structural Representations
2D Coordinate generation
Images created
Structures that fail to standardize…
Have no records in PC Compound
Cannot be searched by structure

Stereoisomers in PC Compound (chiral sugars)

PubChem Compound continued
- Calculate Properties and Links
Nomenclature
IUPAC http://www.iupac.org/
SMILES & SMARTS
InChI
- Structural Information
Calculate & store “Fingerprints”
Calculate & link to similar structures (90% level)
- Physical Properties
Molecular Formula
Molecular Weight
Number of H-bonds donor/acceptor sites
XLogP value
Lipinski value (bioavailability)
Number of Rotatable bonds
- Links to NCBI Database Records
Structures (MMDB records) http://130.14.29.110/Structure/MMDB/mmdb.shtml
Protein sequences (from Structure links)
Genes (from Protein links)
- Links to MeSH Terms through IUPAC name
("believe it or not, but people read every article and assign mesh to them" ... :-)

PC Compound Record - all the data, most complete
1 CID / bioactivity / links to substances
2 MeSH Links - use pubchem to do chem medline searches!
3 Calculated Properties
vendors / descriptors

Handling Mixtures
SID / CID / links to unique components & their cid's

PC BioAssay Record
Tables - active etc / overlap made non redundant

BioAssay Protocol
methods / procedures - no std's / text explanations & links to web

PubChem integration in Entrez
-What is Entrez?
-System of 31 linked databases
-Text search engine
-Tool for finding biologically linked data
-Data retrieval engine
-Virtual workspace for manipulating large datasets
-Free public access

Entrez review
Fields
chemical[synonym] (all)
chemical[completesynonym] (exact)
Atom abrev[element]
[sourcename]
[filter] = structure, rules
lipinski[filter]
[pharmaction] = mesh

pubchem Search page
Entrez Limits page, very diff from medline
Details
Preview/Index
Entrez History

Display - Downloading Reports
- property report - one line
- BULK -- pubchen download - long records, goes to a URL for a week

Linking in Entrez
- hard = biol / chem
- soft = computed, algorhithm links

PubChem Links
related struct / assays / literature (pmc = free) / other entrez db

Linking in Bulk
Use DISPLAY link for list --> "pubchem bioassays" ...

The PubChem FTP Site
ftp://ftp.ncbi.nih.gov/pubchem

Programming Tools

PubChem Help -- excellent

Monday, May 21, 2007

NCBI New Database -- Protein Clusters

Protein Clusters is a collection of related protein sequences (clusters). These clusters consist of Reference Sequence (ie, RefSeq = comprehensive, integrated, non-redundant sets of sequences) proteins which are encoded by prokaryotic and chloroplastic plasmids and genomes.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=proteinclusters

Friday, May 18, 2007

NCBI Genes and Disease Page

This resource is aimed at consumer health needs, but its generic information about genetic diseases might be useful to the student researchers. It has splendid links to other NCBI resources such as OMIM.
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=gnd.TOC&depth=2

An amusing NCBI taxonomy site has extinct critter's DNA:
http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=extinct

Thursday, May 17, 2007

Seminar - May 23 - noon-2:00

The Genomics Shared Service presents:
Empowering Genomic Discoveries with Agilent Microarray Platform

Agilent Technologies is at the forefront of integrated genomics, offering solutions for gene expression profiling, miRNA, chromosomal copy number detection (CGH), transcription factor binding (ChIP-chip), methylation, alternative splicing and informatics. Learn how advancements with Agilent’s inkjet printing flexibility and free online design tools can be leveraged to further enable the research community at the University of Arizona.

Speaker: Christopher Hopkins, Ph.D. Agilent Technologies
Location: Kiewit Auditorium, Arizona Cancer Center
Date: Wednesday, May 23, 2007
Time: Noon ­ 2 p.m., LUNCH WILL BE PROVIDED.

Come see what the latest addition to the Genomics Shared Service can do
for your research.

Seminar - May 24 at 8am

The College of Medicine invites you to attend the special seminar of Dr.
Lisa Rimsza, who will be visiting the University of Arizona as a candidate
for the Head of Pathology. Please join us for this exciting presentation!

COLLEGE OF MEDICINE SPECIAL SEMINAR
Thursday, May 24: 8:00 - 9:00am
College of Medicine, room 8403
Translational Research in Lymphoma: Using Gene Expression Profiling to
Find and Expose "Immunostealth"

Tuesday, May 15, 2007

NCBI: Microbial Genome Notes - morning

Hand-out
ftp://ftp.ncbi.nih.gov/pub/minicourses/CURRENT/HO/PDF/Microbial.pdf
Primary page
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
NCBI -- http://www.ncbi.nlm.nih.gov/ -- Instructors
Susan Dombrowski / Wayne ....
Help
blast-help@ncbi.nlm
301-496-2475

Outline
Entrez / pubmed & MynNCBI / BLAST / spec resources
Use
Entrez >> subject / text
BLAST >> sequence / pcr reaction -- http://www.ncbi.nlm.nih.gov/BLAST/
Entrez
PubMed / taxonomy (lineage) / VAST / BLAST / Phylogeny
interlinked - related sequences btw db
neighboring - inside a db
30 db -- new Protein Clusters (UniGene for protein)
Groups -- nucleotides / pubmed / taxonomy / proteins
-- Search _ all[filter]_ = current content of db
Types of db
primary db (orig data, controlled by submitter)
derived db (blt from primary, controlled by ncbi)
Searching
Help manual in entrez books - http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpbook.TOC
All db / cross db -- http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
default = search everywhere
Tabs = limit (fields, preview index)
[orgn] = organism field
entry Number = code
XM = experimental
icon -
MyNCBI
SDIs / "google alerts"
BLAST - http://www.ncbi.nlm.nih.gov/BLAST/
"basic local alignment search tool"
- very basic BLAST! >> no scoring, no statistics
- local, isolated, surprising reguions of similarity
- breaks seq into words
- hits become "seed" for alignment extension
- search algorithms .. smith-waterman - "local alignment"
- global alignment - end to end for pockets of similarity
- local alignment - matches letter by letter, not sequence by sequence (?) .... http://en.wikipedia.org/wiki/Local_alignment
- word size in nuc = exact match
- word size in proteins -- flexible matching (exact or "neighbohood words" - amino acid residues that are similar / biochem equiv) -
NR = non redundant protein = default db
LIMITS - organism
To exclude = _all[filter] NOT mammals_
Algorithm parameters - higher Expect Value, lower stringency
short sequence strategy ....
long sequence strategy (eval high
Precomputed services
- nuc or prot = related sequences
- blast link = Blink (like related sequences)
- transcript clusters = UniGene
- protein homologs = Homologene
DBS
Nucleotide
-- Links / Related == can Sort Related Sequences - chose how
Genbank - No protein sequences
Protein >> BLink = related prot seques
Genome >> book links, many seque pub as texts
-- shotgun: http://www.ncbi.nlm.nih.gov/Genbank/wgs.html
-- browse genome trees

Bits
GC=guanine-cytosine content (GC-content) is a characteristic of the genome of any given organism -- http://en.wikipedia.org/wiki/GC-content

Ideas
MyNCBI set up for Bio5 researchers
(Elvis lives .. short sequence stragey)

Monday, May 14, 2007

NCBI workshops on campus - May 2007

5/15
Entrez Search Engine and the BLAST : Microbial Genomes resources
5/23
Exploring 3D Molecular Structures Using NCBI Tools
5/24
NCBI PubChem Workshop

http://biotech.arl.arizona.edu/education/events/2007/ncbi_courses.php

Friday, May 11, 2007

Molecule of the Day

Molecule of the Day provides something to chat about with faculty and students. Many of the daily postings are amusing.

I enjoy Pharyngula, too, but it's much more opinionated.

Thursday, May 10, 2007

COM seminar

The College of Medicine invites you to attend the special seminar of Dr.
Christopher Corless, who will be visiting the University of Arizona as a
candidate for the Head of Pathology. Please join us for this exciting
presentation!

COLLEGE OF MEDICINE SPECIAL SEMINAR
Friday, May 11, 8:00 - 9:00am
College of Medicine, room 2117

Wednesday, May 9, 2007

University of Arizona Bio5 Librarians

Events and ideas of interest to the group.