Thursday, May 24, 2007

NCBI course - Principles of PubChem

http://www.ncbi.nlm.nih.gov/Class/PubChem/course.html
ftp://ftp.ncbi.nih.gov/pub/PowerTools/PubChem/Docs/handout.pdf
http://www.ncbi.nlm.nih.gov
http://pubchem.ncbi.nlm.nih.gov
http://www.mli.nih.gov/mlsmr/index.php

What is PubChem?
A public repository of electronic representations of small molecules and associated bioactivity assay data
- new program -- link chem informatics to bio-informatics
- A component of the NIH Molecular Libraries RoadMap
- Part of the NCBI Entrez search and linking system
- A system of four components: molecular libraries
--PubChem Substance DB
--PubChem Compound DB
--PubChem BioAssay DB
--PubChem Structure Search / tool like blast / vast

http://nihroadmap.nih.gov/ --> Grants
compund repository (MLSMR)
molecular libraries small molecule repository
molec lib screening center netwk MLSCN
predictive ADMET

The National Center for Biotechnology Information
What does NCBI do?
Accepts submissions of primary data.
Develops tools to analyze these data.
Uses these tools to create derivative databases based on the primary data.
Provides free search, linking, and retrieval of data, mainly through the Entrez system.

entrez - text / seq - blast / protein stru - vast / sm molec struc - pubchem

pubchem Types of Databases
=Primary Databases
Original submissions by experimentalists
Content controlled by the submitter
Examples: GenBank, SNP, GEO, PubChem Substance and BioAssay
=Derivative Databases
Built from primary data
Content controlled by third party (NCBI)
Examples: RefSeq, RefSNP, GDS, PubChem Compound

PubChem Databases
substance = real chemicals / non redundant
bioassay = experimental

PC Substance Record
structure display / subID = sid + [compund id=cid] / link to depositor / chem nomenclature
? (iupac names from ncbi)

Non-uniformity in PC Substance - diff ways to draw a chemical
The Bizarre / non-standard in PC (pubchem) Substance (chamomile tea, grapefruit)

PubChem Compound

Standardize Structures
Verify Chemical Data
Atom description (label, element)
Functional group clean-up
Atom valence verification to prevent non-sense structures
“Normalize” and “Standardize”
Valence-Bond canonicalize (for Tautomer invariance)
Aromaticity detection and self- consistency
Stereochemistry detection
Explicit hydrogen assignment
Structural Representations
2D Coordinate generation
Images created
Structures that fail to standardize…
Have no records in PC Compound
Cannot be searched by structure

Stereoisomers in PC Compound (chiral sugars)

PubChem Compound continued
- Calculate Properties and Links
Nomenclature
IUPAC http://www.iupac.org/
SMILES & SMARTS
InChI
- Structural Information
Calculate & store “Fingerprints”
Calculate & link to similar structures (90% level)
- Physical Properties
Molecular Formula
Molecular Weight
Number of H-bonds donor/acceptor sites
XLogP value
Lipinski value (bioavailability)
Number of Rotatable bonds
- Links to NCBI Database Records
Structures (MMDB records) http://130.14.29.110/Structure/MMDB/mmdb.shtml
Protein sequences (from Structure links)
Genes (from Protein links)
- Links to MeSH Terms through IUPAC name
("believe it or not, but people read every article and assign mesh to them" ... :-)

PC Compound Record - all the data, most complete
1 CID / bioactivity / links to substances
2 MeSH Links - use pubchem to do chem medline searches!
3 Calculated Properties
vendors / descriptors

Handling Mixtures
SID / CID / links to unique components & their cid's

PC BioAssay Record
Tables - active etc / overlap made non redundant

BioAssay Protocol
methods / procedures - no std's / text explanations & links to web

PubChem integration in Entrez
-What is Entrez?
-System of 31 linked databases
-Text search engine
-Tool for finding biologically linked data
-Data retrieval engine
-Virtual workspace for manipulating large datasets
-Free public access

Entrez review
Fields
chemical[synonym] (all)
chemical[completesynonym] (exact)
Atom abrev[element]
[sourcename]
[filter] = structure, rules
lipinski[filter]
[pharmaction] = mesh

pubchem Search page
Entrez Limits page, very diff from medline
Details
Preview/Index
Entrez History

Display - Downloading Reports
- property report - one line
- BULK -- pubchen download - long records, goes to a URL for a week

Linking in Entrez
- hard = biol / chem
- soft = computed, algorhithm links

PubChem Links
related struct / assays / literature (pmc = free) / other entrez db

Linking in Bulk
Use DISPLAY link for list --> "pubchem bioassays" ...

The PubChem FTP Site
ftp://ftp.ncbi.nih.gov/pubchem

Programming Tools

PubChem Help -- excellent

No comments: