Tuesday, May 15, 2007

NCBI: Microbial Genome Notes - morning

Hand-out
ftp://ftp.ncbi.nih.gov/pub/minicourses/CURRENT/HO/PDF/Microbial.pdf
Primary page
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
NCBI -- http://www.ncbi.nlm.nih.gov/ -- Instructors
Susan Dombrowski / Wayne ....
Help
blast-help@ncbi.nlm
301-496-2475

Outline
Entrez / pubmed & MynNCBI / BLAST / spec resources
Use
Entrez >> subject / text
BLAST >> sequence / pcr reaction -- http://www.ncbi.nlm.nih.gov/BLAST/
Entrez
PubMed / taxonomy (lineage) / VAST / BLAST / Phylogeny
interlinked - related sequences btw db
neighboring - inside a db
30 db -- new Protein Clusters (UniGene for protein)
Groups -- nucleotides / pubmed / taxonomy / proteins
-- Search _ all[filter]_ = current content of db
Types of db
primary db (orig data, controlled by submitter)
derived db (blt from primary, controlled by ncbi)
Searching
Help manual in entrez books - http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpbook.TOC
All db / cross db -- http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
default = search everywhere
Tabs = limit (fields, preview index)
[orgn] = organism field
entry Number = code
XM = experimental
icon -
MyNCBI
SDIs / "google alerts"
BLAST - http://www.ncbi.nlm.nih.gov/BLAST/
"basic local alignment search tool"
- very basic BLAST! >> no scoring, no statistics
- local, isolated, surprising reguions of similarity
- breaks seq into words
- hits become "seed" for alignment extension
- search algorithms .. smith-waterman - "local alignment"
- global alignment - end to end for pockets of similarity
- local alignment - matches letter by letter, not sequence by sequence (?) .... http://en.wikipedia.org/wiki/Local_alignment
- word size in nuc = exact match
- word size in proteins -- flexible matching (exact or "neighbohood words" - amino acid residues that are similar / biochem equiv) -
NR = non redundant protein = default db
LIMITS - organism
To exclude = _all[filter] NOT mammals_
Algorithm parameters - higher Expect Value, lower stringency
short sequence strategy ....
long sequence strategy (eval high
Precomputed services
- nuc or prot = related sequences
- blast link = Blink (like related sequences)
- transcript clusters = UniGene
- protein homologs = Homologene
DBS
Nucleotide
-- Links / Related == can Sort Related Sequences - chose how
Genbank - No protein sequences
Protein >> BLink = related prot seques
Genome >> book links, many seque pub as texts
-- shotgun: http://www.ncbi.nlm.nih.gov/Genbank/wgs.html
-- browse genome trees

Bits
GC=guanine-cytosine content (GC-content) is a characteristic of the genome of any given organism -- http://en.wikipedia.org/wiki/GC-content

Ideas
MyNCBI set up for Bio5 researchers
(Elvis lives .. short sequence stragey)

No comments: