notes-biotech
book-body
Contents
local
DeoxyRiboNucleic Acid. Contains Sequence of chromosomes.
"Phenotype" simply refers to an observable trait. "Pheno" simply means "observe" and comes from the same root as the word "phenomenon".
A person's genotype is their unique sequence of DNA (or genes). More specifically, this term is used to refer to the two alleles a person has inherited for a particular gene.
Phenotype is the detectable expression of this genotype. e.g. Black hair or Purple flower.
In biology, homologous chromosomes are paired chromosomes. They essentially have the same: gene sequence, loci (gene position), centromere location, and chromosomal length.
Normal cell division where diploid somatic (body, hair, etc) cells duplicates itself into two copies of diploid cells. Both parent and daugther cells contains 2*23 chromosomes.
Meiosis is a special type of cell division of germ cells (not normal somatic cells!) in sexually-reproducing organisms that produces the gametes, such as sperm or egg cells. The 46 chromosomes in source produces 23 chromosomes in child cells.
This happens in 2 stages: * In first stage, a cell with 2*23 chromosomes split into two cells of 23 chromosomes each. * The second phase is like Mitosis where it duplicates the cells. * You started with single diploid germ cell and ended up with producing 4 haploids!
An aromatic molecule or compound is one that has special stability and properties due to a closed loop of electrons. Not all molecules with ring (loop) structures are aromatic.
The orbitals are numbered as: 1s 2s 2p(x) 2p(y) 2p(z) 3s 3p*, etc:
Orbital Max Electrons
s 2
p 6
d 10
f 14
g 18
h 22
i 26
See Auf Bau Diagram on how to fill the orbitals:
Shell Capacity of Shell
1s 2
2s 2p 8
3s 3p 3d 18
4s 4p 4d 4f 32
5s 5p 5d 5f 5g 50
6s 6p 6d 6f 6g 6h 72
7s 7p 7d 7f 7g 7h 7i 98
Tissue is a group of cells that have similar structure and that function together as a unit. A nonliving material, fills the spaces between the cells. The amount of space varies. Human tissue is described as an organ, or part of a human body or any substance extracted from a human body. Tissue is a group or layer of cells that work together to perform a specific function:
Tissue Secretion
-------------------------------------------------
Thyroid thyroid hormones
Breast milk
Salivary gland saliva
Tear ducts tears
Exocrine pancreas digestive enzymes
Islets of Langerhans insulin, glucagon
Stomach epithelium acid, intrinsic factor
Tissues respond to a variety of stimuli. For example, the thyroid responds to thyroid-stimulating hormone (TSH) released by the pituitary gland, which promotes the division of thyroid epithelial cells and the release of thyroid hormones.
Discovering disease pathways, which can be defined as sets of proteins associated with a given disease, is an important problem that has the potential to provide clinically actionable insights for disease diagnosis, prognosis, and treatment.
Cytokines are a broad and loose category of small proteins (~5–25 kDa[1]) important in cell signaling. Cytokines are peptides and cannot cross the lipid bilayer of cells to enter the cytoplasm.
Information Extraction aka Name Entity recognition
Automatic QA system.
Text Summarization
Text generation: understand a database and write human readable text analysis.
Machine Translation
Sentiment Analysis
Morphologya(stem words), syntax, semantics, pragmatics (context) and discourse (paragraphs).
Typical sentences read:
(Gene|Protein) (MolecularFunction) (Gene| (MolecularFunction) (Gene|)Protein), For Example: Pax-3 Gene activated Myodl Gene.
The TOP10 popularly studied genes includes TP53, TNF, EGFR, VEGFA, APOE, IL6, TGFB1, MTHFR, ESR1, AKT1.
Each chromosome pair has the same genes. Sometimes there are slight variations of these genes. These variations occur in less than 1% of the DNA sequence. The genes that have these variations are called alleles. If one out of two genes (alleles) is abnormal, if it is recessive, no disease develops. If both are abnormal or if the abnormal gene is dominant, then disease develops.
Almost all diseases have a genetic component! This can be classified as:
Genes which are not well understood are called dark genes.
Pathogenesis is the process by which an infection leads to disease. Pathogenic mechanisms of viral disease include (1) implantation of virus at the portal of entry, (2) local replication, (3) spread to target organs (disease sites), and (4) spread to sites of shedding of virus into the environment.
Types of pathogenesis include microbial infection, inflammation, malignancy and tissue breakdown. For example, bacterial pathogenesis is the process by which bacteria cause infectious illness.
A diagnosis is an identification of a disease via examination. What follows is a prognosis, which is a prediction of the course of the disease as well as the treatment and results.
A drug target is a body protein (functional biomolecules), that is intrinsically associated with a particular disease process and that could be addressed by a drug (biologically active compounds) to produce a desired therapeutic effect.
Various kinds of drug targets are:
1. G-Proteins: Also known as guanine nucleotide-binding proteins, act as molecular switches inside
cells, involved in transmitting signals from a variety of stimuli
outside a cell to its interior. G-protein coupled receptors (GPCRs),
otherwise known as G-proteins, are a diverse family of receptors found
in a huge range of tissues throughout the body. Accounts 5% of Genome.
45% of all our drugs target the G-Proteins. Potential G-Protiens are
around 5000 but only around 400 distinct proteins are targeted now.
Note: Ligands that bind to and activate nuclear receptors include lipophilic substances such as endogenous hormones, vitamins A and D, and xenobiotic hormones. Because the expression of a large number of genes is regulated by nuclear receptors, ligands that activate these receptors can have profound effects on the organism.
A cell line is a set of cells grown in a laboratory from a single plant or animal cell. Cell lines can either be based on primary cells – for example muscle or fat cells – or on stem cells. The work involves growing stem cell lines and examining their surface molecules.
The branches of science known informally as omics are various disciplines in biology whose names end in the suffix -omics, such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics.
Pharos is the user interface to the Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) program funded by the National Institutes of Health (NIH) Common Fund. (Grant No. 1U24CA224370-01). The goal of KMC is to develop a comprehensive, integrated knowledge-base for the Druggable Genome (DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on three of the most commonly drug-targeted protein families:
The Pharos interface provides facile access to most data types collected by the KMC. Given the complexity of the data surrounding any target, efficient and intuitive visualization has been a high priority, to enable users to quickly navigate & summarize search results and rapidly identify patterns. A critical feature of the interface is the ability to perform flexible search and subsequent drill down of search results. Underlying the interface is a GraphQL API that provides programmatic access to all KMC data, allowing for easy consumption in user applications.
The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences.
Numerous sources for disease definitions and data models currently exist, which include HPO, OMIM, SNOMED CT, ICD, PhenoDB, MedDRA, MedGen, ORDO, DO, GARD, etc But they are not accurate. UMLS helps but not precise.
Mondo’s development is coordinated with the Human Phenotype Ontology (HPO), which describes the individual phenotypic features that constitute a disease. Like the HPO, Mondo provides a hierarchical structure which can be used for classification or “rolling up” diseases to higher level groupings. It provides mappings to other disease resources, but in contrast to other mappings between ontologies, we precisely annotate each mapping using strict semantics, so that we know when two disease names or identifiers are equivalent or one-to-one, in contrast to simply being closely related
The Ontology Lookup Service (OLS) is a repository for biomedical ontologies.
OLS is developed and maintained by the Samples, Phenotypes and Ontologies Team (SPOT) at EMBL-EBI.
Example Diseases to understand ontology:
See https://rarediseases.org/ This is NORD: National Organization for Rare Disorders
See Also: Orphanet Rare disease Ontology http://www.orphadata.org/cgi-bin/index.php#ontologies
Autoimmune encephalitis
Online Mendelian Inheritance in Man
gtex homologene expression lincs
protein
SELECT DISTINCT sym AS symbol, id AS protein_id FROM protein WHERE sym
in ... (Fetch protein Id for symbol)
phenotype
mpo (mouse phenotype)
pathway
"SELECT DISTINCT id_in_source AS reactome_id, name FROM pathway WHERE
pwtype = 'Reactome' SELECT DISTINCT SUBSTR(id_in_source, 6) AS
kegg_pathway_id, name AS kegg_pathway_name FROM pathway WHERE " "pwtype
= 'KEGG'
SELECT DISTINCT
clinvar.protein_id
FROM
clinvar
JOIN
clinvar_phenotype ON clinvar.clinvar_phenotype_id = clinvar_phenotype.id
JOIN
clinvar_phenotype_xref ON clinvar_phenotype.id =
clinvar_phenotype_xref.clinvar_phenotype_id
WHERE
clinvar_phenotype_xref.source = 'OMIM' AND clinvar.clinical_significance
!= 'Uncertain significance'
GA/kNN identified 10 genes (ANTXR2, STK3, PDK4, CD163, MAL, GRAP, ID3, CTSZ, KIF1B and PLXDC2) whose coordinate pattern of expression was able to identify 98.4% of discovery cohort subjects correctly (97.4% sensitive, 100% specific)
Consider 100 people appearing for interview for 10 posts. Let us say, there are top 10 really good candidates. Interview process identifies 7 good candidates and 3 bad candidates. What is sensitivity, specificity and Accuracy ?
Sensitivity = True Positive / True Positive + False Negative
= Correctly Selected / Correctly Selected + Mistakenly Rejected
= Correctly Selected/ Total Excellent candidates
= 7 / 10 = 70% sensitivity.
Specificity = True Negative/ True Negative + False Positive
= Correctly Rejected / Correctly Rejected + Mistakenly Selected
= Correctly Rejected / Total poor candidates
= 87 / 90 = 96.66 % Specificity
Note: With low sensitivity, and high specificity, we "justify" that
there is lot of negatives and we are still able to reject most of them.
Accuracy = (True Positive + True Negative)/
(True Positive+ False Positive + True Negative + False Negative)
= (Correctly Selected + Correctly Rejected)/
(Correctly Selected + Mistakenly Selected + Correctly Rejected
+ Mistakenly Rejected)
= (Correctly Selected + Correctly Rejected)/ Total Candidates.
= (7 + 87) / 100 = 94.0 %
Notes:
For above examples, consider various cases :
Case Sensitivity Specificiy Accuracy
Reject All 0% 100% 90%
Select All Bad 0%