History
During the development of OSCAR the need to have a program to convert identified chemical names to connection tables arose. Due to the absence of any open source efforts with broad coverage of organic nomenclature work was started by Peter Murray-Rust and Joe Townsend on such a program. This work was continued by Peter Corbett cumulating in the creation of a system broadly similar to the current incarnation (Corbett and Murray-Rust 2006). In 2008 Daniel Lowe took over development of the project as part of his PhD during which time the range of nomenclature supported as been expanded substantially, along with improvements to the parser to cope with the complexity of the grammar of chemical names (Lowe et al. 2011). A comprehensive description of OPSIN, its algorithms and performance (as of mid 2012) is included in the PhD thesis (Lowe 2012). Development of OPSIN is still on-going with more nomenclature continuing to be added (Lowe et al. 2013).
Examples of Supported Nomenclature
Nomenclature | Examples |
---|---|
Alk/ane/ene/yne | hexane hex-1-ene hex-1-yne |
Heteroatom chains | tetrasilane tetrasiloxane |
Cyclised chains | cyclohexane cyclotriborazane |
Trivial acids and derivatives | maleic acid maleamic acid maleamide maleimide |
Hantzsch-Widman rings | 1,3-oxazole |
Spiro compounds |
spiro[4.5]decane pentaspiro[2.0.24.1.1.210.0.213.18.23]octadecane 1H,1'H-2,2'-spirobi[naphthalene] 2λ6,2',2''-spiroter[[1,3,2]benzodioxathiole] 1'H,1''H,2H,8'H-1,2':7',2''-dispiroter[naphthalen]-1'-one spiro[1,2-benzodithiole-3,2'-[1,3]benzodithiole] |
von Baeyer systems | pentacyclo[13.7.4.33,8.018,20.113,28]triacontane |
Hydro/dehydro | 2,3-dihydropyridine 1,2-didehydrobenzene |
Indicated hydrogen | 1H-benzoimidazole phosphinin-2(1H)-one |
Heteroatom replacement |
3-aza-pentane 3-azonia-pentane 3-azanylia-pentane 3-azanida-pentane 3-azanuida-pentane |
Specification of charge: ium/ide/ylium/uide | azanium boranuide |
Multiplicative nomenclature |
ethylenediaminetetraacetic acid 3,3'-[ethane-1,2-diylbis(oxy)]bis{4-[2-(furan-3-yloxy)ethoxy]furan} |
Conjunctive nomenclature | 1,3,5-benzenetriacetic acid |
Fused ring systems |
imidazo[4,5-d]pyridine phenothiazino[3',4':5,6][1,4]oxazino[2,3-i]benzo[5,6][1,4]thiazino[3,2-c]phenoxazine phenanthro[4,5-bcd:1,2-c']difuran |
Simple bridges |
2,3-methanoindene 3,4-methylenedioxy-β-methoxyphenethylamine 3,4-epoxy-3,4-dihydrophenanthrene |
Ring assemblies | biphenyl 2,2':6',2''-terpyridine |
Prefix functional replacement | peroxybenzoic acid |
Infix functional replacement | benzoperoxoic acid |
Lambda convention | λ5-phosphane |
Perhalogenation | perchloro-3,4-dimethylenecyclobutene |
Radicofunctional nomenclature | acetals, acids, alcohols, amides, anhydrides, anilides, azetidides, azides, bromides, chlorides, cyanates, cyanides, esters, di/tri/tetra esters, ethers, fluorides, fulminates, glycol ethers, glycols, hemiacetals, hemiketal, hydrazides, hydrazones, hydrides, hydroperoxides, hydroxides, imides, iodides, isocyanates, isocyanides, isoselenocyanates, isothiocyanates, ketals, ketones, lactams, lactims, lactones, mercaptans, morpholides, oxides, oximes, peroxides, piperazides, piperidides, pyrrolidides, selenides, selenocyanates, selenoketones, selenolsselenosemicarbazones, selenones, selenoxides, selones, semicarbazones, sulfides, sulfones, sulfoxides, sultams, sultims, sultines, sultones, tellurides, telluroketones, tellurones, tellurosemicarbazones, telluroxides, thiocyanates, thioketones, thiols and thiosemicarbazones |
Amino Acids and derivatives |
glycinol L-2-aminobutyric acid L-alanyl-L-glutaminyl-L-arginyl-O-phosphono-L-seryl-L-alanyl-L-proline |
Nucleosides, nucleotides and their esters | adenosine 5'-(tetrahydrogen triphosphate) |
Steroids including alpha/beta stereochemistry | (3β)-cholest-5-en-3-ol |
Open-chain monosaccharides | 4-amino-4,6-dideoxy-3-C-methyl-2-O-methyl-L-mannose |
Cyclised monosaccharides and glycosides |
β-D-ribofuranose β-D-fructofuranosyl α-D-glucopyranosyl-(1->4)-α-D-glucopyranoside |
Deoxy and anhydro | 5-acetamido-2,7-anhydro-3,5-dideoxy-D-glycero-α-D-galacto-non-2-ulopyranosonic acid |
Basic inorganic support |
aluminium(3+) chloride mercury(II) chloride |
Isotope specification |
(2H6)propan-2-one acetone-d6 hexadeuterioacetone |
R/S stereochemistry | (1R,3S)-3-amino-3-methylcyclohexanecarboxylic acid |
E/Z stereochemistry | (2Z)-but-2-ene |
cis/trans indicating relative stereochemistry on rings | cis-1,4-dimethylcyclohexane |
Structure-based polymer names | poly(2,2'-diamino-5-hexadecylbiphenyl-3,3'-diyl) |
References
[1] Peter Corbett and Peter Murray-Rust. High-Throughput Identification of Chemistry in Life Science Texts. Lecture Notes in Computer Science. 2006, 4216, pp107-118. DOI: 10.1007/11875741_11
[2] Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, Robert C. Glen. Chemical Name to Structure: OPSIN, an Open Source Solution. Journal of Chemical Information and Modeling. 2011, 51 (3), 739-753 DOI: 10.1021/ci100384d
[3] Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012. Available from Apollo.
[4] Daniel M. Lowe, Peter Murray-Rust, Robert C. Glen. OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature. 6th Joint Sheffield Conference on Chemoinformatics, 2013. Available from here.
Libraries
OPSIN utilises dk.brics.automaton to provide a discrete finite state automaton to allow the parsing of chemical names. Woodstox is used as an XML framework for reading in resource files and writing CML. JNI-InChI is used to generate InChIs. Additionally for testing JUnit and Mockito are employed. The web interface is powered by Restlet with the Indigo toolkit being used for 2D coordinate generation and depiction.
OPSIN's developers use YourKit to profile and optimise code.
YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.
License and Warranty
OPSIN is licensed under the MIT License
OPSIN is made available in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.