History
During the development of OSCAR the need to have a program to convert identified chemical names to connection tables arose. Due to the absence of any open source efforts with broad coverage of organic nomenclature work was started by Peter Murray-Rust and Joe Townsend on such a program. This work was continued by Peter Corbett cumulating in the creation of a system broadly similar to the current incarnation (Corbett and Murray-Rust 2006). In 2008 Daniel Lowe took over development of the project as part of his PhD during which time the range of nomenclature supported as been expanded substantially, along with improvements to the parser to cope with the complexity of the grammar of chemical names.(Lowe et al. 2011)
Examples of Supported Nomenclature
| Nomenclature | Examples |
|---|---|
| Alk/ane/ene/yne | hexane hex-1-ene hex-1-yne |
| Heteroatom chains | tetrasilane tetrasiloxane |
| Cyclised chains | cyclohexane cyclotriborazane |
| Trivial acids and derivatives | maleic acid maleamic acid maleamide maleimide |
| Hantzsch-Widman rings | 1,3-oxazole |
| Spiro compounds |
spiro[4.5]decane pentaspiro[2.0.24.1.1.210.0.213.18.23]octadecane 1H,1'H-2,2'-spirobi[naphthalene] 2λ6,2',2''-spiroter[[1,3,2]benzodioxathiole] 1'H,1''H,2H,8'H-1,2':7',2''-dispiroter[naphthalen]-1'-one spiro[1,2-benzodithiole-3,2'-[1,3]benzodithiole] |
| von Baeyer systems | pentacyclo[13.7.4.33,8.018,20.113,28]triacontane |
| Hydro/dehydro | 2,3-dihydropyridine 1,2-didehydrobenzene |
| Indicated hydrogen | 1H-benzoimidazole phosphinin-2(1H)-one |
| Heteroatom replacement |
3-aza-pentane 3-azonia-pentane 3-azanylia-pentane 3-azanida-pentane 3-azanuida-pentane |
| Specification of charge: ium/ide/ylium/uide | azanium boranuide |
| Multiplicative nomenclature |
ethylenediaminetetraacetic acid 3,3'-[ethane-1,2-diylbis(oxy)]bis{4-[2-(furan-3-yloxy)ethoxy]furan} |
| Conjunctive nomenclature | 1,3,5-benzenetriacetic acid |
| Fused ring systems |
imidazo[4,5-d]pyridine phenothiazino[3',4':5,6][1,4]oxazino[2,3-i]benzo[5,6][1,4]thiazino[3,2-c]phenoxazine phenanthro[4,5-bcd:1,2-c']difuran |
| Simple bridges |
2,3-methanoindene 3,4-methylenedioxy-β-methoxyphenethylamine 3,4-epoxy-3,4-dihydrophenanthrene |
| Ring assemblies | biphenyl 2,2':6',2''-terpyridine |
| Prefix functional replacement | peroxybenzoic acid |
| Infix functional replacement | benzoperoxoic acid |
| Lambda convention | λ5-phosphane |
| Radicofunctional nomenclature | acids, acetals, alcohols, amides, anhydrides, azides, bromides, chlorides, cyanates, cyanides, esters, di/tri/tetra esters, ethers, fluorides, fulminates, glycols, glycol ethers, hemiacetals, hemiketal, hydrazones, hydroperoxides, hydrazides, imides, iodides, isocyanates, isocyanides, isoselenocyanates, isothiocyanates, ketals, ketones, lactams, lactims, lactones, selenocyanates, thiocyanates, selenols, thiols, mercaptans, oxides, oximes, peroxides, selenides, selenones, selenoxides, selones, selenoketones, selenosemicarbazones, semicarbazones, sulfides, sulfones, sulfoxides, sultams, sultims, sultines, sultones, tellurides, telluroketones, tellurosemicarbazones, tellurones, telluroxides, thioketones and thiosemicarbazones |
| Amino Acids and derivatives |
glycinol L-alanyl-L-glutaminyl-L-arginyl-O-phosphono-L-seryl-L-alanyl-L-proline |
| Nucleosides, nucleotides and their esters | adenosine 5'-(tetrahydrogen triphosphate) |
| Steroids including alpha/beta stereochemistry | (3β)-cholest-5-en-3-ol |
| Open-chain monosaccharides | 4-amino-4,6-dideoxy-3-C-methyl-2-O-methyl-L-mannose |
| Simple cyclised monosaccharides | β-D-ribofuranose |
| Basic inorganic support |
aluminium(3+) chloride mercury(II) chloride |
| R/S stereochemistry | (1R,3S)-3-amino-3-methylcyclohexanecarboxylic acid |
| E/Z stereochemistry | (2Z)-but-2-ene |
| cis/trans indicating relative stereochemistry on rings | cis-1,4-dimethylcyclohexane |
| Structure-based polymer names | poly(2,2'-diamino-5-hexadecylbiphenyl-3,3'-diyl) |
References
[1] Peter Corbett and Peter Murray-Rust. High-Throughput Identification of Chemistry in Life Science Texts. Lecture Notes in Computer Science. 2006, 4216, pp107-118. DOI: 10.1007/11875741_11
[2] Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, Robert C. Glen. Chemical Name to Structure: OPSIN, an Open Source Solution. Journal of Chemical Information and Modeling. 2011, 51 (3), 739-753 DOI: 10.1021/ci100384d
Libraries
OPSIN utilises dk.brics.automaton to provide a discrete finite state automaton to allow the parsing of chemical names. XOM is used as an XML framework for reading in resource files and for holding information about how a name was parsed. JNI-InChI is used to generate InChIs. Additionally for testing JUnit and Mockito are employed. The web interface is powered by Restlet with the Indigo toolkit being used for 2D coordinate generation and depiction.
License and Warranty
OPSIN is licensed under the Artistic License v2.0
OPSIN is made available in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

