kids encyclopedia robot

Simplified molecular-input line-entry system facts for kids

Kids Encyclopedia Facts
Quick facts for kids
SMILES
Filename extension
.smi
Internet media type
chemical/x-daylight-smiles
Type of format chemical file format
SMILES
SMILES generation algorithm for ciprofloxacin: break cycles, then write as branches off a main backbone

The Simplified Molecular-Input Line-Entry System (SMILES) is a special way to write down the structure of chemicals. It uses short text codes to describe molecules. Think of it like a secret code for chemicals!

These SMILES codes can be put into most computer programs. The programs then turn the codes back into 2D pictures or 3D models of the molecules.

The first SMILES system was created in the 1980s. It has been updated and made better since then. In 2007, a new version called OpenSMILES was made. This version is open for everyone to use and improve.

What is SMILES?

SMILES is a way to describe how atoms are connected in a molecule. It uses a single line of text. This makes it easy for computers to understand and store chemical information.

It's like writing a recipe for a molecule. Each letter and symbol tells you something important.

How did SMILES start?

The first SMILES system was started by David Weininger in the 1980s. He worked at a lab in Duluth, Minnesota. The Environmental Protection Agency (EPA) helped pay for this early work.

Other groups, like Daylight Chemical Information Systems, later improved SMILES. In 2007, the Blue Obelisk group created "OpenSMILES." This version is free for anyone to use and change.

Other ways to write chemical structures in a line exist. These include WLN, ROSDAL, and SLN.

In 2006, another system called InChI was made. It is a standard way to show chemical formulas. SMILES is often easier for people to read than InChI. It also works with many computer programs.

Understanding SMILES Terms

The word SMILES can mean the system itself. It can also mean a single line of code for a molecule. Usually, you can tell what it means from how it's used.

Terms like "canonical" and "isomeric" can be a bit confusing. They describe different features of SMILES codes. They are not opposites.

What is Canonical SMILES?

Many different SMILES codes can describe the same molecule. For example, CCO, OCC, and C(O)C all mean ethanol.

To make things simpler, special computer rules were made. These rules create only one unique SMILES code for each molecule. This unique code is called the canonical SMILES.

These rules first turn the SMILES into a computer picture of the molecule. Then, another rule creates a unique SMILES code from that picture. Canonical SMILES helps organize molecules in databases. It makes sure each molecule has only one entry.

What is Isomeric SMILES?

SMILES can also show how atoms are arranged in 3D space. This includes things like how groups are placed around a central atom. It also shows the shape of double bonds.

These details cannot be shown by just listing connections. SMILES codes that include this 3D information are called isomeric SMILES. They help tell apart different forms of the same molecule.

How SMILES Works

SMILES uses a few simple rules to build its codes.

Atoms in SMILES

Atoms are shown by their usual element symbols. For example, [Au] means gold.

You don't need square brackets for common atoms like:

  • B, C, N, O, P, S.
  • F, Cl, Br, or I.
  • These atoms must have no charge.
  • They must have the normal number of hydrogen atoms attached.
  • They must be the most common isotopes.
  • They must not be chiral centers (special 3D points).

All other elements need brackets. Their charges and hydrogens must be shown. For example, water can be O or [OH2]. Hydrogen can also be written separately, like [H]O[H] for water.

If brackets are used, H is added if hydrogens are present. The number of hydrogens follows if more than one. A + means a positive charge, and - means a negative charge. For example, [NH4+] is ammonium. For more than one charge, you can use a number or repeat the sign. So, titanium(IV) can be [Ti+4] or [Ti++++].

Bonds in SMILES

Bonds connect atoms. They are shown using symbols like . - = # $ : / \.

  • Single bonds between simple atoms are usually just implied by placing atoms next to each other. For example, ethanol is usually CCO. You can write C-C-O, but it's not needed.
  • Double bonds use =, triple bonds use #, and quadruple bonds use $. For example, carbon dioxide (CO2) is O=C=O. Hydrogen cyanide (HCN) is C#N.
  • A . means two parts are not bonded. For example, sodium chloride in water is [Na+].[Cl-].
  • Aromatic bonds (special bonds in rings) can use :.
  • Single bonds next to double bonds can use / or \. These show the 3D shape (stereochemistry).

Rings in SMILES

Ring structures are shown by breaking the ring at some point. Then, numbers are added to show where the ring connects.

For example, cyclohexane is C1CCCCC1. The 1s show where the ring closes. For a second ring, you use 2. Decalin can be C1CCCC2C1CCCC2.

You can use any numbers for rings. You can even reuse numbers after a ring is closed. For example, bicyclohexyl is usually C1CCCCC1C2CCCCC2.

If a ring needs a two-digit number, use % before it. So, C%12 means ring number 12. You can also add a bond type (like =) before the ring number. For example, cyclopropene is often C1=CC1.

Aromaticity in SMILES

Aromatic rings, like benzene, can be written in a few ways:

  • Using alternating single and double bonds, like C1=CC=CC=C1.
  • Using the aromatic bond symbol :, like C:1:C:C:C:C:C1.
  • Most commonly, by using lowercase letters for the atoms (b, c, n, o, p, s).

When using lowercase letters, bonds between aromatic atoms are assumed to be aromatic. So, benzene is c1ccccc1. Pyridine is n1ccccc1.

If an aromatic nitrogen has a hydrogen, it's written as [nH]. So, imidazole is n1c[nH]cc1.

If aromatic atoms are connected by a single bond, like in biphenyl, the single bond must be shown: c1ccccc1-c2ccccc2.

3-cyanoanisole SMILES
Visualization of 3-cyanoanisole as COc(c1)cccc1C#N.

Branching in SMILES

Branches are parts of a molecule that stick off the main chain. They are shown with parentheses (). For example, propionic acid is CCC(=O)O. Fluoroform is FC(F)F.

The atom inside the parentheses and the atom after the parentheses are both connected to the same branch point. The bond symbol must be inside the parentheses.

You can write branches in any order. For example, bromochlorodifluoromethane can be FC(Br)(Cl)F or BrC(F)(F)Cl. It's usually easiest to read if the simpler branch comes first.

Ring-closing bonds do not need parentheses. This can make SMILES codes shorter. For example, toluene is usually Cc1ccccc1, not c1cc(C)ccc1.

Stereochemistry in SMILES

Trans-1,2-difluoroethylene
trans-1,2-difluoroethylene

SMILES can show the 3D arrangement of atoms. This is called stereochemistry.

For double bonds, / and \ show the direction of single bonds next to the double bond. For example, F/C=C/F shows trans-1,2-difluoroethylene. Here, the fluorine atoms are on opposite sides of the double bond. F/C=C\F shows cis-1,2-difluoroethylene. Here, the fluorines are on the same side.

These symbols always come in groups of at least two. The first symbol can be anything. So, F\C=C\F is the same as F/C=C/F.

L-Alanin - L-Alanine
L-Alanine

For tetrahedral carbon atoms (atoms with four different groups attached), @ or @@ are used. Imagine looking at the central carbon from the first bond. The other three groups can be arranged clockwise or counter-clockwise.

  • @@ means clockwise.
  • @ means counter-clockwise.

For example, the amino acid alanine can be N[CH](C)C(=O)O. The common form, L-Alanine, is N[C@@H](C)C(=O)O. If you look from the nitrogen-carbon bond, the hydrogen, methyl, and carboxylate groups appear clockwise.

The order of branches matters for stereochemistry. If you swap two groups, you must change the @ or @@ symbol.

Isotopes in SMILES

Isotopes are atoms of the same element with different numbers of neutrons. In SMILES, they are shown with a number before the atom symbol. This number is the mass of the isotope. For example, carbon-14 in benzene is [14c]1ccccc1. Deuterochloroform is [2H]C(Cl)(Cl)Cl.

SMILES Examples

Molecule Structure SMILES formula
Dinitrogen N≡N N#N
Methyl isocyanate (MIC) CH3−N=C=O CN=C=O
Copper(II) sulfate Cu2+SO2−
4
[Cu+2].[O-]S(=O)(=O)[O-]
Vanillin Molecular structure of vanillin O=Cc1ccc(O)c(OC)c1
COc1cc(C=O)ccc1O
Melatonin (C13H16N2O2) Molecular structure of melatonin CC(=O)NCCC1=CNc2c1cc(OC)cc2
CC(=O)NCCc1c[nH]c2ccc(OC)cc12
Nicotine (C10H14N2) Molecular structure of nicotine CN1CCC[C@H]1c2cccnc2
Glucose (β-D-glucopyranose) (C6H12O6) Molecular structure of glucopyranose OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@H](O)1

For molecules with more than 9 rings, like cephalostatin-1 (a large molecule from the Indian Ocean):

Cephalostatine-1
Molecular structure of cephalostatin-1

Here is its SMILES code:

CC(C)(O1)C[C@@H](O)[C@@]1(O2)[C@@H](C)[C@@H]3CC=C4[C@]3(C2)C(=O)C[C@H]5[C@H]4CC[C@@H](C6)[C@]5(C)Cc(n7)c6nc(C[C@@]89(C))c7C[C@@H]8CC[C@@H]%10[C@@H]9C[C@@H](O)[C@@]%11(C)C%10=C[C@H](O%12)[C@]%11(O)[C@H](C)[C@]%12(O%13)[C@H](O)C[C@@]%13(C)CO

Notice the % sign before numbers like %10. This is used for ring numbers greater than 9.

SMILES Extensions

SMARTS

SMARTS is like an advanced version of SMILES. It helps find specific patterns within molecules. It uses many SMILES symbols but also has "wildcard" symbols. These wildcards can stand for any atom or bond. This helps search for parts of molecules in databases.

When you search using SMARTS, the SMILES and SMARTS codes are first turned into computer pictures. Then, the computer looks for matching parts in these pictures.

SMIRKS

SMIRKS is a way to describe chemical reactions. It shows how molecules change during a reaction. The basic way to write it is REACTANT>AGENT>PRODUCT. You can leave parts blank or list many molecules separated by a dot ..

BigSMILES

SMILES works well for small molecules. But many materials are very large molecules called macromolecules. These are too big to easily write SMILES for. BigSMILES is a new version of SMILES that aims to describe these huge molecules more easily.

Converting SMILES

SMILES codes can be turned back into 2D pictures using special computer rules. Sometimes, this conversion can have a few different results. To get 3D models, computers use methods that find the most stable shape of the molecule.

Many free programs and websites can convert SMILES codes for you.

See also

Kids robot.svg In Spanish: SMILES para niños

  • SMILES arbitrary target specification (SMARTS), an extension of SMILES for finding patterns in molecules
  • International Chemical Identifier (InChI), another standard way to describe chemicals
  • Chemistry Development Kit, software for 2D layout and converting SMILES
  • OpenBabel, JOELib, OELib (software for converting chemical formats)
kids search engine
Simplified molecular-input line-entry system Facts for Kids. Kiddle Encyclopedia.