Protein structure prediction facts for kids

Kids Encyclopedia Facts

Protein structure prediction is like solving a puzzle to figure out the three-dimensional (3D) shape of a protein. Imagine a long chain of beads; these beads are called amino acids. Proteins are made from these chains. The way this chain folds up into a specific 3D shape is super important for what the protein does.

Scientists want to predict this 3D shape just by knowing the order of the amino acids in the chain. This is a big challenge in computational biology. It's important for things like making new medicines or designing new enzymes (which are proteins that speed up chemical reactions).

Since 1994, there's been a special event called CASP (Critical Assessment of Structure Prediction). It happens every two years to check how well different methods predict protein structures. There's also a project called CAMEO3D that continuously checks how well online tools predict protein structures.

Understanding Protein Shapes
How Proteins Are Grouped
- Important Terms for Protein Groups
Predicting Secondary Structure
- How Predictions Got Better
Predicting Tertiary Structure (Overall 3D Shape)
Quaternary Structure: Proteins Working Together
Software for Prediction
- AI Methods: AlphaFold and Beyond
- Checking Automatic Prediction Servers
See also

Understanding Protein Shapes

Proteins are long chains of amino acids linked together. Think of amino acids as building blocks. The way these chains twist and turn gives proteins their unique 3D shapes. This twisting happens because parts of the chain can rotate.

Inside a protein, different parts of the chain can form special patterns. These patterns are called "secondary structures." The main types are alpha helices and beta sheets. In these patterns, parts of the protein chain connect with each other using tiny bonds called hydrogen bonds. This helps the protein fold up efficiently.

Alpha Helices: Spiral Shapes

An alpha-helix with hydrogen bonds (yellow dots)

The alpha-helix is a common spiral shape found in proteins. It's like a coiled spring.

Each turn of the spiral has about 3.6 amino acids.
Hydrogen bonds form between amino acids that are four positions apart, holding the spiral together.
Alpha helices often sit on the surface of proteins. The side of the helix facing inside the protein usually has amino acids that don't like water (hydrophobic). The side facing out, towards water, has amino acids that do like water (hydrophilic).
Certain amino acids, like alanine and leucine, are often found in alpha helices. Others, like proline, can break or bend a helix.

Beta Sheets: Flat, Pleated Shapes

Beta sheets are another common protein shape. They are like flat, pleated ribbons.

They form when different parts of the protein chain lie next to each other.
Hydrogen bonds connect these nearby parts of the chain.
The chains can run in the same direction (parallel) or opposite directions (antiparallel).
It's harder to predict where beta sheets will form compared to alpha helices.

Loops: Flexible Connectors

Some parts of a protein don't form neat alpha helices or beta sheets. These parts are often called "loops."

Loops connect alpha helices and beta sheets.
They are usually found on the outside surface of proteins.
Because they are on the surface, changes to the amino acids in loops are often tolerated more easily.
Loops can also be important parts of a protein's "active site," where it does its job.

How Proteins Are Grouped

Proteins can be grouped based on how similar their shapes are or how similar their amino acid sequences are.

Shape Similarity: Scientists compare the 3D arrangements of alpha helices and beta sheets.
Sequence Similarity: This was the first way proteins were grouped. It looks at how similar the order of amino acids is.

It's interesting because two proteins with very different amino acid sequences might still fold into a similar shape. Also, a protein's sequence can change a lot over time (evolution), but its basic shape might stay the same.

Important Terms for Protein Groups

Here are some terms used to describe how proteins are related:

Active site: A special spot on a protein where it connects with other molecules to do its job. Proteins with different sequences can have the same active site.
Architecture: How the secondary structures (like helices and sheets) are arranged in a protein's 3D shape.
Fold (topology): A type of architecture where the connecting loops are also similar.
Class: A way to group protein parts (domains) based on what kind of secondary structures they have. For example, mainly-alpha, mainly-beta, or a mix of alpha and beta.
Core: The inner part of a folded protein, usually made of alpha helices and beta sheets. This part is often very stable.
Domain: A section of a protein chain that can fold into its own 3D shape, often doing a specific job. A protein can have several domains.
Family: A group of proteins that are very similar (more than 50% identical) in their amino acid sequence. They usually have similar jobs. If you know the structure of one protein in a family, you can often predict the structure of others.
Motif (sequence): A short, common pattern of amino acids found in different proteins. It often helps with a specific function.
Motif (structural): A small combination of secondary structures that fold into a specific 3D shape, like a helix connected to a loop and then another helix.
Primary structure: Just the simple, linear order of amino acids in a protein chain.
Secondary structure: The local shapes formed by the protein chain, like alpha helices and beta sheets.
Superfamily: A larger group of protein families that are distantly related, meaning they might have a common evolutionary origin but less sequence similarity than a family. They still share some basic structural features.
Tertiary structure: The overall 3D shape of a single protein chain, formed by how its secondary structures fold and pack together.
Quaternary structure: The 3D shape formed when two or more separate protein chains (subunits) come together to work as one big protein.

Predicting Secondary Structure

Constituent amino-acids can be analyzed to predict secondary, tertiary and quaternary protein structure.

Predicting secondary structure means trying to figure out which parts of a protein's amino acid sequence will form alpha helices, beta sheets, or turns. Scientists compare these predictions to known protein structures to see how accurate they are.

Early methods in the 1960s and 70s were about 60-65% accurate. Since the 1980s, computer programs using "artificial neural networks" (a type of machine learning) have been used. These programs learn from many known protein structures.

Modern methods are much better, reaching up to 80% accuracy. They often use information from many similar protein sequences at once. This high accuracy helps scientists predict the overall 3D shape of proteins.

How Predictions Got Better

Chou-Fasman method: One of the first methods. It looked at how often each amino acid appeared in different secondary structures. It was about 50-60% accurate.
GOR method: This method was more advanced. It considered not just individual amino acids but also their neighbors. It was about 65% accurate.
Machine Learning: A big leap forward came with using artificial neural networks. These programs learn patterns from huge databases of known protein structures. Programs like PSIPRED and JPRED are based on neural networks and are over 70% accurate.
Support Vector Machines (SVMs): Another machine learning technique that's good at finding turns in proteins.

Scientists are always trying to improve these predictions by adding more information, like how much of a protein is exposed to water.

Predicting Tertiary Structure (Overall 3D Shape)

Predicting the full 3D shape of a protein is very important today. We have tons of protein sequence data from projects like the Human Genome Project. But figuring out their actual 3D shapes in the lab (using methods like X-ray crystallography) is slow and expensive.

Predicting the 3D shape is super hard because there are so many possible ways a protein chain can fold. It's like trying to find one specific grain of sand on a huge beach!

Before Making the Model

Often, a protein is first split into smaller parts called "domains." Each domain is like a mini-protein that folds on its own. Then, the predicted shapes of these individual domains are put together to form the final 3D structure of the whole protein.

Ab initio Protein Modeling: From Scratch

Ab initio (meaning "from the beginning") methods try to build protein models without directly using existing structures. They try to figure out the shape based on the basic rules of physics and chemistry.

These methods need huge amounts of computer power.
Projects like Folding@home use many home computers working together to help with these complex calculations.
Even with supercomputers, it's still very hard to predict the structure of larger proteins from scratch.

Using Evolution to Predict 3D Contacts

Scientists noticed that if two amino acids in a protein are important for its function, they might change together over time (coevolve). If one changes, the other might change to keep the protein stable.

In 2011, a new method called EVfold showed that these coevolved amino acids could actually help predict the protein's 3D shape.
This method needs a lot of similar protein sequences (over 1,000) to work well.
It can even predict the shapes of proteins that cross cell membranes, which are very hard to study in the lab.

Comparative Protein Modeling: Using Known Shapes

Comparative protein modeling uses already known protein structures as starting points. This works because even though there are millions of different proteins, they tend to fold into a limited number of basic 3D shapes (around 2,000 different "folds").

These methods have two main types:

Homology modeling: This is based on the idea that if two proteins have similar amino acid sequences, they will likely have very similar 3D shapes. If you know the structure of one protein, you can use it as a template to predict the structure of a similar one.
Protein threading: This method takes a protein sequence and tries to "thread" it onto a database of known 3D structures. It checks how well the sequence fits each known shape to find the best match.

Modeling Side-Chain Shapes

After the main protein chain is folded, the "side chains" (parts of the amino acids that stick out) also need to find their best positions.

Scientists use "rotamer libraries" which are collections of common and stable shapes for these side chains.
Computer programs try to find the best combination of these side-chain shapes to make the protein as stable as possible.
This is especially important for the inner "core" of the protein, where side chains are tightly packed.

Quaternary Structure: Proteins Working Together

When two or more separate protein chains come together to form a complex, predicting their combined 3D shape is called "protein-protein docking." If you know the individual protein structures, you can use computer methods to figure out how they fit together.

Software for Prediction

There are many computer programs and tools for predicting protein structures. They use different methods like homology modeling, threading, and ab initio approaches.

Recently, "deep learning" (a type of AI) has been very successful.
Some well-known programs include I-TASSER, HHpred, and AlphaFold.
In 2021, AlphaFold was reported to be the best at predicting protein structures.

Knowing a protein's structure often helps us understand what it does. For example, collagen is a protein that forms long, fiber-like chains, which explains why it's important for strength in our bodies.

AI Methods: AlphaFold and Beyond

AlphaFold was one of the first artificial intelligence (AI) programs to predict protein structures. It was created by Google's DeepMind.

AlphaFold uses a special type of artificial neural network to directly predict the 3D coordinates of all atoms in a protein, just from its amino acid sequence.
It can make predictions very quickly, in minutes or hours, depending on the protein's size.

AlphaFold2, introduced in 2020, was even more accurate, predicting structures almost as well as experiments could. Other new AI tools like RoseTTAFold and OmegaFold have followed. These AI methods are now being used to help understand genomes (all the genetic material in an organism) by predicting the shapes of many proteins.

The European Bioinformatics Institute and DeepMind have even created a huge database called the AlphaFold - EBI database, which contains millions of predicted protein structures.

Checking Automatic Prediction Servers

The CASP experiment, held every two years since 1994, is a big test for protein structure prediction methods. It checks both human-made predictions and automated computer servers.
CAMEO3D is another project that continuously checks how well automated protein structure prediction servers work. It publishes results weekly on its website.