The Molecular Biology of the E484K Mutation

The molecular basis of how E484K is an immune escape mutation

May 10, 2021

THE MOLECULAR BIOLOGY OF THE E484K MUTATION

10 May 2021/Monday/300pm CDT

“Molecular biology is concerned particularly with the forms of biological molecules and with the evolution, exploitation and ramification of these forms in the ascent to higher and higher levels of organization. Molecular biology is predominantly three-dimensional and structural—which does not mean, however, that it is merely a refinement of morphology. It must at the same time inquire into genesis and function.”

-William Thomas Astbury, 1951

High resolution version of today’s pandemic graphic: https://drive.google.com/file/d/1TTq5We47xlrpd3WLmThSsBpfaG9r7R1_/view?usp=sharing

BACKGROUND

Each time the virus infects a new person, it's a prime opportunity to mutate. It's like pulling the handle on a slot machine. More than likely the odds are there's no payout for the virus, but the chance exists. Infecting thousands of people each day is like the virus getting to pull a slot machine handle thousands of times each day. The chance of payout for the virus increases. When the virus gets a payout from that metaphorical slot machine, a new variant emerges.

Since the start of this pandemic, COVID has had several “big payouts” on the mutation slot machine.

The first one was a mutation designated D614G that arose a year ago. This mutation had the effect in increasing the contagiousness of the virus by making the spike protein more stable in a configuration that facilitated more rapid infection. Since the emergence of this mutation, because it favored increased transmission, is now present in the vast majority of viral samples isolated from patients worldwide. D614G is present in every variant of interest and variant of concern we are dealing with now.

The second “big payout” was N501Y, it likely emerged in variants last summer. This mutation had the effect of increasing how well the receptor binding domain (RBD) of the COVID viral spike could “dock” or bind with the host ACE2 receptor- this is the receptor we have on our own cells in many organ systems of the body that allow the COVID virus to hijack our cellular machinery for its own purposes. Because N501Y increased how well the spike RBD could bind with the host ACE2 receptor, it made any variant with the N501Y mutation much more contagious. The surges seen in Europe and in Michigan earlier this spring were due to the variant B.1.1.7 having the N501Y mutation.

The third, and in many ways ,the biggest of the mutation “payouts”, is E484K and the subject of my post today. This mutation has the effect of dampening our immune response to the virus, what we call “immune escape”. Because of how this mutation can affect how our immune system can detect and respond to COVID infection, we consider E484K a “Mutation of Concern” and any variant that arises with E484K is tracked closely.

Several variants have the E484K mutation- B.1.351 which emerged in South Africa last summer, both P.1 and P.2 which emerged in Brazil last spring, B.1.525 which emerged in Nigeria last summer, and B.1.526 which emerged in New York last fall. We are now also seeing the E484K mutation starting to appear in B.1.1.7 which is currently the dominant variant in the United States.

AMINO ACIDS 101

Before we talk about the E484K mutation in detail, it’s important to understand amino acids which are the basic building blocks of every protein- proteins in turn are the building blocks of all life on Earth. Proteins provide structure and function to all living things. Insulin is a protein. Hemoglobin which transports oxygen in our blood is a protein. Actin and myosin are fiber-like proteins that are the basis of the action of our muscles.

The genome of an organism encodes for all the proteins it needs. As humans, our proteins are encoded on DNA in our chromosomes. For the SARS-CoV-2 (COVID) virus, its proteins are encoded on RNA.

DNA has four letters- A (adenine), C (cytosine), G (guanine) and T (thymine).

RNA has four letters as well- but instead of T (thymine), it has U (uracil).

Every letter is called a nucleotide. Nucleotides are the individual letters of the genetic code whether it’s DNA or RNA. COVID-19’s genomic sequence is 30,000 nucleotides long. The human genomic sequence is over 3 BILLION nucleotides long.

A string of three nucleotides is called a codon. A codon codes for a particular amino acid. For example, AAA or AAG encode for the amino acid Lysine.

There are twenty amino acids, each one encoded by a particular codon. As the genetic code is transcribed, amino acids are assembled into chains- a chain of amino acids is a protein.

All twenty amino acids have the same basic structure. At one end is a positively-charged amine group and at the other end is a negatively charged carboxylic acid group. The amine group is attracted to the carboxylic acid group on another amino acid- think of these two groups as like the couplings on railcar- the negatively charged end is attracted to the positively charged end of the next amino acid. A chemical reaction allows the amine end to bind the the carboxylic acid of the next amino acid in sequence. That bond is called a peptide bond and water is the by-product of that reaction that links amino acids to each other.

The side chain of each amino acid is what differentiates them and the properties of the amino acid side chain determine the biochemical behavior of that particular amino acid.

All twenty amino acids can be written “shorthand” in one of two ways. For Glutamic Acid, it can be shortened to “Glu” or the even shorter “E”. Lysine can be shortened to “Lys” or the even shorter “K”.

We designate mutations in the COVID genetic code based upon what amino acid got swapped out and where. In the case of E484K, “E” tells us what the original amino acid was before the change- in this case, glutamic acid. “484” tells where this change occurred, in position 484 which happens to be in the spike receptor binding domain. “K” tells us the new amino acid in position 484- in this case, lysine has replaced glutamic acid.

THE CENTRAL DOGMA OF MOLECULAR BIOLOGY

Sir Francis Crick (he along with James Watson discovered the double helix structure of DNA) first described the “Central Dogma of Molecular Biology” in 1957 and it is an explanation of how genetic information flows within a living organism.

Our genetic information is stored on our DNA. The vast majority of our genes are encoded on the DNA that lays inside the nucleus of each and every cell in the body. There are some exceptions to this but aren’t relevant to COVID molecular biology.

Information from the DNA is transcribed to mRNA, or “messenger RNA”. The mRNA then exits the nucleus of the cell and the cell’s internal machinery then takes the mRNA and transcribes it into a sequence of amino acids. A chain of amino acids is a protein and proteins are the basic building blocks of all life on Earth. Some proteins are very simple, like insulin. Other proteins are very complex and have multiple subunits that are built separately then assembled into larger proteins. Antibodies, for example, have multiple subunits that are then assembled into a complete antibody.

Genetic information flow except in very particular circumstances NOT present in COVID infection or vaccination goes DNA —> mRNA —> proteins.

MUTATIONS ARE A DYNAMIC PART OF EVOLUTION

Keeping in mind how information flows from the genome to proteins, a mutation is a change in the genetic code or genome of the virus. That results in a change in the amino acid sequence and a protein is just a complex long chain of amino acids. A change in one amino acid can have big impacts depending upon WHERE in the protein structure it’s located. If it’s in a non-critical part of the spike, then nothing changes and that’s a silent mutation. If it’s in a critical part of the spike but makes the spike non-functional or not work as well, then that’s a deleterious mutation and those viruses can’t propagate further. But if it makes the spike work better, then that offers an evolutionary advantage to the virus and any virus with that mutation becomes dominant.

Back in the early 1800s there was a light colored moth that lived in the forests of Britain. As the nation industrialized during the Victorian period, soot collected on the trees and the light colored moth became easy to spot and thus get eaten by birds. A mutation in the color gene of the moth created some moths that were dark. Because of evolutionary pressure, the dark colored version of the moth became predominant.

The same thing is happening with the COVID virus. Mutations that give it an advantage become predominant. In the case of E484K mutation, it gives the virus the advantage of being able to *partially* escape our immune system response.

E484K showing up independently in multiple variants is convergent evolution in action. The evolutionary pressure over time is favoring variants that have the E484K mutation. I will have a later posting on the epidemiology of the E484K mutation. Stay tuned!

E484K AT THE MOLECULAR LEVEL

In the middle part of today's graphic, I have drawn simplified chemical structures for Positions 483, 484, and 485 of the COVID spike receptor binding domain (RBD). On the left is E484 or what we sometimes call the "wild type", meaning this virus does NOT have the E484K mutation. On the right is the same three amino acid positions but for any variant that has the E484K mutation.

"E484" refers specifically to any variant that lacks the E484K mutation.

"K484" refers specifically to any variant that has the E484K mutation.

If you refer to my COVID variants table (seen any of the posts in the links at end of this article that discuss variant overviews), I have E484K in red for any variant that has this mutation.

I have simplified the chemical structure diagram- the amine "end" is in light blue, the carboxylic acid "end" is in light red. Where those two overlap is the peptide bond. The side chain is in light green. The black line shows the carbon backbone of the side chain- any place there is an angle or a junction in that black line, that's a carbon unless otherwise labelled.

For both situations, Position 483 is the amino acid valine, Position 485 is the amino acid glycine.

In those variants that do NOT have E484K, Position 484 is glutamate (or glutamic acid)- its carbon backbone if you stretched it out would be four carbons long and the side chain is negatively charged.

In those variants that do have E484K, Position 484 is lysine- it's carbon backbone if you stretched it out would be five carbons long and the side chain is positively charged.

You can see at the molecular level, there's quite a bit of difference between the glutamate in E484 and the lysine in K484. Keep that structural difference in mind as we move to the third section of my graphic.

IMPACT OF E484K ON THE ELECTROSTATIC CHARGE OF THE RBD

A big tip of the hat goes to Jürgen Bosch PhD who kindly provided these diagrams and explained to me the implications of E484K at the molecular level. Today's post wouldn't be possible without his help. A big THANK YOU to Jürgen!

Let's look at the first upper left diagram in this section. It's basically a surface map of the electrical charge of the receptor binding domain of the COVID (SARS-CoV-2) virus. The more intense the red color, the greater the negative charge on that spot. The more intense the blue color, the greater the positive charge on that spot. Any areas that are neutrally charged are going to be in white.

Remember, an amino acid's side chain determines its chemical characteristics in the overall protein structure. In this case, the side chain determines the overall charge of that amino acid.

In E484, the glutamic acid (or glutamate) at Position 484 causes a large patch of negative charge on the RBD.

Now compare that diagram with the one in the upper right of this section, that's the same surface map of electrostatic charge, only this time it's K484. Notice in the location of Position 484 that big patch of negative charge in E484 has now been replaced by a big patch of positive charge.

This is the result of the negatively charged amino acid glutamate being swapped out for the positively charged amino acid lysine.

Now look at the green diagram in the lower left. Same part of the COVID spike, the receptor binding domain, only this time its showing the "skeleton" based on the carbon backbones of the amino acids in the protein's sequence. The yellow box marks out the area that monoclonal antibodies recognize. These sorts of areas that our immune system's antibodies recognize and target are called epitopes. An epitope is a target that our immune system has been trained either via prior infection or vaccination, to recognize and home in on.

That epitope that contains Position 484 is about 600 square Angstroms in size.

Oh, wait, what's an Angstrom? It's a unit of length at the molecular scale. One Angstrom is one hundred millionth of a centimeter.

Looking back at the electrostatic surface maps of both E484 and K484, you can see that the electrostatic charge by the amino acid at Position 484 is a significant portion of that particular epitope that our immune system recognizes. Our immune system, trained either by prior infection or vaccination or augmented by monoclonal antibody therapy, is geared to see that epitope with a big negatively charged patch, what do you think happens when presented with that same epitope but with a big positively charged patch (and bigger amino acid since lysine has a carbon backbone one carbon longer than glutamate)?

Our own antibodies, primed by prior infection or by vaccination, or monoclonal antibodies, engineered to target that epitope, won't latch on to that target as well, thereby dampening the overall immune response.

This is the mechanism of immune escape by variants that have the E484K mutation.

The last diagram in the lower right shows the COVID RBD in green interacting with the human ACE2 host receptor in white. Note the location of Position 484. It's off to the side. How does targeting this epitope stop infection?

The human IgG antibody is about 4 times the size of the ACE2 receptor. So imagine a BIG Y-shaped multi-unit protein latched on to the epitope where Position 484 is located. The antibody is so much larger, it obstructs the part of the RBD that binds with the ACE2 receptor. We call that "steric inhibition" or "steric hinderance". IgG binding to that epitope sterically blocks the receptor binding domain.

TAKE HOME MESSAGES

1/ E484K is an immune escape mutation present in the following variants: B.1.351, P.1, P.2, B.1.525, and B.1.526. This mutation is starting to appear in the dominant variant in the United States, B.1.1.7.

2/ At the molecular level, E484K results in a negatively charged glutamate (or glutamic acid) with a four carbon backbone being swapped out for positively charged lysine with a five carbon backbone.

3/ The E484K mutation results in a significant change of the electrostatic properties of a key epitope on the COVID spike receptor binding domain.

4/ E484K is not a complete immune escape mutation- while some monoclonal antibody therapies and some vaccines are affected, most of the vaccines and other monoclonal antibody therapies are still effective against variants with this mutation.

5/ But any degree of immune escape raises the possibility of breakthrough infection in someone vaccinated, re-infection in someone previously infected, or prolonged/more severe infection with the potential for an increased fatality rate.

PARTING THOUGHTS

The biggest concern we face with the variants are those that have the mutations for both being more contagious and have E484K- variants B.1.351 and P.1 meet this criteria of concern. We are also dealing with B.1.1.7, the dominant variant in the United States currently, acquiring the E484K mutation.

Our best chance at limiting the emergence of new mutations and new variants is to limit the opportunities for mutation with each new infected individual. We have to absolutely control spread and limit new infections. Our best defense is still vaccination. Non-pharmacologic interventions at the public level like mask mandates, social distancing measures and capacity restrictions as well as improved ventilation of indoor spaces buy us crucial time to get as many people vaccinated as possible.

Don't give away your shot. There are millions of people in a lot of places worldwide that dearly wish to be in your position to have such ready access to COVID vaccinations.

RELEVANT PAST POSTS (FACEBOOK LINKS)

IMMUNOLOGY 101: THE IMMUNE SYSTEM- REPOST/UPDATE (05 May 2021): https://www.facebook.com/jp.j.santiago/posts/10218801066954703

QUICK UPDATE: AMERICAN INDIANS/ALASKA NATIVES LEAD THE WAY ON VACCINATIONS (29 April 2021): https://www.facebook.com/jp.j.santiago/posts/10218763919866049

QUICK UPDATE: BY THE NUMBERS (27 April 2021): https://www.facebook.com/jp.j.santiago/posts/10218750780817581

COVID VARIANTS OVERVIEW AND UPDATE (22 April 2021): https://www.facebook.com/jp.j.santiago/posts/10218722676514991

NEW COVID VARIANT OF INTEREST: B.1.617 (18 April 2021): https://www.facebook.com/jp.j.santiago/posts/10218699298170547

CASE SURGES CAN RESULT IN COVID VARIANTS (12 April 2021): https://www.facebook.com/jp.j.santiago/posts/10218657527966318

THE OVERVIEW & STATUS OF COVID VARIANTS (9 April 2021): https://www.facebook.com/jp.j.santiago/posts/10218637120816152

MAYO CLINIC GRAND ROUNDS: COVID TESTING (12 March 2021): https://www.facebook.com/jp.j.santiago/posts/10218453381862793

QUICK UPDATE: THE COVID FAMILY TREE (01 February 2021): https://www.facebook.com/jp.j.santiago/posts/10218211054244754

COVID MUTANTS: VARIANTS OF CONCERN (31 January 2021): https://www.facebook.com/jp.j.santiago/posts/10218204381877949

COVID MOLECULAR BIOLOGY: THE SPIKE (6 January 2021): https://www.facebook.com/jp.j.santiago/posts/10218019683340601

Ramblings of a Free-Range Primary Care Physician

The Molecular Biology of the E484K Mutation

The molecular basis of how E484K is an immune escape mutation