Learning the Language of Viral Evolution and Escape

“Learning another language is not only learning different words for the same things, but learning another way to think about things.”

-Flora Lewis, American journalist

The all-out war between the SARS-COV-2 virus and the human-designed mRNA vaccine has already reached new heights with the virus current capability to mutate into several related forms. This phenomenon of the virus mutating and evading the human immune system is called viral escape, and this ingenious viral maneuver can wreak havoc to antiviral and vaccine design and development.

The group from Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) and other departments from MIT brilliantly modeled viral escape with algorithms derived from human natural language.

A parallel between a virus mutating to a related form that the host immune system no longer recognizes and a sentence maintaining its grammatical structure while altering its meaning is cleverly made. Viral escape is modeled by concomitantly characterizing semantic change and grammaticality as the two key dimensions of natural language that relate directly to both viral antigenic change (thereby rendering the immune system much less effective) and viral fitness (virus ability to infect the host and replicate, also known as replication capacity). In short, semantic landscape ~ antigenic variation while grammaticality ~ viral fitness.

While the concept of a bridge from natural language to viral evolution is in and of itself both novel and fascinating, the technical aspect of the modeling is every bit as captivating and intriguing. The neural language model uses a bidirectional long short-term memory (BiLSTM) architecture to learn the semantic structure and the grammaticality. A constrained semantic change search (CSCS) for viral escape prediction is designed to search for possible mutations that preserve fitness while being antigenically different to the host.

The viral protein “language” model uses both fitness and semantic change to predict an escape sequence that is highly desired: high fitness but different antigenic characteristics. In other words, a mutation with high semantic change as well as high grammaticality is much more likely to induce an escape.

The investigators assessed the generality of this approach by analyzing three viral surface proteins that are involved in the binding process to host cells: influenza A hemagglutinin, HIV-1 envelope glycoprotein, and the now well known SARS-COV-2 spike glycoprotein.

This work and its analogy of viral escape to natural language is an innovative conceptual nexus of two seemingly unrelated biological domains of viral evolution and natural language. The future therapeutic strategy can perhaps include a mosaic vaccine that includes most, if not all, of the most likely mutations in the viral escape portfolio.

The full article can be read here

Recommended Posts