<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Assay]]></title><description><![CDATA[Deep dives into protein engineering, enzyme design, and scaling gene editing - from the Mandrake Bio team]]></description><link>https://research.mandrake.bio</link><image><url>https://substackcdn.com/image/fetch/$s_!lq34!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f04631-063b-43c9-8e28-b6a5f678ccbf_279x279.png</url><title>The Assay</title><link>https://research.mandrake.bio</link></image><generator>Substack</generator><lastBuildDate>Mon, 25 May 2026 11:00:45 GMT</lastBuildDate><atom:link href="https://research.mandrake.bio/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Mandrake Bio]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[mandrakebio@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[mandrakebio@substack.com]]></itunes:email><itunes:name><![CDATA[Mandrake Bio]]></itunes:name></itunes:owner><itunes:author><![CDATA[Mandrake Bio]]></itunes:author><googleplay:owner><![CDATA[mandrakebio@substack.com]]></googleplay:owner><googleplay:email><![CDATA[mandrakebio@substack.com]]></googleplay:email><googleplay:author><![CDATA[Mandrake Bio]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[We don't even design binders!!]]></title><description><![CDATA[We're a gene editor design company in Bangalore. Last month we entered a protein binder design competition as a weekend side-project and ended up winning it. 12000+ designs and only 1 strong binder.]]></description><link>https://research.mandrake.bio/p/we-dont-even-design-binders</link><guid isPermaLink="false">https://research.mandrake.bio/p/we-dont-even-design-binders</guid><dc:creator><![CDATA[Mandrake Bio]]></dc:creator><pubDate>Tue, 28 Apr 2026 14:16:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!hzd9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iu6P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iu6P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 424w, https://substackcdn.com/image/fetch/$s_!iu6P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 848w, https://substackcdn.com/image/fetch/$s_!iu6P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 1272w, https://substackcdn.com/image/fetch/$s_!iu6P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iu6P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png" width="1178" height="662" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:1178,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121490,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/195732539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iu6P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 424w, https://substackcdn.com/image/fetch/$s_!iu6P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 848w, https://substackcdn.com/image/fetch/$s_!iu6P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 1272w, https://substackcdn.com/image/fetch/$s_!iu6P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb568ccd9-4551-4c06-b348-9de9b71bca38_1178x662.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We are a gene editor design company. We work on de novo gene editors to enable a step change upgrade in gene editing and make all those breakthroughs that we&#8217;ve been hearing about since years - possible. We do not, as a company, design protein binders.</p><p>A few weekends ago we entered the GEM &#215; Adaptyv RBX1 binder design competition. We submitted three different approaches across separate accounts (we just wanted to test our wild appraoches), ran the whole thing as a side project alongside our main gene editing work. This served as a stress-test of our internal AI X biophysical platform on a problem class we don&#8217;t normally touch.</p><p>All three of our submissions (21 designs) were selected by <a href="https://www.adaptyvbio.com/">Adaptyv </a>for the wet-lab round. 20/21 designs expressed cleanly at the BLI step. One bound - at K<sub>D</sub> = 26 nM, the only Strong-classified binder in the entire 322-design competition.</p><p>The pipeline that produced it is called <strong>ORBIT</strong> (Oracle-Reseeded Binder design with Interface Targeting). ORBIT surprisingly outperformed teams including Pacesa Lab - the authors of BindCraft, the framework that inspired a range of hallucination based protein design approaches such as Mosaic (And ORBIT itself!!).<br><br>All the other teams, focused on designing binders reached medium binding at best. These are serious teams running serious methods. They work on binder design as their primary research. We don&#8217;t.  But this shows the generalizability of our biophysics approach for enzyme design.  </p><p>This is the overview. The full technical writeup - methodology, code, ablation logs, design trajectories, candidate structures - is coming in a companion post on <a href="https://research.mandrake.bio/">The Assay</a>, and we&#8217;re releasing everything publicly so the design community can pull it apart, reproduce it, and improve on it. More on that below.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://research.mandrake.bio/subscribe?"><span>Subscribe now</span></a></p><p></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;a22284be-30a6-4f93-9a4b-a68e80791799&quot;,&quot;duration&quot;:null}"></div><p style="text-align: center;"><em>A gorgeous visualization of RBX1 hugging our binder</em></p><h2>Why we entered</h2><p>We&#8217;re building an AI X Physics first protein design platform to design de novo gene editing enzymes. As a field, the challenge is how do you search a large search space with as less compute as possible. We&#8217;re a fairly young company that&#8217;s just started out, and are setting up our wet lab as we speak. This competition was a great way for us to validate some of our hypotheses and see how they stand against actual wet lab results.</p><p>The Adaptyv competition was the cleanest version of that benchmark we&#8217;d seen. RBX1 partly overlaps with our actual work - metal coordination, partial disorder, multi-domain stability &#8212; but is otherwise a different problem class. If our platform could produce a competitive design with weekend-scale effort and three approaches in parallel, that would tell us something. If it couldn&#8217;t, that would tell us something too.</p><p>We submitted three approaches across separate accounts because we wanted experimental signal across methods, not a single bet. ORBIT was one of the three. The other two were also novel and explored a diverse search space. They get their own writeup. The three-approach result is itself the more interesting story for us internally: most binder design pipelines fail on RBX1 - including two of our own.</p><h2>Why RBX1 broke most methods</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tiB_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tiB_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 424w, https://substackcdn.com/image/fetch/$s_!tiB_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 848w, https://substackcdn.com/image/fetch/$s_!tiB_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 1272w, https://substackcdn.com/image/fetch/$s_!tiB_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tiB_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif" width="936" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:936,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:118130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/195732539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tiB_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 424w, https://substackcdn.com/image/fetch/$s_!tiB_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 848w, https://substackcdn.com/image/fetch/$s_!tiB_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 1272w, https://substackcdn.com/image/fetch/$s_!tiB_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ebd2f3b-9216-4885-94fa-2128928908a7_936x684.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RBX1 has a large IDP region and three Zinc ions in complex</figcaption></figure></div><p>RBX1 is a 108-residue subunit of the Cullin-RING E3 ubiquitin ligase complex. Disrupting its function inhibits CRL-mediated protein degradation, which is therapeutically relevant in cancers where CRL activity is dysregulated. The biology aside, what makes it a brutal computational target is the protein itself.</p><p>Three things stack against you. First, an intrinsically disordered N-terminal region that floods standard interface-confidence metrics with noise unrelated to binding. Second, a C-terminal RING-H2 finger that coordinates three Zn&#178;&#8314; ions in a cross-brace arrangement - geometry no AI model can infer from sequence alone. Third, no useful evolutionary relatives, so sequence-based methods have nothing to lean on.</p><p>The fold-prediction baselines tell you how hard this was: ESMFold pLDDT 0.40 (uninformed), Boltz-2 0.63 (marginal), Protenix without template 0.60. Most de novo binder pipelines start with a structure prediction; if you can&#8217;t fold the target cleanly, the entire pipeline downstream inherits the noise.</p><p>We used Protenix as the starting point as it gave the best starting prediction as compared to other models. But still, this wasn&#8217;t good enough. So, the first useful intervention was the obvious one.<br>Feed Protenix the 2LGV NMR structure as a template, with three Zn&#178;&#8314; ions explicitly placed. pLDDT jumped from 0.60 to 0.86. That single change unlocked everything downstream. For metal-coordinated, constraint-heavy folds, ab initio is a tax you pay for no reason. The template encodes physics - zinc coordination geometry - that no model trained on sequence and general structure can recover. Use it.</p><h3>ORBIT: three ideas, layered</h3><p>ORBIT is built on the BindCraft framework that <a href="https://x.com/MartinPacesa">Pacesa </a>Lab developed - backpropagating through a structure prediction model to optimize a sequence PSSM. We use Protenix as the structure prediction model (the JAX version that <a href="https://escalante.bio/">Escalante Bio </a>built for their <a href="https://github.com/escalante-bio/mosaic">Mosaic </a>pipeline) and SolubleMPNN as the inverse folding oracle. Most of what we built sits on top of it.</p><ol><li><p><strong>Search the right regions of design space, and stop searching the wrong ones.</strong> Standard differentiable hallucination starts from a uniform PSSM and runs a single long trajectory from there. Most of that compute is spent finding the right neighborhood; only a small fraction is spent refining within it. We changed both halves of that equation.</p><p>We turned the optimizer into a multi-stage iterative search. Run a brief hallucination to get a provisional structure. Hand that structure to SolubleMPNN to extract the sequence prior consistent with it. Restart the optimizer from that informed prior. The MPNN sampling temperature is a control knob - low temperatures give a sharp, decisive prior; higher temperatures preserve exploration - and we ran an ablation, then allocated seed volume by what won. The optimizer is no longer doing one long descent from random initialization; it&#8217;s doing structured restarts from oracle-informed basins.</p><p>We also stopped treating each run as a single function call returning a single answer. A differentiable optimization run is a <em>trajectory</em> through sequence-distribution space. Two consequences. First, the optimizer&#8217;s final state isn&#8217;t necessarily its best discrete sequence - the argmax projection can be excellent at some intermediate step and worse at the end. We track the best hard sequence along the trajectory and harvest from there. Second, most bad trajectories fail in recognizable ways early. A small classifier trained on intermediate trajectory features catches failures at step 60 of a 165-step run with 97.7% accuracy and zero false negatives at the conservative threshold. Together: dead runs stop consuming budget, live ones get harvested for their best discrete output rather than their final one, and high-variance branches - the ones with the highest peaks - become affordable to scale.</p></li><li><p><strong>A force-field for the interface.</strong> Most binder design pipelines treat the target surface as uniform - bind anywhere. Some allow weighted contact targeting in distogram space, which biases <em>where</em> contacts form but doesn&#8217;t shape <em>how</em> the binder gets pulled there. We built a <em>gravitational binding loss</em> that operates in coordinate space directly: high-value target residues exert a continuous attractive field on the binder, with the strength of the pull modulated by per-residue importance. The framework extends naturally to repulsive zones and exclusion regions. RBX1 was the first target we built it for; the abstraction is general.</p></li><li><p><strong>A programmable biophysical layer.</strong> We added a regularization layer that operates on the soft PSSM during optimization - soft pressure toward foldability margin, compositional balance, controlled net charge, hydrophobicity profile, manufacturability. The structural model is no longer the only judge of what the optimizer should care about. The optimizer feels biophysical intent throughout the trajectory, not just as a downstream filter. This is the part of the system that&#8217;s genuinely platform-shaped: switch the regularizers, and the same optimization machinery steers toward a different biochemical profile. The work it did on RBX1 is one configuration of a more general capability.</p></li></ol><p>Apart from these three key research ideas above there&#8217;s a longer engineering stack underneath that we use day-to-day in our pipeline such as: <a href="https://sassafras13.github.io/GumbelSoftmax/">Gumbel-Softmax sampling</a>, slow PSSM sharpening, cosine-decayed steering schedules, noise-robust interface scoring, a few other things. </p><p>The full technical writeup is going up on <a href="http://research.mandrake.bio">The Assay</a> in the detailed post, soon.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://research.mandrake.bio/subscribe?"><span>Subscribe now</span></a></p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kX9E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kX9E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 424w, https://substackcdn.com/image/fetch/$s_!kX9E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 848w, https://substackcdn.com/image/fetch/$s_!kX9E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 1272w, https://substackcdn.com/image/fetch/$s_!kX9E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kX9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif" width="1108" height="686" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:686,&quot;width&quot;:1108,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:529834,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/195732539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!kX9E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 424w, https://substackcdn.com/image/fetch/$s_!kX9E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 848w, https://substackcdn.com/image/fetch/$s_!kX9E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 1272w, https://substackcdn.com/image/fetch/$s_!kX9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca060d5-62f8-4639-8e91-60b5d77e3a7b_1108x686.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Red: RBX1 | Blue: The winning binder</figcaption></figure></div><p>One important point to note &#8211; We also saw that the past Adaptyv data showed model metrics such as ipTM, pLDDT, have significantly less correlation with actual binder performance, so every design which scored high on our metrics had to pass a strict &#8216;vibe check&#8217; by our protein designer &#8211; Surabhi, for it to make it to the final list.</p><h2><strong>The result</strong></h2><p><em>Our winning binder</em> is a 100-residue, 11.8 kDa de novo binder. It expressed cleanly, folded independently as a stable monomer (pLDDT 0.975), and bound RBX1 at K_D = 2.6 &#215; 10&#8315;&#8312; M (26 nM) by bio-layer interferometry. <strong><br></strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hzd9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hzd9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 424w, https://substackcdn.com/image/fetch/$s_!hzd9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 848w, https://substackcdn.com/image/fetch/$s_!hzd9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 1272w, https://substackcdn.com/image/fetch/$s_!hzd9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hzd9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif" width="1108" height="686" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:686,&quot;width&quot;:1108,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:344735,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/195732539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!hzd9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 424w, https://substackcdn.com/image/fetch/$s_!hzd9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 848w, https://substackcdn.com/image/fetch/$s_!hzd9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 1272w, https://substackcdn.com/image/fetch/$s_!hzd9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F721db1d8-a9db-4af6-8d73-ae02cba075ba_1108x686.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Adaptyv&#8217;s classification:</strong> <strong>Strong</strong>. <em><strong>The only one in that category across all 322 designs in the competition.</strong></em></p><p><em>As Martin Pacesa says</em><strong> </strong><em>- <strong>Helices are all you need :p</strong></em></p><p>The next-best binder came in at 85 nM - <em><strong>a 3.3&#215; gap</strong></em> in absolute affinity, but within the same order of magnitude. The cleaner reading is the categorical one: ours was the only design Adaptyv classified as Strong, the next eight binders were Medium (170 nM to 5.7 &#215; 10&#8315;&#8311; M), and 313 of 322 designs did not bind the target at all. That hit-rate is the more honest summary of where the field is on this kind of target - partially-disordered, metal-coordinated, no useful structural homologs. Designing binders for proteins that don&#8217;t sit still is genuinely unsolved.</p><p>Which is also the framing we want to put on the result. One Strong binder on RBX1 is not &#8220;binder design for IDPs is solved.&#8221; It&#8217;s a single data point that the right integration choices - template guidance, structure-informed reseeding, weighted epitope steering, multi-model gating - can find affinity in design space that earlier configurations missed. That space is large. Most of it is still unmapped.</p><h2>What&#8217;s coming: open release</h2><p>We engineered a bunch of things across the pipeline. Code, ablations, and a detailed end-to-end writeup of exactly what we did and why are coming in a follow-up post on <a href="https://research.mandrake.bio/">The Assay</a> - subscribe there if you want it in your inbox when it lands.</p><p>The competition&#8217;s full dataset is already open under ODC-ODbL on Proteinbase; ORBIT will sit alongside it for anyone to pull apart, reproduce, and improve on.</p><p>If you have built or are building a binder design pipeline and want to compare notes - or if you find something in our methodology you&#8217;d push back on - please. That&#8217;s the point.</p><h2><strong>What&#8217;s next</strong></h2><p>The hardest protein design problems we work on at Mandrake aren&#8217;t binders - they&#8217;re enzymes whose function depends on dynamic conformational states no current structure prediction model handles cleanly. RBX1 was a useful stress-test, but the open problem we actually care about is generating designs that get state-dependent function right on the first try. That&#8217;s where the next wave of methods has to land. We&#8217;re working on it.</p><p><em>Thanks to the GEM Workshop and Adaptyv Bio for organizing a competition that takes hard targets seriously, and to Adaptyv&#8217;s experimental team for fast, careful BLI work. Particular thanks to Pacesa Lab for BindCraft, and Escalante Bio for Mosaic which inspired the differentiable hallucination core of ORBIT.</em></p><p><em>If you read this and have something you&#8217;d push back on - or if you&#8217;re an AI Researcher, physicist or protein engineer interested in this kind of work - we are hiring. <br>Reach us at ai@mandrake.bio </em></p><p><em>Mandrake Bio: foundational protein engineering for next-generation gene editors. Bengaluru.</em></p><p><em>If you liked this post, consider sharing it with others in your network!</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/p/we-dont-even-design-binders?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://research.mandrake.bio/p/we-dont-even-design-binders?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p>]]></content:encoded></item><item><title><![CDATA[Protein Language Models: Fluent, but clueless]]></title><description><![CDATA[These models have learned grammar. They have not learned physics.]]></description><link>https://research.mandrake.bio/p/protein-language-models-fluent-but</link><guid isPermaLink="false">https://research.mandrake.bio/p/protein-language-models-fluent-but</guid><dc:creator><![CDATA[Aryan Chandak]]></dc:creator><pubDate>Sat, 18 Apr 2026 07:33:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0TO-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0TO-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0TO-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 424w, https://substackcdn.com/image/fetch/$s_!0TO-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 848w, https://substackcdn.com/image/fetch/$s_!0TO-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 1272w, https://substackcdn.com/image/fetch/$s_!0TO-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0TO-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png" width="1456" height="483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2416905,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0TO-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 424w, https://substackcdn.com/image/fetch/$s_!0TO-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 848w, https://substackcdn.com/image/fetch/$s_!0TO-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 1272w, https://substackcdn.com/image/fetch/$s_!0TO-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9a8e02a-7145-46e9-b547-aaf2a44ad3ac_2244x744.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There&#8217;s a beautiful idea at the heart of modern protein AI: that the language of evolution is, at some level, just that - a language. Amino acids are letters. Proteins are sentences. And if you train a large enough model on enough of these sentences, maybe it learns to speak fluently.</p><p>This idea is not crazy. It has produced some genuinely remarkable results. But here&#8217;s the thing: fluency is not understanding. And in biology, the gap between the two can cost you months of wet-lab time.</p><p>At Mandrake, we use autoregressive protein language models (PLMs) every day&#8212;ProGen2, ZYMCTRL, and the ecosystem built around them. They are core tools in our protein engineering pipeline, and the whole field is leaning on them. For good reason: they work. But they work within a comfort zone that is narrower than most people realize, and the edges are sharp.</p><p>So we stress-tested them. Not on standard benchmarks, but on the questions that actually matter for protein engineering. What we found was a consistent, somewhat unsettling picture that we felt compelled to share.</p><p>This is that story.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://research.mandrake.bio/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>First: What Even Is a Protein Language Model?</h2><p>Bear with me for a moment - this will pay off.</p><p>A protein is, at its core, a string of amino acids. There are 20 of them (think of them as an alphabet), strung together in chains that can run anywhere from a few dozen to several thousand residues long. This 1D string then folds&#8212;through a remarkable process driven by physics and chemistry&#8212;into a precise 3D shape. And that shape determines everything: whether a protein catalyses a reaction, binds a drug, edits a genome, or does nothing at all.</p><p>For decades, figuring out the relationship between sequence (the string) and function (what it does) was one of biology&#8217;s grand unsolved problems. Then, around 2020, researchers noticed something: the statistical structure of protein sequences looks a lot like the statistical structure of text. Both have local patterns (amino acid motifs, like grammatical phrases), long-range dependencies (contacts between distant residues, like subject-verb agreement), and a vast training corpus (hundreds of millions of known protein sequences from nature, i.e. evolution&#8217;s 4-billion-year experiment in writing).</p><p>So they did the obvious thing: they took the transformer architecture from NLP&#8212;the same fundamental engine behind GPT, BERT, and their descendants&#8212;and trained it on proteins instead of text.</p><p>The results were impressive. Models like ESM-2, ProGen2, and ESM-C learned to assign high probability to sequences that look protein-like, generate novel sequences that fold into real structures, and even predict the effect of mutations on protein stability. ESMFold, built on ESM-2&#8217;s representations, can predict a protein&#8217;s 3D structure in seconds&#8212;a task that once required months of crystallography.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G-jW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G-jW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 424w, https://substackcdn.com/image/fetch/$s_!G-jW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 848w, https://substackcdn.com/image/fetch/$s_!G-jW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!G-jW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G-jW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png" width="1456" height="661" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:661,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:639546,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G-jW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 424w, https://substackcdn.com/image/fetch/$s_!G-jW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 848w, https://substackcdn.com/image/fetch/$s_!G-jW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!G-jW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39ce891c-5c51-47dc-87ea-ff012c3b88fb_2286x1038.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two broad families emerged:</p><ul><li><p><strong>Autoregressive models (like ProGen2):</strong> Read proteins left-to-right, one amino acid at a time. They generate sequences the same way GPT generates text&#8212;conditioning each new token on everything that came before. Good for generation. Blind to the future.</p></li><li><p><strong>Masked language models (like ESM-2):</strong> Trained by randomly masking residues and predicting the masked positions from both left and right context. Bidirectional. Better for understanding the whole sequence. Less naturally suited for generation.</p></li></ul><p>Both families are now widely used in protein engineering. Both have been heralded as transformative tools. Both are also, as we&#8217;re about to discuss, surprisingly fragile in ways the standard benchmarks completely miss.</p><h2>These models have learned grammar. They have not learned physics</h2><p>What we mean by grammar: which amino acids tend to follow which, what motifs are common, how families cluster in sequence space, what an RT domain looks like versus a zinc-finger.</p><p>What we mean by physics: how proteins fold into 3D structures, how residues that are far apart in sequence interact when the chain collapses into shape, and how functional constraints&#8212;an active site here, a binding interface there&#8212;compose into a working machine.</p><p>Grammar is impressive. Grammar got us to sequences that look right and fold into plausible shapes. But in protein engineering, &#8220;plausible&#8221; isn&#8217;t enough. You need &#8220;functional.&#8221; And function lives in physics.</p><p>We ran four experiments to probe exactly where the grammar ends and the physics doesn&#8217;t begin. Let&#8217;s look at what we did and what we found.</p><div><hr></div><h2>Experiment 1: 1D Grammar, Zero 3D Awareness</h2><p>This is the question at the heart of everything.</p><p>When ProGen2 generates a protein one amino acid at a time, left to right, does it have any internal picture of the 3D shape it&#8217;s building? Or is it pattern-matching sequences with no awareness of what those sequences do in physical space?</p><p>Here&#8217;s why this matters. A protein sequence is a 1D string, but the actual object is 3D. Two amino acids that are 250 positions apart in the sequence can be physically touching in the folded protein. These are called <strong>long-range 3D contacts</strong>, and they are not optional details. They determine whether the protein folds correctly, whether the active site is shaped right, whether the thing actually works.</p><p>To test whether ProGen2 sees these contacts, we looked at its <strong>attention patterns</strong>. In a transformer, attention is the mechanism by which the model decides, at each position, which other positions to look at. If ProGen2 genuinely understands 3D structure, then when it&#8217;s generating residue 280, it should be paying special attention to residue 30&#8212;because in the folded protein, they&#8217;re physically touching, even though they&#8217;re 250 positions apart in the sequence.</p><p>One subtlety: there are two reasons two residues might attend to each other. First, they could be in <strong>structural contact</strong> (physically touching in 3D). Second, they could have <strong>co-evolved</strong> (mutations at one position tend to be compensated by mutations at the other, across evolutionary history). These two signals are correlated but not identical. We specifically wanted the structural signal. So we applied a standard technique called APC (Average Product Correction) to remove the evolutionary background noise. What remains, if anything, should be pure structural awareness.</p><p>This is the same principle behind ESMFold: if attention genuinely encodes 3D contacts, you can fold a protein from attention patterns alone. <em><a href="https://www.science.org/doi/10.1126/science.ade2574">ESM-2 can do this</a></em>. The question was whether ProGen2 can do anything similar.</p><p>We tested 150 non-redundant protein structures (X-ray resolution &#8804; 2.0&#197;, 100&#8211;500 amino acids), across 38,286 contact/decoy pairs. For each protein, we asked: after removing evolutionary background, can ProGen2&#8217;s attention distinguish real 3D contacts from decoys at the same sequence distance?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5oIn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5oIn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 424w, https://substackcdn.com/image/fetch/$s_!5oIn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 848w, https://substackcdn.com/image/fetch/$s_!5oIn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!5oIn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5oIn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:354546,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5oIn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 424w, https://substackcdn.com/image/fetch/$s_!5oIn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 848w, https://substackcdn.com/image/fetch/$s_!5oIn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!5oIn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa80dba-4216-421d-85b5-1359e05459e2_4800x1800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Left: Cohen&#8217;s d effect size for True vs Decoy contact discrimination, by 1D sequence distance. ProGen2 (salmon) never reaches the 0.2 &#8220;small effect&#8221; threshold. ESM-2 (blue) peaks at 0.51, a medium effect. Right: statistical significance. ESM-2&#8217;s signal is overwhelming; ProGen2&#8217;s last two bins are not even statistically significant</figcaption></figure></div><p><strong>The results across 38,286 contact/decoy pairs:</strong></p><ul><li><p>ProGen2 AUC: <strong>0.527</strong> &#8212; barely above coin flip (0.50)</p></li><li><p>ProGen2 max Cohen&#8217;s d: <strong>0.184</strong> &#8212; below the threshold for even a &#8220;small&#8221; effect</p></li><li><p>Beyond ~100 residues of sequence separation: <strong>ProGen2 is</strong> <strong>statistically indistinguishable from random</strong> (p = 0.468 and p = 0.223 for the last two bins)</p></li></ul><p>For comparison, we ran the exact same analysis on ESM-2 (a bidirectional masked model with 12 layers ; matched to ProGen2's depth for a fair architectural comparison):</p><ul><li><p>ESM-2 AUC: <strong>0.611</strong></p></li><li><p>ESM-2 Cohen&#8217;s d peaks at <strong>0.52</strong> &#8212; a genuine medium effect</p></li><li><p>Every single distance bin statistically significant, including the hardest ones (p &lt; 10&#8315;&#185;&#8304; at 150&#8211;500 residues apart)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bqLS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bqLS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 424w, https://substackcdn.com/image/fetch/$s_!bqLS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 848w, https://substackcdn.com/image/fetch/$s_!bqLS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!bqLS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bqLS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png" width="1456" height="397" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:397,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1679166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bqLS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 424w, https://substackcdn.com/image/fetch/$s_!bqLS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 848w, https://substackcdn.com/image/fetch/$s_!bqLS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!bqLS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83c31426-8bfc-4ac5-b34e-1ffc22b4a705_6600x1800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Left: ProGen2 scatter (True vs Decoy Z-scores). Middle: ESM-2 scatter. Right: Z-score gap (True minus Decoy) LOWESS curves overlaid. ProGen2's gap (red) crosses zero around 150 residues and goes negative. ESM-2's gap (blue) stays positive across the entire range</figcaption></figure></div></li></ul><p>The layer-wise analysis makes the difference even starker. ESM-2 concentrates its structural signal in its final layers&#8212;Layer 11 shows Z-score gaps of 1.51, 1.71, and 1.45 across the first three distance bins. It builds a progressive, deep structural representation. ProGen2, despite having 27 layers, never achieves anything comparable.</p><p>This doesn't mean ProGen2 learned nothing. It learned excellent 1D grammar ; it knows which amino acids tend to follow which, which motifs are common, what local secondary structure looks like. But 3D contacts, the thing that actually determines whether a protein folds and functions? Essentially invisible to its causal attention.</p><p>There&#8217;s a paradox buried here worth pausing on. You might think: &#8220;Of course - causal attention reads left-to-right, so at any time, it is always working with partial information.&#8221; But that&#8217;s not actually the explanation. </p><p>Look at the short-range bins: at 10-20 residues apart, ProGen2 actually shows <em>some</em> signal. It performs best when it has the <em>least</em> context. The paradox is at long range. When the model is evaluating a contact at position 500, it has already seen 499 residues; nearly the entire protein. It has more than enough context to have built a representation of the protein&#8217;s overall architecture. And yet it performs <em>worse</em> than at short range, eventually becoming indistinguishable from random.</p><p><em><strong>The problem isn&#8217;t that causal attention sees too little of the protein. The problem is that even when it sees almost all of it, it hasn&#8217;t encoded 3D structural relationships into its representations. More context doesn&#8217;t help because the model never learned to use context for spatial reasoning in the first place.</strong></em></p><div><hr></div><h2>Experiment 2: The Copy Bias Trap</h2><p>This one disturbed us the most.</p><p>Take any protein sequence - call it ABC. Now duplicate it: ABCABC. Feed this to ProGen2 and measure how confident the model is at predicting each amino acid.</p><p>On the first half (the original), the model behaves normally: uncertain at many positions, per-position perplexity values of 15&#8211;25. Expected. Proteins are complex.</p><p>On the second half - the repeated copy - perplexity <strong>collapses to approximately 1.0</strong>. The model predicts every single amino acid with near-perfect confidence. Not because it understands the protein&#8217;s structure or function. Because it saw the exact same tokens earlier in the context window and simply copied them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GuKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 424w, https://substackcdn.com/image/fetch/$s_!GuKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 848w, https://substackcdn.com/image/fetch/$s_!GuKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 1272w, https://substackcdn.com/image/fetch/$s_!GuKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png" width="1456" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:361665,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GuKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 424w, https://substackcdn.com/image/fetch/$s_!GuKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 848w, https://substackcdn.com/image/fetch/$s_!GuKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 1272w, https://substackcdn.com/image/fetch/$s_!GuKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3110d9f-84cf-4c96-9c12-58ac2cdaf5c9_4211x1691.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Per-position perplexity for 6 real proteins and 6 random amino acid strings. Blue bars: original sequence. Red bars: repeated copy. The copy mechanism collapses perplexity from 19-27 down to 1.0-2.0.</figcaption></figure></div><p>Here&#8217;s the part that should alarm you: <strong>this works on random gibberish too.</strong> We generated random amino acid strings - no biology, no evolutionary signal, no structural logic - and the collapse is actually <em>worse</em> than for real proteins. It&#8217;s pure copy-paste.</p><p>This mechanism is the likely culprit behind the high rates of mode collapse we see during candidate generation in our own pipelines.</p><p>We then ran a follow-up: give ProGen2 a real protein followed by just the first 25% of a repeat, and let it generate freely. <strong>Would it produce a novel protein, or lock into a copy loop?</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tAo5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tAo5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 424w, https://substackcdn.com/image/fetch/$s_!tAo5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 848w, https://substackcdn.com/image/fetch/$s_!tAo5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 1272w, https://substackcdn.com/image/fetch/$s_!tAo5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tAo5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png" width="1456" height="938" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:938,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:568073,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tAo5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 424w, https://substackcdn.com/image/fetch/$s_!tAo5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 848w, https://substackcdn.com/image/fetch/$s_!tAo5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 1272w, https://substackcdn.com/image/fetch/$s_!tAo5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5551859-30aa-47d0-a9fc-28ef1edbd645_3953x2547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Top: the experimental setup ; prompt is a full protein + the start of a repeat. Bottom: across 7 diverse proteins (129-240aa), ProGen2 copies the input sequence character-for-character in an infinite loop. 5 out of 7 proteins show 100% exact match, even with stochastic sampling</figcaption></figure></div><p>Across 7 diverse proteins (129&#8211;240 amino acids): 5 out of 7 produced a <strong>100% character-for-character copy</strong>, <strong>looping indefinitely</strong>. Even with stochastic sampling (top-p = 0.95, temperature = 0.8) - which should introduce randomness - the copy mechanism dominates. The model stops being a protein engineer and becomes a photocopy machine.</p><p>This isn&#8217;t a ProGen2-specific quirk. Kantroo et al. (arXiv:2504.17068) tested this across ESM2-650M, ESM2-8M, ProGen2-M, CARP-640M, and LC-PLM-1.4B. All auto-regressive transformer-based models show the same collapse. Convolutional architectures (CARP) only collapse for repeats shorter than ~70 residues; BiMamba-based models (LC-PLM) degrade gradually without the catastrophic cliff.</p><p>This is an <strong>attention-specific architectural vulnerability</strong>, and it has a well-understood mechanistic origin. The NLP interpretability literature calls it <em>induction heads</em> - a specific circuit pattern in transformer attention layers that detects and continues repeated token sequences (Olsson et al., 2022). <br>In text, these circuits make transformers good at in-context learning. In proteins, the same circuits become a liability: they detect any repeated pattern and lock onto it, overriding everything the model knows about biology. </p><div><hr></div><h2>Experiment 3: DMS Variant Scoring</h2><p>To make the comparison concrete in a practical task, we benchmarked ProGen2-base (754M parameters) against ESM-C (600M parameters) on Deep Mutational Scanning (DMS) variant effect prediction across 7 standard assays.</p><p>DMS experiments work like this: take a protein, systematically mutate every position to every other amino acid, and measure what happens to function. The resulting dataset tells you, empirically, whether each single mutation helps, hurts, or is neutral. A good PLM should be able to predict this from sequence alone.</p><p>For ProGen2, scoring a variant means computing the log-likelihood for both the wildtype and mutant sequences, then taking the difference. A mutation at position 42 changes the log-probability at position 42 and all downstream positions&#8212;but positions 1&#8211;41 remain identical. The effect ripples forward. Never backward.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FfQw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FfQw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 424w, https://substackcdn.com/image/fetch/$s_!FfQw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 848w, https://substackcdn.com/image/fetch/$s_!FfQw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 1272w, https://substackcdn.com/image/fetch/$s_!FfQw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FfQw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png" width="1456" height="567" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:567,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:493890,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://research.mandrake.bio/i/194584278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FfQw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 424w, https://substackcdn.com/image/fetch/$s_!FfQw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 848w, https://substackcdn.com/image/fetch/$s_!FfQw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 1272w, https://substackcdn.com/image/fetch/$s_!FfQw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1120fd5f-22b2-47ec-83e0-10af4f8b4575_4762x1855.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Left: Spearman correlation between predicted and measured fitness for 7 DMS assays. ESM-C (blue) beats ProGen2 (red) on every single assay. Right: the delta ; ESM-C's advantage ranges from +0.028 (HIS7) to +0.268 (GAL4).                            Mean advantage: +0.136</figcaption></figure></div><p>ESM-C outperforms ProGen2 across every assay - a mean Spearman difference of +0.136. The most striking data point: ProGen2 achieves a Spearman correlation of -0.003 on GFP. Essentially random. Now, GFP is arguably the most well-studied protein in all of biology - the field has been staring at it for decades. So this isn&#8217;t a case of an obscure protein with sparse training data. It&#8217;s a clean signal that something architectural is going on, and it&#8217;s worth understanding why.</p><p>The answer, we think, is directionality. When ProGen2 scores a mutation, its causal architecture means the effect ripples forward - but never backward. Downstream packing partners and distal contacts that determine whether a mutation is tolerated are invisible to positions upstream of the mutation site. Bidirectional models like ESM-C see both sides. It&#8217;s not that ProGen2 is broken- it&#8217;s that the autoregressive training objective, so powerful for generation, creates a structural blind spot for variant scoring. The failure mode is specific, it&#8217;s explainable, and that&#8217;s actually useful.</p><p>One important caveat: DMS benchmark performance is heavily influenced by how well a protein family is represented in training data. High performance can reflect memorisation as much as genuine understanding. The bioRxiv preprint &#8220;<a href="http://2024.10.03.616542v1">Protein Language Model Fitness Is a Matter of Preference&#8221;</a><strong> </strong> makes this case convincingly. DMS is our gold standard, and even it is more fragile than we&#8217;d like - <em>but that&#8217;s a story for another post.</em></p><div><hr></div><h1>Evals Evals Evals</h1><p>Everything described above could have been caught earlier. If we had the right evals. </p><p>The field currently evaluates PLMs on:</p><ul><li><p><strong>Perplexity</strong> &#8212; how well does the model predict the next amino acid? A 1D string metric. Tells you nothing about 3D structure, robustness, or compositionality.</p></li><li><p><strong>DMS fitness prediction</strong> &#8212; can the model rank single-point mutations? As we showed, heavily influenced by training data distribution, and rewards memorization.</p></li></ul><p>But these new failure modes necessitate different evals and benchmarks: Does this model collapse when given a trivially duplicated sequence? Can it distinguish real 3D structural contacts from decoys? Can it compose two functional constraints it hasn&#8217;t seen together? Does it degrade gracefully or fail catastrophically at distribution edges?</p><p>Our contact discrimination experiment is, in a sense, a benchmark that didn&#8217;t exist before we built it. ProGen2&#8217;s AUC of 0.527 would be immediately recognized as failure on any standard ML benchmark. In protein AI, it has gone largely unnoticed.</p><p>There&#8217;s also a subtler issue: over-optimizing for perplexity can actively hurt. Perplexity rewards the most statistically dense, evolutionarily consensus sequence. But real functional proteins need <em>energetic frustration</em>&#8212;sub-optimal packing, dynamic loops, flexible active sites. Pushing perplexity down drives models toward hyper-stable, catalytically inert consensus proteins. <em><strong>You get sequences that look maximally protein-like and do maximally little.</strong></em></p><div><hr></div><h2>What This Actually Means</h2><p>We want to be precise about what we&#8217;re claiming and what we&#8217;re not. As protein design gets easier and easier, knowing the failure modes is what separates successful campaigns v/s spending weeks in the wet-lab. </p><p><strong>We&#8217;re not saying these models are useless.</strong> We use them at Mandrake. They generate high-quality sequences within well-characterized families. Our adapter-based pipeline produces sequences with 98&#8211;99% target PFAM identities and meaningful structural diversity. That&#8217;s real, and it matters.</p><p><strong>We&#8217;re saying they are brittle.</strong> They work when the problem fits neatly inside the training distribution. They fail&#8212;sometimes catastrophically&#8212;the moment you step outside that envelope. A trivial duplication trick collapses their predictions. They can&#8217;t see 3D contacts. They can&#8217;t compose functional constraints. And our benchmarks don&#8217;t catch any of this.</p><p>We are in the <strong>BERT-era of protein AI</strong>. These models have learned grammar&#8212;which amino acids follow which, what motifs look like, how families cluster in sequence space. They have not learned physics&#8212;how proteins fold, how residues interact across 3D space, how functional constraints compose into a working machine.</p><p>The next generation of protein AI needs to solve these problems. Whether through better architectures (state-space models and convolutional models show genuine promise), better training objectives (structure-aware pretraining), or better benchmarks that test for the right capabilities&#8212;the path forward starts with acknowledging where we actually are.</p><p>Not broken. But brittle.</p><p>Grammar without physics.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/p/protein-language-models-fluent-but?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading The Assay! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/p/protein-language-models-fluent-but?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://research.mandrake.bio/p/protein-language-models-fluent-but?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p style="text-align: center;">If you liked what you just read, consider subscribing!</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://research.mandrake.bio/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://research.mandrake.bio/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>