The vision for Lexicon Bio

9 Apr

I founded Lexicon Bio last year with the aim of building a small molecule platform for precise transcriptional editing. Having now raised a round and generated data for the early programmes, I wanted to spell that vision out. Everything Lexicon does is founded on the following three assumptions:

Assumption 1:

A good a priori estimate of the robustness of the causal chain from drug action to the end state you care about is a critical determinant of discovery programme success

Assumption 2:

For a large and important class of diseases, control of transcription is the biologically relevant end state

Assumption 3:

Control of transcription in humans is very hard

Causality in drug discovery

All drug discovery programmes are a test of the causal link between drug action X and desired outcome Y. In hindsight, the causal chain is all that matters. In practice, making this estimate prior to testing the hypothesis directly in humans is extraordinarily difficult. There is often an ocean of noise sitting between X and Y which gets compressed to a neat diagram. When we are right about the causal link, remarkable medicines can be discovered. A good example of drug discovery with robust causality is a class of targets called G-protein coupled receptors (GPCRs). These make up <5% of the human proteome (around 800 in total) and an astonishing 30 - 50% of FDA approved drug targets. Part of the reason for this is that, in many cases, the causal chain from the receptor to the end state is extremely short - either in terms of total steps or time. GPCRs emerged as useful signal transduction nodes for when your biology needs an immediate response. GPCRs control your response to exercise, to danger, to pleasure, and even to satiety. These systems are designed not to be buffered - for a large subset of GPCRs, signal = event.

If we understand the biology of the GPCR (an enormous if), we can develop very strong predictors of what will happen in a human trial. Finding binders of the isolated protein and measuring the effect in a lab really can translate smoothly into the clinic. Of course this can go wrong, GPCRs might not have the same effect in all tissue types, off-target toxicity and therapeutic windows still kill many GPCR programmes. However the broader point stands, all other things being equal, a drug programme with a robust, legible causal chain will have a higher likelihood of success in the long run (1).

The paradigm described above has another underlying assumption, that we care about deeply understanding what is going on. Here we are taking a reductionist approach to drug discovery. There are approaches which forgo understanding (at least initially) for the sake of causality. Phenotypic drug discovery uses (hopefully) robust cellular models of the disease of interest and screens unbiased libraries to identify entities which move the diseased cells towards a healthier state. Enormously successful drugs (Vertex’s CFTR correctors for example) have been discovered by establishing causal models of a disease of interest and taking an unbiased approach to screening. Unfortunately, it’s not always feasible to set up a robust cellular model of the disease you care about, with a highly causal readout, and run a phenotypic screen. Even when you are able to, finding hits with real activity and optimising them is an enormous challenge, so phenotypic assays alone can’t solve the problem of bringing shorter causal chains to drug discovery.

Transcription as a relevant end state

For many diseases, we don’t have the option of finding a GPCR which is responsible for the aberrant biology or taking a (robust) phenotypic approach. For these diseases, often what matters is the tangled base layer of biology - transcription. To an extremely high degree of fidelity, every cell in your body contains the same DNA. The reason we have hundreds (possibly thousands) of unique cell types is because what matters is not the DNA itself, but which bits are being actively transcribed. Whilst it is true there is a very large class of diseases (cancers) which have very high mutational burden, i.e. the underlying DNA has been changed, even here the end result is still an altered transcriptional profile. KRAS mutations result in hyperactive proliferative programmes where the ultimate control mechanism is still a change in the genes which are being transcribed. The most frequently mutated gene in cancer, p53, is itself a transcription factor. Loss of p53 function results in a shift in transcriptional programming and genes responsible for cell death in response to DNA damage are semi-permanently switched off. Mutations are a mechanism by which transcription is altered, the disease phenotype is still ultimately a transcriptional phenomenon.

If then, we could find a way to programmably alter transcription, perhaps we could design therapeutics with much stronger causal links to the biology that really matters in these diseases. Unfortunately, this is a process which is not at all like GPCR signalling. Transcription is a mess of interacting nodes and pathways, massively dependent on context and irreducibly complex. Transcription factors, the endogenous regulators responsible for precise transcriptional control, are notoriously hard to target directly. They often exist in a disordered state, not presenting suitable binding sites for our best intracellular drugs, small molecules. This flexibility is part of their adaptability and functionality. By interacting with different binding partners in different contexts, biology can encode one-to-many functionality and build complex organisms from a small number of parts (2).

Transcriptional editors

Programmable direct editing of transcription (i.e. therapeutics designed to deliver a specific, first-order effect on transcription of specific genes) is relatively challenging to achieve in vitro, let alone in humans. The best tools we have for doing this in vitro are protein constructs. In many cases these are modified fusions of CRISPR systems and an endogenous protein, which together target a specific region of DNA and recruit a transcriptional regulator, like a DNA methyltransferase, to activate or repress transcription at that region. This is extremely modular, very powerful, and making good progress in clinical trials. However, as with the vast majority of protein constructs, these are expensive and extremely difficult to deliver into humans, especially if you want them to work outside of the liver or in a poorly perfused tissue like a solid tumour.

Putative small molecule transcriptional editors have the opposite set of problems. Small molecules can achieve broad tissue distribution (even into the CNS) and they are incredibly cheap to make. However, they are awful at selectively binding to precise DNA sequences and using them to target (most) transcription factors has been historically incredibly difficult. Intrinsically disordered proteins don’t occupy a single state with the sort of binding site a small molecule needs to form a stable, high affinity complex. Of the subset of transcription factors we can drug with small molecules, we have almost no control over the specifics of how they work. The impact of ligand binding has to be determined empirically, you can’t simply turn a transient inhibitor of target X into an activator, or recruit a different effector to shut the target down for months instead of hours. This gives us two problems to turn small molecules into programmable transcriptional editors: scope of target, and programmability.

Targeting transcription factors

Perhaps counter-intuitively, programmability is the easier problem to solve. As several literature examples now show, you can take a ligand for a transcription factor (or indeed any chromatin-bound protein of interest) and design a bifunctional small molecule which recruits an effector of your choice (3). This has been used to convert the transcriptional repressor BCL6 into a transcriptional activator, to supercharge mutant p53, and to temporarily shut down the oestrogen receptor. Bifunctional molecules which recruit a transcriptionally active enzyme to DNA are actually not a new idea. A paper titled “Transforming ligands into transcriptional regulators: building blocks for bifunctional molecules” was published in 2011 (see A. K. Mapp paper, further reading section below). This explicitly outlines some of the uses cases above at a high level, including the specific bromodomains and repressor complexes used.

As the examples above demonstrate, and indeed with a lot of biology, proximity is a central control mechanism for protein activity and transcription is another area which might benefit enormously from the development of bifunctional small molecules. For any given mark on chromatin, there are relatively few protein effectors which control the presence, absence, or interpretation of that particular signal, far fewer than there are adaptors used by the protein degradation system (E3 ligases). To me this seems fairly intuitive. Any two random stretches of chromatin are likely, on average, structurally more similar to each other than two randomly selected proteins. As the stereoelectronics of the substrates become more similar, proximity begins to dominate as a control mechanism. If you get these effectors near chromatin for long enough, they are likely to function as nature intended.

This intuition I think extends further. Part of the reason intrinsically disordered transcription factors are so difficult to target with small molecules is that they have many low energy states they can flit between, rather than a single stable 3D structure. They already pay a significant energetic cost to preorganise a highly flexible domain into the right shape to form a new protein-protein interaction, meaning that these are often highly specific, but energetically weak interactions. The complexes formed can be fairly transient; forming, performing a function, and falling apart relatively quickly. This transience is part of the buffering built into the system, but it doesn’t mean that these interactions aren’t meaningful. The end state can propagate, with changes to chromatin spreading, or creating new signalling surfaces for other complexes to bind onto. From a reductionist perspective, this makes them incredibly difficult to target because the wider context is everything. You simply cannot measure the effect you care about outside of a fully functioning native cellular system. However, their capacity to form new interactions with downstream effects that substantially outlast the lifetime of the complex means that disordered transcription factors may well be far easier to exploit with therapeutic intent than we currently think.

Direct measurement of transcriptional readouts

If transcription is the end state you care about for a given disease, then being able to measure it is essential. For a long time, large scale transcriptomics has been helpful and informative, but rarely project critical and certainly not cheap, scaleable, or accurate enough to drive a drug discovery programme forward, especially not as a key endpoint for optimisation. However, quantitative measurements of gene transcription have collapsed in cost and increased in accuracy substantially. Apples to apples comparisons are hard, but acquiring the data is probably two orders of magnitude cheaper, the compute required for processing is now widely available, and the tools we have for running this at high throughput are incomparably better. A typical multiplexed, DRUG-seq style workflow now costs on the order of $25 - 35 per well including all the laboratory and instrument time. This means that for the same price as an IC50 curve for a compound vs a single protein, you can generate a decent estimate of the behaviour of every single gene in a cell line in response to treatment. This fundamentally changes the modalities and optimisation pathways that are practical in a drug discovery setting.

If assumptions 1 and 2 hold (causality is essential, transcription is what matters) then high throughput transcriptomics data could be an enormous shift in how we design and deliver medicines. It’s difficult to overstate how important this is going to become over the next few decades, particularly as systems able to reason over this quantity of data become widely embedded in discovery workflows.

To return to the assumptions at the start: a good a priori estimate of causality in drug discovery matters enormously, transcription is the relevant end state of a large subset of diseases, and programmably altering transcription is hard. We can add to this chain of reasoning: bifunctional small molecules might solve the historical challenges of programmability and delivery, and cheap transcriptomics now makes the effect we care about measurable. Lexicon exists to alter transcription in humans, and the first step is bifunctional small molecules. The early programmes focus on targets where the underlying biology is well understood but the modalities we have available haven’t caught up yet. Oncology, regenerative medicine, and immunology are all within immediate scope. In the longer term, the ceiling is extremely high. We should be brave enough to aim for small molecules which are dosed once a year - a gene therapy in a pill. The building blocks to make this happen are not decades away, they exist now and are waiting to be combined in the right way by the right teams. If transcription is the relevant endpoint for a disease, then Lexicon wants to build a programme in that space.

Footnotes

1 - There are plenty of examples where serendipity has delivered remarkable classes of drugs. However, clinical failure rates tell you that we are wrong a lot more often than we are right.

2 - Intrinsically disordered proteins (and disordered regions of proteins) correlate with organism complexity - for example chordata have a far higher proportion of these domains than archaea.

3 - To note a potentially obvious issue, you need to make sure that the small molecule doesn’t displace the transcription factor from the binding site you care about. This is a design strategy issue not a fundamental blocker.

Further reading

Open access GPCR review - https://www.nature.com/articles/s41392-024-01803-6

Transcription factors review - https://www.cell.com/cell/fulltext/S0092-8674(18)30106-5

Intrinsically disordered proteins review (paywall) - https://www.nature.com/articles/s41580-023-00673-0

A. K. Mapp paper (paywall) - https://pubs.rsc.org/en/content/articlelanding/2011/cs/c1cs15050b

BCL6 activators - https://www.biorxiv.org/content/10.1101/2025.03.14.643404v1.full.pdf

Mutant TP53 activators - https://pmc.ncbi.nlm.nih.gov/articles/PMC11565735/

ER repressors - https://www.biorxiv.org/content/10.1101/2025.10.22.680877v1.full.pdf

DRUG-seq - https://www.nature.com/articles/s41467-018-06500-x