DC Science Series #7

DC Science Series #7 - Angelina Yurchenko (DC4)

Welcome to the SYNSENSO DC Science Series. In this blog post, Angelina shares her research interests with us. Enjoy!

Optimizing Cell-Free Systems: Bayesian Optimization with Transfer Learning

Cell-free synthetic biology enables rapid prototyping and characterization of biological systems without the constraints of maintaining living cells [1]. However, optimizing these systems, whether maximizing protein yields, improving biosensor sensitivity, or enhancing metabolic pathway flux, requires navigating a complex design space of buffer conditions, component concentrations, DNA ratios, and reaction temperatures. Traditional optimization approaches typically rely on one-factor-at-a-time (OFAT) experiments, grid search over selected parameters, or design-of-experiments methods such as fractional factorial designs [2]. While these methods have proven useful, they become experimentally intractable as the number of parameters increases, often requiring hundreds of experiments to adequately explore the design space.

Bayesian optimization addresses this challenge by building a probabilistic model of the system that improves with each experiment, strategically selecting which conditions to test next to maximize information gain [3]. When combined with transfer learning, which leverages historical data from previously optimized related systems (for example, data from expressing a homologous protein, or optimization results from a closely related cell extract), this approach can dramatically reduce the experimental burden of optimization [4-6].

Standard Bayesian Optimization

In Bayesian optimization (Image 2), we begin with a limited set of experimental measurements (green samples). A surrogate model, typically a Gaussian process [7], learns the underlying landscape of our objective function across the design space. The acquisition function then determines where to sample next. In simple terms, it scores each untested point by how promising it looks: points in high-performing regions score well (exploitation), as do points where we’re very uncertain about performance (exploration). This balance ensures we don’t just refine known good conditions but also discover potentially better regions we haven’t explored yet.

This iterative cycle continues until we identify optimal conditions. Compared to traditional methods like grid search, which tests points uniformly regardless of what we’ve learned, or OFAT approaches, which ignore parameter interactions, Bayesian optimization adapts its search strategy based on all available data, typically achieving good performance in 20 to 50 experiments rather than hundreds [8,9].

Transfer Learning in Bayesian Optimization

Transfer learning builds on standard Bayesian optimization by exploiting structural similarities between related problems (Image 2). Consider a scenario where we’ve previously optimized GFP expression in an E. coli-based cell extract (the source system, shown in green). When we move to optimizing a related protein, such as mKate (the target system shown in orange), we can leverage that prior knowledge rather than starting from scratch.

The algorithm uses the source data to shape its initial expectations about the target system. As new experiments on the target system are performed, the model learns the relationship between the source and target systems, determining how much to rely on transferred knowledge versus new observations. This approach has been particularly effective when the source and target share underlying biochemical mechanisms [5,6].

Practical Benefits

This combined approach changes how we think about optimization in cell-free systems. Instead of treating each new system as an independent challenge requiring extensive screening, we can systematically build on accumulated knowledge. Researchers working on iterative design cycles, such as refining circuits, scaling between extract preparations, or adapting systems for different applications, stand to benefit most. The reduction in required experiments translates directly into faster development timelines and more efficient resource use, ultimately allowing us to tackle more ambitious synthetic biology projects.

Bibliography

D. Silverman, A.D. et al., Cell-free gene expression: an expanded repertoire of applications. Nat Rev Genet 21, 2020.
Gilman et al., Statistical Design of Experiments for Synthetic Biology, ACS Synthetic Biology, 2021
Balandat, et al., BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. Advances in Neural Information Processing Systems 33, 2020.
Bai et al., Transfer Learning for Bayesian Optimization: A Survey, arXiv, 2023
Sedgwick, et al., Transfer learning Bayesian optimization for competitor DNA molecule design for use in diagnostic assays. Biotechnology and Bioengineering, 2024.
Zhu et al., AI-driven high-throughput droplet screening of cell-free gene expression, Nature Communications, 2025
E. Rasmussen et al., Gaussian Processes for Machine Learning, MIT Press, 2006
Pandi et al., A versatile active learning workflow for optimization of genetic and metabolic networks, Nature Communications, 2022
Borkowski et al., Large scale active-learning-guided exploration for in vitro protein production optimization, Nature Communications, 2020

Text by Angelina Yurchenko, DC4. To find out more about Angelina, visit her profile.