Computational design of biocatalysts
Proteins are versatile nanostructured biomacromolecules which are used by nature in a multitude of functions: as highly active and selective catalysts, as efficient nanomachines, or as nanostructured materials with superior mechanical, electrical, or optical properties. Though in principle any protein can be conveniently produced by chemical DNA synthesis and expression of the respective gene, the application of proteins in white and red biotechnology is still limited to the naturally found proteins and variants thereof. While we understand in most cases how a single mutation changes the biochemical and biophysical properties of a protein, we are only at the very beginning of understanding the general relationship between sequence, structure, and function. A deep understanding of this relationship would enable us to predict the function of a protein from its sequence, and to design ab initio the sequence of a protein with desired properties and functions.
The design of biocatalytic reaction systems is highly complex due to the size of protein sequence space and the dependency of enzymatic properties on the protein sequence and on the reaction conditions. Due to the vast number of parameters, systematic parameter studies or Design of Experiment strategies have limited success. However, the exponentially increasing volume of protein sequence data, techniques such as high throughput experimentation, data mining, machine learning, and simulation, and the access to high performance computing resources will enable an engineering approach in biocatalysis.
In our research, we combine comprehensive data mining with extensive molecular simulations for a deeper understanding of sequence-function relationships and as a basis for the design of efficient biocatalytic systems from first principles. Typical research questions are: How to take advantage of the rapidly increasing protein sequence data? How to predict the catalytic activity and the substrate scope of an enzyme from its sequence? How to identify promising candidates from (meta-)genomics data? How model the biochemical properties of enzymes and of complex reaction mixtures from first principles? How to bridge the scales between microscopic system properties and macroscopic enzyme kinetics?
We perform molecular dynamics simulations are performed on our in-house computer cluster and on the infrastructure provided by HLRS, bwForCluster BinAC, and bwUniCluster. We use the software package openMM for molecular dynamics simulations." To study sequence-function relationships and to identify new enzymes, we develop and systematically analyze protein families using our BioCatNet database system.
A major challenge in all research projects in biocatalysis is the limited reproducibility and reusability of experimental and simulation data. Therefore, we contribute to the development of standardized exchange formats for biocatalytic data (EnzymeML) and of an open workflow platform for molecular simulations (Simulation Foundry) to make experimental and simulation data F.A.I.R.
Protein lysine methylation is a post-translational modification which is introduced by protein lysine methyltransferases (PKMTs). Currently the rules determining the substrate selection of PKMTs are not well understood.
In collaboration with international partners, we develop a bioinformatics platform based on Jupyter and PostgreSQL to generate and analyse protein family databases, to explore sequence-function relationships, and to search and design promising novel enzyme candidates.
Our current way to do biocatalytic research and development is still limited by low reproducibility of experimental results, limited scalability of experimentation, and limited access to data.
Driven by still exponentially increasing computational power, machine learning has made its way into more and more applications. However, a large amount of high-quality, structured, and machine-readable data is prerequisite to successful machine learning approaches. Therefore, data has been described as the new oil of the digital economy.