Bioinformatics group

Computational design of biocatalysts

Proteins are versatile nanostructured biomacromolecules which are used by nature in a multitude of functions: as highly active and selective catalysts, as efficient nanomachines, or as nanostructured materials with superior mechanical, electrical, or optical properties. Though in principle any protein can be conveniently produced by chemical DNA synthesis and expression of the respective gene, the application of proteins in white and red biotechnology is still limited to the naturally found proteins and variants thereof. While we understand in most cases how a single mutation changes the biochemical and biophysical properties of a protein, we are only at the very beginning of understanding the general relationship between sequence, structure, and function. A deep understanding of this relationship would enable us to predict the function of a protein from its sequence, and to design ab initio the sequence of a protein with desired properties and functions.

The design of biocatalytic reaction systems is highly complex due to the size of protein sequence space and the dependency of enzymatic properties on the protein sequence and on the reaction conditions. Due to the vast number of parameters, systematic parameter studies or Design of Experiment strategies have limited success. However, the exponentially increasing volume of protein sequence data, techniques such as high throughput experimentation, data mining, machine learning, and simulation, and the access to high performance computing resources will enable an engineering approach in biocatalysis.

In our research, we combine comprehensive data mining with extensive molecular simulations for a deeper understanding of sequence-function relationships and as a basis for the design of efficient biocatalytic systems from first principles. Typical research questions are: How to take advantage of the rapidly increasing protein sequence data? How to predict the catalytic activity and the substrate scope of an enzyme from its sequence? How to identify promising candidates from (meta-)genomics data? How model the biochemical properties of enzymes and of complex reaction mixtures from first principles? How to bridge the scales between microscopic system properties and macroscopic enzyme kinetics?

We perform molecular dynamics simulations are performed on our in-house computer cluster and on the infrastructure provided by HLRS, bwForCluster BinAC, and bwUniCluster. We use the software package openMM for molecular dynamics simulations." To study sequence-function relationships and to identify new enzymes, we develop and systematically analyze protein families using our BioCatNet database system.

A major challenge in all research projects in biocatalysis is the limited reproducibility and reusability of experimental and simulation data. Therefore, we contribute to the development of standardized exchange formats for biocatalytic data (EnzymeML) and of an open workflow platform for molecular simulations (Simulation Foundry) to make experimental and simulation data F.A.I.R.

Ongoing Projects

Designing aqueous deep eutectic mixtures by data-integrated simulation

In collaboration with Niels Hansen (University of Stuttgart), we develop molecular dynamics simulation workflows to model thermophysical properties (density, viscosity, solubility) of deep eutectic solvents.

More Information

Understanding and designing substrate specificity of methyltransferases by molecular dynamics simulations and experiment

Protein lysine methylation is a post-translational modification which is introduced by protein lysine methyltransferases (PKMTs). Currently the rules determining the substrate selection of PKMTs are not well understood.

More Information

PAZy, a protein family database on plastics active enzymes

In collaboration with Wolfgang Streit (University of Hamburg) we develop a database on plastics-degrading enzymes and apply it to find and design novel active enzymes.

More Information

PyEED (Python Enzyme Engineering Database)

In collaboration with international partners, we develop a bioinformatics platform based on Jupyter and PostgreSQL to generate and analyse protein family databases, to explore sequence-function relationships, and to search and design promising novel enzyme candidates.

More Information

EnzymeML - a data exchange format for biocatalysis and enzymology

Our current way to do biocatalytic research and development is still limited by low reproducibility of experimental results, limited scalability of experimentation, and limited access to data.

More Information

FAIR research data management in chemistry

Driven by still exponentially increasing computational power, machine learning has made its way into more and more applications. However, a large amount of high-quality, structured, and machine-readable data is prerequisite to successful machine learning approaches. Therefore, data has been described as the new oil of the digital economy.

More Information

Integration of sequence and reaction data for the design and engineering of SAM–dependent enzymes

In this project, an integrated computational platform for analysing data on sequence, structure, and function of enzymes will be developed and applied to design tailored S-adenosylmethionine (SAM) – dependent enzymes with an altered substrate profile. The project is part of the DFG-funded Forschungsgruppe FOR 5596 "Unfolding the potential of S-adenosylmethionine-dependent enzyme chemistry"

More Information