Research Article

Semisupervised Gaussian Process for Automated Enzyme Search

School of Chemistry, University of Manchester, Manchester M13 9PL, U.K.
Manchester Institute of Biotechnology, University of Manchester, Manchester M13 9PL, U.K.
§ iSSB, Institute of Systems and Synthetic Biology, CNRS, University of Évry-Val-d’Essonne, 91000 Évry, France
SYNBIOCHEM Centre, Manchester Institute of Biotechnology, University of Manchester, Manchester M13 9PL, U.K.
MICALIS Institute, INRA, 78352 Jouy en Jossas, France
ACS Synth. Biol., Article ASAP
DOI: 10.1021/acssynbio.5b00294
Publication Date (Web): March 23, 2016
Copyright © 2016 American Chemical Society
OpenURL UNIV OF MANCHESTER

Abstract

Abstract Image

Synthetic biology is today harnessing the design of novel and greener biosynthesis routes for the production of added-value chemicals and natural products. The design of novel pathways often requires a detailed selection of enzyme sequences to import into the chassis at each of the reaction steps. To address such design requirements in an automated way, we present here a tool for exploring the space of enzymatic reactions. Given a reaction and an enzyme the tool provides a probability estimate that the enzyme catalyzes the reaction. Our tool first considers the similarity of a reaction to known biochemical reactions with respect to signatures around their reaction centers. Signatures are defined based on chemical transformation rules by using extended connectivity fingerprint descriptors. A semisupervised Gaussian process model associated with the similar known reactions then provides the probability estimate. The Gaussian process model uses information about both the reaction and the enzyme in providing the estimate. These estimates were validated experimentally by the application of the Gaussian process model to a newly identified metabolite in Escherichia coli in order to search for the enzymes catalyzing its associated reactions. Furthermore, we show with several pathway design examples how such ability to assign probability estimates to enzymatic reactions provides the potential to assist in bioengineering applications, providing experimental validation to our proposed approach. To the best of our knowledge, the proposed approach is the first application of Gaussian processes dealing with biological sequences and chemicals, the use of a semisupervised Gaussian process framework is also novel in the context of machine learning applied to bioinformatics. However, the ability of an enzyme to catalyze a reaction depends on the affinity between the substrates of the reaction and the enzyme. This affinity is generally quantified by the Michaelis constant KM. Therefore, we also demonstrate using Gaussian process regression to predict KM given a substrate-enzyme pair.

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acssynbio.5b00294.

  • Figure S1. Mean receiver operating characteristic curves for the 6 individual clusters formed using the top level E.C. number. Figure S2. Mean receiver operating characteristic curves for the 12 individual clusters formed using DBSCAN. (PDF)

Explore by:

Metrics

Received 21 December 2015
Published online 23 March 2016