The project aims to develop, validate and commercialize new physicochemical and metabolic prediction approaches based on LINGO methods (LINGO, an Efficient Holographic Text Based Method to Calculate Biophysical Properties and Intermolecular Similarities, J. Chem. Inf. Model. 2005, 45, 386-393) and to integrate them in a new lead discovery software package based on these predictions, to be delivered to the pharmaceutical, biotechnology, chemical and other related industries as desktop and server-client software applications and as Software as a Service. LINGOs have been shown to successfully predict phase partition data (logP, logS), pharmacological parameters (logBB), multi-target activity, and other ADME-Tox relevant properties.
In addition, the consortium have the ability to acquire new experimental datasets and use them to develop novel prediction models which will allow the prediction of properties not currently available in the market, and of very high commercial value, such as:
- Blood-Brain-Partition Coefficient
- Free Brain Fraction
- Human Serum Albumin Affinity Constant
- Alpha Acid Glycoprotein Affinity Constant
- Membrane Affinity Constant
- Per cent Plasma Protein Binding
- Free Plasma Fraction
The architecture to be used in the development will allow performing searches and property calculations at very high speeds, enabling the manipulations of libraries of compounds, which contain millions of entities, more effectively. Unlike usually used graph-based fragment-decomposition methods, LINGOs use overlapping text fragments of the linear SMILES string as descriptors. The graph-based algorithms are NP-complete, and therefore slow, but LINGO methods linearise this problem, remove this bottleneck and do not require the time-consuming generation of 2D or 3D chemical structures. LINGO-based compound similarity measurements retrieve bioisosteres, i.e., compounds with similar pharmacological profiles, at a very high precision rate. LINGO methods have been proven to work for very large datasets and numbers of queries. They provide a straightforward way of defining compound-intrinsic descriptors, or potential information attributes, that are compatible with state-of-the-art statistical large scale analyses including PLS, decision and regression trees, random forests but also for small datasets through neighborhood-based predictions. LINGO methods can, thus, generate prediction models on the basis of large scale data (PLS, trees, forests) or just a single reference compound (neighborhood-based). There is a real need in the pharmaceutical, biotechnology, chemical and other related industries for new predictive tools which have been evaluated by experts with relevant experimental data. The structure and nature of the collaboration will allow all the theoretical advances postulated to be translated into easy to use and modify software applications and to be confirmed through experimentation. LINGO methods are perfectly suited for machine-learning, easily incorporate new data and allow local and global improvement of the models. While some of the models require supercomputer-clusters (available at Origenis), eventually in combination with GPU methods, for the generation of the models, desktop computers are sufficient to run the models on large queries. The number of properties to be incorporated into the predictive package range from molecular descriptors, such us permeability of molecules to more complex interactions such us susceptibility to certain types of metabolism. Ultimately end users will be offered one environment under which they will be able to predict, evaluate and redesign the molecules being researched.