Le Guilloux, V., Colliandre, L., Bourg, S., Guenegou, G., Dubois-Chevalier, J. and Morin-Allory, L.
Journal of Chemical Information and Modeling 51 (8) 1762-1774
High-throughput screening (HTS) is a well-established technology which can test up to several million compounds in a few weeks. Despite these appealing capabilities, available resources and high costs may limit the number of molecules screened, making diversity analysis a method of choice to design and prioritize screening libraries. With a constantly increasing number of molecules available for screening, chemical space has become a key concept for visualizing, analyzing, and comparing chemical libraries. In this first article, we present a new method to build delimited reference chemical subspaces (DRCS). A set of 16 million screening compounds from 73 chemical providers has been gathered, resulting in a database of 6.63 million standardized and unique molecules. These molecules have been used to create three DRCS using three different sets of chemical descriptors. A robust principal component analysis model for each space has been obtained, whereby molecules are projected in a reduced two-dimensional viewable space. The specificity of our approach is that each reduced space has been delimited by a representative contour encompassing a very large proportion of molecules and reflecting its overall shape. The methodology is illustrated by mapping and comparing various chemical libraries. Several tools used in these studies are made freely available, thus enabling any user to compute DRCS matching specific requirements.