Y. An, T.F. Hughes, D.J. Giesen, A. Chandrasekaran, M.A.F. Afzal, H.S. Kawk, K. Leswing, K. Marshall, T. Robertson, M.D. Halls
Schrodinger, Inc.,
United States
Keywords: high-throughput simulation, machine learning, reinforcement learning, OLED, HTMs, optoelectronic properties
Summary:
Hole-transporting materials (HTM) are a critical class of organic semiconductors, required for the fabrication of a variety of state-of-the-art display and semiconductor devices. Although there are a number of ways to generate ideas for new compounds in silico, such as exhaustive R-group enumeration, core hopping, and early structure-based de novo design algorithms, they share a common limitation, which is the generated compounds often do not have the desired properties, including activity, stability and others depending on the application area. Recent advancement in drug discovery to generate predicted actives against dopamine receptor type 2 using Recurrent Neural Networks (RNN) on SMILES representation of molecules, has pointed to a new path of molecular de novo design targeting specific properties. In this work, we started with 370 known OLED materials collected from commercial catalogs, then a Genetic Algorithm (GA) was used to generate a library comprised of over a million new compounds. This serves as the unexplored chemical space for novel HTMs. 10000 compounds sampled from the million compounds library are selected to run optoelectronic properties calculations using Quantum Mechanics (QM) tools from Schrӧdinger’s Materials Science Suite. These compounds are used to train QSPR models for each of the properties. Combining the prior network trained from the potential HTM chemical space with a QSPR model, the de novo design algorithm generates compounds with predicted property values. Optoelectronic property calculations from QM are used to validate the final generated novel compounds. We hope to demonstrate, by carefully defining the applicable chemical space, and providing accurate physics-based property calculation on a large number of compounds, our data-driven approach, with the aid of advanced machine learning techniques, can be expanded to many different domains to systematically discover novel materials with targeted properties.