Revisão das Tecnologias de Inteligência Artificial e Machine/Deep Learning: Restrições, Oportunidades, Estado da Arte e Desafios
Resumo
A utilização de algoritmos de aprendizagem de máquina tem aumentadoexponencialmente na pesquisa científica, especialmente devido a avanços recentes emtécnicas de aprendizado profundo. Aqui, serão discutidas aplicações desses algoritmos naquímica e em outras áreas da ciência, com foco em redes neurais artificiais. Essas redestêm a capacidade de automatizar todas as etapas do processo de aprendizado demáquina,incluindo a classificação e a predição de propriedades químicas. Será fornecida uma visãohistórica do desenvolvimento desses algoritmos, desde a década de 1940 até os dias atuais,com destaque para aplicações em áreas como desenvolvimento de medicamentos, ciência de materiais e técnicas de análise autônomas. Aspectos importantes desses algoritmos serãodiscutidos em detalhes. Além disso, será abordado o processo de vetorização molecular, essencial para o tratamento de dados químicos, e alguns caracterizadores moleculares serão discutidos em particular. Em conclusão, será fornecida uma visão abrangente das aplicaçõesdos algoritmos de aprendizado de máquina na química, juntamente com suas limitações e desafios associados à sua implementação, destacando seu potencial transformador quando
utilizado de maneira responsável e ética.
Referências
1. Aires-de-Sousa J, Hemmer MC, Gasteiger J. Prediction of 1H NMR chemical shifts using neural networks. Anal Chem. 2002;74(1):80-90.
2. Zupan J, Gasteiger J. Neural Networks in Chemistry and Drug Design. Published online 1999:400.
3. Kwon S, Bae H, Jo J, Yoon S. Comprehensive ensemble in QSAR prediction for drug discovery. BMC Bioinformatics. 2019;20(1):1-12.
4. Balaban AT. Neural Networks in QSAR and Drug Design. J Chem Inf Comput Sci. 1997;37(3).
5. Mundim KC, Tsallis C. Geometry optimization and conformational analysis through generalized simulated annealing. Int J Quantum Chem. 1996;58(4):373-381.
6. Behler J. Neural network potential-energy surfaces in chemistry: A tool for large-scale simulations. Physical Chemistry Chemical Physics. 2011;13(40):17930-17955.
7. Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444.
8. Adeel A, Gogate M, Hussain A. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Information Fusion. 2020;59:163-170.
9. Tian H, Chen SC, Shyu ML. Evolutionary Programming Based Deep Learning Feature Selection and Network Construction for Visual Data Classification. Information Systems Frontiers. 2020;22(5):1053-1066.
10. Young T, Hazarika D, Poria S, Cambria E. Recent trends in deeplearning based natural language processing [Review Article]. IEEE Comput Intell Mag. 2018;13(3):55-75.
11. Koppe G, Meyer-Lindenberg A, Durstewitz D. Deep learning for small and big data in psychiatry. Neuropsychopharmacology. 2021;46(1):176-190.
12. Ramsundar B, Riley P, Webster D, Konerding D, Edu KS, Edu PS. Massively Multitask Networks for Drug Discovery. 2015;(Icml).
13. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model. 2015;55(2):263-274.
14. Hamza H, Salim N, Nasser M, Saeed F. AtomNet: A Deep Learning Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. Published online 2020:21-37.
15. Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics: The prediction of aqueous solubility for drug-like molecules. J Chem Inf Model. 2013;53(7):1563-1575.
16. Mobley DL, Wymer KL, Lim NM, Guthrie JP. Blind prediction of solvation free energies from the SAMPL4 challenge. J Comput Aided Mol Des. 2014;28(3):135-150.
17. Delaney JS. ESOL: Estimating aqueous solubility directly from molecular structure. J Chem Inf Comput Sci. 2004;44(3):1000-1005.
18. Mobley DL, Guthrie JP. FreeSolv: A database of experimental and calculated hydration free energies, with input files. J Comput Aided Mol Des. 2014;28(7):711-720.
19. Rupp M, Tkatchenko A, Müller KR, Von Lilienfeld OA. Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett. 2012;108(5):1-5.
20. Montavon G, Rupp M, Gobre V, et al. Machine learning of molecular electronic properties in chemical compound space. New J Phys. 2013;15:0-16.
21. McGibbon RT, Taube AG, Donchev AG, et al. Improving the accuracy of Møller-Plesset perturbation theory with neural networks. Journal of Chemical Physics. 2017;147(16).
22. Schütt KT, Arbabzadah F, Chmiela S, Müller KR, Tkatchenko A. Quantum-chemical insights from deep tensor neural networks. Nat Commun. 2017;8(0):1-21.
23. Blum LC, Reymond JL. 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009;131(25):8732-8733.
24. Gaulton A, Kale N, Van Westen GJP, et al. A large-scale crop protection bioassay data set. Sci Data. 2015;2.
25. Wang Y, Xiao J, Suzek TO, et al. PubChem’s BioAssay database. Nucleic Acids Res. 2012; 40 (D1).
26. Goos G, Hartmanis J, Van J, et al. LNCS 5342 - Structural, Syntactic, and Statistical Pattern Recognition.; 2008.
27. Wang R, Fang X, Lu Y, Yang CY, Wang S. The PDBbind database: Methodologies and updates. J Med Chem. 2005;48(12):4111-4119.
28. Huang R, Xia M, Nguyen DT, et al. Tox21 challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front Environ Sci. 2016;3(JAN).
29. Mayr A, Klambauer G, Unterthiner T, Hochreiter S. DeepTox: Toxicity prediction using deep learning. Front Environ Sci. 2016;3(FEB).
30. Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016.
31. Mcculloch WS, Pitts W. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics. 1943;5:115-133.
32. Rosenblatt F. The Perceptron - A Perceiving and Recognizing Automaton. Report 85, Cornell Aeronautical Laboratory. Published online 1957:460-461.
33. Minsky M. A Neural-Analogue Calculator Based upon a Probability Model of Reinforcement. Harvard University Psychological Laboratories,. Published online 195
34. Widrow B. An Adaptive “Adaline” Neuron Using Chemical “Memistors.” Stanford Electronics Laboratories Technical Report. Published online 1960:1553-2.
35. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323(6088):533-536.
36. LeCun Y, Boser B, Denker JS, et al. Handwritten Digit Recognition with a Back-Propagation Network. AT&T Bell Laboratories. 1989;(07733):396-404.
37. Weng J, Ahuja N, Huang TS. Cresceptron: A Self-Organizing NeuralNetwork Which Grows Adaptively. RN. 1992;63(2):576-581.
38. Hearst MartiA, Scholkopf Bernhard, Dumais Susan, Osuna Edgar, Platt J. Supprot vector machines. IEEE Intelligent Systems and their Applications. 1998;13(4):18-28.
39. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997;(9):1735–1780.
40. Graves A, Mohamed A rahman, Hinton G. Speech Recognition With Deep Recurrent Neural Networks. IEEE. 2013;(3).
41. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-Based LearningApplied to Document Recognition. proc OF THE IEEE. Published online 1998.
42. Fei-Fei L, Deng J, Li K. ImageNet: A Large-Scale Hierarchical Image Database. Journal of Vision - IEEE. 2009;9(8):1037-1037.
43. Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge. Published online 2015.
44. Krizhevsky A, Sutsekever I, Hinton GE. ImageNet Classification with Deep Convolution Neural Networks. Adv Neural Inf Process Syst. 2012;60(6):84-90.
45. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. Published online 2015:1-14.
46. Targ S, Almeida D, Lyman K. Resnet in Resnet: Generalizing Residual Architectures.
47. Canziani A, Culurciello E, Paszke A. An Analysys of Deep Neural Netowrk Models for Practical Applications. Published online 2017:1-7.
48. Ramsundar B, Eastman P, Walters P, Pande V. Deep Learning for the Life Science. 1st ed. (Tache N, Loukides M, Tozer K, Head R, eds.). O’Reilly Media; 2019
49. Simon S. Haykin. Neural Networks and Learning Machines. Vol 10. Prentice Hall; 2009.
50. Kingma DP, Ba JL. Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. ; 2015:1-15.
51. Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D, eds. Proceedings of the 30th International Conference on Machine Learning. PMLR; 2013:1139-1147.
52. Nesterov Y. A method for solving the convex programming problem with convergence rate O(1/k^2). Proceedings of the USSR Academy of Sciences. 1983;269:543-547.
53. Duchi J, Hazan E, Singer Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research. 2011;12(61):2121-2159.
54. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. 2012;4(2):26-31.
55. Mannor S, Peleg D, Rubinstein R. The Cross Entropy Method for Classification. In: Proceedings of the 22nd International Conference on Machine Learning. Association for Computing Machinery; 2005:561–568.
56. Goutte C, Gaussier E. Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research. Springer-Verlag; 2005:345–359.
57. Henaff M, Bruna J, LeCun Y. Deep Convolutional Networks on Graph-Structured Data.
58. Mcgregor MJ, Pallai P v. Clustering of Large Databases of Compounds: Using the MDL “Keys” as Structural Descriptors. J Chem Inf Comput Sci. 1997;37(3):443-448.
59. Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742-754.
60. Hazra R, Hazra P. Prediction of molecular energy using Coulomb matrix and Graph Neural Network. J Emerg Investig. 2022;5.
61. Elton DC, Boukouvalas Z, Butrico MS, Fuge MD, Chung PW. Applying machine learning techniques to predict the properties of energetic materials. Sci Rep. 2018;8(1).
62. Pandey M, Radaeva M, Mslati H, et al. Ligand Binding Prediction Using Protein Structure Graphs and Residual Graph Attention Networks. Molecules. 2022;27(16).
63. Wu G, Robertson DH, Brooks CL, Vieth M. Detailed analysis of grid-based molecular docking: A case study of CDOCKER - A CHARMm-based MD docking algorithm. J Comput Chem. 2003;24(13):1549-1562.
64. Bartók AP, De S, Poelking C, et al. Machine Learning Unifies the Modeling of Materials and Molecules.; 2017.
65. Zhang L, Han J, Wang H, Saidi WA, Car R. End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems. In: 32nd Conference on Neural Information Processing Systems. ; 2018.
66. Lavecchia A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today. 2019;24(10):2017-2032.
67. Choudhary K, DeCost B, Chen C, et al. Recent advances and applications of deep learning methods in materials science. NPJ Comput Mater. 2022;8(1).
68. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241-1250.
69. Goh GB, Hodas NO, Vishnu A. Deep learning for computational chemistry. J Comput Chem. 2017;38(16):1291-1307.
70. Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Networks.
71. Brown TB, Mann B, Ryder N, et al. Language Models are FewShot Learners.