Revisão das Tecnologias de Inteligência Artificial e Machine/Deep Learning: Restrições, Oportunidades, Estado da Arte e Desafios
Resumo
A utilização de algoritmos de aprendizagem de máquina tem aumentadoexponencialmente na pesquisa científica, especialmente devido a avanços recentes emtécnicas de aprendizado profundo. Aqui, serão discutidas aplicações desses algoritmos naquímica e em outras áreas da ciência, com foco em redes neurais artificiais. Essas redestêm a capacidade de automatizar todas as etapas do processo de aprendizado demáquina,incluindo a classificação e a predição de propriedades químicas. Será fornecida uma visãohistórica do desenvolvimento desses algoritmos, desde a década de 1940 até os dias atuais,com destaque para aplicações em áreas como desenvolvimento de medicamentos, ciência de materiais e técnicas de análise autônomas. Aspectos importantes desses algoritmos serãodiscutidos em detalhes. Além disso, será abordado o processo de vetorização molecular, essencial para o tratamento de dados químicos, e alguns caracterizadores moleculares serão discutidos em particular. Em conclusão, será fornecida uma visão abrangente das aplicaçõesdos algoritmos de aprendizado de máquina na química, juntamente com suas limitações e desafios associados à sua implementação, destacando seu potencial transformador quando
utilizado de maneira responsável e ética.
Referências
chemical shifts using neural networks. Anal Chem. 2002;74(1):80-
90. doi:10.1021/ac010737m
2. Zupan J, Gasteiger J. Neural Networks in Chemistry and Drug
Design. Published online 1999:400. doi:3-527-29779-0
3. Kwon S, Bae H, Jo J, Yoon S. Comprehensive ensemble in QSAR
prediction for drug discovery. BMC Bioinformatics. 2019;20(1):1-
12. doi:10.1186/s12859-019-3135-4
4. Balaban AT. Neural Networks in QSAR and Drug Design. J Chem
Inf Comput Sci. 1997;37(3).
5. Mundim KC, Tsallis C. Geometry optimization and conformational
analysis through generalized simulated annealing. Int J
Quantum Chem. 1996;58(4):373-381. doi:10.1002/(sici)1097-
461x(1996)58:4<373::aid-qua6>3.0.co;2-v
6. Behler J. Neural network potential-energy surfaces in chemistry:
A tool for large-scale simulations. Physical Chemistry Chemical
Physics. 2011;13(40):17930-17955. doi:10.1039/c1cp21668f
7. Lecun Y, Bengio Y, Hinton G. Deep learning. Nature.
2015;521(7553):436-444. doi:10.1038/nature14539
8. Adeel A, Gogate M, Hussain A. Contextual deep learning-based
audio-visual switching for speech enhancement in real-world
environments. Information Fusion. 2020;59:163-170. doi:https://doi.
org/10.1016/j.inffus.2019.08.008
9. Tian H, Chen SC, Shyu ML. Evolutionary Programming Based
Deep Learning Feature Selection and Network Construction
for Visual Data Classification. Information Systems Frontiers.
2020;22(5):1053-1066. doi:10.1007/s10796-020-10023-6
10. Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep
learning based natural language processing [Review Article].
IEEE Comput Intell Mag. 2018;13(3):55-75. doi:10.1109/
MCI.2018.2840738
11. Koppe G, Meyer-Lindenberg A, Durstewitz D. Deep learning
for small and big data in psychiatry. Neuropsychopharmacology.
2021;46(1):176-190. doi:10.1038/s41386-020-0767-z
12. Ramsundar B, Riley P, Webster D, Konerding D, Edu KS, Edu PS.
Massively Multitask Networks for Drug Discovery. 2015;(Icml).
13. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. Deep neural nets as
a method for quantitative structure-activity relationships. J Chem Inf
Model. 2015;55(2):263-274. doi:10.1021/ci500747n
14. Hamza H, Salim N, Nasser M, Saeed F. AtomNet: A Deep Learning
Neural Network for Bioactivity Prediction in Structure-based
Drug Discovery. Published online 2020:21-37. doi:10.5121/
csit.2020.100203
15. Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in
chemoinformatics: The prediction of aqueous solubility for drug-like
molecules. J Chem Inf Model. 2013;53(7):1563-1575. doi:10.1021/
ci400187y
16. Mobley DL, Wymer KL, Lim NM, Guthrie JP. Blind prediction of
solvation free energies from the SAMPL4 challenge. J Comput Aided
Mol Des. 2014;28(3):135-150. doi:10.1007/s10822-014-9718-2
17. Delaney JS. ESOL: Estimating aqueous solubility directly from
molecular structure. J Chem Inf Comput Sci. 2004;44(3):1000-1005.
doi:10.1021/ci034243x
18. Mobley DL, Guthrie JP. FreeSolv: A database of experimental and
calculated hydration free energies, with input files. J Comput Aided
Mol Des. 2014;28(7):711-720. doi:10.1007/s10822-014-9747-x
19. Rupp M, Tkatchenko A, Müller KR, Von Lilienfeld OA. Fast
and accurate modeling of molecular atomization energies with
machine learning. Phys Rev Lett. 2012;108(5):1-5. doi:10.1103/
PhysRevLett.108.058301
20. Montavon G, Rupp M, Gobre V, et al. Machine learning of molecular
electronic properties in chemical compound space. New J Phys.
2013;15:0-16. doi:10.1088/1367-2630/15/9/095003
21. McGibbon RT, Taube AG, Donchev AG, et al. Improving the
accuracy of Møller-Plesset perturbation theory with neural networks.
Journal of Chemical Physics. 2017;147(16). doi:10.1063/1.4986081
22. Schütt KT, Arbabzadah F, Chmiela S, Müller KR, Tkatchenko A.
Quantum-chemical insights from deep tensor neural networks. Nat
Commun. 2017;8(0):1-21. doi:10.1038/ncomms13890
23. Blum LC, Reymond JL. 970 Million druglike small molecules for
virtual screening in the chemical universe database GDB-13. J Am
Chem Soc. 2009;131(25):8732-8733. doi:10.1021/ja902302h
24. Gaulton A, Kale N, Van Westen GJP, et al. A large-scale crop
protection bioassay data set. Sci Data. 2015;2. doi:10.1038/
sdata.2015.32
25. Wang Y, Xiao J, Suzek TO, et al. PubChem’s BioAssay database.
Nucleic Acids Res. 2012;40(D1). doi:10.1093/nar/gkr1132
26. Goos G, Hartmanis J, Van J, et al. LNCS 5342 - Structural,
Syntactic, and Statistical Pattern Recognition.; 2008.
27. Wang R, Fang X, Lu Y, Yang CY, Wang S. The PDBbind database:
Methodologies and updates. J Med Chem. 2005;48(12):4111-4119.
doi:10.1021/jm048957q
28. Huang R, Xia M, Nguyen DT, et al. Tox21 challenge to build
predictive models of nuclear receptor and stress response pathways
as mediated by exposure to environmental chemicals and drugs.
Front Environ Sci. 2016;3(JAN). doi:10.3389/fenvs.2015.00085
29. Mayr A, Klambauer G, Unterthiner T, Hochreiter S. DeepTox:
Toxicity prediction using deep learning. Front Environ Sci.
2016;3(FEB). doi:10.3389/fenvs.2015.00080
30. Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs
and side effects. Nucleic Acids Res. 2016;44(D1):D1075-D1079.
doi:10.1093/nar/gkv1075
31. Mcculloch WS, Pitts W. A Logical Calculus of the Ideas Immanent in
Nervous Activity. Bulletin of Mathematical Biophysics. 1943;5:115-
133. doi:10.1007/bf02478259
32. Rosenblatt F. The Perceptron - A Perceiving and Recognizing
Automaton. Report 85, Cornell Aeronautical Laboratory. Published
online 1957:460-461.
33. Minsky M. A Neural-Analogue Calculator Based upon a Probability
Model of Reinforcement. Harvard University Psychological
Laboratories,. Published online 195
34. Widrow B. An Adaptive “Adaline” Neuron Using Chemical
“Memistors.” Stanford Electronics Laboratories Technical Report.
Published online 1960:1553-2.
35. Rumelhart DE, Hinton GE, Williams RJ. Learning representations
by back-propagating errors. Nature. 1986;323(6088):533-536.
doi:10.1038/323533a0
36. LeCun Y, Boser B, Denker JS, et al. Handwritten Digit Recognition
with a Back-Propagation Network. AT&T Bell Laboratories.
1989;(07733):396-404.
37. Weng J, Ahuja N, Huang TS. Cresceptron: A Self-Organizing Neural
Network Which Grows Adaptively. RN. 1992;63(2):576-581.
doi:10.1109/IJCNN.1992.287150
38. Hearst MartiA, Scholkopf Bernhard, Dumais Susan, Osuna Edgar,
Platt J. Supprot vector machines. IEEE Intelligent Systems and their
Applications. 1998;13(4):18-28.
39. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural
Comput. 1997;(9):1735–1780. doi:10.1162/neco.1997.9.8.1735
40. Graves A, Mohamed A rahman, Hinton G. Speech Recognition With
Deep Recurrent Neural Networks. IEEE. 2013;(3). doi:https://doi.
org/10.1109/ICASSP.2013.6638947
41. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-Based Learning
Applied to Document Recognition. proc OF THE IEEE. Published
online 1998. http://ieeexplore.ieee.org/document/726791/#full-textsection
42. Fei-Fei L, Deng J, Li K. ImageNet: A Large-Scale Hierarchical
Image Database. Journal of Vision - IEEE. 2009;9(8):1037-1037.
doi:https://doi.org/10.1109/CVPR.2009.5206848
43. Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual
Recognition Challenge. Published online 2015.
44. Krizhevsky A, Sutsekever I, Hinton GE. ImageNet Classification
with Deep Convolution Neural Networks. Adv Neural Inf Process
Syst. 2012;60(6):84-90. doi:http://dx.doi.org/10.1145/3065386
45. Simonyan K, Zisserman A. Very deep convolutional networks
for large-scale image recognition. 3rd International Conference
on Learning Representations, ICLR 2015 - Conference Track
Proceedings. Published online 2015:1-14.
46. Targ S, Almeida D, Lyman K. Resnet in Resnet: Generalizing
Residual Architectures. Published online 2016:1-7. http://arxiv.org/
abs/1603.08029
47. Canziani A, Culurciello E, Paszke A. An Analysys of Deep Neural
Netowrk Models for Practical Applications. Published online 2017:1-
7.
48. Ramsundar B, Eastman P, Walters P, Pande V. Deep Learning for the
Life Science. 1st ed. (Tache N, Loukides M, Tozer K, Head R, eds.).
O’Reilly Media; 2019
49. Simon S. Haykin. Neural Networks and Learning Machines. Vol 10.
Prentice Hall; 2009.
50. Kingma DP, Ba JL. Adam: A method for stochastic optimization. In:
3rd International Conference on Learning Representations, ICLR
2015 - Conference Track Proceedings. ; 2015:1-15.
51. Sutskever I, Martens J, Dahl G, Hinton G. On the importance of
initialization and momentum in deep learning. In: Dasgupta S,
McAllester D, eds. Proceedings of the 30th International Conference
on Machine Learning. PMLR; 2013:1139-1147.
52. Nesterov Y. A method for solving the convex programming problem
with convergence rate O(1/k^2). Proceedings of the USSR Academy
of Sciences. 1983;269:543-547.
53. Duchi J, Hazan E, Singer Y. Adaptive Subgradient Methods for
Online Learning and Stochastic Optimization. Journal of Machine
Learning Research. 2011;12(61):2121-2159.
54. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient
by a running average of its recent magnitude. COURSERA: Neural
networks for machine learning. 2012;4(2):26-31.
55. Mannor S, Peleg D, Rubinstein R. The Cross Entropy Method for
Classification. In: Proceedings of the 22nd International Conference
on Machine Learning. Association for Computing Machinery;
2005:561–568.
56. Goutte C, Gaussier E. Probabilistic Interpretation of Precision,
Recall and F-Score, with Implication for Evaluation. In: Proceedings
of the 27th European Conference on Advances in Information
Retrieval Research. Springer-Verlag; 2005:345–359.
57. Henaff M, Bruna J, LeCun Y. Deep Convolutional Networks on
Graph-Structured Data. Published online 2015:1-10. http://arxiv.org/
abs/1506.05163
58. Mcgregor MJ, Pallai P v. Clustering of Large Databases of
Compounds: Using the MDL “Keys” as Structural Descriptors.
J Chem Inf Comput Sci. 1997;37(3):443-448. doi:https://doi.
org/10.1021/ci960151e
59. Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf
Model. 2010;50(5):742-754. doi:10.1021/ci100050t
60. Hazra R, Hazra P. Prediction of molecular energy using Coulomb
matrix and Graph Neural Network. J Emerg Investig. 2022;5.
61. Elton DC, Boukouvalas Z, Butrico MS, Fuge MD, Chung PW.
Applying machine learning techniques to predict the properties of
energetic materials. Sci Rep. 2018;8(1). doi:10.1038/s41598-018-
27344-x
62. Pandey M, Radaeva M, Mslati H, et al. Ligand Binding Prediction
Using Protein Structure Graphs and Residual Graph Attention
Networks. Molecules. 2022;27(16). doi:10.3390/molecules2716511
63. Wu G, Robertson DH, Brooks CL, Vieth M. Detailed analysis
of grid-based molecular docking: A case study of CDOCKER -
A CHARMm-based MD docking algorithm. J Comput Chem.
2003;24(13):1549-1562. doi:10.1002/jcc.10306
64. Bartók AP, De S, Poelking C, et al. Machine Learning Unifies
the Modeling of Materials and Molecules.; 2017. doi:https://doi.
org/10.1063/1.5126336
65. Zhang L, Han J, Wang H, Saidi WA, Car R. End-to-end Symmetry
Preserving Inter-atomic Potential Energy Model for Finite and
Extended Systems. In: 32nd Conference on Neural Information
Processing Systems. ; 2018.
66. Lavecchia A. Deep learning in drug discovery: opportunities,
challenges and future prospects. Drug Discov Today.
2019;24(10):2017-2032. doi:10.1016/j.drudis.2019.07.006
67. Choudhary K, DeCost B, Chen C, et al. Recent advances and
applications of deep learning methods in materials science. NPJ
Comput Mater. 2022;8(1). doi:10.1038/s41524-022-00734-6
68. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The
rise of deep learning in drug discovery. Drug Discov Today.
2018;23(6):1241-1250. doi:10.1016/j.drudis.2018.01.039
69. Goh GB, Hodas NO, Vishnu A. Deep learning for computational
chemistry. J Comput Chem. 2017;38(16):1291-1307. doi:10.1002/
jcc.24764
70. Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative
Adversarial Networks. Published online June 10, 2014. http://arxiv.
org/abs/1406.2661
71. Brown TB, Mann B, Ryder N, et al. Language Models are FewShot Learners. Published online May 28, 2020. http://arxiv.org/
abs/2005.14165