人工智能与椎体骨折诊断最新章节_严瀚著

2.3 总结

本章首先对人工智能的基本概念、发展历程进行了简述。接着，基于近些年在机器学习方面的理论研究与工程应用取得长足进步的现状，对机器学习的基本概念和要素，以及算法类型等进行了介绍，并对机器学习的相关模型进行了进一步讨论。本章对相关模型的介绍遵循的原则是力争能够涵盖机器学习领域的所有方面，同时重点介绍机器学习自诞生以来的各个阶段在业界产生较重大影响的相关模型，例如支持向量机、神经网络、循环神经网络、生成对抗网络、深度强化学习等模型。通过本章的讲解，可以了解到以机器学习、知识图谱为代表的人工智能技术的发展与广泛应用，这些技术的背后都离不开人工智能领域研究者的长期努力。我们也应充分意识到目前以深度学习为核心的各种人工智能技术和“人类智能”还不能相提并论。深度学习和人类的学习方式差异性很大，需要大量的标注数据。虽然深度学习取得了很大的成功，但是目前来说深度学习还不是一种可以解决一系列复杂问题的通用智能技术，而只是可以解决单个问题的一系列技术。

（俞祝良）

参考文献

［1］邱锡鹏．神经网络与深度学习［M］．北京：机械工业出版社，2020.

［2］ RUSSELL S，NORVIG P．人工智能：一种现代方法［M］．3版．北京：清华大学出版社，2013.

［3］周志华．机器学习［M］．北京：清华大学出版社，2016.

［4］李航．统计学习方法［M］．2版．北京：清华大学出版社，2019.

［5］ HASTIE T，TIBSHIRANI R，FRIEDMAN J H，et al．The elements of statistical learning：data mining，inference，and prediction［M］．New York：Springer，2009.

［6］ BENGIO Y，COURVILLE A，VINCENT P．Representation learning：a review and new perspectives［J］．IEEE transactions on pattern analysis and machine intelligence，2013，35（8）：1798-1828.

［7］ MITCHELL T M，MITCHELL T M．Machine learning［M］．New York：McGraw-hill，1997.

［8］ NEMIROVSKI A，JUDITSKY A，LAN G，et al．Robust stochastic approximation approach to stochastic programming［J］．SIAM Journal on optimization，2009，19（4）：1574-1609.

［9］ BOTTOU L．Large-scale machine learning with stochastic gradient descent［C］．Proceedings of Compstat’2010．Physica-Verlag HD，2010：177-186.

［10］ WATANABE S．Knowing and guessing：a quantitative study of inference and information［M］．New York：Wiley, 1969.

［11］ YANG Y．An evaluation of statistical approaches to text categorization［J］．Information retrieval，1999，1（1-2）：69-90.

［12］ BISHOP C M，NASRABADI N M．Pattern recognition and machine learning［M］．New York：Springer，2006.

［13］ DAUMÉ III H．A course in machine learning［EB/OL］．2012．http：//ciml．info.

［14］ ROSENBLATT F．The perceptron：a probabilistic model for information storage and organization in the brain［J］．Psychological review，1958，65（6）：386.

［15］ PLATT J．Sequential minimal optimization：a fast algorithm for training support vector machines［J］．1998.

［16］ SCHÖLKOPF B，SMOLA A J，BACH F．Learning with kernels：support vector machines，regularization，optimization，and beyond［M］．Cambridge：the MIT press，2002.

［17］ CYBENKO G．Approximations by superpositions of a sigmoidal function［J］．Mathematics of control，signals and systems，1989，2：183-192.

［18］ HENDRYCKS D，GIMPEL K．Gaussian error linear units（GELUs）［J］．arXiv preprint arXiv：1606．08415，2016.

［19］ GOODFELLOW I J，WARDE-FARLEY D，MIRZA M，et al．Maxout networks［C］．Proceedings of the international conference on machine learning，2013：1319-1327.

［20］ KIPF T N，WELLING M．Semi-supervised classification with graph convolutional networks［J］．arXiv preprint arXiv：1609．02907，2016.

［21］ HORNIK K，STINCHCOMBE M，WHITE H．Multilayer feedforward networks are universal approximators［J］．Neural networks，1989，2（5）：359-366.

［22］ KRIZHEVSKY A，SUTSKEVER I，HINTON G E．ImageNet classification with deep convolutional neural networks［C］．Advances in neural information processing systems，2012：1106-1114.

［23］ LECUN Y，BOSER B，DENKER J S，et al．Backpropagation applied to handwritten zip code recognition［J］．Neural computation，1989，1（4）：541-551.

［24］ LECUN Y，BOTTOU L，BENGIO Y，et al．Gradient-based learning applied to document recognition［J］．Proceedings of the IEEE，1998，86（11）：2278-2324.

［25］ LONG J，SHELHAMER E，DARRELL T．Fully convolutional networks for semantic segmentation［C］．Proceedings of the IEEE conference on computer vision and pattern recognition，2015：3431-3440.

［26］ SIMONYAN K，ZISSERMAN A．Very deep convolutional networks for large-scale image recognition［J］．arXiv preprint arXiv：1409．1556，2014.

［27］ LANG K J，WAIBEL A H，HINTON G E．A time-delay neural network architecture for isolated word recognition［J］．Neural networks，1990，3（1）：23-43.

［28］ GERS F A，SCHMIDHUBER J，CUMMINS F．Learning to forget：continual prediction with lstm［J］．Neural computation，2000，12（10）：2451-2471.

［29］ GREFF K，SRIVASTAVA R K，KOUTNÍK J，et al．LSTM：a search space odyssey［J］．IEEE transactions on neural networks and learning systems，2017，28（10）：2222-2232.

［30］ CHO K，VAN MERRIËNBOER B，GULCEHRE C，et al．Learning phrase representations using RNN encoder-decoder for statistical machine translation［J］．arXiv preprint arXiv：1406．1078，2014.

［31］ CHUNG J，GULCEHRE C，CHO K，et al．Empirical evaluation of gated recurrent neural networks on sequence modeling［J］．arXiv preprint arXiv：1412．3555，2014.

［32］ GOYAL P，DOLLÁR P，GIRSHICK R，et al．Accurate，large minibatch sgd：training imagenet in 1 hour［J］．arXiv preprint arXiv：1706．02677，2017.

［33］ IOFFE S，SZEGEDY C．Batch normalization：Accelerating deep network training by reducing internal covariate shift［C］．Proceedings of the 32nd international conference on machine learning，2015：448-456.

［34］ DOZAT T．Incorporating nesterov momentum into adam［C］．ICLR，2016.

［35］ BA L J，KIROS R，HINTON G E．Layer normalization［J］．arXiv preprint arXiv：1607．06450，2016.

［36］ BERGSTRA J，BENGIO Y．Random search for hyper-parameter optimization［J］．Journal of machine learning research，2012，13（Feb）：281-305.

［37］ ITTI L，KOCH C，NIEBUR E．A model of saliency-based visual attention for rapid scene analysis［J］．IEEE transactions on pattern analysis & machine intelligence，1998，20（11）：1254-1259.

［38］ KIM Y，DENTON C，HOANG L，et al．Structured attention networks［C］．Proceedings of 5th international conference on learning representations，2017.

［39］ KOHONEN T．Self-organization and associative memory［M］．Heidelberg：Springer science & business media，2012.

［40］ HOPFIELD J J．Neurons with graded response have collective computational properties like those of two-state neurons［J］．Proceedings of the national academy of sciences，1984，81（10）：3088-3092.

［41］ HINTON G E，SEJNOWSKI T J，POGGIO T A．Unsupervised learning：foundations of neural computation［M］．Cambridge：the MIT press，1999.

［42］ FREUND Y，SCHAPIRE R E，et al．Experiments with a new boosting algorithm［C］．Proceedings of the international conference on machine learning，1996：148-156.

［43］ RAINA R，BATTLE A，LEE H，et al．Self-taught learning：transfer learning from unlabeled data［C］．Proceedings of the 24th international conference on machine learning，2007：759-766.

［44］ BLUM A，MITCHELL T．Combining labeled and unlabeled data with co-training［C］．Proceedings of the eleventh annual conference on computational learning theory，1998：92100 .

［45］ ZHANG Y，YANG Q．A survey on multi-task learning［J］．arXiv preprint arXiv：1707．08114，2017.

［46］ ARNOLD A，NALLAPATI R，COHEN W W．A comparative study of methods for transductive transfer learning［C］．Seventh IEEE international conference on data mining workshops（ICDMW 2007）．IEEE，2007：77-82.

［47］ THRUN S．Lifelong learning algorithms［M］//Learning to learn．Boston，MA：Springer，1998：181-209.

［48］ YOUNGER A S，Hochreiter S，Conwell P R．Meta-learning with backpropagation［C］．IJCNN’01．International Joint Conference on Neural Networks．Proceedings（Cat．No．01CH37222）．IEEE，2001，3.

［49］ KOLLER D，FRIEDMAN N．Probabilistic graphical models：principles and techniques［M］．Cambridge：the MIT press．2009.

［50］ HINTON G E，OSINDERO S，TEH Y W．A fast learning algorithm for deep belief nets［J］．Neural computation，2006，18（7）：1527-1554.

［51］ ACKLEY D H，HINTON G E，SEJNOWSKI T J．A learning algorithm for boltzmann machines［J］．Cognitive science，1985，9（1）：147-169.

［52］ GOODFELLOW I，POUGET-ABADIE J，MIRZA M，et al．Generative adversarial nets［J］．Communications of the ACM，2020，63（11）：139-144.

［53］ SUTTON R S，BARTO A G．Reinforcement learning：an introduction［M］．Cambridge：the MIT press，2018.

［54］ WATKINS C J，DAYAN P．Q-learning［J］．Machine learning，1992，8（3）：279292.

［55］ SILVER D，LEVER G，HEESS N，et al．Deterministic policy gradient algorithms［C］．Proceedings of international conference on machine learning，2014：387-395.

［56］ BENGIO S，VINYALS O，JAITLY N，et al．Scheduled sampling for sequence prediction with recurrent neural networks［C］．Advances in neural information processing systems，2015：1171-1179.

［57］ YU L，ZHANG W，WANG J，et al．SeqGAN：sequence generative adversarial nets with policy gradient［C］．Proceedings of thirty-first AAAI conference on artificial intelligence，2017：2852-2858.