微軟推薦的經典案例：Batch scoring of Spark models

微軟的Azure的逆襲讓微軟的市值在2018年的第四季度取代Apple 重新成為了全球老大！王者歸來！Satya用cloud first贏回了華爾街的芳心，繼任CEO至今，微軟股票翻了四倍！當然也是Azure將Satya推上了神壇！

Azure的推薦案例裏，Azure Databricks無處不在。請不要問我Databricks是什麽！今夜，分享壹個AI架構： Batch scoring of Spark models on Azure Databricks

這種場景很普遍。重資產的工業企業需要減少生產成本和提升運轉時間，那麽就需要減少非預期的機械事故。那麽，可以通過從機器收集到的IoT數據，來創建壹個機器學習模型預測機械維護。這樣在事故發生之前，可以做維護和修理，讓設備可以更長時間的運轉賺錢！整個流程都在用Databricks/Spark！

- 數據采集：Ingest the data from the external data store onto an Azure Databricks data store.

- 建模：Train a machine learning model by transforming the data into a training data set, then building a Spark MLlib model. MLlib consists of most common machine learning algorithms and utilities optimized to take advantage of Spark data scalability capabilities.

- 預測：Apply the trained model to predict (classify) component failures by transforming the data into a scoring data set. Score the data with the Spark MLLib model.

- 存結果：Store results on the Databricks data store for post-processing consumption.

Github裏面有jupyter/iPython notebooks，直接點開，改改就能跑了！這位作者是多麽喜歡磚廠的紅磚啊！滿屏幕的磚啊，向經典的坦克大戰致敬！