Npj Comput. Mater.: 聚簇描述—机器学习法实现外推

Npj Comput. Mater.: 聚簇描述—机器学习法实现外推





Fig. 1 Interpolation results of XGB that was the best prediction model in the interpolation problems to predict the thermoelectric properties of the 5205 observations in the ESTM dataset.



Fig. 2 The overall process of SIMD to generate the material representations for an input tabular data of the materials.



Fig. 3 The overall process of SIMD to generate the system identified features of the input chemical composition in the transfer learning environments.

来自韩国化工研究所的Gyoung S.Na和Hyunju Chang教授团队在公用数据集、机器学习法应用和外推问题的解决上都进行了创新性探索。他们首先建立了一个包含5205个实验观测对象的公用数据集,其中有880种独立的热电材料和包含品质因子在内的五种实验测得的热电性质。

Fig. 4 Confusion matrices of XGBd and SXGBd in the highthroughput screening to discover high-ZT (≥1.5) thermoelectric materials from unknown material groups.

随后他们对比了五种机器学习算法的预测结果,发现XGB法在四种热电性质预测中实现了0.9以上的R2值,同时也发现了外推的低效率问题(R2值小于0.2)。因此他们提出了一种材料描述子。将数据集中不同掺杂但基质组成类似的材料识别出来并归于一簇,提取相关物理与化学信息来构成这种系统识别的材料描述子(system-identified material descriptor,SIMD),并且作为机器学习法的输入参数。利用这种描述子,不但高通量筛选的假阳性可以被降低50%以上,而且针对未参与训练的热电材料的ZT值外推预测,也可以将R2值从原来的0.13显著提高到0.71。


Fig. 6 Experimentally measured and predicted ZTs of Ag- and Ti-doped Bi0.5Sb1.5Te3 materials.


该文近期发表于npj Computational Materials 8:214(2022)英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。

撰文:陈昊鸿 (中国科学院上海硅酸盐研究所 副研究员,从事透明光功能材料及相关计算模拟研究)

A public database of thermoelectric materials and system-identified material representation for data-driven discovery

Gyoung S. Na & Hyunju Chang 

Thermoelectric materials have received much attention as energy harvesting devices and power generators. However, discovering novel high-performance thermoelectric materials is challenging due to the structural diversity and complexity of the thermoelectric materials containing alloys and dopants. For the efficient data-driven discovery of novel thermoelectric materials, we constructed a public dataset that contains experimentally synthesized thermoelectric materials and their experimental thermoelectric properties. For the collected dataset, we were able to construct prediction models that achieved R2-scores greater than 0.9 in the regression problems to predict the experimentally measured thermoelectric properties from the chemical compositions of the materials. Furthermore, we devised a material descriptor for the chemical compositions of the materials to improve the extrapolation capabilities of machine learning methods. Based on transfer learning with the proposed material descriptor, we significantly improved the R2-score from 0.13 to 0.71 in predicting experimental ZTs of the materials from completely unexplored material groups.


