Posts by Collection



Computational prediction of new auxetic materials

Published in Nature Communications, 2017

Auxetics comprise a rare family of materials that manifest negative Poisson’s ratio, which causes an expansion instead of contraction under tension. Most known homogeneously auxetic materials are porous foams or artificial macrostructures and there are few examples of inorganic materials that exhibit this behavior as polycrystalline solids. It is now possible to accelerate the discovery of materials with target properties, such as auxetics, using high-throughput computations, open databases, and efficient search algorithms. Candidates exhibiting features correlating with auxetic behavior were chosen from the set of more than 67 000 materials in the Materials Project database. Poisson’s ratios were derived from the calculated elastic tensor of each material in this reduced set of compounds. We report that this strategy results in the prediction of three previously unidentified homogeneously auxetic materials as well as…

Recommended citation: Dagdelen, J., Montoya, J., de Jong, M., & Persson, K. (2017). Computational prediction of new auxetic materials. Nature Communications, 8(323).

Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows

Published in Computational Materials Science, 2017

We introduce atomate, an open-source Python framework for computational materials science simulation, analysis, and design with an emphasis on automation and extensibility. Built on top of open source Python packages already in use by the materials community such as pymatgen, FireWorks, and custodian, atomate provides well-tested workflow templates to compute various materials properties such as electronic bandstructure, elastic properties, and piezoelectric, dielectric, and ferroelectric properties. Atomate also enables the computational characterization of materials by providing workflows that calculate X-ray absorption (XAS), Electron energy loss (EELS) and Raman spectra. One of the major features of atomate is that it provides both fully functional workflows as well as reusable components that enable one to compose complex materials science workflows that use a diverse set of computational tools…

Recommended citation: K Mathew, JH Montoya, A Faghaninia, S Dwarakanath, M Aykol, H Tang, ... Computational Materials Science 139, 140-152

Unsupervised word embeddings capture latent knowledge from materials science literature

Published in Nature, 2019

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure–property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.

Recommended citation: Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., … Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–98.

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature

Published in Journal of Chemical Information and Modeling, 2019

The number of published materials science articles has increased manyfold over the past few decades. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw text of published articles onto structured database entries that allow for programmatic querying. To this end, we apply text mining with named entity recognition (NER) for large-scale information extraction from the published materials science literature. The NER model is trained to extract summary-level information from materials science documents, including inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifier achieves an accuracy (f1) of 87%, and is applied to information extraction from 3.27 …

Recommended citation: Weston, L., Tshitoyan, V., Dagdelen, J., Kononova, O., Trewartha, A., Persson, K. A., … Jain, A. (2019). Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature. Journal of Chemical Information and Modeling, 59(9), 3692–3702.


Review of Energy Generation Trends in the United States.


As part of the 2015 Paris Climate Agreement, the Obama Administration pledged that the United States would reduce its carbon emissions by 25% before 2025. To meet this target, Obama and the US Environmental Protection Agency (EPA) proposed a set of regulations known as the Clean Power Plan (CPP), which laid out new efficiency standards for fossil-fuel power plants and policies designed to expand the country’s low/zero-carbon power generation capacity. As a result of the combination of regulations like the CPP and market pressures caused by the availability of inexpensive natural gas, the role of coal in electricity generation in the USA has significantly declined over the last few years. While coal accounted for almost half of all electricity generated in the United States in 2007, less than one-third of electricity in US is generated from the burning of coal today. As the number of US coal power plants decreased from 639 in 2005 to 416 in 2016, the number of people employed in the coal industry also shrank. Whereas about 250,000 people were employed in the coal industry in the 1980’s, that figure currently stands at 100,000. It was argued that any jobs lost due to closed coal mines, coal-fired power plants, and frozen construction of new plants would be replaced with those created by the expansion of new wind and solar projects. The EPA estimated that the CPP would create more than 80,000 new jobs to replace those lost in the coal sector due to the transition to cleaner power sources, but many people remain skeptical whether this will in fact be the case. This presentation will look at the role coal, wind, and solar power as sources of energy and their impact on carbon emissions and the American jobs landscape in the coming decades.

Computational Investigation of Poisson’s Ratio and its Relationship to Crystal Structure.


A material’s Poisson’s ratio describes the magnitude of transverse strain that results when it undergoes a tensile strain. However, Poisson’s ratio can be generalized in anisotropic crystals though the elastic tensor to give the single-direction Poisson’s ratio for a strain in any specific crystallographic direction. In a few rare cases, a crystalline material can possess extreme directional Poisson’s ratios and exhibit surprising properties as a result. One example of this phenomenon is the appearance of an overall negative average Poisson’s ratio in the polycrystalline bulk. Such materials have been shown to have exceptional mechanical properties and are enabling advancements in technologies like high-precision sensors, tougher ceramics, and impact-resistant composites. While only a few hundred experimental elastic tensors have been measured to date, new computational methods are now enabling researchers to calculate the elastic tensors of thousands of materials at a time. In this talk (or poster), I discuss how we can use high-throughput computational screening methods based on new descriptors for similarity between crystal structures in combination with open materials databases like the Materials Project database to identify materials that are likely to exhibit unusual mechanical properties (such as negative Poisson’s ratio). I then show how the mechanical properties of these materials can be further investigated through computation of the elastic tensor via density functional theory calculations to yield a complete topographical picture of Poisson’s ratio for a given crystalline material.

Natural Language Processing for Materials Discovery


The majority of all materials data is currently scattered across the text, tables and figures of millions of scientific publications. We present recently developed natural language processing and machine learning techniques to extract materials knowledge by textual analysis of the abstracts of several million journal articles. We describe our use of Word2Vec to map words in our corpus to vector representations, which we then use as inputs to named entity recognition (NER) classifiers to extract materials, structures, properties, applications, synthesis methods, and characterization techniques from the abstracts in our database. With this information, we have created new tools for materials literature review such as: searching within chemical systems, filtering articles by experiment/theory, summarizing the known attributes of a material, or finding similar materials to a target. Furthermore, we report how these techniques can be used not only to automatically summarize existing knowledge, but enable new ways of discovering novel materials such as thermoelectrics or ion-conductors by revealing previously undiscovered relationships between materials and their properties.

Driving Materials Innovation with Natural Language Processing


The majority of all materials data is currently scattered across the text, tables, and figures of millions of scientific publications. In my talk, I will present the work of our team at Lawrence Berkeley National Laboratory on the use of natural language processing (NLP) and machine learning techniques to extract and discover materials knowledge through textual analysis of the abstracts of several million journal articles. With this data we are exploring new avenues for materials discovery and design, such as how functional materials like thermoelectrics can be identified by using only unsupervised word embeddings for materials. To date, we have used advanced techniques for named entity recognition to extract more than 100 million mentions of materials, structures, properties, applications, synthesis methods, and characterization techniques from our database of over 3 million materials science abstracts. With this data, we are developing machine learning tools for autonomously building databases of materials-properties data extracted from unstructured materials text. Finally, my talk will also feature a sneak peek into the public-facing website and API we have developed to make this data freely available to the materials research community.


Introduction to Computational Materials Science

Graduate Course, University of California, Berkeley, Department of Materials Science & Engineering, 2019

Introduction to computational materials science. Development of atomic scale simulations for materials science applications. Application of kinetic Monte Carlo, molecular dynamics, and total energy techniques to the modeling of surface diffusion processes, elastic constants, ideal shear strengths, and defect properties. Introduction to simple numerical methods for solving coupled differential equations and for studying correlations.