A Coding Implementation for Building and Analyzing Crystal Structures Using Pymatgen for Symmetry Analysis, Phase Diagrams, Surface Generation, and Materials Project Integration

In this tutorial, we explore the capabilities of the pymatgen library for computational materials science using Python. We begin by constructing crystal structures such as silicon, sodium chloride, and a LiFePO₄-like material, and then investigate their lattice properties, densities, and compositions. Also, we analyze symmetry using space-group detection, examine atomic coordination environments, and apply oxidation-state decorations to better understand the structures’ chemistry. We also generate supercells, perturb atomic positions, and compute distance matrices to study structural relationships at larger scales. Along the way, we simulate X-ray diffraction patterns, construct a simple phase diagram, and demonstrate how disordered alloy structures can be approximated by ordered configurations. Finally, we extend the workflow to include molecule analysis, CIF export, and optional querying of the Materials Project database, thereby illustrating how pymatgen can serve as a powerful toolkit for materials modeling and data analysis.

Copy CodeCopiedUse a different Browser!pip -q install pymatgen mp-api spglib

import os
import json
import warnings
import sys

warnings.filterwarnings(“ignore”)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from pymatgen.core import Lattice, Structure, Molecule
from pymatgen.core.surface import SlabGenerator
from pymatgen.core.composition import Composition
from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
from pymatgen.analysis.local_env import CrystalNN
from pymatgen.analysis.diffraction.xrd import XRDCalculator
from pymatgen.analysis.phase_diagram import PDEntry, PhaseDiagram
from pymatgen.transformations.standard_transformations import (
SupercellTransformation,
OrderDisorderedStructureTransformation,
OxidationStateDecorationTransformation,
)
from pymatgen.io.cif import CifWriter

print(“Python:”, sys.version.split()[0])
print(“NumPy:”, np.__version__)
print(“pandas:”, pd.__version__)

try:
import pymatgen
print(“pymatgen:”, pymatgen.__version__)
except Exception:
import importlib.metadata
print(“pymatgen:”, importlib.metadata.version(“pymatgen”))

def line():
print(“=” * 100)

def header(title):
line()
print(title)
line()

header(“1. BUILD EXAMPLE STRUCTURES”)

si = Structure(
Lattice.cubic(5.431),
[“Si”, “Si”],
[[0, 0, 0], [0.25, 0.25, 0.25]],
)

nacl = Structure(
Lattice.cubic(5.64),
[“Na”, “Cl”],
[[0, 0, 0], [0.5, 0.5, 0.5]],
)

li_fe_po4 = Structure(
Lattice.orthorhombic(10.33, 6.01, 4.69),
[“Li”, “Fe”, “P”, “O”, “O”, “O”, “O”],
[
[0.0, 0.0, 0.0],
[0.5, 0.5, 0.5],
[0.1, 0.25, 0.2],
[0.22, 0.04, 0.28],
[0.72, 0.54, 0.78],
[0.31, 0.66, 0.12],
[0.81, 0.16, 0.62],
],
)

for name, s in [(“Si”, si), (“NaCl”, nacl), (“LiFePO4-like”, li_fe_po4)]:
print(f”{name}: formula={s.composition.formula}, sites={len(s)}, volume={s.volume:.3f} Å^3″)

We begin by installing the required libraries. We initialize the environment, verify package versions, and define helper functions to organize the output. We then construct example crystal structures such as silicon, NaCl, and a LiFePO₄-like structure and print their basic structural properties.

Copy CodeCopiedUse a different Browserheader(“2. BASIC INTROSPECTION”)

for name, s in [(“Si”, si), (“NaCl”, nacl), (“LiFePO4-like”, li_fe_po4)]:
print(f”n{name}”)
print(“Reduced formula:”, s.composition.reduced_formula)
print(“Density:”, round(s.density, 4), “g/cm^3”)
print(“Lattice parameters (a, b, c):”, tuple(round(x, 4) for x in s.lattice.abc))
print(“Angles (alpha, beta, gamma):”, tuple(round(x, 4) for x in s.lattice.angles))
print(“First site:”, s[0])

header(“3. SPACE GROUP AND SYMMETRY ANALYSIS”)

for name, s in [(“Si”, si), (“NaCl”, nacl), (“LiFePO4-like”, li_fe_po4)]:
sga = SpacegroupAnalyzer(s, symprec=0.1)
print(f”n{name}”)
print(“Space group symbol:”, sga.get_space_group_symbol())
print(“Space group number:”, sga.get_space_group_number())
print(“Crystal system:”, sga.get_crystal_system())
print(“Lattice type:”, sga.get_lattice_type())
print(“Primitive sites:”, len(sga.find_primitive()))
print(“Conventional sites:”, len(sga.get_conventional_standard_structure()))

We examine the structures in greater detail by inspecting their formulas, densities, lattice parameters, and site information. We then perform a symmetry analysis using SpacegroupAnalyzer to determine space-group symbols, crystal systems, and lattice types. Through this step, we gain insight into the crystallographic symmetry and structural characteristics of the materials.

Copy CodeCopiedUse a different Browserheader(“4. LOCAL ENVIRONMENT WITH CRYSTALNN”)

cnn = CrystalNN()

def summarize_neighbors(structure, label):
print(f”n{label}”)
for i, site in enumerate(structure[:min(4, len(structure))]):
try:
nn_info = cnn.get_nn_info(structure, i)
species = [str(x[“site”].specie) for x in nn_info]
weights = [round(float(x[“weight”]), 3) for x in nn_info]
print(f”Site {i} {site.species_string}: CN={len(nn_info)}, neighbors={species}, weights={weights}”)
except Exception as e:
print(f”Site {i} {site.species_string}: neighbor analysis failed -> {e}”)

summarize_neighbors(si, “Si”)
summarize_neighbors(nacl, “NaCl”)

header(“5. OXIDATION STATE DECORATION”)

oxi_transform = OxidationStateDecorationTransformation(
{“Li”: 1, “Fe”: 2, “P”: 5, “O”: -2, “Na”: 1, “Cl”: -1, “Si”: 0}
)

nacl_oxi = oxi_transform.apply_transformation(nacl.copy())
lfp_oxi = oxi_transform.apply_transformation(li_fe_po4.copy())

print(“NaCl species with oxidation states:”, [str(site.specie) for site in nacl_oxi])
print(“LiFePO4-like species with oxidation states:”, [str(site.specie) for site in lfp_oxi])

We analyze the local atomic environments using the CrystalNN coordination analysis algorithm. We identify neighboring atoms for selected sites and evaluate their coordination numbers and weights. We then decorate the structures with oxidation states to better represent the chemical environment.

Copy CodeCopiedUse a different Browserheader(“6. MAKE SUPERCELLS”)

si_super = SupercellTransformation([[2, 0, 0], [0, 2, 0], [0, 0, 2]]).apply_transformation(si.copy())
nacl_super = SupercellTransformation([[2, 0, 0], [0, 2, 0], [0, 0, 2]]).apply_transformation(nacl.copy())

print(“Si supercell sites:”, len(si_super), “formula:”, si_super.composition.formula)
print(“NaCl supercell sites:”, len(nacl_super), “formula:”, nacl_super.composition.formula)

header(“7. PERTURB STRUCTURE AND COMPUTE DISTANCE MATRIX”)

si_perturbed = si_super.copy()
si_perturbed.translate_sites([0], [0.01, -0.005, 0.012], frac_coords=False)

dm = si_perturbed.distance_matrix

print(“Distance matrix shape:”, dm.shape)
print(“First 5 distances from site 0:”, np.round(dm[0][:5], 4))

header(“8. GENERATE A SURFACE SLAB”)

slabgen = SlabGenerator(
initial_structure=si,
miller_index=(1, 1, 1),
min_slab_size=8.0,
min_vacuum_size=12.0,
center_slab=True,
in_unit_planes=False,
)

slabs = slabgen.get_slabs()
slab = slabs[0]

print(“Number of generated slabs:”, len(slabs))
print(“Chosen slab formula:”, slab.composition.formula)
print(“Chosen slab sites:”, len(slab))
print(“Chosen slab lattice:”, tuple(round(x, 3) for x in slab.lattice.abc))

We expand the crystal structures into larger supercells to study periodic structures at a larger scale. We apply a small perturbation to atomic positions and compute the resulting distance matrix to analyze structural changes. We also generate a surface slab from the silicon crystal to demonstrate how surface structures can be modeled.

Copy CodeCopiedUse a different Browserheader(“9. XRD SIMULATION”)

xrd = XRDCalculator(wavelength=”CuKa”)

pattern_si = xrd.get_pattern(si, two_theta_range=(10, 90))
pattern_nacl = xrd.get_pattern(nacl, two_theta_range=(10, 90))

plt.figure(figsize=(12, 4))
plt.vlines(pattern_si.x, [0], pattern_si.y, linewidth=1.5)
plt.xlabel(r”2$theta$ (degrees)”)
plt.ylabel(“Intensity”)
plt.title(“Simulated XRD Pattern: Si”)
plt.show()

plt.figure(figsize=(12, 4))
plt.vlines(pattern_nacl.x, [0], pattern_nacl.y, linewidth=1.5)
plt.xlabel(r”2$theta$ (degrees)”)
plt.ylabel(“Intensity”)
plt.title(“Simulated XRD Pattern: NaCl”)
plt.show()

header(“10. SIMPLE PHASE DIAGRAM”)

entries = [
PDEntry(Composition(“Li”), 0.0),
PDEntry(Composition(“Fe”), 0.0),
PDEntry(Composition(“P”), 0.0),
PDEntry(Composition(“O2”), 0.0),
PDEntry(Composition(“Li2O”), -6.0),
PDEntry(Composition(“FeO”), -4.2),
PDEntry(Composition(“Fe2O3”), -10.5),
PDEntry(Composition(“P2O5”), -15.0),
PDEntry(Composition(“Li3PO4”), -18.5),
PDEntry(Composition(“FePO4”), -12.2),
PDEntry(Composition(“LiFePO4”), -16.9),
]

pdg = PhaseDiagram(entries)

target = [e for e in entries if e.composition.reduced_formula == “LiFePO4”][0]

e_above_hull = pdg.get_e_above_hull(target)
decomp, e_hull = pdg.get_decomp_and_e_above_hull(target)

print(“Target entry:”, target.composition.reduced_formula)
print(“Energy above hull:”, round(float(e_above_hull), 6), “eV/atom”)

print(“Decomposition products:”)
for k, v in decomp.items():
print(” “, k.composition.reduced_formula, “:”, round(float(v), 6))

We simulate X-ray diffraction patterns for silicon and NaCl using pymatgen’s diffraction tools. We visualize the diffraction peaks to understand how the crystal structure influences the XRD pattern. We then construct a simple thermodynamic phase diagram and calculate the stability of LiFePO₄ relative to competing phases.

Copy CodeCopiedUse a different Browserheader(“11. DISORDERED STRUCTURE -> ORDERED APPROXIMATION”)

disordered = Structure(
Lattice.cubic(3.6),
[{“Cu”: 0.5, “Au”: 0.5}],
[[0, 0, 0]],
)

disordered.make_supercell([2, 2, 2])

print(“Disordered composition:”, disordered.composition)

try:
disordered_oxi = disordered.copy()
disordered_oxi.add_oxidation_state_by_element({“Cu”: 1, “Au”: 1})

ordered_transform = OrderDisorderedStructureTransformation()

ordered_candidates = ordered_transform.apply_transformation(
disordered_oxi,
return_ranked_list=3,
)

for idx, cand in enumerate(ordered_candidates):
s = cand[“structure”].copy()
s.remove_oxidation_states()
print(f”Ordered candidate {idx+1}: formula={s.composition.formula}, sites={len(s)}”)

except Exception as e:
print(“Ordering step skipped due to transformation issue:”, e)

header(“12. MOLECULE SUPPORT”)

water = Molecule(
[“O”, “H”, “H”],
[
[0.0, 0.0, 0.0],
[0.7586, 0.0, 0.5043],
[-0.7586, 0.0, 0.5043],
],
)

print(“Water formula:”, water.composition.formula)

print(“Water center of mass:”, np.round(water.center_of_mass, 4))

print(
“O-H bond lengths:”,
round(water.get_distance(0, 1), 4),
round(water.get_distance(0, 2), 4),
)

header(“13. CIF EXPORT”)

output_dir = “/content/pymatgen_tutorial_outputs”

os.makedirs(output_dir, exist_ok=True)

si_cif = os.path.join(output_dir, “si.cif”)
nacl_cif = os.path.join(output_dir, “nacl.cif”)
slab_cif = os.path.join(output_dir, “si_111_slab.cif”)

CifWriter(si).write_file(si_cif)
CifWriter(nacl).write_file(nacl_cif)
CifWriter(slab).write_file(slab_cif)

print(“Saved:”, si_cif)
print(“Saved:”, nacl_cif)
print(“Saved:”, slab_cif)

header(“14. DATAFRAME SUMMARY”)

rows = []

for name, s in [
(“Si”, si),
(“NaCl”, nacl),
(“LiFePO4-like”, li_fe_po4),
(“Si slab”, slab),
]:

sga = SpacegroupAnalyzer(s, symprec=0.1)

rows.append(
{
“name”: name,
“formula”: s.composition.reduced_formula,
“sites”: len(s),
“volume_A3”: round(s.volume, 4),
“density_g_cm3”: round(float(s.density), 4),
“spacegroup”: sga.get_space_group_symbol(),
“sg_number”: sga.get_space_group_number(),
}
)

df = pd.DataFrame(rows)

print(df)

header(“15. OPTIONAL MATERIALS PROJECT API ACCESS”)

mp_api_key = None

try:
from google.colab import userdata
mp_api_key = userdata.get(“MP_API_KEY”)
except Exception:
pass

if not mp_api_key:
mp_api_key = os.environ.get(“MP_API_KEY”, None)

if mp_api_key:

try:
from pymatgen.ext.matproj import MPRester

with MPRester(mp_api_key) as mpr:

mp_struct = mpr.get_structure_by_material_id(“mp-149”)

summary_docs = mpr.summary.search(
material_ids=[“mp-149”],
fields=[
“material_id”,
“formula_pretty”,
“band_gap”,
“energy_above_hull”,
“is_stable”,
],
)

print(“Fetched mp-149 from Materials Project”)

print(“Formula:”, mp_struct.composition.reduced_formula)

print(“Sites:”, len(mp_struct))

if len(summary_docs) > 0:

doc = summary_docs[0]

print(
{
“material_id”: str(doc.material_id),
“formula_pretty”: doc.formula_pretty,
“band_gap”: doc.band_gap,
“energy_above_hull”: doc.energy_above_hull,
“is_stable”: doc.is_stable,
}
)

except Exception as e:
print(“Materials Project API section skipped due to runtime/API issue:”, e)

else:
print(“No MP_API_KEY found. Skipping live Materials Project query.”)
print(“In Colab, add a secret named MP_API_KEY or set os.environ[‘MP_API_KEY’].”)

header(“16. SAVE SUMMARY JSON”)

summary = {
“structures”: {
“Si”: {
“formula”: si.composition.reduced_formula,
“sites”: len(si),
“spacegroup”: SpacegroupAnalyzer(si, symprec=0.1).get_space_group_symbol(),
},
“NaCl”: {
“formula”: nacl.composition.reduced_formula,
“sites”: len(nacl),
“spacegroup”: SpacegroupAnalyzer(nacl, symprec=0.1).get_space_group_symbol(),
},
“LiFePO4-like”: {
“formula”: li_fe_po4.composition.reduced_formula,
“sites”: len(li_fe_po4),
“spacegroup”: SpacegroupAnalyzer(li_fe_po4, symprec=0.1).get_space_group_symbol(),
},
},
“phase_diagram”: {
“target”: target.composition.reduced_formula,
“energy_above_hull_eV_atom”: float(e_above_hull),
},
“files”: {
“si_cif”: si_cif,
“nacl_cif”: nacl_cif,
“slab_cif”: slab_cif,
},
}

json_path = os.path.join(output_dir, “summary.json”)

with open(json_path, “w”) as f:
json.dump(summary, f, indent=2)

print(“Saved:”, json_path)

header(“17. FINAL NOTES”)

print(“Tutorial completed successfully.”)

print(“Artifacts are saved in:”, output_dir)

print(“You can now extend this notebook to parse VASP outputs, query MP at scale, or build defect/workflow pipelines.”)

We demonstrate how a disordered alloy structure can be approximated by generating ordered candidates. We analyze molecular structures, export crystal structures as CIF files, and summarize their properties in a pandas DataFrame. We optionally connect to the Materials Project API and save a structured JSON summary of the analysis results.

In conclusion, we built a complete workflow for exploring and analyzing materials structures using pymatgen. We demonstrated how to construct and manipulate crystal structures, analyze symmetry and local environments, and perform common computational materials science tasks, such as supercell generation, surface slab preparation, and diffraction simulation. We also showed how to evaluate thermodynamic stability through a simple phase diagram and handle disordered structures by generating ordered approximations. In addition, we exported structures to standard CIF files, summarized key properties in tabular form, and optionally connected to the Materials Project database to retrieve real materials data. Through these steps, we saw how pymatgen provides an integrated framework that allows us to move from structure generation to analysis and data integration within a single Python environment.

Check out the FULL Notebook Here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post A Coding Implementation for Building and Analyzing Crystal Structures Using Pymatgen for Symmetry Analysis, Phase Diagrams, Surface Generation, and Materials Project Integration appeared first on MarkTechPost.