01. Title: Simulation Methods as Data Generators
02. Learning objectives
- What students should be able to do.
03. Why simulations dominate materials data generation
- Controlled mapping from assumptions to outputs.
04. Recap from Unit 1
- Design space and validity.
05. Simulation as a map from assumptions to data
- Inputs, solver, outputs, metadata.
06. Length and time scales in materials modeling
- Continuum to atomistic to electronic.
07. FEM outputs
- Stress, strain, fields, constitutive response.
08. MD outputs
- Trajectories, forces, diffusion observables.
09. MC outputs
- Thermodynamic sampling and phase averages.
10. DFT outputs
- Energies, forces, band-related quantities.
11. Cost vs accuracy vs scale
- Why no single simulator dominates all tasks.
12. Hidden bias from simulation choices
- Functionals, force fields, boundary conditions.
13. What becomes an ML target
- Labels, constraints, and proxy observables.
14. What remains metadata
- Provenance needed for trust and reuse.
15. Simulation consistency vs physical accuracy
- Reproducibility tradeoffs.
16. Which method for which property
17. Failure mode: mismatched fidelity
- Wrong labels for wrong question.
18. Failure mode: missing provenance
19. Bridge to databases
- Why records need method metadata.
20. Bridge to Week 3
- Atomistic and electronic simulations in detail.
21. Bridge to Week 4
- Stability and continuum outputs as ML context.
22. Feature leakage risks
- Label proxies and duplicates.
23. Train/val/test with structure families
24. Distribution shift in crystal data
25. Target examples
- Bandgap, formation energy, stability.
26. Physical constraints in predictions
27. Error analysis by structure class
- Beyond aggregate metrics.
28. Uncertainty in structure-property models
- Confidence-aware decisions.
29. Outliers and anomaly handling
- Discovery vs data errors.
30. Data provenance importance
- Reproducibility and trust.
31. FAIR perspective (light)
- Reuse-oriented data practice.
32. Minimal baseline workflow
- Parse, featurize, split, train, evaluate.
33. Metrics choice by target type
- Regression vs classification.
34. Model card for materials task
35. Common failure mode #1
- Overfit to narrow chemistry.
36. Common failure mode #2
- Hidden duplicates/leakage.
37. Common failure mode #3
- Domain shift across databases.
39. Case sketch: crystal subset study
40. Case sketch: split comparison
- Random vs grouped outcomes.
42. Link to upcoming MG Unit 3
- Feature engineering transition.
43. Exercise scaffold: task setup
44. Exercise scaffold: parsing step
45. Exercise scaffold: feature table
46. Exercise scaffold: split + model
47. Exercise scaffold: diagnostics
- One bias/leakage analysis.
48. Exam-oriented key statements