Materials Genomics
Unit 4: From Classical Descriptors to Learned Representations

Prof. Dr. Philipp Pelz

FAU Erlangen-Nürnberg

02. Learning objectives and expected outputs

03. Recap from previous unit and dependency map

04. Why this unit matters for materials discovery

05. Reading map and chapter anchors

06. Descriptor purpose: encode chemistry/structure into ML-ready vectors

07. Composition descriptors: elemental statistics and stoichiometric moments

08. Magpie-style feature families and what they capture

09. matminer descriptor ecosystem and practical tradeoffs

10. Structure descriptors: radial/angle summaries and coordination stats

11. Process descriptors as context variables in PSPP pipelines

12. Physical invariances required in descriptor design

13. Scaling and normalization across heterogeneous feature groups

14. Interpretability strengths of hand-crafted features

15. Where classical descriptors fail: nonlocal interactions

16. Feature sparsity and high-dimensional curse in materials spaces

17. Correlation and multicollinearity in descriptor tables

19. Feature leakage introduced during preprocessing

20. Target-aware descriptor engineering pitfalls

21. Data regime analysis: small-data vs medium-data descriptor choices

22. When to keep descriptor baselines as scientific controls

23. Transition criteria to learned representations

24. Representation learning objective: task performance + transferability

25. Autoencoder intuition as nonlinear descriptor learner

26. Latent feature semantics vs black-box embeddings

27. Descriptor robustness under domain shift

28. Descriptor uncertainty and error propagation

29. Ablation studies for descriptor family contribution

30. Interpreting feature importance under correlated inputs

31. Domain knowledge injection into descriptor construction

32. Computational cost of descriptor pipelines

33. Reproducible featurization templates and data cards

34. Case: descriptor baseline for bandgap prediction

35. Case: descriptor baseline for stability screening

36. Failure mode: over-engineering without split discipline

37. How this unit feeds graph-based representations in Unit 5

38. Advanced note: From Classical Descriptors to Learned Representations concept extension 33

39. Advanced note: From Classical Descriptors to Learned Representations concept extension 34

40. Advanced note: From Classical Descriptors to Learned Representations concept extension 35

41. Advanced note: From Classical Descriptors to Learned Representations concept extension 36

42. Advanced note: From Classical Descriptors to Learned Representations concept extension 37

43. Advanced note: From Classical Descriptors to Learned Representations concept extension 38

44. Advanced note: From Classical Descriptors to Learned Representations concept extension 39

45. Exercise setup and dataset definition

46. Exercise task 1 (pipeline core)

47. Exercise task 2 (comparison/ablation)

48. Exercise task 3 (failure analysis)

49. Exam-oriented key statements

50. Summary, next-unit bridge, and references

Materials GenomicsUnit 4: From Classical Descriptors to Learned Representations

02. Learning objectives and expected outputs

03. Recap from previous unit and dependency map

04. Why this unit matters for materials discovery

05. Reading map and chapter anchors

06. Descriptor purpose: encode chemistry/structure into ML-ready vectors

07. Composition descriptors: elemental statistics and stoichiometric moments

08. Magpie-style feature families and what they capture

09. matminer descriptor ecosystem and practical tradeoffs

10. Structure descriptors: radial/angle summaries and coordination stats

11. Process descriptors as context variables in PSPP pipelines

12. Physical invariances required in descriptor design

13. Scaling and normalization across heterogeneous feature groups

14. Interpretability strengths of hand-crafted features

15. Where classical descriptors fail: nonlocal interactions

16. Feature sparsity and high-dimensional curse in materials spaces

17. Correlation and multicollinearity in descriptor tables

18. Filter, wrapper, and embedded feature selection strategies

19. Feature leakage introduced during preprocessing

20. Target-aware descriptor engineering pitfalls

21. Data regime analysis: small-data vs medium-data descriptor choices

22. When to keep descriptor baselines as scientific controls

23. Transition criteria to learned representations

24. Representation learning objective: task performance + transferability

25. Autoencoder intuition as nonlinear descriptor learner

26. Latent feature semantics vs black-box embeddings

27. Descriptor robustness under domain shift

28. Descriptor uncertainty and error propagation

29. Ablation studies for descriptor family contribution

30. Interpreting feature importance under correlated inputs

31. Domain knowledge injection into descriptor construction

32. Computational cost of descriptor pipelines

33. Reproducible featurization templates and data cards

34. Case: descriptor baseline for bandgap prediction

35. Case: descriptor baseline for stability screening

36. Failure mode: over-engineering without split discipline

37. How this unit feeds graph-based representations in Unit 5

38. Advanced note: From Classical Descriptors to Learned Representations concept extension 33

39. Advanced note: From Classical Descriptors to Learned Representations concept extension 34

40. Advanced note: From Classical Descriptors to Learned Representations concept extension 35

41. Advanced note: From Classical Descriptors to Learned Representations concept extension 36

42. Advanced note: From Classical Descriptors to Learned Representations concept extension 37

43. Advanced note: From Classical Descriptors to Learned Representations concept extension 38

44. Advanced note: From Classical Descriptors to Learned Representations concept extension 39

45. Exercise setup and dataset definition

46. Exercise task 1 (pipeline core)

47. Exercise task 2 (comparison/ablation)

48. Exercise task 3 (failure analysis)

49. Exam-oriented key statements

50. Summary, next-unit bridge, and references

Materials Genomics
Unit 4: From Classical Descriptors to Learned Representations