Content validity evidence for a personality inventory

LLMs-assisted psychometrics

Authors

DOI:

https://doi.org/10.15448/1980-8623.2025.1.47225

Keywords:

artificial intelligence, psychological assessment, psychometrics

Abstract

Large Language Models (LLMs) represent a significant advancement in Natural Language Processing (NLP). This study investigates the use of these models in gathering content-based validity evidence for a new instrument assessing the Big Five personality factors. Items for the new instrument were created by ChatGPT and semantically analyzed by Gemini, alongside items from the BFI-2 (human-created). The analysis employed item classification via prompt (simulating an expert judge) and Exploratory Factor Analysis of item embeddings (obtained via API), proposing a novel approach to psychometrics. Results showed semantic convergence for neuroticism, agreeableness, openness, and conscientiousness, but greater dispersion for extraversion items. Semantic convergence was also observed between LLM-generated and human-created items (content-convergent validity). It is concluded that LLMs show significant potential to contribute to the process of gathering content-based validity evidence.

Downloads

Download data is not yet available.

Author Biographies

José Maurício Haas Bueno, Federal University of Pernambuco (UFPE), Recife, Pernambuco, Brazil.

Doctor, affiliated with the Federal University of Pernambuco.

Ricardo Primi, São Francisco University (USF), Campinas, São Paulo, Brazil.

Doctor, with institutional affiliation at the University of São Francisco.

Emanuel Duarte de Almeida Cordeiro, Southwest Bahia State University (UESB), Vitória da Conquista, Bahia, Brazil.

Doctor, affiliated with the State University of Southwest Bahia.

Ana Deyvis Santos Araújo Jesuíno, Federal University of Maranhão (UFMA), São Luís, Maranhão, Brazil.

She holds a doctorate and works institutionally at the Federal University of Maranhão.

Monalisa Muniz, Federal University of São Carlos (UFSCar), São Carlos, São Paulo, Brazil.

PhD, affiliated with the Federal University of São Carlos.

Ana Paula Porto Noronha, São Francisco University (USF), Campinas, São Paulo, Brazil.

She holds a doctorate and is affiliated with the University of São Francisco.

References

Alexandre, N. M. C., & Coluci, M. Z. O. (2011). Validade de conteúdo nos processos de construção e adaptação de instrumentos de medidas. Ciência & Saúde Coletiva, 16(7), 3061–3068. https://doi.org/10.1590/S1413-81232011000800006

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Attali, Y., Runge, A., LaFlair, G. T., Yancey, K., Goodwin, S., Park, Y., & Von Davier, A. A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077

Debelak, R., Koch, T. K., Aßenmacher, M., & Stachl, C. (2024). From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists. https://doi.org/10.31234/osf.io/bc56a

Dempsey, P. A., & Dempsey, A. D. (2000). Using Nursing Research: Process, Critical Evaluation, and Utilization (5th ed.). Lippincott Williams & Wilkins.

Demszky, D., Yang, D., Yeager, D. S., Bryan, C. J., Clapper, M., Chandhok, S., Eichstaedt, J. C., Hecht, C., Jamieson, J., Johnson, M., Jones, M., Krettek-Cobb, D., Lai, L., Jones Mitchell, N., Ong, D. C., Dweck, C. S., Gross, J. J., & Pennebaker, J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2, 688–701. https://doi.org/10.1038/s44159-023-00241-5

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North, 4171–4186. https://doi.org/10.18653/v1/N19-1423

Ethayarajh, K. (2019). How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (arXiv:1909.00512). arXiv. http://arxiv.org/abs/1909.00512

Fitzner, K. (2007). Reliability and Validity A Quick Review. The Diabetes Educator, 33(5), 775–780. https://doi.org/10.1177/0145721707308172

Fors Connolly, F., & Johansson Sevä, I. (2021). Agreeableness, extraversion and life satisfaction: Investigating the mediating roles of social inclusion and status. Scandinavian Journal of Psychology, 62(5), 752–762. https://doi.org/10.1111/sjop.12755

Goldberg, L. R. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229. https://doi.org/10.1037/0022-3514.59.6.1216

Google. (2024). Gemini (Modelo models/text-embedding-004) [Large language model]. Google. https://ai.google.dev/gemini-api/docs/embeddings

Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. https://doi.org/10.1037/1040-3590.7.3.238

Hu, J., Dong, T., Gang, L., Ma, H., Zou, P., Sun, X., Guo, D., & Wang, M. (2024). PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation (Versão 2). arXiv. https://doi.org/10.48550/ARXIV.2407.05721

Hu, L., He, H., Wang, D., Zhao, Z., Shao, Y., & Nie, L. (2024). LLM vs Small Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18234–18242. https://doi.org/10.1609/aaai.v38i16.29782

Kjell, O. N. E., Kjell, K., & Schwartz, H. A. (2024). Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment. Psychiatry Research, 333, 115667. https://doi.org/10.1016/j.psychres.2023.115667

Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05

Lorenzo-Seva, U., & Ten Berge, J. M. F. (2006). Tucker’s Congruence Coefficient as a Meaningful Index of Factor Similarity. Methodology, 2(2), 57–64. https://doi.org/10.1027/1614-2241.2.2.57

McCrae, R. R., & Costa, P. T. (1997). Personality trait structure as a human universal. American Psychologist, 52(5), 509–516. https://doi.org/10.1037/0003-066X.52.5.509

Oliveira, J. P. (2019). Psychometric Properties of the Portuguese Version of the Mini-IPIP five-Factor Model Personality Scale. Current Psychology, 38(2), 432–439. https://doi.org/10.1007/s12144-017-9625-5

Ooms, J. (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects (Versão 1). arXiv. https://doi.org/10.48550/ARXIV.1403.2805

OpenAI. (2023). ChatGPT (Versão 3.5, consulta de setembro) [Large language model]. OpenAI. https://chat.openai.com

Pasquali, L. (2010). Instrumentação Psicológica: Fundamentos e Práticas. Artmed.

Pellert, M., Lechner, C. M., Wagner, C., Rammstedt, B., & Strohmaier, M. (2024). AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories. Perspectives on Psychological Science, 19(5), 808–826. https://doi.org/10.1177/17456916231214460

Pires, J. G., Nunes, C. H. S. D. S., Nunes, M. F. O., & Primi, R. (2023). Preliminary validity for the Big Five Inventory-2 in Brazilian adults. Psico-USF, 28(1), 91–102. https://doi.org/10.1590/1413-82712023280108

R Core Team. (2023). R: A Language and Environment for Statistical Computing (Vienna, Austria). R Foundation for Statistical Computing. https://www.R-project.org/

Revelle, W. (2007). psych: Procedures for Psychological, Psychometric, and Personality Research (p. 2.4.6.26) [Dataset]. https://doi.org/10.32614/CRAN.package.psych

Rizopoulos, D. (2006). ltm: An R Package for Latent Variable Modeling and Item Response Theory Analyses. Journal of Statistical Software, 17(5). https://doi.org/10.18637/jss.v017.i05

Roebianto, Roebianto, Savitri, Aulia, Suciyana, & Mubarokah. (2023). Content validity: Definition and procedure of content validation in psychological research. Testing, Psychometrics, Methodology in Applied Psychology, 30(1), 5–18. https://doi.org/10.4473/TPM30.1.1

Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics, 8, 842–866. https://doi.org/10.1162/tacl_a_00349

Slaney, K. (2017). Validating Psychological Constructs. Palgrave Macmillan UK. https://doi.org/10.1057/978-1-137-38523-9

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. 31st Conference on Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Wickham, H. (2023). httr: Tools for Working with URLs and HTTP (Versão 1.4.6) [Software]. https://CRAN.R-project.org/package=httr

Zhang, J., Xu, X., Zhang, N., Liu, R., Hooi, B., & Deng, S. (2023). Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View (Versão 3). arXiv. https://doi.org/10.48550/ARXIV.2310.02124

Zhang, W., Deng, Y., Liu, B., Pan, S. J., & Bing, L. (2023). Sentiment Analysis in the Era of Large Language Models: A Reality Check (Versão 1). arXiv. https://doi.org/10.48550/ARXIV.2305.15005

Published

2025-12-19

How to Cite

Haas Bueno, J. M., Primi, R., Duarte de Almeida Cordeiro, E., Deyvis Santos Araújo Jesuíno, A., Muniz, M., & Porto Noronha, A. P. (2025). Content validity evidence for a personality inventory: LLMs-assisted psychometrics. Psico, 56(1), e47225. https://doi.org/10.15448/1980-8623.2025.1.47225

Most read articles by the same author(s)