Validez basada en el contenido de un inventario de personalidad
psicometría asistida por LLMs
DOI:
https://doi.org/10.15448/1980-8623.2025.1.47225Palabras clave:
inteligencia artificial, evaluación psicológica, psicometríaResumen
Los Large Language Models (LLMs) representan un avance en el Procesamiento del Lenguaje Natural (PLN). Este estudio investiga la utilización de MLGEs en la obtención de evidencias de validez basadas en el contenido en la evaluación de los cinco grandes factores. Los ítems del nuevo instrumento fueron creados por ChatGPT y analizados semánticamente por Gemini, junto a los ítems del BFI2 (creados por humanos). El análisis empleó la clasificación de los ítems mediante prompt (juez experto) y el análisis factorial exploratorio de los embeddings (API), un nuevo enfoque psicométrico. Los resultados mostraron convergencia semántica para neuroticismo, amabilidad, apertura y consciencia, pero una mayor dispersión en los ítems de extraversión. Se observó también convergencia semántica entre los ítems creados por el LLMs y por humanos (validez convergente de contenido). Se concluye que los LLMs presentan un buen potencial para contribuir en el proceso de obtención de evidencias de validez de contenido.
Descargas
Citas
Alexandre, N. M. C., & Coluci, M. Z. O. (2011). Validade de conteúdo nos processos de construção e adaptação de instrumentos de medidas. Ciência & Saúde Coletiva, 16(7), 3061–3068. https://doi.org/10.1590/S1413-81232011000800006
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Attali, Y., Runge, A., LaFlair, G. T., Yancey, K., Goodwin, S., Park, Y., & Von Davier, A. A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077
Debelak, R., Koch, T. K., Aßenmacher, M., & Stachl, C. (2024). From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists. https://doi.org/10.31234/osf.io/bc56a
Dempsey, P. A., & Dempsey, A. D. (2000). Using Nursing Research: Process, Critical Evaluation, and Utilization (5th ed.). Lippincott Williams & Wilkins.
Demszky, D., Yang, D., Yeager, D. S., Bryan, C. J., Clapper, M., Chandhok, S., Eichstaedt, J. C., Hecht, C., Jamieson, J., Johnson, M., Jones, M., Krettek-Cobb, D., Lai, L., Jones Mitchell, N., Ong, D. C., Dweck, C. S., Gross, J. J., & Pennebaker, J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2, 688–701. https://doi.org/10.1038/s44159-023-00241-5
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Ethayarajh, K. (2019). How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (arXiv:1909.00512). arXiv. http://arxiv.org/abs/1909.00512
Fitzner, K. (2007). Reliability and Validity A Quick Review. The Diabetes Educator, 33(5), 775–780. https://doi.org/10.1177/0145721707308172
Fors Connolly, F., & Johansson Sevä, I. (2021). Agreeableness, extraversion and life satisfaction: Investigating the mediating roles of social inclusion and status. Scandinavian Journal of Psychology, 62(5), 752–762. https://doi.org/10.1111/sjop.12755
Goldberg, L. R. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229. https://doi.org/10.1037/0022-3514.59.6.1216
Google. (2024). Gemini (Modelo models/text-embedding-004) [Large language model]. Google. https://ai.google.dev/gemini-api/docs/embeddings
Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. https://doi.org/10.1037/1040-3590.7.3.238
Hu, J., Dong, T., Gang, L., Ma, H., Zou, P., Sun, X., Guo, D., & Wang, M. (2024). PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation (Versão 2). arXiv. https://doi.org/10.48550/ARXIV.2407.05721
Hu, L., He, H., Wang, D., Zhao, Z., Shao, Y., & Nie, L. (2024). LLM vs Small Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18234–18242. https://doi.org/10.1609/aaai.v38i16.29782
Kjell, O. N. E., Kjell, K., & Schwartz, H. A. (2024). Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment. Psychiatry Research, 333, 115667. https://doi.org/10.1016/j.psychres.2023.115667
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05
Lorenzo-Seva, U., & Ten Berge, J. M. F. (2006). Tucker’s Congruence Coefficient as a Meaningful Index of Factor Similarity. Methodology, 2(2), 57–64. https://doi.org/10.1027/1614-2241.2.2.57
McCrae, R. R., & Costa, P. T. (1997). Personality trait structure as a human universal. American Psychologist, 52(5), 509–516. https://doi.org/10.1037/0003-066X.52.5.509
Oliveira, J. P. (2019). Psychometric Properties of the Portuguese Version of the Mini-IPIP five-Factor Model Personality Scale. Current Psychology, 38(2), 432–439. https://doi.org/10.1007/s12144-017-9625-5
Ooms, J. (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects (Versão 1). arXiv. https://doi.org/10.48550/ARXIV.1403.2805
OpenAI. (2023). ChatGPT (Versão 3.5, consulta de setembro) [Large language model]. OpenAI. https://chat.openai.com
Pasquali, L. (2010). Instrumentação Psicológica: Fundamentos e Práticas. Artmed.
Pellert, M., Lechner, C. M., Wagner, C., Rammstedt, B., & Strohmaier, M. (2024). AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories. Perspectives on Psychological Science, 19(5), 808–826. https://doi.org/10.1177/17456916231214460
Pires, J. G., Nunes, C. H. S. D. S., Nunes, M. F. O., & Primi, R. (2023). Preliminary validity for the Big Five Inventory-2 in Brazilian adults. Psico-USF, 28(1), 91–102. https://doi.org/10.1590/1413-82712023280108
R Core Team. (2023). R: A Language and Environment for Statistical Computing (Vienna, Austria). R Foundation for Statistical Computing. https://www.R-project.org/
Revelle, W. (2007). psych: Procedures for Psychological, Psychometric, and Personality Research (p. 2.4.6.26) [Dataset]. https://doi.org/10.32614/CRAN.package.psych
Rizopoulos, D. (2006). ltm: An R Package for Latent Variable Modeling and Item Response Theory Analyses. Journal of Statistical Software, 17(5). https://doi.org/10.18637/jss.v017.i05
Roebianto, Roebianto, Savitri, Aulia, Suciyana, & Mubarokah. (2023). Content validity: Definition and procedure of content validation in psychological research. Testing, Psychometrics, Methodology in Applied Psychology, 30(1), 5–18. https://doi.org/10.4473/TPM30.1.1
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics, 8, 842–866. https://doi.org/10.1162/tacl_a_00349
Slaney, K. (2017). Validating Psychological Constructs. Palgrave Macmillan UK. https://doi.org/10.1057/978-1-137-38523-9
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. 31st Conference on Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wickham, H. (2023). httr: Tools for Working with URLs and HTTP (Versão 1.4.6) [Software]. https://CRAN.R-project.org/package=httr
Zhang, J., Xu, X., Zhang, N., Liu, R., Hooi, B., & Deng, S. (2023). Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View (Versão 3). arXiv. https://doi.org/10.48550/ARXIV.2310.02124
Zhang, W., Deng, Y., Liu, B., Pan, S. J., & Bing, L. (2023). Sentiment Analysis in the Era of Large Language Models: A Reality Check (Versão 1). arXiv. https://doi.org/10.48550/ARXIV.2305.15005
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2025 José Maurício Haas Bueno, Ricardo Primi, Emanuel Duarte de Almeida Cordeiro, Ana Deyvis Santos Araújo Jesuíno, Monalisa Muniz, Ana Paula Porto Noronha

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.




