LLM-Assisted
Content Analysis
A systematic procedure for integrating large language models into content analysis while maintaining methodological rigour and the researcher's interpretive responsibility.
LLM-Assisted Content Analysis
Preface
This is an English adaptation of the original Spanish guide Análisis de contenido asistido por LLM. Guía paso a paso para la investigación educativa by Javier Vidal (EVORI Group, University of León). The translation was produced with LLM assistance and reviewed by the author in accordance with Principle P6. The AiRISES case study examples draw on data from Spanish educational forums; they are left in their original form as authentic illustrations of the methodology, and readers from other national contexts are encouraged to substitute equivalent examples from their own systems. Table 7 (the lexical dictionary) has been adapted to a generic anglophone educational framework; a note in the table explains this adaptation. The warm, direct tone of the original has been preserved intentionally — it may feel less formal than typical academic writing.
The English translation of this guide was produced with LLM assistance (Claude Sonnet, May 2026), used for initial drafting, reformulation, and terminology search. All translation decisions — including the adaptation of educational terminology to anglophone contexts, the handling of culturally specific examples, and the editorial tone — were taken and reviewed by the author. No LLM has replaced the author's judgement. The translation function has been instrumental, similar to an advanced editing tool, under the author's supervision and responsibility at all times.
After much consideration, and given the mixture of enthusiasm and exhaustion that academic research tends to produce, I decided to write this guide with the aim of reducing the exhaustion without diminishing the enthusiasm. I thought it might help you to have a document that accompanies you step by step through LLM-assisted content analysis (what people generally call "the ChatGPT thing"). I have written it in a direct, practical style — more working handbook than academic treatise. If you need academic references on content analysis, there are hundreds of them, both textbooks and research articles that apply the method. You will also find that this analytical approach is used across many scientific fields — it is worth looking at examples from areas close to your own research topic.
Although I have been doing content analysis since my doctoral thesis (1995), this guide is the product of my experience over the past three years in the AiRISES project and everything I have been learning — sometimes in front of a screen, sometimes over coffee with colleagues. Through trials, errors, meetings and occasional moments of inspiration (rare, but glorious), we found a way of working with LLMs that combines the rigour of analysis with the possibilities these new tools offer. This is only the beginning. I think I should put a date on this document — one never knows when a new LLM version will come along that renders half of it obsolete. Whether I manage to keep it updated is another matter (I doubt it).
My intention is that this guide should accompany you step by step through the content analysis process, remind you that the process has an internal logic — even when it does not seem like it — and allow you to move forward confidently while maintaining your own judgement and research perspective. The procedure described must be adapted to your type of study, the size of your corpus and your research questions, and always requires your decision-making to adjust, review and reinterpret each phase. In this sense, the guide proposes a flexible framework that will support your analysis without replacing your methodological reflection or your interpretive responsibility — let alone the analysis of implications and improvement actions, which are indispensable in educational research.
I hope you find here not only the steps to follow, but also the feeling that this is doable, and moreover, that you can do it very well.
If it is useful, let me know. And of course, please send me any suggestions for improvements that would benefit others who might use it. I have already incorporated some.
In Part V, Section 12.4, I say that something must be done. In keeping with my own recommendations and as an example of good practice, I include this declaration.
In preparing this guide I used support from several LLMs — specifically ChatGPT, NotebookLM, Gemini and Claude (in their most up-to-date versions as of March 2026). They were used as assistance tools at various stages of the process. Their use was limited to drafting, textual reformulation, searching for alternative expressions, generating examples, supporting content structuring and presentation formatting. All conceptual, methodological and editorial decisions were made by me, and I critically reviewed each proposal generated. The model did not intervene in the essential content and, in no case, did it replace my judgement as a researcher. Its function was instrumental, similar to an advanced editing tool, always under my supervision and responsibility.
A note on language. In the original Spanish guide I deliberately combined three registers depending on context. In the English version, the direct, personal tone is preserved throughout. In definitions, principles and academic formulations I use generic neutral language ("the researcher"). When addressing you directly with instructions or warnings, I use the second person ("you"), which in English carries no gender marking. This is a deliberate editorial choice intended to make the guide feel like a working companion rather than a textbook.
Good luck.
Javier Vidal
Foundations
Introduction
Some context. Content analysis assisted by large language models (LLMs) has become a tool of growing interest in educational research. The availability of systems such as ChatGPT, Gemini or Claude — capable of processing large volumes of text and generating summaries, categories and explanations — makes it possible to optimise processes that are traditionally laborious and demand considerable time for reading, coding and comprehension. From the outset, however, it is important to make clear that the use of LLMs is a methodological support tool that does not replace the researcher's responsibility. This guide takes a combined approach: (a) conducting a traditional content analysis grounded in methodological rigour, transparency and critical examination of results, while (b) incorporating the LLM as technical support to accelerate tasks and organise information — without, I insist, replacing the researcher's judgement or interpretation.
The analytical procedure presented in this document is the direct result of work developed in the project Application of Artificial Intelligence in the Analysis of Informal Social Networks for Guidance in Higher Education (AiRISES), funded by the Spanish Ministry of Science and Innovation (PID2021-125405NB-I00), of which María José Vieira and I were principal investigators. Its development is grounded in more than three years of continuous research, combining automated analysis techniques and expert review to address rigorously a broad and complex corpus of messages from informal forums. This procedure is the final synthesis of the knowledge accumulated by everyone involved in the project, both the research team — Camino Ferreira, Agustín Rodríguez-Esteban, Diego González-Rodríguez and Alba González-Moreira — and the working team — Estela Mayor-Alonso, Yaiza Viñuela, María Álvarez-Godos, Ainhoa Martínez-Rodríguez and Héctor González-Mayorga. The final version presented here would not have been possible without their contributions, which have constantly enriched the approach, the understanding of the object of study and the robustness of the method.
Purpose of the guide
The main purpose of this guide is to offer you a detailed, step-by-step procedure for conducting LLM-assisted content analysis in educational research. It combines conceptual foundations, practical guidance, examples and ready-to-use instructions, so that you can adapt the process to your own study, whether you are working with a large corpus of hundreds of documents or a brief survey with only a few responses.
The ultimate aim is for you to be able to conduct a rigorous, reproducible and ethically informed content analysis, integrating the LLM as an expert assistant that speeds up the technical work, but without in any case replacing your own judgement or interpretive responsibility (I will be somewhat insistent about this).
Who it is for
The guide is intended for people like you — early-career researchers, educators conducting research or innovation studies, and teams wanting to integrate LLMs into their workflows. It assumes no programming knowledge or prior experience with LLMs. It does, however, assume familiarity with the fundamental concepts of content analysis. It may be useful to consult a textbook or introductory document before continuing.
Scope and limits of LLMs in content analysis
LLMs can be powerful allies in supporting content analysis, but their usefulness depends on controlled use accompanied by the researcher's own judgement. They are particularly good at summarising, organising, grouping, rewriting and suggesting, but they do not replace the contextual, theoretical and ethical understanding of the person conducting the analysis. They may introduce biases inherited from their training data, over-interpret information or invent details. They therefore require precise instructions and constant supervision.
It is important to understand their capabilities and limitations clearly.
| Dimension | Elementos clave |
|---|---|
| What an LLM does well | Summarising large volumes of text; identifying preliminary patterns or tentative themes; suggesting initial category structures; coding texts into organised matrices; drafting syntheses and reports. |
| What must not be delegated to the LLM | Final interpretation of findings; attribution of deep or contextual meaning; autonomous construction of categories without human supervision; theoretical judgement and conceptual relevance; ethical responsibility in data handling. |
| Critical aspects to monitor | Circular or tautological reasoning; over-interpretations or unjustified inferences; excessive generalisations; inconsistencies in coding between similar fragments; lack of documentation that compromises the transparency of the process. |
In summary, LLM-assisted analysis is always an interactive process: the model accelerates repetitive and structural tasks; the researcher interprets, validates and decides.
Rationale for using LLMs in educational research
In educational research, the data collected tends to be abundant, varied and rich in nuance: open-ended questionnaires, interviews, classroom diaries, student reflections, tutor reports, portfolios, focus group transcripts, teaching plans, or even virtual forum messages. Analysing all of this manually is time-consuming and can slow down the exploratory phases.
This is where LLMs add clear value: they allow processes to be accelerated without sacrificing rigour, as long as interpretive control remains in the researcher's hands.
Nevertheless, it is essential to remember that models can make mistakes, over-summarise or introduce inferences not present in the data. Los LLM no comprenden el contexto educativo, institucional o social más allá de lo que se les proporcione y por tanto su aporte siempre requiere verificación crítica.
Recuerda que, aunque el LLM te entregue una tabla impecable, la que firma el artículo eres tú. Si el modelo se inventa algo y no lo pillas, tú te llevarás un disgusto cuando te pregunten los revisores y la máquina ni se inmutará. Peor aún, te dirá: I'm sorry, you're right. I was wrong. The LLM accelerates the analysis process; the researcher interprets. That is the central principle. Here are a few examples.
Example 1. Open-ended student response (questionnaire)
Original data (Year 10 student):
“Me gusta cuando trabajamos en grupos pequeños porque así puedo preguntar sin vergüenza. En clase grande me pierdo enseguida.”
What the LLM can do here:
identify preliminary patterns (comfort, participation, emotional climate),
suggest tentative categories: participation, peer support, barriers in large-group settings,
extract relevant verbatim quotations without rewriting.
What it must not do:
interpretar que la alumna “tiene baja autoestima”,
suggest causes not mentioned,
infer personal characteristics.
Example 2. Comments in an institutional forum (university)
Student message on Moodle:
“A veces no entiendo bien qué pide cada práctica porque las instrucciones están en sitios distintos. Me ayudaría tener todo unificado.”
What the LLM can do:
identify needs for clarity, materials management, and student experience,
suggest a category such as: demands for teaching organisation,
suggest representative quotations.
What it must not do:
deducir que el profesor “no planifica bien”,
assert that this affects academic performance.
Formal aspects of the guide
First, throughout this guide I use the term LLM (Large Language Model) in preference to the broader term AI (Artificial Intelligence). The reason is that the assisted content analysis described here is specifically based on generative language models — such as ChatGPT, Claude or Gemini — capable of processing text, generating responses and applying complex analytical criteria. These models are part of the broader field of artificial intelligence, but the term AI is excessively generic and encompasses very different technologies (computer vision, expert systems, robotics, predictive analytics). By contrast, LLM precisely identifies the specific tool involved in content analysis, makes it easier to describe the process in technical terms, and avoids conceptual confusion. For this reason, and in order to maintain terminological rigour and methodological clarity, this guide uses the term LLM to refer to the assistant agent throughout all phases of the analysis.
Second, this guide was developed with reference to the capabilities of language models available in December 2025. Although LLMs evolve rapidly, the procedure described here is intended to be useful and durable, because it is grounded in the classical principles of content analysis and in structured interaction with the model. It is foreseeable that future improvements to LLMs (greater stability in applying analytical criteria, reduction of errors and more robust reasoning capabilities) will increase the validity and reliability of results, gradually reducing the need for direct human supervision at each step. Nevertheless, the researcher's interpretive judgement will remain indispensable, and this guide provides a methodological foundation both for current capabilities and for those that will develop in the future.
Third, this guide includes a series of instructions designed to guide LLM use during content analysis. To distinguish them from the explanatory text, these indications appear highlighted in single-line boxes, easy to identify and to copy directly when needed. Each box contains a specific action you can ask the model to perform, or a rule the model must follow. I prefer to call them instructions, although everyone refers to them as prompts. A prompt can be "good morning" — that is, something that triggers a response from the LLM. Here I focus only on the instructions we want the model to follow. I will therefore use both terms interchangeably, with the same meaning. They will take the format of this example:
If you detect inconsistencies, biases or impossible tasks, flag the problem before continuing.
Aprovecho para aclarar que estos son ejemplos, son propuestas. Los más expertos en este tema dicen que el prompting is as much an art as a science. Experimenting to find what works best is therefore almost a necessity.
Bear in mind that what I explain here can be used in its entirety or partially, depending on the research approach and methodological decisions you make. I have structured the guide so that each chapter can be read independently, without omitting warnings I consider essential (which means you will encounter some of them repeated).
Cross-cutting operative principles (mandatory throughout)
These principles apply to all phases (construction, refinement, coding, synthesis and reporting). Do not forget them. I state them here to help you calibrate your expectations about what an LLM can and cannot do to assist you. As I noted above, I will revisit them in each section.
Foundations of content analysis
This chapter presents the conceptual principles underpinning the LLM-assisted content analysis procedure. Although I assume you already have a basic understanding of what content analysis is, I offer a summary to situate you in the methodological framework that will be applied in the following chapters.
Definitions and key concepts
Content analysis
Content analysis is a systematic analytical method for identifying, organising and interpreting meanings present in textual data. In educational research it is used to analyse:
open-ended questionnaire responses,
interviews,
educational forums, emails or chats,
portfolios or diaries,
learning evidence,
other types of documents containing open text.
El objetivo es transformar datos textuales en información organizada y, de ahí, en interpretación fundamentada. Los datos (corpus) solo adquieren ese estatud cuando son interpretados en un marco coherente y estable.
Units of analysis
The first decision is which unit of analysis to use — that is, the textual unit that will be analysed: complete responses to a survey question, individual sentences, paragraphs, or semantic units defined by the researcher (e.g., complete ideas). Example of a forum message (reproduced literally, spelling uncorrected):
ID03216 - Hola quisiera saber donde es en terminos relativos , ingenieria tecnica industrial, menos dura, numero de egresados donde puede verse de cada universidad o alguna informacion en relacion al tema
Categories
Each unit of analysis may contain one or more categories: labels or concepts representing patterns, ideas or themes identified in the text. Examples: academic performance, vocational family, university access, employment…
Coding
To identify categories in units of analysis, we use a process called coding. There are two levels of application in open texts: (a) internal coding by segment (sentences, semantic units, paragraphs) and (b) document-level coding when a category functions as an attribute or variable of the case (for example, classifying the document by the educational level of the respondent, profile, or other analytical variables). Examples:
Coding a message segment: identifying the phrase 'I'm not sure whether to do the Higher VET course' and assigning it the code Higher VET. Coding a document: analysing a complete message to determine, using linguistic indicators, whether the author is male, female or neutral (user gender).
Tema (o dimensiones)
Categories may share common elements. In that case, we create groupings that integrate several categories and express a central idea or significant pattern. Categories should be the final level of a hierarchical structure that may have more than two levels of grouping, which we can call themes or dimensions. Examples:
Temporal dimension: the theme "Stages", grouping hierarchically lower categories such as: Pre-university, University entrance exam, University and Employment.
Content dimension: the theme "Attitudinal aspects", integrating the categories Emotions (fear, hope), Intentions and Perceived difficulty.
Types of analysis: inductive, deductive and mixed
Inductive
In the coding process we can use an inductive strategy, in which we create categories from what we find in the text.
This strategy is useful when exploring little-studied phenomena, when there is no prior theoretical framework, when exploring a new phenomenon, or when the aim is to capture the content of the text without imposing prior interpretations.
An LLM is useful here for suggesting initial patterns.
Deductive
We can also use a deductive strategy, in which we apply prior categories derived from theories, existing research, conceptual frameworks, rubrics or previous questionnaires.
This strategy is useful when the aim is to test, compare or apply prior theories.
An LLM acts as an assistant for applying the predefined category system to the units of analysis.
Mixed
The mixed approach combines both strategies, applying prior categories while allowing new ones to emerge. It combines predefined and emergent categories.
This is the most common approach in complex studies, and an LLM can help adjust and enrich the system.
Risks
In the inductive approach, the main risk when using LLMs is over-inflating initial proposals and accepting poorly grounded categories; in the deductive approach, the danger is forcing data to fit pre-established theoretical frameworks; in mixed approaches, both risks coexist and require especially careful vigilance.
Here is a summary table.
| Criterion | Inductive | Deductive | Mixed |
|---|---|---|---|
| Starting point | The data | Prior theoretical framework | Theory and data |
| Category system | Emerge progressively from the corpus | Defined before the analysis | Initial categories that are revised and adjusted |
| Relationship with theory | Theory is constructed from the data | Theory guides the analysis | Theory orientates, but is revised in light of the data |
| Degree of openness | Very high | Low | Medium–high |
| Main advantages | Discovers unforeseen topics; high sensitivity to context | Greater conceptual control; facilitates comparability | Balance between rigour and openness; high adaptability |
| Risks or limits | Initial dispersion; unstable categories | Forcing the data; losing relevant information | Requires greater methodological control |
| Typical use in educational research | Exploratory studies, analysis of student experiences | Programme evaluation, application of theoretical models | Applied educational research (very common) |
| Role of the LLM | Supports identification of preliminary patterns, thematic exploration and initial category proposals, always under human review | Systematically applies a previously defined category system and facilitates consistent corpus coding | Supports both initial exploration and the refinement and adjustment of categories, facilitating rapid iterations between theory and data |
Units of analysis and text segmentation.
The quality of the analysis depends largely on the precise definition of the unit of analysis.
| Type | Description | When to use it? |
|---|---|---|
| Complete response | Each response is coded as a whole | Short or very direct questionnaires |
| Sentence | Segment delimited by punctuation | Long but structured responses |
| Semantic unit | Complete idea regardless of length | Interviews, complex narrative |
| Paragraph | Extensive textual blocks | Institutional documents or long reflections |
| Turn of speech | Participant intervention | Focus groups |
| Diary entry | Text of a day or diary entry | Diaries |
LLMs work well at any of these levels, but require explicit instructions for each case. It is also useful to distinguish whether you will be:
coding within each document (segments/meaning units),
codificar el documento completo cuando la categoría se usa como atributo/variable del caso (p. ej., presencia de tema, tipo de documento, perfil del respondiente, etc.).
Ejemplos de instrucciones:
Use the sentence as the unit of analysis and do not divide into smaller units.
Identify meaning units within each response.
Treat each complete response as a case and assign variables/attributes at document level.
The role of the researcher in interpretation.
Although LLMs can suggest categories, summarise, code and synthesise, they do not understand the educational context in the deep sense required for rigorous analysis. The researcher contributes:
sensitivity to institutional and socio-cultural context,
theoretical knowledge,
judgement for assessing relevance,
reflective capacity for interpretation,
critical review of the model's work.
El principio que se debe seguir siempre es el P1: the LLM proposes and you decide.
Differences between manual and LLM-assisted analysis
Manual and assisted analysis share methodological principles but differ in pace and resources. Compared to manual analysis, LLM-assisted analysis presents the following:
Ventajas
Reduction of initial reading time.
Rapid identification of preliminary patterns.
Automatic generation of category proposals.
Orderly coding in tables.
Assistance in drafting summaries and results reports.
Consistency in handling large volumes of text.
Limitations
Risk of over-interpretation.
Categories that are too general if not adjusted.
Mechanical coding without nuance.
Lack of sensitivity to contextual details.
Requires constant supervision and cross-checking.
Recommendation: LLM-assisted analysis does not replace manual analysis, but optimises and makes it more efficient. Best practice consists of using the LLM to accelerate tasks, provide alternatives and generate structure, while you retain interpretive control.
Here is a summary table.
| Dimension | Manual analysis | LLM-assisted analysis |
|---|---|---|
| Speed | Slow with large corpora | High even with large volumes |
| Scalability | Limited by human time | High |
| Consistency | May vary between sessions | High si las instrucciones son estables |
| Risk of biases | Explicit human biases | Human + model biases |
| Transparency | High si se documenta bien | Requires explicit documentation |
| Researcher's role | Codes and interprets | Interprets, validates and decides |
| Type of tasks | Interpretive and technical | Accelerated techniques + human supervision |
You now have the conceptual map. You know what content analysis is, how an LLM functions in that process, and what type of analysis you will be conducting. Now it is time to get practical: before asking the model anything, you need to prepare your data. If this phase is done well, everything else flows. If done poorly, sooner or later you will have to come back here. On to Part II.
Preparation
Data preparation
- Define the research questions and study objectives. → Cap. 1.1
- Decide the analysis approach: inductive, deductive or mixed. → Cap. 2.2
- Identify the type of corpus to be analysed and the appropriate unit of analysis. → Cap. 2.3
The quality of LLM-assisted analysis depends initially on the quality of the data the model receives. This chapter explains how to collect, process and organise texts before asking an LLM to analyse them. Adequate preparation reduces errors, improves consistency and allows better use of the model's capabilities.
It is important to avoid discovering data errors during the analysis process. If they arise, you will typically need to go back, correct them and redo part of the work — costing you time and, very likely, your composure. Invest time in cleaning your texts thoroughly at the outset.
Data collection: common sources in educational research
In educational research it is common to work with different types of text. Before starting the analysis, it is worth reviewing their relevance, their relationship to the research questions, and considering whether contextual information should be incorporated.
LLMs can work with any type of text, but remember that in educational research the most common materials are:
Open-ended questionnaire responses (frequent in studies of perceptions, competencies, satisfaction…).
Semi-structured interviews or focus groups (transcribed).
Learning diaries and student reflections.
Institutional reports and documents (e.g., school plans, innovation reports).
Messages on educational platforms (forums, Moodle, Teams, chats…).
Student written work (essays, portfolios, comments).
Each data type requires specific decisions:
What will the unit of analysis be?
Will it be segmented into sentences, paragraphs, semantic units?
Are additional contextual data needed?
Does it need to be anonymised?
Cleaning and anonymisation
Before using an LLM it is essential to:
A. Remove personal or sensitive information.
names of students or teachers,
educational institutions,
phone numbers, addresses,
direct identifiers,
health data, and
other specially protected information.
If the presence of certain data is analytically relevant (for example, a person's role), it is advisable to replace that data with labels, such as Teacher A, Student 3 or School X.
B. Correct errors that hinder the model's reading
It is not necessary to correct everything, but you should correct what might impede comprehension, such as duplicated text, incomplete lines, page breaks or irrelevant formatting.
C. Standardise the format
While there is no single standard here, these are some format recommendations: plain text or simple table, without colours or indentation, using standard characters.
Your Ethical Data Safety Traffic Light.
Before uploading any data to the platform, take a moment to check which colour applies to you. Remember that privacy is a mandatory principle (P5):
Recommended formats
Models work well with numbered lists, delimited text blocks (commas, tabs, etc.), simple tables and clear paragraphs. It is preferable to avoid complex formats and to prioritise well-structured plain text. Numbering responses (R1, R2, R3…) facilitates traceability during coding.
Example of a data block:
| ID | RESPONSE |
|---|---|
| R1 | I think digital skills are important for motivating students. |
| R2 | I use ICT, but I feel I need more training. |
| R3 | The school does not have sufficient technological resources. |
Block-based analysis management
Most models work better when excessive loads are avoided. Rather than sending the entire corpus at once, you can work in blocks (for example, 50–200 short responses or one complete interview). Use clear names for each block. For example:
Block A: perceptions of digital resources
Block B: teacher training
If you load too much data at once, the model may become saturated, lose context, generate overly general categories or forget previous instructions. Working in parts, the LLM better maintains context, reduces length-related issues and allows the category system to be refined incrementally.
Special cases: short responses, noise and multilingual data
Very short responses, noise, multilingual data and poor spelling may require specific decisions: marking them as non-codeable, asking the model to translate or to correct only spelling without altering the content. These decisions must be documented, as they affect interpretation.
A. Very short responses
Ejemplo: “Sí”, “No”, “Depende”, pueden ser respuestas insuficientes o no relevantes para la investigación. En ese caso una posible solución es:
Detecta respuestas no codificables o insuficientes (como “Sí”, “No”, “Depende”) y agrúpalas en una categoría general llamada “Respuestas mínimas”, sin asignarles interpretación temática.
B. Noise or irrelevant texts
Isolated phrases, blank responses, duplicates, etc. may appear. In these cases, remove them before analysis or ask the model to flag them as non-analysable.
C. Multilingual data
If this is the case (not very common), note that the LLM can process them, but it is advisable to indicate this explicitly with instructions such as:
This dataset contains responses in English, French and Welsh.
Do not translate anything and analyse each response in its original language.
First translate the entire corpus into English literally and without interpreting. Do not alter the content.
D. Poor spelling
Models usually handle this well, but if it affects comprehension you should give instructions such as:
Rewrite these responses correcting only spelling and punctuation.
Configuring the LLM for content analysis
- Collect, clean and anonymise the data. → Cap. 3
- Apply the ethical safety traffic light before loading data into external platforms. → Cap. 3.2
- Organise the corpus in numbered blocks in a format readable by the LLM. → Cap. 3.3
Correctly configuring a language model is one of the most critical steps in the process. The quality of the final analysis depends largely on how the model is prepared, what instructions it receives, and what conceptual restrictions are established from the outset.
This chapter provides a set of principles and instructions for configuring ChatGPT, Gemini, Claude or other LLMs in a way orientated to educational research.
El “rol analítico” del LLM
LLMs do not possess intentionality or deep understanding; they function through statistical prediction (though it may not seem so). For the model to act as a methodological ally, it must be assigned a role — in this case, that of expert assistant in content analysis in educational research, careful with interpretation, respectful of the data, and attentive to requesting clarifications when faced with ambiguity.
The role must include:
competence in content analysis,
attention to methodological instructions,
emphasis on not inventing,
obligation to request clarification when faced with ambiguity,
respect for the language of the corpus, and criteria of transparency and traceability.
The role can be repeated at the start of each session or recalled periodically for greater consistency.
Master prompt: what to include
It is advisable to provide context and action instructions. The master prompt defines the working framework: approach (inductive, deductive or mixed), response style, restrictions (do not invent, do not generalise beyond the corpus) and protocols for flagging inconsistencies. This instruction should be used at the start of each session or phase of the analysis.
The master prompt is the initial configuration that orientates the entire conversation. It can be used with ChatGPT, Gemini and Claude, for example, without significant changes.
Tell the model:
A. El rol (como se vio en 4.1. )
Act as an expert assistant in content analysis in educational research. Your role is to help me explore, categorise and synthesise textual data, following criteria of academic rigour. Do not invent information. Request clarifications when necessary.
B. The methodological framework
Trabajaremos mediante un análisis de contenido inductivo/deductivo/mixto.
Sigue los pasos de exploración, categorización y síntesis que te indicaré.
C. The response style
The response style should be clear/structured/academic but not excessively technical/technical/with examples/with direct quotations/…
D. Restrictions
Do not make inferences that are not grounded in the data.
If something cannot be determined, state it.
Do not generalise beyond the corpus provided.
E. Error control
If you detect inconsistencies, biases or impossible tasks, flag the problem before continuing.
Version management and conversational memory.
LLMs function through threads or sessions. It is advisable to separate sessions by stage (exploration, categories, coding, synthesis), repeat the master prompt when opening a new thread, and always note which version of the category system is being used. Saving key outputs (category versions, tables) is essential for traceability.
For example, create separate sessions by phase:
Session 1: initial exploration
Session 2: developing categories
Session 3: coding
Session 4: synthesis and reporting
Rules for avoiding loss of context:
Repeat the master prompt at the start of each session.
State the version of the category system. Load only the data needed for each task.
We will work with the category system version 2.0 [attach].
Save key outputs manually:
category list,
coding tables,
previous summaries.
Maintain organised folders and file names:
Categorias_v1.docx
Categorias_v2.docx
Codificación_BloqueA.xlsx
Consejo de investigadora a investigadora: tu bitácora de investigación.
To comply with the traceability principle (P3), do not rely solely on the chat memory. I suggest opening a simple document where you note:
| Field | Explanation | Usage example |
|---|---|---|
| Date and Model | Record the exact day of the session and the specific LLM version. This is crucial because model capabilities evolve rapidly. | 14 March, ChatGPT-5. |
| Session description | Note the context of the conversation "thread". It is important to know whether the model has memory of previous steps or whether you have started "from scratch", to avoid accumulated biases. | Continuing the analysis from where I left off Starting the analysis of document ID23 from scratch |
| Key instructions | Document any change or nuance you have introduced in your master prompt. If you paste the exact instruction here, you will be able to replicate the result months later. | Modifiqué la instrucción añadiendo “identifica ideas duplicadas”. |
| Observations | Note qualitative impressions about the model's performance or conceptual blocks. This will help you interpret the data without desperately trying to remember what happened in that session. | Today the model is being particularly creative It seems to have got stuck on the TEACHING category |
Sources of error and how to mitigate them
Typical errors include over-interpretation, creation of excessively general categories, loss of instructions and inconsistency between responses. To mitigate them, clear restrictions are formulated, rules are recalled periodically, and the model is asked to review its own coherence.
Main problems and suggestions for mitigating them.
A. Over-interpretation o “alucinación”. Los modelos pueden generar inferencias no justificadas.
Do not make inferences that are not grounded in the data. If you cannot determine something, state it.
B. Overly general categories. Instructions: request inclusion/exclusion criteria and textual examples.
C. Loss of instructions. Ask the model to repeat the master prompt periodically.
D. Inconsistency between responses. Instructions: ask the model to review its own coherence.
Revisa si hay categorías aplicadas de forma inconsistente.
Identifica contradicciones y propón ajustes.
E. Ambiguity in units of analysis. Instructions: explicitly indicate which unit to use.
F. Mixed languages. Instructions: give clear guidance on whether to translate or not.
Here is a summary table.
| Source of error | Description | Risks | Mitigation strategy |
|---|---|---|---|
| Ambiguous instructions | Unclear or incomplete prompts | Inconsistent coding | Draft explicit, reusable instructions |
| Over-interpretation | Inferences not present in the data | Loss of validity | Prohibit external interpretations |
| Hallucinations | Introduction of non-existent content | Invalid results | Require exact verbatim quotations |
| Vague categories | Unclear definitions | Overlaps | Refine definitions and criteria |
| Lack of human control | Accepting results without review | Systematic errors | Review samples and versions |
| Model variability | Variation between runs | Low reproducibility | Document prompts and versions |
Using projects or persistent workspaces (ChatGPT Projects, NotebookLM, etc.)
Specific working environments exist within LLMs themselves, such as ChatGPT Projects or NotebookLM, which allow you to organise documents, notes and conversations in one place permanently.
In the context of LLM-assisted content analysis, these spaces can function almost like a digital research notebook, where you can bring together:
el corpus (respuestas abiertas, interviews, documentos);
methodological documents (design, analysis protocol, inclusion/exclusion criteria, etc.);
the various versions of the category system;
notes of analytical decisions, records of changes and reflections;
exported coding tables or interim summaries.
Their use can be very helpful when the analysis is complex, extended over time, or involves different types of documents. However, they also introduce risks and limitations that are important to understand and make explicit in your research. Your decision whether to use them or not should be based on an analysis of these advantages and disadvantages for your specific project.
Advantages
a) Centralising study information
A project allows you to bring together in a single environment:
the corpus (for example, all open-ended questionnaire responses),
reference documents (key articles, theoretical framework, analysis criteria),
the methodological guide being followed,
previous reviews and versions of the category system.
Esto reduce la dispersión típica de tener muchos archivos sueltos (Word, PDF, hojas de cálculo) y facilita que el modelo pueda “ver” el conjunto de documentos relevantes cuando interactúa con el investigador.
b) Maintaining context between sessions
En una conversación “normal” con el modelo, el contexto se pierde o se fragmenta cuando se cierra la sesión o se supera el límite de mensajes. En cambio, los proyectos o notebooks mantienen:
the interaction history,
attached documents,
and often a persistent summary of key content.
This makes it possible to resume the analysis days or weeks later without having to explain everything from scratch, which is very practical for those who have to combine research with teaching or other tasks.
c) Reducing repetition of instructions
When working without projects, you must repeat in each session:
the study context,
the research questions,
the type of analysis (inductive, deductive, mixed),
the model's role (assistant in content analysis),
las advertencias de “no inventar”, etc.
En un proyecto, gran parte de esta información se puede fijar al inicio y el modelo la tiene disponible como referencia recurrente. Esto ahorra tiempo y reduce el riesgo de olvidar alguna indicación importante.
d) Función de “cuaderno de investigación” digital
A project can be used as a methodological record space, where the following are stored:
decisions about category merges and splits;
versions (v1.0, v2.0, v3.0) of the category system;
justifications for changes;
discarded explorations, but documented.
Although this does not replace the formal record in your own document (for example, a research diary or the method section in your research report), it can complement it and serve as a draft or working repository.
e) Semantic search within documents
Some environments allow you to search for relevant fragments within uploaded documents, not only by keyword, but by meaning. This can help to:
quickly locate verbatim quotations that exemplify a category;
check whether a topic actually appears in the corpus or only in the summary;
review how a specific concept is used in different parts of the material.
In content analysis terms, this function can speed up the collection of evidence to justify findings.
f) Support for iterative and complex analyses
In extensive studies (for example, several waves of data collection, or several groups of participants), the project can contain logical subfolders (by block, phase, group), which makes it easier to maintain a global view of the analysis and allows the LLM to navigate between related materials without the researcher having to load them one by one in each message.
Disadvantages and risks
a) Privacy and data protection risks
Uploading real educational corpora (student responses, teacher data, institutional data…) to an external platform carries significant risks:
possible processing of data by the provider (OpenAI, Google, etc.);
storage on external servers;
potential breach of data protection regulations, if appropriate measures are not taken.
This requires:
rigorously anonymising data before uploading;
reviewing the platform's privacy policy and terms of use;
following the ethical and legal standards of the institution and the research project.
If the sensitivity level is high (for example, data about minors, health, or vulnerable situations), it may be more prudent not to use these types of environments, or to restrict use to already anonymised and highly aggregated materials.
b) Falsa sensación de “seguimiento inteligente”
El hecho de tener un “Proyecto” con muchos documentos puede crear la impresión de que el modelo tiene una especie de memoria profunda y estable del análisis, cuando en realidad:
it still operates through predictive text generation (it will always give you a response),
it may forget relevant details,
and it may not always respect the logical sequence of analytical decisions.
Podrías acabar confiando en exceso en que “el proyecto ya lo sabe todo” y dejar de verificar la coherencia metodológica. Es importante recordar que el proyecto es una ayuda organizativa, no un sustituto de tu razonamiento.
c) Technological dependency and access problems
If the entire analysis (decisions, versions, evidence) is concentrated in a single online project, any of the following can seriously affect continuity of work:
a technical incident,
a change in service terms,
loss of access to the account.
Por ello, es crucial exportar periódicamente información clave (sistemas categoriales, tablas, síntesis) a documentos locales bajo control del investigador. En general, es conveniente, necesario e, insisto, imprescindible hacer copias de seguridad diarias de todo lo relacionado con tu investigación (o con tu trabajo o con tus cosas personales). Supongo que no hará falta insistir en esto, pero pregunta a alguien que esté haciendo estas copias por haberlo aprendido por las malas (yo mismo). Apréndelo por las buenas.
d) Reproducibility limitations
In academic terms, reproducibility is affected because:
other researchers will not be able to replicate exactly the same project environment;
models may change version;
LLM results may vary over time.
This does not invalidate their use, but requires greater care in documenting the procedure, instructions, model versions and the core content of the project.
e) Riesgo de “caja negra” metodológica
If much of the reasoning and refinement takes place within the project environment and is not explicitly recorded in other documents, part of the analytical process may become opaque (or wholly dependent on screenshots, internal logs, etc.).
This runs counter to the principles of transparency and verifiability in research, so the project must not replace the research diary or the formal method section in your report.
Recommendation
Projects or persistent workspaces can be very useful as an organisational support tool when the analysis requires centralising the corpus, saving versions of the category system, maintaining coherence between sessions, and enabling rapid searches within uploaded materials. Their use is especially advisable when work extends over long periods, involves multiple documents, or requires easy retrieval of previous decisions.
However, these environments should not be used as the primary repository for the analysis or as a substitute for the formal methodological record. You must maintain the given recommendations, such as periodically exporting key analytical products (categories, tables, summaries) and recording in writing, outside the project environment, the most important methodological decisions. Persistent workspaces can complement the researcher's work, but cannot replace the documentary records required by content analysis.
And, very importantly, remember that before using a persistent workspace it is worth asking three questions:
Is the corpus completely anonymised?
Will the research extend over a long period with multiple documents?
Does the institution authorise the use of these platforms?
If any answer is negative, it is preferable to limit their use or work with other options.
With the data prepared and the model configured, you are ready to begin the analysis proper. From here, the work becomes more iterative: you will explore, propose categories, refine them and start again as many times as necessary. This is normal. This is how good content analysis works, with or without an LLM. Part III accompanies you through each of those cycles.
Analysis
Initial exploratory analysis
- Configure the LLM with the master prompt and assign it the analytical role. → Cap. 4
- Prepara los datos divididos en bloques para enviarlos progresivamente. → Cap. 3.4
- Nota: At this stage you do not yet need a category system. You are just exploring.
Exploratory analysis is the first systematic engagement with your data — the corpus. Its aim is not to build final categories but to understand the terrain, identify possible patterns and guide the categorisation phase. This phase benefits particularly from the speed and synthesis that LLMs offer.
How to present data to the LLM
The way data are introduced affects the quality of the analysis. Data blocks should be clearly delimited, using stable numbering and start/end markers. It is useful to briefly indicate what the block is about and what is expected of the model (for example, "I want an exploratory analysis, with no definitive categories yet").
Some recommendations:
Use clear, clean blocks
R1: [respuesta]
R2: [respuesta]
R3: [respuesta]
Include a stable, consistent header
A continuación, tienes un conjunto de respuestas sobre [tema].
Analiza únicamente este bloque.
Delimit the text. Using triple tildes or triple quotation marks helps the model identify the corpus:
<<< R1[respuesta] / R2[respuesta]>>>
Avoid multiple or ambiguous instructions. For example, do not ask for summaries, categories and coding simultaneously.
Requesting summaries, patterns and possible themes
In the exploratory phase, the model is asked to synthesise the main ideas, identify between five and ten preliminary patterns (depending on the type of data) and provide representative quotations. This gives a first overview of the corpus without yet committing to a stable category system.
In this phase it is recommended to begin with simple descriptive tasks:
Aquí tienes un conjunto de respuestas sobre [tema]. Para un análisis exploratorio preliminar quiero un resumen general de las ideas principales (5–7 líneas).
[Esperar respuesta]
Ahora, quiero entre 5 y 10 temas o patrones preliminares con ejemplos textuales breves que representen cada patrón. No generes categorías definitivas, solo patrones.
Quick identification of interpretive angles
In addition to the summary and patterns, it is useful to ask the model to identify:
recurring concerns,
positive/negative attitudes,
facilitating factors and barriers,
unexpected elements, questions that the data raise.
Identifica:
- facilitadores,
- barreras,
- emociones dominantes,
- necesidades expresadas,
- ideas minoritarias pero relevantes.
This allows the analysis to be orientated towards interpretive dimensions without fixing categories yet.
Possible errors at this stage
LLMs can make errors at this stage if they do not receive adequate guidance.
First, the model commonly confuses emerging patterns with definitive categories. This happens because LLMs tend to organise information in a structured way, even when that structure has not yet been validated. To avoid this, it is important to specify explicitly that preliminary observations are sought, not consolidated categories.
A second problem is the tendency to over-generalise. Models, having worked with large training corpora, tend to propose broad or abstract assertions. The most effective way to control this risk is always to require that any assertion be accompanied by exact verbatim quotations from the corpus, so that the analysis remains anchored in real data.
Third, externally sourced interpretations may appear that are not justified by the data. LLMs, by their generative nature, may incorporate inferences based on common associations or habitual linguistic patterns that are not present in the corpus being analysed. This problem can be mitigated by explicitly indicating that the model must not interpret beyond what is clearly expressed in the responses.
Fourth, themes or analytical dimensions of different scales may be mixed. For example, the model may combine individual emotions with institutional barriers, generating patterns that lack conceptual coherence. To prevent this type of confusion, it is advisable to ask that patterns be organised at a clearly defined analytical level (for example: personal, institutional, pedagogical, etc.).
Finally, LLMs tend to suggest causal relationships even when these are not supported by the data. These inferences tend to arise because models are trained to complete plausible narratives that do not necessarily reflect the content of the corpus. To avoid this problem, it should be expressly indicated that unjustified causal inferences are prohibited and that only elements appearing explicitly in the analysed texts should be described.
Before closing your laptop each day (and getting some rest), so that you can pick up the work tomorrow without surprises:
Backup. Have you exported the latest table to a local Excel or Word file?
Chat link. Save the link or PDF of the current conversation.
Review your research log. Have you noted all your decisions?
You can now end your work session with peace of mind
El error: treating preliminary patterns as definitive categories.
Why this happens: los LLM tienden a estructurar la información incluso cuando se les pide solo explorar.
How to avoid it: explicitly state in the prompt that you do not yet want definitive categories. Trata las propuestas como hipótesis provisionales.
Building the category system
- Complete the exploratory analysis and prepare a list of preliminary patterns or themes. → Cap. 5
- Confirm the analysis approach (inductive, deductive or mixed). → Cap. 2.2
- If your analysis is deductive: have the theoretical framework or prior category system ready to provide to the LLM.
The category system is the heart of content analysis. At this stage, preliminary patterns are transformed into an organised analytical structure useful for coding and interpretation. LLMs can accelerate this process, but the researcher must exercise rigorous control to ensure clarity, relevance and coherence.
Academic criteria for a good category system
With or without an LLM, a category system must meet a series of criteria that guarantee its validity, analytical utility and internal and external coherence.
Relevance to the research objective.
The system must be directly linked to the study's questions and purposes. Categories should not be generic but relevant to what is to be understood or explained.
Exhaustiveness.
The set of categories must allow all significant topics present in the data to be classified. An exhaustive system avoids gaps and ensures the analysis captures the diversity of the corpus.
Exclusiveness.
Las categorías deben ser conceptualmente distintas y mutuamente excluyentes entre sí, de modo que cada una represente un significado claramente diferenciado. Esto no implica que una respuesta completa solo pueda vincularse a una única categoría cuando la codificación se realiza por segmentos: una misma respuesta puede contener varios segmentos y cada segmento puede corresponder a una categoría diferente.
Similarly, when categories are used as document- or case-level attributes or variables (for example, presence or absence of a topic, response type, analytical profile of the respondent), exclusiveness applies to the definition of the categories, not to the number of attributes a single document may present. The same case may have several attributes, provided these are clearly defined and do not overlap conceptually.
In both uses, the fundamental point is that categories must not compete with one another for the same type of content. A well-constructed category system avoids redundancies, reduces dispersion and facilitates consistent and comparable interpretations.
Conceptual clarity.
Each category must have a precise name, a clear definition and criteria that guide its application. The elements included in each category must share that definition and function as a cohesive concept, not a heterogeneous collection of ideas. This clarity facilitates reproducibility and reduces ambiguities in interpretation.
Grounding in textual examples.
Using representative corpus fragments as membership examples helps to delimit what does and does not belong to each category. Examples strengthen the comprehension, transparency and validity of the system.
Inductive: generating categories from the data
From an inductive approach, the LLM can propose emerging categories from the corpus. The researcher reviews those proposals, clarifies ambiguous terms, unifies duplicates and adapts the system to the theoretical and contextual reality of the study.
LLMs can help generate preliminary lists of categories, groupings, meaning summaries or operational descriptors.
A partir del siguiente bloque de datos, genera un sistema de categorías inicial.
Para cada categoría incluye:
- nombre breve,
- descripción clara,
- criterios de inclusión,
- criterios de exclusión,
- 2 o 3 citas textuales representativas.
No generes temas superiores todavía.
Deductive: applying pre-existing categories
When working with prior conceptual frameworks, the model is used to apply already-defined categories. It is provided with the category list and asked to classify each response, always with explicit justification based on the text. In deductive analysis, categories come from prior literature, theoretical models, rubrics, questionnaires, etc.
Estas son las categorías predefinidas.
Revísalas brevemente y confirma que las comprendes.
[CATEGORIES HERE]
Después clasifica cada respuesta dentro de una o varias categorías, siempre con justificación textual.
El LLM debe aprender el sistema, no crearlo.
A special case of deductive coding: using lexical dictionaries (controlled lexicon)
Within the deductive approach, a particular situation may arise in which category assignment does not require complex semantic interpretation, but can be supported by the explicit presence of certain terms or expressions. These are cases in which certain lexical elements act as unambiguous indicators of a previously defined category, which allows the implementation of deductive coding based on ad hoc dictionaries or lexicons.
This procedure constitutes a special case of the deductive approach, since it departs from a closed, theoretically validated category system defined with clear operational criteria. Unlike inductive coding, it does not seek to identify emerging themes or expand the category system, but rather to operationalise existing categories through explicit decision rules.
En el marco del análisis de contenido asistido por LLM, el uso de diccionarios léxicos no debe entenderse como un análisis puramente léxico ni como un sustituto de la interpretación semántica, sino como un mecanismo complementario, especialmente adecuado para determinadas dimensiones estructurales del contenido. En el proyecto AiRISES, este enfoque se aplicó, por ejemplo, a dimensiones como las etapas o niveles educativos, donde la mención explícita de términos como “universidad”, “bachillerato”, “Educación Secundaria” o “Formación Profesional de Grado Superior” permite una asignación directa y robusta de la categoría correspondiente, con un margen mínimo de ambigüedad interpretativa. Piénsalo como un "atajo seguro": si en el texto aparece "Bachillerato", la categoría es "Bachillerato", sin más vueltas.
The lexicon is built from the already-defined category system, associating with each category a finite set of terms, variants and equivalent expressions. This resource acts as a formalised deductive rule, so that the appearance of one of the dictionary terms activates the coding of the corresponding category. The selection of terms is based on domain expert knowledge, corpus review, and, where appropriate, previous iterations of LLM-assisted analysis.
From a methodological standpoint, this strategy fulfils a dual function. On one hand, it increases coding precision for categories where indicators are explicit and normative, reducing the risk of false negatives. On the other, it provides a contrast criterion for evaluating coding carried out by LLMs or human coders, facilitating the detection of inconsistencies, systematic omissions or problems in operational definitions.
It is worth emphasising that this special case of deductive coding must not be applied indiscriminately. Its use is appropriate only when there is a clear and stable correspondence between terms and categories. For more interpretive dimensions — such as attitudes, emotions, valuations or intentions — LLM-assisted semantic coding is more appropriate. Consequently, the approach proposed in this guide advocates a hybrid model, in which lexicon-based deductive coding and AI-assisted semantic interpretation are combined strategically, according to the nature of each analytical dimension.
The following table presents a simplified example of a lexicon used for deductive coding of the dimension educational stage or level. In this case, each category is associated with a small set of terms and expressions whose presence in the text is considered an unambiguous indicator of the corresponding category. The lexicon is built from the previously defined category system (educational levels) and acts as a formal decision rule, so that the appearance of any of the listed terms automatically activates category coding, without the need for additional semantic interpretation.
| Category / dimension | Operational definition | Lexicon terms and expressions (non-exhaustive) |
|---|---|---|
| Lower Secondary | Explicit references to lower secondary education (e.g., Key Stage 3/4 in England, Years 7–11), regardless of the narrative context of the message. | secondary school, lower secondary, Year 7, Year 8, Year 9, Year 10, Year 11, KS3, KS4, middle school, junior high |
| Upper Secondary / Sixth Form | Direct mentions of upper secondary education or the equivalent pre-university stage. | sixth form, A levels, AS levels, Year 12, Year 13, upper secondary, senior school, Highers (Scotland), advanced higher |
| VET / Technical education | Explicit references to vocational education and training or technical qualifications, without specifying level. | VET, vocational, technical college, further education, FE college, BTec, T level, apprenticeship, vocational qualification |
| Intermediate VET | Direct mentions of intermediate-level vocational qualifications or programmes. | Level 2 VET, intermediate apprenticeship, BTec First, Foundation apprenticeship, vocational Level 2 |
| Higher VET | Direct mentions of higher-level vocational qualifications or programmes. | Level 3 VET, Level 4/5, higher apprenticeship, BTec National, T level, HNC, HND, degree apprenticeship |
| University / Higher Education | Explicit references to university or higher education, regardless of qualification type. | university, uni, college, higher education, HE, campus, degree course, undergraduate |
| Bachelor's degree | Direct mentions of undergraduate degree-level study. | bachelor's, bachelor degree, BA, BSc, BEng, LLB, honours degree, undergraduate degree, first degree |
| Postgraduate | Explicit references to postgraduate study. | master's, MA, MSc, MBA, PhD, doctorate, postgrad, postgraduate, graduate school |
| Note: This table is an adaptation of the original Spanish lexicon, which was built around regulated Spanish educational levels (ESO, Bachillerato, FP, etc.). Researchers should construct their own lexicon based on the educational system relevant to their study context. |
The presence of any term included in the lexicon automatically activates coding of the corresponding category. This procedure is applied only to dimensions in which terms function as unambiguous indicators, and does not exclude the possibility of a single message being coded in multiple categories.
Si quieres que el LLM aplique este diccionario de forma automática y rígida, puedes usar esta instrucción que te ahorrará mucho tiempo de revisión manual:
Actúa como asistente técnico. Te proporciono un lexicón (diccionario de términos) vinculado a un sistema de categorías deductivo.
Tu regla de decisión es simple: si en el texto aparece alguno de los términos del lexicón, asigna automáticamente la categoría correspondiente. No realices interpretaciones semánticas profundas; cíñete a la presencia explícita de las palabras.
[INSERTAR TABLA DEL LEXICÓN AQUÍ, como la del ejemplo de la Tabla 7]
Devuelve los resultados en una tabla de codificación donde aparezca: ID de la respuesta, términos detectados y categoría asignada.
A piece of advice: before applying the lexicon to the full corpus, test it with 10 or 20 responses. If you see the model getting confused (for example, someone says "my brother is at university" but they are actually talking about their own VET experience), adjust the prompt so that it distinguishes the subject's context.
Applying these categories might give rise to a table like the following, where 1 indicates the presence of the category in the unit of analysis (for example, in a message).
| ID | Ed.Secundaria | Bachillerato | FP | FP Grado Medio | FP Grado Superior | Universidad | Grado | Posgrado |
|---|---|---|---|---|---|---|---|---|
| DOC1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |
| DOC2 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| DOC3 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |
| DOC4 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
| DOC5 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
| DOC6 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
| DOC7 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| DOC8 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| DOC9 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
Mixed: template + emerging categories
In many studies, pre-existing categories are combined with new ones emerging from the data. El modelo puede añadir categorías cuando detecta contenidos que no encajan en la plantilla, explicando por qué y proponiendo definiciones. El enfoque mixto combina lo mejor de ambas estrategias.
Use this category system as the base template.
[CATEGORIES HERE]
Add new categories if ideas appear that do not fit.
For each added category:
- explain why it does not fit the prior categories,
- provide its definition,
- include a textual example.
This approach is common in complex studies or those with multiple data sources. In any case, it is advisable to also use the inductive strategy to ensure that the proposed system is exhaustive or that the corpus suggests modifications to existing frameworks. Both situations are methodologically relevant.
How to ask the LLM for clear descriptions, inclusions, exclusions and examples
To avoid vague categories, the LLM should be asked to provide for each category a precise description, inclusion and exclusion criteria and textual examples. The total number of categories is also controlled to keep them manageable. Models tend to generate overly general categories if not guided.
Reformulate each category so that it meets:1) Nombre claro y no ambiguo.
2) Precise description with a single central idea.
3) Inclusion criteria.
4) Exclusion criteria.
5) Ejemplos textuales reales del corpus.
If any category is not consistent, propose adjustments.
Useful tips:
Pedir que evite sinónimos vagos: “motivación”, “actitud”, “recursos”, … sin especificar.
Request explicit exclusions: Incluye X pero no Y.
Restrict the number of categories (e.g., between 6 and 12).
Comparison between manually and LLM-generated systems
AI-generated systems tend to be more general; those constructed by researchers tend towards greater specificity. Combining both approaches allows the LLM's processing and synthesis capability to be exploited without sacrificing conceptual depth.
The LLM tends to:
generate more general categories,
group under broad concepts,
propose more neutral names.
The researcher tends to:
generate more specific categories,
qualify conceptual differences,
adjust the system to the theoretical framework.
Suggested process:
The LLM generates a first version (v1.0).
It produces a category draft from the initial corpus.
The researcher reviews and adjusts.
Refines the system: removes redundancies, merges categories, clarifies definitions and adds criteria.
The LLM generates the revised version (v2.0).
Integrates the modifications and reorganises the system according to the instructions.
The researcher validates the functioning.
Verifies that the categories are clear, distinct and applicable to the corpus.
A second LLM carries out a critical review (recommended).
Offers advantages, problems and improvements of the category system, acting as an independent reviewer.
El error: accepting overly general categories without exclusion criteria.
Why this happens: LLMs propose broad terms without specifying what is excluded.
How to avoid it: demand inclusion criteria y and exclusion criteria, as well as real textual examples from the corpus.
Refining and validating the category system
- Generate a category system version 1.0 (with LLM, manually or combining both). → Cap. 6
- Verify that each category has at least a name and provisional description.
- ⚠ Aviso: no codifiques el corpus completo hasta terminar este capítulo. La validación del sistema es previa a la codificación masiva.
Once the initial version of the category system has been constructed, it is essential to review and improve it before using it to code the full corpus. This phase, often underestimated, is where the quality, coherence and analytical utility of the system are ensured. Do not skip it. We need to seek external coherence (i.e., that one category is not so similar to another that you end up flipping a coin to decide which to use). LLMs can help detect inconsistencies and propose improvements, but the final validation must always be yours.
Merging, splitting and level of abstraction
During refinement, overly broad, redundant or ambiguous categories are detected. The decision is made whether they should be split, merged or renamed. The aim is to achieve a consistent and useful category system.
Este es el sistema de categorías versión 1.0.
Analiza la coherencia interna y externa e identifica:
1. Categorías demasiado amplias.
2. Categorías redundantes o solapadas.
3. Nombres ambiguos.
4. Diferencias de nivel conceptual.
5. Categorías innecesarias o irrelevantes.
Proporciona propuestas concretas de mejora.
Internal and external coherence
El sistema categorial debe funcionar como un mapa conceptual lógico.
Coherencia interna - Cada categoría debe tener una idea central clara, criterios bien definidos, límites precisos y ejemplos textuales consistentes.
Coherencia externa - Las categorías deben diferenciarse claramente entre sí, no solaparse, no contradecirse y tener un nivel de generalidad comparable.
Revisa el sistema categorial y responde:
- ¿Qué categorías están mal delimitadas?
- ¿Qué categorías son demasiado similares?
- ¿Qué categorías deberían agruparse bajo un tema superior?
- ¿Qué categorías carecen de ejemplos claros?
- Sugiere mejoras justificadas.
LLM assistance for detecting ambiguities
The LLM can be asked to act as a methodological critic, identifying conceptual weaknesses and proposing alternative groupings. It is also possible to carry out stress tests by applying the system to particularly complex fragments. Models can detect problems that escape the researcher due to their proximity to and familiarity with the data.
Request critical analysis.
Actúa como crítico metodológico.
¿Qué debilidades conceptuales encuentras en este sistema de categorías?
Proporciona observaciones específicas basadas en definiciones, criterios y ejemplos.
Solicitar alternativas conceptuales.
Propose 2 or 3 alternative ways of grouping these categories.
Request stress tests.
Evalúa este sistema aplicándolo a los siguientes 5 segmentos.
Detecta dónde fallan las definiciones y justifica por qué.
These tests are used to check whether the categories really work.
Example evolution: version 1.0 → 2.0 → 3.0
Version 1.0 (initial inductive AI)
Lack of training
Escasez de recursos
Motivation to learn
Use of technology
Apoyo institucional
Problems detected:
Category 4 too broad.
Categories 1 and 3 very similar but different.
Lack of precision in descriptors.
Version 2.0 (after human review + LLM)
Carencias formativas personales
Insufficient technological resources
Individual disposition towards digital learning
Institutional limitations for integrating technology
Self-training strategies and peer support
Improvements in v2.0:
More specific categories.
Clear differentiation between personal and institutional level.
Version 3.0 (final refined system)
Personal level
Carencias formativas personales
Individual disposition towards learning
Self-training strategies
Institutional level
Insufficient technological resources
Organisational limitations for integrating technology
Advantages of v3.0:
Consistent conceptual level.
Clarity in the boundaries between categories.
Greater utility for coding and interpretation.
Documenting methodological decisions
It is advisable to generate a decision record that captures how the system has evolved, what changes have been made and why. The LLM can help draft this record from the various versions. Traceability is essential for ensuring the credibility of the analysis.
It is recommended to record:
initial version of the system (v1.0),
LLM critiques,
human review,
improved version (v2.0),
reasons for merges or splits,
final criteria,
final version used (v3.0).
You can ask the LLM:
Genera un registro de decisiones basado en estas revisiones.
Include versions, changes and methodological justifications.
This record will be useful for:
the research method,
appendices,
peer review,
triangulation with other analysts.
El error: skipping the refinement and coding directly with version 1.0.
Why this happens: el sistema inicial casi siempre tiene solapamientos y definiciones imprecisas.
How to avoid it: aplica el sistema a 5–10 segmentos reales antes de usarlo en masa.
Assisted coding
- Validate and refine the category system to version 2.0 or above. → Cap. 7
- Verifica que cada categoría tiene nombre, definición, criterios de inclusión, criterios de exclusión y ejemplos textuales.
- Organiza los datos en bloques numerados y anonimizados. → Cap. 3.4
- Prepare the master prompt for the coding session. → Cap. 4.2
We arrive at the part that usually gives us the most headaches: coding. But do not worry — this is where the LLM will genuinely save you those hours of staring at a table until the words lose their meaning. Coding is the process by which categories are assigned to units of analysis. This normally involves coding text segments within each document, but in some studies the complete document is also coded when certain categories operate as case attributes/variables (for example, presence/absence of a topic, document type, or analytical characteristics of the respondent).
With the help of an LLM, this process can be organised much more quickly and in an orderly fashion, provided the category system is well-defined and the instructions are very precise. Part II explains how to prepare the data — you should review this now. Before coding, texts are organised into numbered blocks and the model is reminded which version of the category system to use. Blocks should not be excessively large, to avoid saturating the LLM's context window.
This section explains how to request table-format coding, how to validate coding quality and how to avoid frequent errors.
How to request clear, justified coding tables
Result tables typically include columns for text, category/categories and justification. The researcher specifies the desired format and whether assigning multiple categories to the same fragment is allowed. Emphasis is placed on justification being based on exact quotations.
Codifica cada respuesta según el sistema de categorías versión 2.0.
Devuelve la salida en forma de tabla con columnas:
1. Texto (resumen breve de la respuesta o la respuesta completa)
2. Categoría(s) asignada(s)
3. Justificación textual basada en segmentos del corpus
No inventes citas. Si una respuesta no encaja en ninguna categoría, indícalo.
[Optional: exigir una sola categoría]
Asigna solo la categoría más relevante a cada respuesta.
[Optional: permitir múltiples categorías]
Una respuesta puede incluir varias categorías si está justificado.
Si, en lugar de codificar segmentos, lo que necesito es codificar atributos/variables a nivel de documento, devuelve una tabla (o matriz) por casos donde cada fila sea una respuesta/documento y las columnas sean atributos (por ejemplo: presencia de X = 1/0), indicando brevemente el criterio para marcar 1.
How to manage large volumes of data
Hay que recordar que, en estudios con muchos casos, se trabaja por bloques (ver apartado PARTE II. 3.4. ) y se guardan las tablas parciales. Debe guardarse cada bloque como un archivo independiente.
It is important to maintain a stable methodological context by recalling key instructions and avoiding changing criteria mid-process. To this end, a permanent context must be established. It is suggested to always repeat:
Prompt maestro
Category system (current version)
Unit of analysis
Formato de salida
Periodically emphasise:
Do not generate new categories.
Do not infer meanings not present in the data.
Remember it is necessary to keep all instructions given saved in an external document (and I take this opportunity to insist: make backups).
El error: not specifying the unit of analysis in the LLM instruction.
Why this happens: without explicit instruction, the LLM freely decides how to segment.
How to avoid it: incluye siempre la unidad de análisis en el prompt y verifica en la revisión manual.
Validation and quality control
- Code at least one block of the corpus with the LLM. → Cap. 8
- Export the coding tables to local files before continuing.
- ⚠ Warning: manual review is mandatory. Do not skip this chapter even if the LLM has generated tables that look impeccable.
LLM-assisted coding does not, in any case, eliminate the need for human control or inter-rater agreement mechanisms. As in traditional content analysis, the quality of the category system and coding is strengthened when the extent to which different judges or coders agree when applying the categories is verified. In this context, judges may be:
uno o varios investigadores humanos,
uno o varios LLM configurados como codificadores,
a combination of both.
Aunque los LLM pueden acelerar el proceso, la validación metodológica sigue requiriendo tres tipos de control: manual review, doble codificación asistida y revisión cruzada entre modelos.
Here is a summary of what I will explain below.
Manual review of assisted coding
Even when coding has been carried out with LLM support, it is essential that you manually review a sufficiently large sample of the material (for example, between 10% and 20% of the corpus, or more in sensitive studies).
You must review:
La coherencia de la categoría elegida - Valora si la categoría asignada se ajusta realmente al sentido del fragmento y al sistema categorial definido (nombre, definición, criterios de inclusión y exclusión).
La adecuación de la cita textual utilizada como justificación - Verifica que la cita seleccionada por el LLM es representativa del fragmento y que sirve efectivamente para justificar la categoría asignada.
La ausencia de contenido inventado - Confirma que el modelo no ha añadido palabras, matices o ejemplos que no aparecen en el corpus original. Cualquier síntesis o reformulación debe poder rastrearse a partir de datos reales.
This manual review allows detection of systematic LLM error patterns (for example, a tendency to over-generalise, always apply the same catch-all category, or introduce subtle interpretations not in the text). If a very high number of errors is detected, or few but very important ones, try to understand why, return to square one and correct the process from the start.
But do not wait to stumble across an error. As the person ultimately responsible for the analysis (P1), be proactive about limiting hallucinations — send this challenge to the model to verify it is not inventing things (P2):
Review the coding table you have just generated. Identify whether any verbatim quotations have been summarised or altered. If you have "hallucinated" or invented any nuance not in the original text, acknowledge it now so I can correct it.
If the model apologises and corrects a quotation, you will have improved the validity of your analysis before it reaches the final report.
Double coding assisted by the same LLM
A useful internal control strategy consists of asking the same LLM for two alternative codings of the same data block, with identical methodological instructions but with some small variation (for example, in role), and then comparing the results. This logic is analogous to human double coding when two coders are asked to apply the same category system independently.
Some options:
Ask for a coding with a more conservative approach, for example:
Vuelve a codificar este bloque utilizando un enfoque más conservador.
Solo asigna una categoría por respuesta.
Ask for coding with strict criteria:
Codifica este bloque aplicando criterios estrictos.
Do not code anything that is not explicitly expressed in the text.
Comparing the initial coding with the conservative or strict coding allows you to:
identificar respuestas en las que el modelo duda o cambia de criterio,
detect categories applied too loosely,
refine category definitions if ambiguities are observed.
Desde el punto de vista del acuerdo interjueces, estas dos codificaciones pueden tratarse como si provinieran de dos jueces distintos (LLM-1 modo estándar y LLM-1 modo conservador) and then calculate agreement indicators (for example, percentage of agreement or fit statistics).
Discussing coding decisions with the model
The LLM can also be used as a judge that explains its decisions. In practice, this means asking the model to justify why it has applied a specific category and to re-evaluate doubtful decisions.
¿Por qué asignaste la categoría X a la respuesta R7?
¿Qué elementos del texto justifican esta categoría y no otra?
¿Detectas alguna respuesta que esté codificada de forma inconsistente respecto a la definición de la categoría X?
This type of dialogue serves to:
make visible the implicit criteria the model is using,
check whether those criteria match the formal definition of the category,
ajustar las instrucciones o reformularlas si se detectan desviaciones.
Although the model does not reason like a human, these explanations can help the researcher identify blind spots, biases or misunderstandings.
In addition, to avoid over-reliance and ensure you remain in charge, do not accept the model's first response. Test it:
I know you assigned category 'A' to this block. Now, act as an external critical evaluator and argue why the same data could fit category 'B'. What nuances would we be ignoring if we stayed with your first option?
This will force you to reflect on the coherence of your categories and to decide with much greater certainty.
Cross-review using multiple LLMs (assisted triangulation)
A particularly interesting way of approaching inter-rater agreement in an LLM-assisted environment is to use several different models as if they were independent judges. It is like asking another model for a second opinion, as one might consult a colleague from another department. For example:
use ChatGPT for a first coding,
use Claude to critically review that coding,
utilizar Gemini para comprobar si hay discrepancias significativas.
This strategy functions as a form of assisted triangulation:
If several different LLMs consistently agree when applying a clear category system, confidence in the stability of the system and the robustness of the coding increases.
If, on the other hand, the models frequently disagree, this may indicate problems in the category definitions, the clarity of instructions or the corpus structure itself.
From a methodological standpoint, this cross-review allows each LLM to be treated as an additional judge, similar to what is done with multiple human coders.
Ello implica la necesidad de analizar el grado de coincidencia entre diferentes codificaciones, que se evalúan mediante indicadores clásicos de acuerdo interjueces (porcentaje de acuerdo, Kappa, etc.).
En un contexto con LLM, se pueden aplicar sobre:
dos codificaciones del mismo modelo con instrucciones diferentes,
human coding vs. LLM coding (one or several models),
several codings by different LLMs (which would allow complete automation of the process).
In the context of this guide, the most important thing is to understand that:
es posible tratar las codificaciones de distintos LLM y de investigadores humanos como fuentes de datos comparables,
calculating these indicators strengthens the credibility of the category system and the coding process,
low levels of agreement are a signal that category definitions, model instructions or even the study design need to be revised (and, unfortunately, the process restarted).
Taken together, the combination of manual review, assisted double coding, discussion with the model and cross-review using multiple LLMs, supported by classic inter-rater agreement indicators, allows the quality standards of content analysis to be transferred to LLM-assisted analysis.
Partial automation of analysis in advanced stages
Once the category system has been constructed, reviewed and validated, you can consider partially automating some phases of the analysis, especially the coding of large volumes of data and the generation of preliminary summaries.
By automation I mean the creation of workflows that execute various tasks systematically and repeatably. For example, tasks that can be automated include initial data cleaning, table generation, file organisation, spreadsheet reading, splitting a corpus into blocks, converting file formats, extracting text fragments, consolidating coding tables and, where appropriate, interaction with an LLM to apply a category system or generate summaries. Automation thus consists of chaining steps that are traditionally carried out manually so that a digital tool performs them in a stable and controlled way. For your level of technical knowledge, fortunately, many of these tasks can be automated using applications that do not require programming, such as ChatGPT Automations or Make, which allow processes to be designed through visual interfaces. The reference to these tools corresponds to the landscape at the time of writing (December 2025), although these functionalities are likely to become increasingly accessible and easy to use, given recent developments in the sector. Automation should not be understood as a replacement for traditional content analysis, but as an efficiency mechanism applied only after verifying that the model applies categories coherently and consistently. Overall, it allows technical work to be accelerated and time freed up for interpretation, which remains the researcher's responsibility.
Condiciones necesarias
We have already mentioned all phases and precautions, but it is worth recalling here the requirements for automating a process in a way that provides the greatest possible assurance of result quality. For automation to be methodologically valid, at a minimum the following must be met:
a) Stable, well-defined category system
Categories must have a clear name, precise description, inclusion/exclusion criteria and representative examples.
At least one round of LLM-assisted refinement (version 2.0 or 3.0) must have been completed.
b) Prior reliability tests
Manually review a broad sample of the corpus (10–20%).
Compare two codings generated at different moments.
Verify that there are no invented elements or inferences beyond the text.
c) Relative corpus homogeneity
Automation works better when the texts:
belong to the same type (all short responses, all interviews…),
refer to the same educational phenomenon,
follow a comparable style and structure.
d) Clear definition of the unit of analysis
Before automating mass coding, you must establish whether you are coding:
the complete response,
the sentence,
the semantic unit.
e) Mandatory quality controls
Even with automation:
periodic manual samples must be reviewed,
contradictions or anomalies must be recorded,
instructions must be refined if biases appear.
What can be automated
Many things can be automated. Here are some examples:
Mass block-based coding: el modelo puede procesar cientos de respuestas en tandas de 50–100, aplicando el sistema categorial establecido.
Generating coding tables in different formats: tabla de texto, CSV, Markdown o Excel.
Summaries by category or theme: for example, summary of all units falling within category X or detection of cross-cutting patterns.
Automatic detection of inconsistencies: una automatización puede revisar si una categoría se está usando de forma desigual, las citas justifican la asignación, hay categorías vacías o sobrecargadas.
Extraction of representative quotations: the LLM can automatically collect the most frequent quotations, the most intense ones, those expressing contradictions.
Cross-review with another model: en automatizaciones complejas se puede codificar con ChatGPT, revisar con Claude y verificar coincidencias.
Riesgos y advertencias
Despite its advantages, automation involves significant risks:
Perder matices o excepciones
Priorizar patrones dominantes.
Reproducing systematic errors.
If the category system contains an ambiguity, automation multiplies the error rather than reducing it.
Over-representation of broad categories.
The LLM tends to lean towards very general categories if not continuously reminded of exclusion criteria.
Excessive dependence on automation.
The researcher may start acting automatically without reviewing critically.
This is why there must always be human verification, even when automation is high.
Automating analysis with ChatGPT Automations (workflow)
ChatGPT Automations currently allow (subject to change) the creation of workflows that execute repetitive content analysis tasks without continuous manual intervention. They are especially useful in advanced phases, when the category system has already been validated and the aim is to process large volumes of data while maintaining rigorous quality control. This section explains how to structure a workflow in ChatGPT to partially automate coding and synthesis, maintaining the principles of traditional content analysis. The specific implementation should be consulted in the application itself, as it may change at any time. Here is an example of the steps that might be automated in sequence:
Automation example: automated profile identification
Trigger: Receipt of the already-segmented corpus (for example, the 1,886 specific messages about Vocational Education). Action 1 (Mass Classification): The system automatically administers a specialised prompt to each message to identify the author's gender (female, male or neutral) based on linguistic indicators such as predicative adjectives and pronouns. Acción 2 (Generación de Alertas): El flujo marca automáticamente como Neutros aquellos mensajes donde no existen indicadores de género claros, evitando que el modelo realice inferencias arriesgadas o inventadas. Action 3 (Integrated Quality Control): The automation sets aside a random sample (for example, 50 messages) for the researcher to carry out an expert review. If the error margin exceeds a threshold (in the real case it was 14%), the workflow stops for prompt refinement. Action 4 (Data Consolidation): Once the prompt has been validated (reaching 100% agreement), the system processes the remaining corpus and exports the results to a file compatible with statistical software such as JASP or SPSS.
Recommendation
Partial automation of the analysis is a powerful and efficient tool, but should only be used after consolidating and validating the category system and verifying the model's stability. The steps can accelerate mass coding, quotation extraction and initial synthesis, but require constant quality controls: human review of samples, triangulation between models and detailed documentation of each step. Before automating, data must be completely anonymised and privacy risks carefully assessed. In no case does automation replace the researcher's analytical responsibility; it should serve only as a technical extension of traditional content analysis.
Once the system has been validated and the coding reviewed, the technical analysis is complete. What comes next is the part that most belongs to you: converting all that category structure into knowledge. The analytical synthesis and report are where your research perspective takes centre stage in a way the LLM can never replicate.
After completing the coding, the most interpretive and conceptual stage of the analysis arrives: analytical synthesis. I believe this is the most interesting stage, and without doubt the most useful from a knowledge-generation perspective. Its aim is to articulate the findings, identify relationships between categories, extract deep meanings and draft a coherent narrative that responds to the research questions. LLMs can help by generating synthesis drafts, identifying tensions and grouping patterns. However, as I insist once again, the final interpretation is the researcher's responsibility — it is your responsibility.
El error: accepting LLM tables without manual review.
Why this happens: el LLM puede generar tablas formalmente impecables pero con citas alteradas o inventadas.
How to avoid it: revisa al menos el 10–20% comparando con el corpus original.
Synthesis and reporting
Analytical synthesis
- Validate the coding and consolidate all tables into a single file. → Cap. 9
- Confirm that the category system is in its final version.
- Recuerda: the LLM can generate drafts, but interpretation is your responsibility.
From the coding, thematic narratives are constructed that integrate categories and respond to the research questions. The LLM can propose drafts of these narratives. To this end, you can ask the LLM to:
Identify patterns between categories
From the category system and coding table, identify the main thematic findings and organise them into 3–6 themes.
Draft a synthesis
Draft an academic thematic analysis of 3–5 paragraphs, based solely on the information contained in the coded data. Include brief textual examples.
Compare temas para buscar diferencias y similitudes
Explain the relationships between these categories and how they group into broader themes. The key is to maintain fidelity to the corpus.
Integrating direct quotations
Including verbatim quotations strengthens the credibility of the analysis. The model can suggest options, but the researcher must verify their accuracy and relevance. It is not just about including quotations — it is about identifying where a quotation can be relevant.
Las citas textuales deben:
representar adecuadamente los datos,
strengthen the credibility of the analysis,
show nuances that synthesis alone would not capture.
Include 1–2 representative verbatim quotations per category or sub-theme. Do not invent content. Use only exact quotations from the corpus.
Good practices:
Evitar seleccionar siempre las mismas respuestas.
Combinar citas breves y moderadas.
Muy importante, eliminar detalles personales si los hubiera.
Relationships between categories
The synthesis must go beyond listing categories: it must show how they relate (support, tension, explicit causality, thematic hierarchy). The model can suggest these connections, which are then contrasted with the corpus and theory. LLMs can help identify relationships of the type:
causal (always with caution: they must be explicit),
condicionales,
complementarias,
contradictorias,
hierarchical,
thematic.
Explain how these categories relate to one another. Indicate which support, contradict or form part of the same process.
Here, the LLM can detect, for example:
tensions between personal motivation and lack of resources,
contradictions between institutional policies and teacher perceptions,
differences between intentional discourses and actual practice.
Tensions, contradictions and exceptions
Cases that do not fit, contradictions and minority voices contribute important nuances. Explicitly asking the model to identify them helps avoid overly homogeneous conclusions. For this reason, a fundamental part of content analysis consists of identifying:
casos que no encajan,
contradicciones internas,
elementos discordantes,
perspectivas minoritarias.
Los LLM pueden ayudar a encontrarlas:
Identify atypical, contradictory or minority cases within the corpus and explain why they are relevant to the analysis.
Estos casos ayudan a:
evitar conclusiones simplistas,
enrich the interpretation,
mostrar diversidad dentro de los datos.
Reporting LLM use in the academic report
- Draft an analytical synthesis with the main findings. → Cap. 10
- Keep a log of all relevant methodological decisions.
- Sugerencia: review the LLM use declaration in the preface as a reference for your report.
When content analysis has been LLM-assisted, the academic report (article, doctoral thesis, master's dissertation, undergraduate dissertation or technical report) must transparently include which tasks were carried out with LLM support and which were the researcher's responsibility. This transparency is an ethical, methodological and reproducibility requirement.
The research report must mention LLM use in the method, limitations, reproducibility and ethics sections, and in results and discussion only when the LLM has explicitly contributed to those parts. Below I indicate in detail where it must be mentioned, together with the justification and a brief writing example for each case.
| Report section | Mandatory? | Justification and ethical/methodological requirement |
|---|---|---|
| Introduction | Optional | Contextualises LLM use as part of the study's novelty or general framework. |
| Method | Mandatory | Describes LLM tasks, instructions (prompts), human supervision and quality control. |
| Results | Condicional | Necessary only if the LLM generated syntheses, thematic summaries or pattern identification. |
| Discussion | Optional | Reinforces transparency about the human interpretive role relative to technical support. |
| Limitations | Mandatory | Acknowledges model biases, probabilistic variability and technological dependencies. |
| Transparency/Anexos | Mandatory | Ensures reproducibility through inclusion of prompts and category system versions. |
| Ethical considerations | Mandatory | Details the protection, anonymisation and privacy of data uploaded to external platforms. |
Introduction (optional, in some cases only)
When to mention it — Only when LLM use is part of the study's novelty, the general methodological framework, or the justification of the work's relevance.
Justification — Allows contextualisation that the study draws on emerging assistance tools, without yet attributing a central methodological role to them.
Ejemplo breve - “Este estudio incorpora herramientas de asistencia basadas en modelos de lenguaje (LLM) para apoyar, no sustituir, determinados procesos del análisis de contenido.”
Method (mandatory)
When to mention it — Always.
Es el lugar principal donde debe describirse el uso:
which tasks the LLM performed,
how it was instructed (instructions/prompt),
which decisions remained the researcher's responsibility,
and how the quality of the process was controlled.
Justification — Methodologically, the reader must be able to understand how categories were generated, how they were applied and what role the LLM played in each phase of the analysis. This information also allows evaluation of validity, biases and reproducibility.
Ejemplo breve - “El análisis de contenido se realizó mediante un procedimiento asistido por LLM. El modelo se empleó para generar propuestas iniciales de categorías, aplicar el sistema categorial validado y elaborar síntesis preliminares. Todas las decisiones de revisión, ajuste y validación fueron tomadas por el investigador. Se revisó manualmente una muestra del 20 % de las codificaciones y se efectuó una revisión cruzada con modelos alternativos.”
Results (depending on use)
¿Cuándo mencionarlo? - Solo si el LLM participó en la generación de resúmenes temáticos, agrupación de citas o identificación de patrones.
Justification — Allows distinction between:
what comes directly from human analysis,
what was generated with LLM support,
and what controls were applied to avoid errors (hallucinations, false quotations, unjustified inferences).
Ejemplo breve - “Las síntesis preliminares de cada categoría fueron generadas con apoyo de un LLM y posteriormente revisadas y ajustadas manualmente para asegurar fidelidad al corpus y coherencia con el sistema categorial.”
Discussion (optional, brief)
When to mention it — When LLM use has influenced how findings are interpreted, or when it is relevant to explain how the model was prevented from introducing external inferences.
Justification — Serves to reinforce transparency and to show that the interpretive role remains human.
Ejemplo breve - “Las interpretaciones presentadas se elaboraron exclusivamente a partir del análisis humano; el LLM se empleó únicamente como apoyo técnico para organizar los datos y generar borradores preliminares.”
Limitations (mandatory)
When to mention it — Always when LLMs are used.
Debe explicarse:
technological dependencies,
posibles sesgos del modelo,
variabilidad entre ejecuciones,
riesgos de interpretaciones no basadas en datos.
Justification — It is an ethical and methodological requirement. It allows the reader to evaluate the robustness of the work.
Ejemplo breve - “El uso de LLM introduce limitaciones asociadas a posibles sesgos del modelo y a variaciones entre ejecuciones, ya que un mismo prompt aplicado al mismo conjunto de datos puede generar respuestas ligeramente diferentes en distintos momentos, debido a la naturaleza probabilística del modelo. Estas diferencias pueden afectar a la clasificación, categorización o interpretación del contenido. Para minimizar estos riesgos, se realizaron comparaciones entre modelos y revisiones manuales sisthematic.”
Transparency and reproducibility (mandatory, may be integrated into the method or appendices)
When to mention it — Always.
Debe incluirse:
which prompts/instructions were used (or the most important ones),
which versions of the category system were provided to the LLM,
which quality controls were applied.
Justification — Allows other researchers to replicate the process or evaluate its credibility.
Ejemplo breve – (en Method) “Los prompts utilizados para la codificación y síntesis asistidas por LLM se incluyen en el Anexo 2, junto con la versión final del sistema categorial y una descripción de los controles de calidad aplicados.”
Consideraciones éticas (obligatorio)
When to mention it — Always, especially if data were uploaded to external platforms.
Justification — It must be explained how participant data were protected and how the LLM was prevented from processing identifiable information.
Ejemplo breve - “Todos los datos fueron anonimizados antes de ser procesados por los LLM y se emplearon plataformas que no reutilizan la información para entrenar modelos.”
Ethics and quality
Good practice and ethics in LLM-assisted analysis
- Cross-cutting chapter: aplica desde el inicio del proyecto hasta la entrega del informe.
- Si lo lees al final: úsalo para revisar que has respetado los principios P1–P6 a lo largo de todo el proceso.
- Si lo lees al principio: it will help you anticipate and plan ethical decisions from the study design stage.
The use of large language models (LLMs) in content analysis offers significant methodological opportunities, but also ethical, epistemological and practical risks. This chapter brings together fundamental principles to ensure responsible, rigorous and transparent use. I indicate here some ordered suggestions, already mentioned throughout the text.
If by the time you read this ChatGPT et al. are already making our mid-morning coffee too, it does not matter; what I explain here about bias and interpretation will still be what distinguishes a researcher from someone who only knows how to copy and paste.
Model biases and researcher biases
LLMs are not neutral; they learn linguistic patterns from enormous data corpora that include social, cultural and political biases, inherit biases from their training data, and may favour certain perspectives or discourses. The researcher also has their own biases. Awareness of both is an ethical and methodological requirement.
Sesgos del modelo
Pueden aparecer en:
interpretaciones que favorecen discursos mayoritarios,
invisibilisation of minority voices,
estereotipos sobre docentes o estudiantes,
categorizaciones demasiado simplificadas.
Sesgos del investigador
El LLM puede reforzar sesgos humanos preexistentes:
interpretar selectivamente respuestas,
accepting convenient categories,
over-trusting automated synthesis.
Recomendación:
Solicita siempre al LLM que identifique posibles sesgos en su salida:
Do you detect any bias or interpretation not supported by the data?
Limitations of using LLMs in content analysis
Models can over-interpret, invent details or simplify complex phenomena. They do not replace critical reflection or the negotiation of meanings between researchers. Among their most common limitations are:
Comprensión limitada: identifican patrones lingüísticos, pero no comprenden intenciones, contextos ni significados humanos profundos.
Risk of hallucinations: they may generate quotations, data or inferences not present in the analysed material.
Result inconsistencies: they may classify or interpret differently depending on the order, context or formulation of the request.
Reduccionismo analítico: tienden a simplificar fenómenos complejos, priorizando patrones dominantes y pudiendo incurrir en cherry picking (partial selection of evidence that reinforces an interpretation while omitting nuances or minority cases).
Excessive dependence: there is a risk that the researcher delegates interpretation without maintaining sufficient critical control.
How to avoid over-reliance
To avoid over-relying on AI, it is recommended to contrast results with manual analysis, consult other researchers and maintain a critical attitude towards the model's proposals. Good practices:
Manually review a sample of the coding.
Compare the LLM version with a human coding.
Use several models (ChatGPT, Gemini, Claude) for triangulation.
Carry out periodic critical checks:
Explain why this category is the most appropriate.
What alternatives might be plausible?
Técnica útil: “contraentrevistar al modelo”. Esto permite detectar debilidades en su razonamiento.
Defend the opposite category to the one you applied and justify your position.
Responsible use and reproducibility
Reproducibility in AI-assisted analysis requires documenting prompts, model versions, methodological decisions and key outputs. Although model responses may vary, it is possible to offer a sufficient description of the process.
Good practices for reproducibility:
Guardar todas las versiones del sistema categorial.
Guardar prompts utilizados.
Record each revision made to the system.
Save coding tables by block.
Flag which parts of the analysis were carried out by the LLM.
Mandatory declarations in undergraduate/master's dissertations/doctoral theses/articles:
model version (ChatGPT 4.x, Gemini Pro, Claude 3, etc.),
instrucciones dadas,
papel del LLM en cada etapa,
medidas tomadas para verificar exactitud.
How to document the process
In the doctoral thesis or article (etc.), it must be described how AI was used, at which stages, under what controls and with what limitations. This transparency increases the credibility of the work. Documentation is key to transparency and academic rigour. A typical section may include (as a checklist of information to be included):
Technical description:
which model(s) were used,
why they were selected,
at which phases they participated.
Detalle operativo:
prompts maestros,
units of analysis,
coding procedures,
justification of methodological decisions.
Verification strategies:
manual review,
triangulation,
contrast with literature.
Example of a transparent declaration:
“El LLM se utilizó para generar propuestas preliminares de categorías, realizar una primera codificación asistida y elaborar borradores de síntesis. Todas las decisiones analíticas finales, incluyendo fusiones y delimitación de categorías, fueron tomadas por el investigador. Se revisaron manualmente todas las citas textuales y se validó la coherencia interna del sistema categorial.”
Llegados aquí, tienes el procedimiento completo. Lo que viene a continuación no es más teoría: son herramientas para que puedas usarlo: checklists y prompts listos para copiar.
The appendices provide practical materials that can be reused directly in future analyses. They function as quick-reference tools and complementary methodological support for the reader.
Additional resources
Appendices
Checklist: peace of mind
Before you start
I have clearly defined the study objective and research questions.
I have decided whether the analysis will be inductive, deductive or mixed.
He determinado el tipo de corpus (encuestas, interviews, foros, documentos institucionales).
During the analysis
The LLM has received clear, stable instructions.
Verbatim quotations have been used to justify categories.
I have manually reviewed corpus samples.
When finishing
The category system is coherent and stable.
Validation strategies have been applied.
The process is documented transparently.
Data preparation and cleaning checklist
The data are complete and free of duplicates.
Empty or non-codeable responses have been removed.
Proper names, institutions and locations have been anonymised.
The texts are in a uniform format (table, plain text, document).
Se ha decidido cómo tratar las respuestas mínimas (“Sí”, “No”, “Depende”).
Changes made to the data have been documented.
Checklist for defining units of analysis and coding type
I have clearly defined the unit of analysis (document, segment, sentence, semantic unit).
I have decided whether I will code:
segments within the document,
the complete document as a case,
both.
I have defined which categories function as document attributes/variables.
I have explicitly stated these decisions in the LLM instructions.
Category system construction checklist
Each category has a clear and precise name.
Each category has an explicit definition.
Inclusion and exclusion criteria exist.
The categories are conceptually distinct from one another.
The system is exhaustive with respect to the corpus.
Each category includes representative textual examples.
LLM-assisted coding checklist
The LLM has the definitive category system.
The coding includes exact verbatim quotations.
A clear coding table has been requested.
Minimal responses are marked as such.
A part of the corpus has been manually reviewed.
Inconsistencies have been corrected before continuing.
Validation and quality control checklist
Manual review of a sample has been carried out.
Double coding (human or assisted) has been applied.
The LLM has been asked to justify doubtful decisions.
A second LLM has been used as a critical reviewer.
Discrepancies have been reviewed and categories adjusted.
Validation decisions are documented.
Ethics and data protection checklist
Data have been anonymised before using the LLM.
The privacy policies of the platform used have been reviewed.
No sensitive or identifiable data have been uploaded.
Unnecessary files have been deleted after the analysis.
LLM use is declared in the academic report.
Academic report checklist
LLM use is described in the Method section.
Final analytical decisions are human.
Results include verbatim quotations.
Limitations associated with LLM use are declared.
The ethical measures adopted are described.
The process is traceable and reproducible.
Methodological checklist
A final checklist to ensure all aspects have been covered:
Anonymised data
Master prompt defined
Versioned category system (min. v2.0)
Sample manually reviewed (min. 10%)
Methodological decisions documented
LLM use declared in method
Backups made
Acknowledged limits of the analyses
Ejemplos de prompts listos para usar (por fases)
1. Master prompt (use at the start of each session)
Actúa como asistente experto en análisis de contenido en investigación educativa.
Sigue escrupulosamente mis instrucciones.
No inventes información.
Si detectas ambigüedad, solicita aclaraciones inmediatamente.
Trabajaremos por etapas: exploración, categorías, codificación y síntesis.
2. Preliminary exploration
Aquí tienes un conjunto de respuestas sobre [tema].
Realiza un análisis preliminar que incluya:
1) Resumen general (5–7 líneas)
2) 5–10 patrones o temas preliminares
3) Citas textuales representativas
4) Tensiones o contradicciones emergentes
No generes categorías definitivas.
3. Creating the category system (inductive)
Genera un sistema de categorías inicial basado en los datos.
Para cada categoría incluye:
- nombre
- descripción clara
- criterios de inclusión
- criterios de exclusión
- 2–3 citas textuales del corpus
4. Refinement
Revisa este sistema de categorías (versión X).
Identifica:
- solapamientos
- categorías amplias o ambiguas
- redundancias
- diferencias de nivel conceptual
Propón una versión mejorada y justifica los cambios.
5. Coding
Codifica este bloque según el sistema de categorías versión X.
Devuelve la salida en una tabla con:
[Texto] – [Categoría(s)] – [Justificación con cita textual]
No inventes citas. Si una respuesta no encaja, indícalo.
6. Analytical synthesis
A partir de la tabla de codificación y el sistema categorial versión X,
elabora un análisis temático académico que incluya:
- temas principales
- relación entre categorías
- tensiones o contradicciones
- 1–2 citas representativas por subtema
No añadas información no presente en los datos.
7. Academic report
Redacta un borrador de informe académico que incluya:
- introducción breve
- método
- resultados (por temas con citas)
- discusión preliminar
- limitaciones del análisis y del uso de LLM
Basado exclusivamente en los datos proporcionados.