7 Best practices
There are multiple opinions regarding the role of LLMs in education. Regardless on the extend to which you are using Large Language Models for your own teaching and learning, there are certain things to keep in mind in order to enhance -and not replace- your thinking.
7.1 Context
One of the most critical challenges of LLM outputs is that they can be too generic -and possibly not very helpful. Providing enough context about the task is important for making effective prompts and conversations.
- Consider describing the background of the task: what is it for? which course is it for? what is the expected level (frosh, senior, etc.) what is your goal with it?
- Endow the LLM with a character or a role for the task:
you are a tutor for this courseyou are a course assistantyou are an editor providing feedbackyou are a student in this course- Fine-tune the LLM for language or knowledge expertise: think about limiting the knowledge or references to the current course or any other prior course. Including a syllabus or course program can help give more context.
Fine-tunning often means to give a specific context for every single prompt. Consider saving this basic context as a reference file in the LLM or even as a text note that you can copy and paste every time you prompt something related to a particular class.
7.2 Evaluation
Front-loading work in prompts is not the only way in which we can make our interactions with LLMs more effective. It is important to analyze their outputs and evaluate if they are achieving our expectations.
Since LLMs have some level of agency -they make tiny decisions as in what text to produce- it is useful to evaluate them as a supervisor would do with an assistant.
For this, it is important to have key rubric items that we can focus on while evaluating the LLM’s outputs:
| Item | Description |
|---|---|
| Compliance | Did the LLM generate what you were expecting? |
| Hallucinations | Are the facts used real? |
| Data | Did the LLM used the appropriate data or references? |
| Voice | Is the output given in the right voice, language, and terminology? |
7.3 TRUST Score
One of the most important aspects of LLM implementations is trust. This involves different levels. From the architecture of the model, the data used for training, to the way it produces outputs, and the usefulness/truthfulness of the responses, it is important to have a holistic sense of trust for the implementation.
There are three stages that are relevant for this:
- Training
- Processing
- Evaluation
A simple way to assess your awareness and knowledge of your implementation is by computing what I call the TRUST score: Transparency, Risk, Usefulness, Safety, and Trust. Each stage has two components and each component has a maximum total points that can be assigned. These are self-assigned points that can help you assess your own knowledge and awareness of the whole LLM system and implementation that you are using.
| Stage | Dimension | Description |
|---|---|---|
| Training | Data (1 pt) | Evaluate the data sources, data quality, copyright and privacy of the data used for training. |
| Training | Footprint (1 pt) | Consider the environmental and labor impact of the LLM. |
| Processing | Explainability (3 pts) | How well does the user understand the output. |
| Processing | Privacy (3 pts) | Assesses data ownership, storage, and usage practices. |
| Evaluation | Assessment (9 pts) | Gauges the effectiveness and accuracy of the outputs. |
| Evaluation | Accountability (9 pts) | Ensures clear human responsibility at every stage of the process. |
The total TRUST possible score is up to 26 points.
This score can help guide to what extend you need to consider a different LLM, consider different tasks to be outsourced, or to what extend you can learn more about the LLM system itself.
Here are some recommended actions based on the TRUST score that you obtain after your self-assessment:
| Level | Score | Action |
|---|---|---|
| High TRUST | > 18 pts | Implement the system while periodically reassessing if any dimensions have changed. |
| Moderate TRUST | Between 13-18 pts | Identify and address specific weaknesses in the lowest-scoring stages Reassess before implementing the system. |
| Low TRUST | Between 5-12 pts | Review all stages of the system (training, processing, implementation) for compliance with current policies and regulations. |
| Minimal TRUST | < 5pts | Consider a system redesign and/or explore a different system. Consult with supervisors and search for alternatives. |