Recently, ChatGPT was born and achieved great success, making obscure abbreviations such as RLHF, SFT, IFT, CoT, etc. begin to appear in the discussion of the general public. What do these obscure acronyms actually mean? Why are they so important? We survey all relevant important papers to classify these works, summarize the work to date, and look forward to the follow-up work.

Let’s first look at the panorama of language model-based conversational agents. ChatGPT is not the first. In fact, many organizations released their own language model dialog agents (dialog agents) before OpenAI, including Meta’s BlenderBot, Google’s LaMDA, DeepMind’s Sparrow, and Anthropic’s Assistant (Anthropic’s Claude is partly based on Assistant continue to develop).

Some of these teams have also announced their plans to build open source chatbots and publicly shared their roadmaps (such as the LAION team’s Open Assistant), and others are sure to have something similar, but have yet to announce it.

The table below compares these AI chatbots based on details of whether they are publicly accessible, training data, model architecture, and evaluation direction. ChatGPT has no record of this information, so we instead used details from InstructGPT, an instruction-fine-tuned model from OpenAI that is believed to be the basis for ChatGPT.

LaMDABlenderBot 3SparrowChatGPT / InstructGPTAssistant
Is it publicly accessiblenocannolimitedno
base model
Pre-training corpus size (number of words)2.81T180B1.4Tunknown400B
Whether the model can
access the network
data size
Quality: 6.4K
Security: 8K
Authenticity: 4K
IR: 49K
20 NLP datasets ranging in size from 18K to 1.2Munknown12.7K (this is InstructGPT, ChatGPT may be more)150K+ LM generated data
artificial safety rules✖️✖️
evaluation standard1. Quality (Plausibility, Specificity, Interest)
2. Safety (Prejudice) 3. Authenticity
1. Quality (engagement, knowledge utilization)
2. Safety (toxicity, bias)
1. Straightening (helpful, harmless, correct)
2. Evidence (from the Internet)
3. Whether it violates the rules
4. Prejudice and stereotypes
5. Integrity
1. Straightening (helpful, harmless, truthful)
2. Prejudice
1. Straightening (helpful, harmless, honest)
2. Prejudice
Crowdsourcing platform for data annotationUS supplierAmazon MTurkunknownUpwork and Scale AISurge AI, Amazon MTurk, and Upwork

We observe that despite many differences in training data, models, and fine-tuning, there are some commonalities. A common goal of all chatbots mentioned above is “instruction following”, that is, following instructions specified by the user. For example, ask ChatGPT to write a poem about fine-tuning.

ChatGPT command example

From predicting text to following instructions

Often, the language modeling goals of the underlying model are not sufficient for the model to learn to follow the user’s instructions in a useful way. Model creators use the “Instruction Fine-Tuning (IFT)” method to achieve this goal. In addition to using classic NLP tasks such as sentiment analysis, text classification, summarization, etc. to fine-tune the model, the method is also very diverse. Demonstrate various written instructions and their output to the basic model, so as to realize the fine-tuning of the basic model. These command demonstrations consist of three main parts – command, input and output. Input is optional, and some tasks only require instructions, such as the example above using ChatGPT for open text generation. When an input is present, the input and output form an “instance”. A given directive can have multiple input and output instances. The following example (taken from Wang et al., ’22):

Instructions and Instance Examples

The training data for IFT is usually a collection of human-written instructions and instances generated with a language model bootstrap. When bootstrapping, the LM is first fed with some samples using the few-shot technique to hint it (as shown in the figure above), and then the LM is asked to generate new instructions, inputs, and outputs. Each round selects some of the human-written samples and some of the model-generated samples for the model. The contributions of humans and models to creating the dataset form a spectrum, see the figure below:

IFT spectrum

At one end of the spectrum are purely model-generated IFT datasets such as Unnatural Instructions (Honovich et al., ’22); at the other end are instructions crafted through extensive community effort such as Super-natural instructions (Wang et al., ’22). Work in between uses a small set of high-quality seed datasets, followed by bootstrapping to generate a final dataset, such as Self-Instruct (Wang et al., ’22). Another approach to curate datasets for IFT is to convert existing high-quality crowdsourced NLP datasets for various tasks (including prompts) into instructions using a unified schema or different templates. This line of work includes T0 (Sanh et al., ’22), the Natural instructions dataset (Mishra et al., ’22), FLAN LM (Wei et al., ’22) and OPT-IML (Iyer et al., ’22).

follow instructions safely

However, instruction-tuned LMs do not always generate helpful and safe response. Examples of this behavior include evasion by always giving unhelpful responses, such as “Sorry, I don’t understand.” or generating unsafe responses to user input on sensitive topics.To mitigate this behavior, model developers use Supervised Fine-tuning (SFT), to fine-tune underlying language models on high-quality human-annotated data to improve usefulness and harmlessness. For example, see the table below (taken from Appendix F of the Sparrow paper).

SFT and IFT are very closely linked. Instruction fine-tuning can be seen as a subset of supervised fine-tuning. In the recent literature, the SFT stage is often used to improve the security of the response, rather than following the IFT to improve the specificity of the command response. In the future, this classification and division should mature day by day to form a clearer usage scenario and methodology.

Manual Safety Rules

Google’s LaMDA is also fine-tuned on dialog datasets with secure annotations according to a set of rules (Appendix A of the paper). These rules are usually pre-defined and developed by the model creators and cover a wide range of topics including harm, discrimination, misinformation.

Fine-tuning the model

Meanwhile, OpenAI’s InstructGPT, DeepMind’s Sparrow, and Anthropic’s Constitutional AI use Human Feedback Reinforcement Learning (Reinforcement Learning From Human Feedback, RLHF) to fine-tune the model using labeled data based on human preferences. In RLHF, the model’s responses are ranked and annotated based on human feedback (eg, selecting text profiles based on human preferences). These labeled responses are then used to train a preference model, which is used to return a scalar reward to the RL optimizer. Finally, a dialogue agent is trained via reinforcement learning to mimic the preference model. For more details, please refer to our previous article on RLHF: The “hero” behind ChatGPT – RLHF technical details.

Chain-of-thought (CoT) Prompts (Wei et al., ’22) are a special case of instruction demonstrations that generate outputs by eliciting step-by-step reasoning by a dialog agent. Models fine-tuned using CoT use human-annotated instruction datasets with step-by-step inference.this is Let’s think step by step The origin of this famous tip. The example below is taken from Chung et al., ’22, instructions are highlighted in orange, inputs and outputs in pink, and CoT inferences in blue.

CoT Diagram

As described in Chung et al., ’22, models fine-tuned using CoT perform better on tasks involving common sense, arithmetic, and symbolic reasoning.

As shown in the work of Bai et al., ’22, CoT fine-tuning has also been shown to be very effective for innocuousness (sometimes better than RLHF), and for sensitive cues, the model does not avoid and generate “Sorry, I can’t answer this question” answer. See Appendix D of his paper for more examples.

Comparison of CoT and RLHF

main point

  1. Compared with the pre-training data, you only need a very small part of the data for instruction fine-tuning (hundreds of orders of magnitude);
  2. Supervised fine-tuning using human annotations to make model outputs safer and more useful;
  3. CoT fine-tuning improves the performance of models on tasks that require incremental thinking and makes them less avoidant on sensitive topics.

Further Work on Dialogue Agents

This blog summarizes a lot of existing work on making dialog agents useful. But there are still many open questions to be explored. We list some of them here.

  1. How important is RL in learning from human feedback? Can we gain RLHF performance by training with higher quality data in IFT or SFT?
  2. How does SFT+RLHF in Sparrow compare to using only SFT in LaMDA from a safety point of view?
  3. Given that we have IFT, SFT, CoT and RLHF, how much is pre-training necessary? How to compromise? What is the best base model (public and non-public) that one should use?
  4. Many of the models referenced in this article have been carefully designed through red-teaming, where engineers specifically search for failure modes and improve subsequent training (hints and methods) based on the problems that have been revealed. How do we systematically document the effects of these methods and reproduce them?

Ps If you find any information missing or incorrect in this blog, please let us know.


Rajani et al.,"What Makes a Dialog Agent Useful?", Hugging Face Blog, 2023.

BibTeX citations:

@article {rajani2023ift,
  author = {Rajani, Nazneen and Lambert, Nathan and Sanh, Victor and Wolf, Thomas},
  title = {What Makes a Dialog Agent Useful?},
  journal = {Hugging Face Blog},
  year = {2023},
  note = {},

Original English:

Translator: Matrix Yao (Yao Weifeng), an Intel deep learning engineer, works on the application of transformer-family models on various modal data and the training and reasoning of large-scale models.

#Interpretation #technical #points #ChatGPT #RLHF #IFT #CoT #red #blue #confrontation #Hugging #Face #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *