HuatuoGPT (Huatuo GPT) is an open source Chinese medical large model, based on doctor’s reply and ChatGPT reply, let the language model become a doctor, and provide rich and accurate consultation.

HuatuoGPT is committed to making the language model have the ability to diagnose and provide useful information like a doctor by fusing the “distilled data” generated by ChatGPT and the data replied by real-world doctors, while maintaining smooth interaction and rich content for users , the dialogue is more silky.

HuatuoGPT uses four different datasets, as follows:

  • Distilled Instructions from ChatGPT: This dataset is inspired by the Alpaca model’s method of creating instruction sets, and distills medical-related instructions from ChatGPT. Different from previous work, this method also incorporates department and role information, and generates qualified instruction datasets based on the sampled departments or roles.
  • Real-world Instructions from Doctors: This dataset is derived from question-and-answer sessions between real doctors and patients. Physician responses are often concise and colloquial, so this method polishes them for readability.
  • Distilled Conversations from ChatGPT: This dataset allows two ChatGPT models to imitate a conversation between a doctor and a patient by providing a shared conversational background.
  • Real-world Conversations with Doctors dataset (Real-world Conversations with Doctors): This dataset is derived from conversations with real doctors, but the doctor’s responses are polished using models.

Together, these datasets provide the model with a unified model of language, physician diagnostic capabilities, and command-following capabilities.

