医疗/健康-垂直领域与语言模型 ChatDoctor(上)
千“垂”百炼:垂直领域与语言模型
这一系列文章仍然坚持走“通俗理解”的风格,用尽量简短、简单、通俗的话来描述清楚每一件事情。本系列主要关注语言模型在垂直领域尝试的相关工作。
This series of articles still sticks to the "general understanding" style, describing everything in as short, simple and easy-to-understand terms as possible. This series focuses on the work of language models in specific domains.
“[Download] PDF版 PPT/Slides:https://github.com/createmomo/Open-Source-Language-Model-Pocket
”
目录 (Table of Contents):
1 引言 Introduction
1.1 语言模型的能力 Power of Language Models 1.2 落地垂直领域的灵魂发问 Questioning: Are You Sure Specific Domains?
2 归根到底是可用的垂直领域数据 Essential: Domain-specific Training Data
【医疗 Medical/健康 Health】
2.1 ChatDoctor (上, Part 1)(←) 2.1 ChatDoctor (中) 2.1 ChatDoctor (下) 2.2 MedicalGPT-zh 2.3 SoulChat 2.4 DoctorGLM 2.5 BenTsao 2.6 QiZhenGPT 2.7 HuaTuoGPT 2.8 BianQue 2.9 MedicalGPT 更多 More(待定 to be confirmed)
2 归根到底是可用的垂直领域数据
Essential: Domain-specific Training Data
把语言模型应用到垂直领域,至少应该考虑清楚两件事情:
我们希望语言模型可以完成具体什么任务、应用在具体什么应用场景 如何获得并利用可以满足上述任务/场景的训练数据
Applying a language model to a particular domain should consider at least two things clearly:
What specific tasks** we want the language model to perform and what specific application scenarios** it should be applied to How to obtain and use training data that can satisfy the above tasks/scenarios
按照这个思路,我们会串讲一些现有的工作是如何解决这两件事情的。
Following this idea, we will talk about how some of the existing work addresses these two issues.
2.1 ChatDoctor (上, Part 1)
ChatDoctor的作者发现,ChatGPT在面对医疗健康方面的提问时,给出的答复有时并不准确,和距离真正的AI医生还有差距。The authors of ChatDoctor found that the answers given by ChatGPT when faced with questions on healthcare were sometimes inaccurate and fell a long way from being a proper AI doctor.
(除此之外,我还发现ChatGPT有一定几率会直接拒绝回答医疗健康类问题 In addition to this, I also found that there was a certain chance that ChatGPT would simply refuse to answer medical health questions)
ChatDoctor的定位/scenario:让语言模型变成具备一定资历的AI医生,能够完成患者-医生对话 turning language models into AI doctors with certain qualifications, capable of completing patient-doctor conversations
患者 Patients→提出需求 giving their needs; ChatDoctor→提供质量不错的建议、诊断、用药建议等 provides decent quality advice, diagnosis, medication advice, etc.
训练模型 (Model Training)
1)Original LLaMA → Fine-tuned LLaMA
LLaMA模型本质是一个语言模型,对话聊天、遵循指令的技能并不突出。The LLaMA model is essentially a language model, and the skills of conversational chatting and following instructions are not outstanding.
为了强化模型的这些技能,ChatDoctor并没有着急一上来就用医患对话的数据去微调,而是先利用通用领域的instruction-following数据来微调,保证LLaMA获得更好的对话聊天、遵循指令的能力。To strengthen these skills in the model, ChatDoctor did not hurry to fine-tune it right away with data from patient-doctor conversations, but first fine-tuned it using instruction-following data from the generic domain to ensure that LLaMA acquired better abilities to chat in conversation and follow instructions.
2)Fine-tuned LLaMA → Final Fine-tuned LLaMA
当完成第1)步后,再利用准备好的垂直领域数据(医患对话)进行微调。When step 1) has been completed, the model is fine-tuned using the prepared special domain data (doctor-patient dialogue).
数据 (Data)
那我们的问题是,这些训练数据是哪里来的、怎么来的呢?The question for us then is, where and how did this training data come from?
1)找到现成可用的数据 Find available data
在线上医疗咨询网站(HealthCareMagic)获得现成的医患对话数据,这些对话是真实的,并不是创造出来的。The data on doctor-patient conversations, which are real and not generated, is readily available on the medical advice website (HealthCareMagic).
2)通过人工和自动的方式进行数据清洗 Cleaning of data by manual and automated processes
移除医生患者的个人身份信息 Removal of personally identifiable information of doctors and patients from the data 使用自动纠正工具修正语法错误 Use the auto-correction tool to fix grammatical errors
最终获得100k的数据可用于模型的微调。The resulting 100k of data can be used to fine-tune the model.
3)测试集 Data for Performance Evaluation
为了证明ChatDoctor在提供医疗建议方面确实比ChatGPT有所提升,这篇工作还准备了一个在训练中没有见过的数据集。To demonstrate that ChatDoctor is indeed an improvement over ChatGPT in providing medical advice, this work also prepares a dataset that has not been seen in training.
核心思想是:把这个数据集中同样的问题同时喂给ChatDoctor和ChatGPT,然后去对比两者答案的好坏。在后面的部分中我们会细聊评测的过程。 The core idea is that the same questions in this dataset are fed to both ChatDoctor and ChatGPT and then to compare which of the two answers is better. We'll talk more about the evaluation process in a later section.
训练设置 (Training Settings)
6 x A100 GPUs 3 hours Batch Size 192 3 epochs Max Sequence Length 512 tokens
推理阶段 Inference
如果单纯只靠微调后的模型通过参数去记住医学知识和对话的方式,应该还不够。It would not be enough to simply rely on a fine-tuned model to remember medical knowledge and dialogue by its parameters.
在推理的过程中,如果模型具备接触外部资源的机会、从外部资源中提炼和用户的问题紧密相关的知识的能力,那么模型的回复效果会更好(内容更准确、可靠)。In the process of inference, the model will respond better (more accurate and reliable content) if it has the ability to access external resources and extract knowledge from external resources that is closely related to the user's problem.
ChatDoctor准备了两种可以接触的外部资源:疾病相关的知识库、维基百科。ChatDoctor has prepared two external resources that can be accessed: disease-related knowledge base and Wikipedia.
疾病相关的知识库的内容格式可以参考下图,大概包含了:疾病名称、症状、可以进一步做的检测与实施的措施、可用的药物等。The format of the disease related knowledge base can be found in the following figure, which probably contains: name of the disease, symptoms, further tests and measures that can be carried out, medication options, etc.
那ChatDoctor是如何与这两种外部知识互动的呢?How then does ChatDoctor interact with these two types of external knowledge?
在这篇工作中,互动的方式比较直接、朴素,并没有用到文本embedding的技术,但是对于我们来说,仍然具有一定的参考价值。In this work, the interaction is relatively straightforward and plain, and does not use the techniques of text embedding, but it is still useful for our reference.
在下一篇文章中,我们会更详细的描述与外部知识资源的互动细节。In the next post we will describe the details of the interaction with external knowledge resources.
(未完待续, To be continued)
进NLP群—>加入NLP交流群
微信扫码关注该文公众号作者