Intern GPT(referred to as iGPT) / InternChat(referred to as iChat) is a pointing language-driven visual interaction system that allows you to use a pointing device to interact with ChatGPT by clicking, dragging and drawing.

The name InternGPT stands for interaction (interaction),noonverbal (nonverbal) and ChatGPT. Unlike existing interactive systems that rely on pure language, by integrating pointing instructions, iGPT significantly improves the communication efficiency between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complex visual scenes middle.

Furthermore, in iGPT, an auxiliary control mechanism is adopted to improve the control ability of LLM and to improve the control ability of a large visual-language model. Husky Fine-tuning for high-quality multimodal conversations (reached in the ChatGPT-3.5-turbo evaluation 93.89% GPT-4 quality).


The main function uses:

After the image is successfully uploaded, you can send the following message to have a multimodal conversation with iGPT:"what is it in the image?" or "what is the background color of image?".

You can also manipulate, edit or generate pictures interactively, as follows:

  • Click anywhere on the image, then press Pick button,preview split area.You can also press OCR button to identify all words present at a specific location;
  • to be in the image remove masked areayou can send a message like this:“remove the masked region”;
  • to be in the image Replace the object in the mask area with another objectyou can send a message like this:“replace the masked region with {your prompt}”;
  • think generate new imageyou can send a message like this:“generate a new image based on its segmentation describing {your prompt}”;
  • want to pass Doodle to create new imageyou should press Whiteboard button and draw on the whiteboard.After drawing, you need to press 保存 button and send a message like this:“generate a new image based on this scribble describing {your prompt}”.

System overview:

The main function

A) Remove covered objects

B) Interactive image editing

C) Image generation

D) Interactive Visual Question Answering

E) Interactive image generation

F) Video highlight commentary


basic requirements

  • Linux
  • Python 3.8+
  • PyTorch 1.12+
  • CUDA 11.6+
  • GCC & G++ 5.4+
  • GPU Memory > 17G for loading basic tools (HuskyVQA, SegmentAnything, ImageOCRRecognition)

Install Python dependencies

pip install -r requirements.txt

