ChatGPT creator OpenAI says it has created GPT-4, a more advanced model of the technology underpinning the popular chatbot that the company says exhibits human-level performance on various professional and academic benchmarks.
According to OpenAI, GPT-4 is the latest milestone in the company’s efforts in scaling up deep learning. The company calls GPT-4 a large multimodal model that accepts image and text inputs and emits text outputs.
GPT-4 can pass a simulated bar exam with a score around the top 10% of test takers, while the previous model GPT-3.5 typically scored around the bottom 10%, according to the company.
OpenAI says its researchers and developers have spent six months iteratively aligning GPT-4 using lessons from its adversarial testing program and ChatGPT, resulting in what the company calls its “best-ever results … on faculty, steerability and refusing to go outside of guardrails.”
The company is releasing GPT-4’s text input capability via ChatGPT and the API, but the image input capability will take some more time as OpenAI collaborates with a partner.
What GPT-4 can do
OpenAI admits that the distinction between GPT-4 and GPT-3.5 is subtle when casually conversing with the model, but the difference comes out as the complexity of the task increases, resulting in a more reliable, creative and nuanced AI assistant.
According to the company, GPT-4 also scored in the 90th percentile for the Uniform Bar Exam, compared to ChatGPT’s 10th percentile placement. In addition, GPT-4 scored in the 99th percentile in the Biology Olympiad, compared to the 31st percentile for ChatGPT.
In addition, GPT-4 is 82% less likely to respond to requests for disallowed content, and is 40% more likely to produce factual responses than GPT-3.5, the company says, based on internal evaluations.
GPT-4 can also accept a prompt of text and images, allowing users to specify any vision or language task. According to OpenAI, GPT-4 generates text outputs like natural language or code given inputs consisting of interspersed text and images. It can also be augmented with test-time techniques developed for text-only language models, such as few-shot and chain-of-thought prompting.
OpenAI gives an example of this capability in which GPT-4 is shown an image of someone plugging a Lightning Cable shaped like a VGA connector into an iPhone, with a text prompt asking GPT-4 to identify the humor.
The image shows a package for a “Lightning Cable” adapter with three panels.
Panel 1: A smartphone with a VGA connector (a large, blue, 15-pin connector typically used for computer monitors) plugged into its charging port.
Panel 2: The package for the “Lightning Cable” adapter with a picture of a VGA connector on it.
Panel 3: A close-up of the VGA connector with a small Lightning connector (used for charging iPhones and other Apple devices) at the end.
The humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port.
OpenAI says other improvements include more control over tone and style and increased protections to make the chatbot safer. The company says it engaged over 50 experts in AI risks, cybersecurity, biorisk, trust, and safety and international security to test the model, and feedback from those experts was fed into mitigations and improvements for GPT-4.
This activity has helped to improve GPT-4’s ability to refuse dangerous requests, such as how to synthesize dangerous chemicals or create a bomb.
The result is a decrease in the model’s tendency to respond to requests for disallowed content by 82% compared to GPT-3.5. Further, GPT-4 also responds to sensitive requests in accordance with policies 29% more often.
For example, early versions of the model would answer a prompt about how to create a bomb, while the new model refuses such requests. In addition, the old model would refuse to answer a prompt about where to find cheap cigarettes, while the new model cautions the user about the harm cigarettes can cause before answering the prompt.
However, the company still warns users that language models still have their limitations.
“Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case,” the company says.
How to access GPT-4
According to OpenAI, ChatGPT Plus subscribers will get GPT-4 access on chat.open.ai.com with a usage cap, and the company will adjust the cap depending on demand and system performance. A new subscription plan for higher-volume GPT-users may be released.
However, users of the new Microsoft Bing chat function already have access to GPT-4, Microsoft says.
In a blog, Microsoft confirms that the new Bing and the generative AI chat feature is already running on GPT-4, which has been customized for search.
“As OpenAI makes updates to GPT-4 and beyond, Bing benefits from those improvements. Along with our own updates based on community feedback, you can be assured that you have the most comprehensive copilot features available,” writes Yusuf Mehdi, Microsoft’s corporate vice president and consumer chief marketing officer.