We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while worse than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
Full paper available here: https://cdn.openai.com/papers/gpt-4.pdf
Three months later, I still find that:
a) Bing Chat has a lot of issues that the ChatGPTs (both 3.5 or 4) don't seem to suffer from nearly as much. For example, it often refuses to answer prompts that are pretty clearly harmless.
b) Bing Chat has a harder time than I expected when answering questions that you can answer by copy-and-pasting the question into Google and then copy-and-pasting the right numbers, sentence or paragraph from the first search result. (Meanwhile, I find that Bing Chat's search still works better than the search plugins for ChatGPT 4, which seem to still have lots of mundane technical issues.) Occasionally ChatGPT (even ChatGPT 3.5) gives better (more factual or relevant) answers "from memory" than Bing Chat gives by searching.
However, when I pose very reasoning-oriented tasks to Bing Chat (i.e., tasks that mostly aren't about searching on Google) (and Bing Chat doesn't for some reason refuse to answer and doesn't get distracted by unrelated search results it gets), it seems clear that Bing Chat is more capable than ChatGPT 3.5, while Bing Chat and ChatGPT 4 seem similar in their capabilities. I pose lots of tasks that (in contrast to variants of Monty Hall (which people seem to be very interested in), etc.) I'm pretty sure aren't in the training data, so I'm very confident that this improvement isn't primarily about memorization. So I totally buy that people who asked Bing Chat the right questions were justified in being very confident that Bing Chat is based on a newer model than ChatGPT 3.5.
Also:
>I've tried (with little success) to use Bing Chat instead of Google Search.
I do now use Bing Chat instead of Google Search for some things, but I still think Bing Chat is not really a game changer for search itself. My sense is that Bing Chat doesn't/can't comb through pages and pages of different documents to find relevant info and that it also doesn't do one search to identify relevant search times for a second search, etc. (Bing Chat seems to be restricted to a few (three?) searches per query.) For the most part it seems to enter obvious search terms into Bing Search and then give information based on the first few results (even if those don't really answer the question or are low quality). The much more important feature from a productivity perspective is the processing of the information it finds, such as the processing of the information on some given webpage into a bibtex entry or applying some method from Stack Exchange to the particularities of one's code.