For example, it didn’t get “In a room I have only 3 sisters. Anna is reading a book. Alice is playing a match of chess. What the third sister, Amanda, is, doing ?”
I didn't get this one either. When visualizing the problem, Alice was playing chess online, since in my experience this is how almost all games of chess are played. I tried to look for some sort of wordplay for the alliterative sister names or the strange grammar errors at the end of the question, but didn't get anywhere.
Yes, the question does not consider the possibility that Alice was playing chess against a non-human opponent or against an opponent that is not in the room.
[Editor’s Note: I forgot to cross-post this on Thursday, sorry about that. Note that this post does not cover Gemini 1.5, which was announced after I posted this. I will cover 1.5 later this week.]
We have now had a little over a week with Gemini Advanced, based on Gemini Ultra. A few reviews are in. Not that many, though, compared to what I would have expected, or what I feel the situation calls for. This is yet another case of there being an obvious thing lots of people should do, and almost no one doing it. Should we use Gemini Advanced versus ChatGPT? Which tasks are better for one versus the other?
I have compiled what takes I did see. Overall people are clearly less high on Gemini Advanced than I am, seeing it as still slightly to modestly behind ChatGPT overall. Despite that, I have not been tempted to switch back. I continue to think that Gemini Advanced is the better default for my typical queries. But your use cases and mileage may vary. I highly recommend trying your prompts in both places to see what works for you.
Impressions of Others
My impressions of Gemini Ultra last week were qualified but positive. I reported that I have switched to using Gemini Advanced as my default AI for the majority of queries. It has definite weaknesses, but for the things I actively want to do frequently, it offers a better experience. For other types of queries, I also use GPT-4, Claude, Perplexity and occasionally Phind.
I am not, however, as much a power user of LLMs as many might think. I mostly ask relatively simple queries, I mostly do not bother to use much prompt engineering beyond having customized ChatGPT, and I mostly don’t write code.
What I definitely did not do was look to make Gemini Advanced fail. I figured correctly that plenty of others would work on that. I was looking for how it could be useful, not how it could be shown to not be useful. And I was focused on mundane utility, nothing theoretical.
So I totally buy that both:
Here is a 20 minute video of AI Explained testing Gemini Ultra. He took the second approach, and found it wanting in many ways, but says wisely to test it out on your own workflow, and not to rely on benchmarks.
There is no incompatibility here.
The logical errors he finds are still rather embarrassing. For example, at (3:50), I own three cars today, sold two cars last year, it says I own one car. Ouch.
Perhaps it turns out the heuristics that cause you to fail at questions like that are actually really good if you are being set up to succeed, and who cares what happens if the user actively sets you up to fail?
As in, the logical failure here is that Gemini has learned that people mention information because it matters, and in practice this shortcut is less stupid than it looks. It still looks pretty dumb.
Whereas when he sets it up to succeed at (5:25) by offering step-by-step instructions on what to do, Gemini Advanced gets it right reliably, ChatGPT is only 50/50.
In practice, I would much rather reliably get the right answer when I am trying to get the right answer and am putting in the work, rather than getting the right answer when I am trying to trick the bot into getting it wrong.
Everyone has a different test. Here are some reactions:
I like Rcrsv’s test, and encourage others to try it as well, as it gives insight into what you care about most. Note that your chats are customized to GPT-4 a bit, so this test will be biased a bit against Gemini, but it is mostly fair.
Lord Per Maximium on Reddit did a comparison with several days of testing, I continue to be surprised that more people are not doing this. I love the detail here, and what is fascinating is that Advanced is still seen as below the old version on some capabilities, a sign that they perhaps launched earlier than they would have preferred, also a clear sign that things should improve over time in those areas:
I’ve definitely seen a bunch of rather stupid-looking fails on logic.
I would take some trade-offs of less accuracy in exchange for better explanations. I want to learn to understand what I am doing, and know how to evaluate whether it is right. So it is a matter of relative magnitudes of difference.
This is Google, so I am really frustrated on this question. Gemini both is unable to properly search outside of a few specific areas, and also has the nasty habit of telling you how to do a thing rather than going ahead and doing it. If Gemini would actually be willing to execute on its own plans, that would be huge.
GPT-4-Turbo seems to be increasingly no-fun in such realms rather than improving, so it makes sense that Gemini wins there.
This stuff matters. Not having a cap at the back of your mind is relaxing even if you almost never actually hit the cap, and speed kills.
The refusals are pretty bad.
A commenter notes, and I have noticed this too, that while Gemini refuses more overall, Gemini is more willing to discuss AI-related topics in particular than GPT-4.
You can do a side-by-side comparison here, among other places.
I did two polls on which one is better, one in general and one for code. I usually include a ‘see results’ but you can see why in this case, I didn’t for code.
The consensus view is that GPT-4 is modestly better than Gemini for both purposes, a lot more people prefer GPT-4 than prefer Gemini, but with a majority seeing at least rough parity for Gemini. For now, it seems my preference is quirky. I am curious to take more polls after a month or two.
What is clear is that these two models are roughly on the same level. You can reasonably say GPT-4 is better than Gemini, but not in the sense that GPT-4 is better than GPT-3.5.
Pros and Cons
Some clear advantages of Gemini Advanced over ChatGPT:
Some clear disadvantages of Gemini Advanced versus ChatGPT right now:
Some things that could go either way or speak to the future: