Зарегистрируйтесь сейчас для лучшей персонализированной цитаты!

Новости по теме

Gemini Advanced failed these simple coding tests that ChatGPT aced. Here's what it got wrong

Feb, 27, 2024 Hi-network.com
Google Gemini logo on laptop screen
Maria Diaz/

To the great sadness of Shakespeare punsters everywhere, Google has renamed Bard to Gemini. Google has also come out with a more capable, more advanced, more expensive version of Gemini called Gemini Advanced. Gemini and Gemini Advanced are roughly analogous to ChatGPT's base model and the ChatGPT Plus service offered for an additional fee.

Also: ChatGPT vs ChatGPT Plus: Is it worth the subscription fee?

In fact, both Google and OpenAI charge$20/month for access to their smarter, more super-powered offerings.

As part of my testing process over the past year, I've subjected generative AIs to a variety of coding challenges. ChatGPT has repeatedly done quite well, while Google's Bard failed pretty hard on two separate occasions.

I ran the same set of tests against Meta's Code Llama AI, which Meta claims is quite super awesome for coding (and yet, it's not).

To be clear, these are not particularly hard tests. One is a request to write a simple WordPress plugin. One is to rewrite a string function. And one is to help find a bug I originally had difficulty finding.

Last week, after using these same tests on Code Llama, a reader reached out to me and asked me why I keep using the same tests. He reasoned that the AIs might succeed if they were given different challenges.

This is a fair question, but my answer is also fair. These are super-simple tests. I'm using PHP, which is not exactly a challenging language. And I'm running some scripting queries through the AIs. By using exactly the same tests, we're able to compare performance directly.

Also: I confused Google's most advanced AI - but don't laugh because programming is hard

But it's also like teaching someone to drive. If they can't get out of the driveway, you're not going to set them loose in a fast car on a crowded highway.

ChatGPT did pretty well with just about everything I threw at it, so I threw more at it. I eventually ran tests with ChatGPT in 22 separate programming languages, 12 modern and 10 obscure. Except for some confused headers in the screenshot interface, ChatGPT aced all the tests.

But since Bard, at least back in May, couldn't get out of the driveway safely, I wasn't about to subject it to more tests until it could handle the basics.

Also: I tested Meta's Code Llama with 3 AI coding challenges that ChatGPT aced - and it wasn't good

But now we're back. Bard is Gemini and I have Gemini Advanced. Let's see what all that Google computing power can do for a few simple tests.

Test 1: Write a simple WordPress plugin

This was my very first test with ChatGPT, and Bard has failed it twice. The challenge was to write a simple WordPress plugin that provides a simple user interface. It's supposed to sort and dedup a series of submitted lines.

Here's the prompt:

Write a PHP 8 compatible WordPress plugin that provides a text entry field where a list of lines can be pasted into it and a button, that when pressed, randomizes the lines in the list and presents the results in a second text entry field with no blank lines and makes sure no two identical entries are next to each other (unless there's no other option)...with the number of lines submitted and the number of lines in the result identical to each other. Under the first field, display text stating "Line to randomize: " with the number of nonempty lines in the source field. Under the second field, display text stating "Lines that have been randomized: " with the number of non-empty lines in the destination field.

One thing to keep in mind is that I purposely didn't specify whether this tool is available on the front end (to site visitors) or on the back end (to site admins). ChatGPT wrote it as a back-end feature, but Gemini Advanced wrote it as a front-end feature.

Also: ChatGPT vs. Microsoft Copilot vs. Gemini: Which is the best AI chatbot?

Gemini Advanced also chose to write both PHP code and JavaScript. To initiate the plugin, a shortcode needs to be placed in the body text of a sample page, like this:

shortcode
Screenshot by David Gewirtz/

Once I saved the page, I viewed it as a site visitor would. This is what Gemini Advanced presented.

frontend

Gemini Advanced's first try

Screenshot by David Gewirtz/

It's certainly a far cry from how ChatGPT presented the same feature, but ChatGPT wrote it for the back end. 

chatgpt-version

ChatGPT's first try

Screenshot by David Gewirtz/

One other note: Once I pasted in names and clicked Randomize using the Gemini-generated front-end version of the code, nothing happened.

I decided I was going to give Gemini Advanced a second chance. I changed the first line to:

Write a PHP 8 compatible WordPress plugin that provides the following for a dashboard interface

This was a failure, in that Gemini Advanced again insisted on giving me a shortcode. It even suggested I paste the shortcode in "a suitable dashboard area." This isn't how the WordPress dashboard works.

Also: How AI-assisted code development can make your IT job more complicated

To be fair, there was still a bit of wiggle room in how the AI might interpret my instructions. So I clarified one more time, changing the beginning of the prompt to:

Write a PHP 8 compatible WordPress plugin that provides a new admin menu and an admin interface with the following features:

This time, Gemini Advanced created a workable interface. Unfortunately, it still didn't function. When pasting a set of names into the top field and hitting the Randomize button, nothing happened. 

randomize

Gemini Advanced's third attempt. In my test, I included names, but left them out of this screenshot because they were real names from that day's email. After hitting Randomize, nothing showed up in the bottom field.

Screenshot by David Gewirtz/

Conclusion:Compared to ChatGPT's first attempt, this is still a failure. It's actually worse than the results of my original Bard test, but not quite as bad as my second Bard test.

Test 2: Rewrite a string function

In the following code, I asked ChatGPT to rewrite some string processing code that processed dollars and cents. My initial test code only allowed integers (so, dollars only) but the goal was to allow dollars and cents. This is a test that ChatGPT got right. Bard initially failed, but eventually succeeded.

Also: How to use ChatGPT to write code

Here's the prompt:

regex-q
Screenshot by David Gewirtz/

And here's the produced code:

code
Screenshot by David Gewirtz/

This one is a failure as well, but it's both subtle and dangerous. The generated Gemini Advanced code doesn't allow for non-decimal inputs. In other words, 1.00 is allowed, but 1 is not. Neither is 20. Worse, it decided to limit the numbers to two digitsbeforethe decimal point instead of after, showing it doesn't understand the concept of dollars and cents. It fails if you input 100.50, but allows 99.50.

Also: I asked ChatGPT to write a WordPress plugin I needed. It did it in less than 5 minutes

Conclusion: Ouch. This is a really easy problem, the sort of thing you give to first-year programming students. And it's a failure. Worse, it's the sort of failure that might not be easy for a human programmer to find, so if you trusted Gemini Advanced to give you this code and assumed it worked, you might have a raft of bug reports later.

Test 3: Find a bug

Late last year, I was struggling with a bug. My code should have worked, but it didn't. The issue was far from immediately obvious, but when I asked ChatGPT, it pointed out that I was looking in the wrong place.

I was looking at the number of parameters being passed, which seemed like the right answer to the error I was getting. But I instead needed to change the code in something called a hook.

Also: Generative AI now requires developers to stretch cross-functionally. Here's why

Both Bard and Meta went down the same erroneous and futile path I had back then, missing the details of how the system really worked. As I said, ChatGPT got it. So, now it's time to see if -- when supplied with exactly the same information -- Gemini Advanced can redeem itself.

prompt
Screenshot by David Gewirtz/

Gemini Advanced did look at the code. And it did identify that there is a parameter issue. But its recommendation is to look "likely somewhere else in the plugin or WordPress" to find the error.

cleanshot-2024-02-24-at-19-39-532x

Gemini Advanced's answer.

Screenshot by David Gewirtz/

By contrast, this is ChatGPT's answer.

error-with-apply-filters-in-wordpress-2023-04-01-04-02-10

ChatGPT's answer. Click the square in the corner to enlarge if you want to read the whole thing.

Screenshot by David Gewirtz/

Look at the detail provided in the second paragraph. ChatGPT correctly identified exactly where the error is being made and how to correct it. That's a lot more helpful than recommending I look somewhere else in the plugin.

Also: What is Google One and is it worth it?

Conclusion:Gemini Advanced just wasn't all that helpful. Nothing it told me was anything I didn't know. And nothing it told me helped to solve the problem.

Well, that's a bummer

I have been regularly using ChatGPT to help speed up my coding. In many ways, it's been amazing. For one project, I am convinced it enabled me to build something in a weekend that might otherwise have taken me a month or more.

But Gemini Advanced? There's no way I'd even open up its interface. Not only does it fail, but some of its failures are subtle enough that they might initially not be noticed, causing all sorts of problems once the code is released.

Also: How to subscribe to ChatGPT Plus (and why you should)

This is why you need to be very careful when using any AI as a coding helper. But with Gemini Advanced, my recommendation is to simply avoid it. I see nothing it does that you, on your own, can't do better. And it certainly doesn't hold a candle to ChatGPT's stellar performance.

And they charge$20/month for this?

Have you tried coding with Gemini, Gemini Advanced, Bard, or ChatGPT? What has your experience been? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter on Substack, and follow me on Twitter at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

tag-icon Горячие метки: 3. Инновации

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.
Our company's operations and information are independent of the manufacturers' positions, nor a part of any listed trademarks company.