I hope this is the correct category, since it’s a more broad AI topic.
I recently tested out ChatGPTs “deep research” feature with, what I assumed would be, a challenging but not insurmountable task.
I had 2 websites that hosted episode lists of a cartoon show.
One with the German episode numbers, titles and plot synopses, the other website with the English episode numbers, titles and plot synopses.
Cartoons, especially ones without an overarching story, where episodes are almost always standalone, were often released in a different order in German. Season 1 for the German release might contain a smattering of episodes from seasons 1, 2, 3 and 4 of the original English release, etc.
I asked ChatGPT 4o, with deep research enabled, to create a mapping table between the German and English releases, with the columns “German episode number”, “German title”, “corresponding English episode number”, “corresponding English title”.
I also included the information, that the title translations are not word for word, literal translations of the English titles, and that it should try to match them based on the plot synopses, rather than just the title,
After 10+ Minutes of “deep research” I got a mapping table that literally just mapped German episode 1 with English episode 1, German episode 2 with English episode 2, etc.
I tried again, telling it to only do it for German Season 2, since I had already mapped Season 1 manually, and this time I got nonsense hallucination episode numbers and titles. Titles that sounded plausible, but didn’t actually exist.
Out of curiosity I asked my 10 year old nephew, whose English isn’t that great, to try and do it, and he did 3 episodes successfully before getting too bored. So I’m assuming that most 14-15 year old interns could do this task quite successfully.
Maybe I’m just getting jaded, or old and cranky, but Open AI is advertising this “deep research” as an agentic capability that can handle complex tasks. Yet it completely fails on something that literal kids can do.
I’ve also tried using LLMs in a professional context before. But I keep getting hallucinations when I’m asking for PowerShell or Azure CLI code snippets, so I have to double check everything anyway, which means I’m usually faster by just going to the first party documentation and writing it myself.
The same goes for more complex sysadmin tasks, I don’t need it for basic stuff, and for more complex issues it’s completely unreliable and not as helpful as first party documentation.
And when it comes to writing emails or general communication, I can always tell when a colleague or a customer is using one of these ai tools. It always sticks out, never sounds natural. And for technical documentation I, again, can’t trust it.
I’ve also seen AI generated images show up in billboard advertisements where I live, and again, you can very much tell that they are AI generated. It feels incredibly cheap, and definitely won’t convince me to buy anything from these companies.
Am I not using the tools correctly, am I missing something? I thought that all of the scenarios I was describing here should be doable and within scope for these tools, based on the claims about and general hype around LLMs, but I’m struggling to get any actual use out of them.