Testing CODEX: The New Artificial Intelligence from OpenAI

If you’ve been following the livestream series I’ve been doing lately, you’ll know I’ve been testing all sorts of AI tools; I’ve tried automation tools like n8n, done vibe coding with Cursor, and I also tested Claude. But none of that compares to what we did recently with Codex, the development tool within OpenAI.

 

 

 

1 - Using AI for work

 

For quite some time now, most of us have been using AI in one way or another. Most of us have relied on Copilot as an assistant in our IDE; maybe someone more interested has used Cursor and really knows how to use it, not just randomly, which helps a lot more when coding.

 

These sorts of tools are great for generating repetitive and mundane code. That’s nice, although they don’t always do exactly what you ask, so you need to be cautious and always review the code. But in general, they produce something acceptable that usually saves you time.

Beyond that, we can use LLMs to help us or to act as a rubber duck for daily programming.

So far, in my opinion, this has been the big use of AI for us as developers.

 

 

2 - ChatGPT and Codex 

 

In May 2025, ChatGPT launched Codex, an artificial intelligence tool that can write code. The announcement trailer was attention-grabbing, as always, but I wanted to try it, so I checked if it was available and found it was initially only accessible to pro members with the $200 subscription.

Still, you could join the waitlist if you have the $20 version, so I did.

A few weeks later, I got a message that it was available, and a few days afterwards, I tested it live.

 

Here’s the live session -> https://youtube.com/live/_fHSCuFzZq4  

 

My first impressions blew my mind. I didn’t ask the AI to recreate Uber or build some huge app. I tried to simulate its capabilities with a regular task.

For this, I took my code from Distribt and gave it a typical task that could happen in a company.

If you’re not familiar with this project, it’s a personal project based on distributed systems. And if you want to learn how to build distributed systems, I have a book for sale!  

 

Modify the mail service to keep a state of sent messages.

And from there came the following PR -> https://github.com/ElectNewt/Distribt/pull/44 

You can see it has a few issues, but they’re minor, and this was after reviewing the code just once.

Note: The issues shown in the PR were manually fixed in less than 5 minutes. 

 

In fact, the bug in the code is very simple to fix—it’s trying to access a class it created in the API layer from the Application layer, so it doesn’t work.

That, and a badly formed reference, so really just minor issues.

 

The development process went like this: 

First, you add the project you want to work on into Codex—in my case, Distribt. Once Codex indexes it in its system, you get a chat, just like ChatGPT, where you can talk and ask questions, including asking for tasks:

codex from openIA

And you simply ask it to do the task, and it goes ahead and does it.

 

When it finishes, it gives you a summary of what it did, how long it took (in my case, 7 minutes for the first version) and provides a window where you can do a code review.

Here we have a totally normal code review window. So when you’re done, you can leave comments, and of course, ask it to fix them—just like when you’re working with a teammate.

 

As you can see in the image, there’s a “dotnet test” command not found message. That’s because you can configure an environment so all commands and everything you need will work. In my case, since I was just testing, it wasn’t strictly necessary. Also, not having it installed helped me test its capabilities without outside help (the compiler).

 

So, after another 10 minutes, it finally answered with the code that is now linked to GitHub. Judge for yourself.

 

 

3 - Final thoughts on my first experiment with Codex.

 

Honestly, it blew my mind, as I mentioned in the first part of this blog. I used to rely on AI for autocomplete and not much else. This functionality is wild—in my opinion, Codex has the skills of a junior or even mid-level developer.

 

It’s important to note that the Distribt code is well-designed, structured, and easy to read. It would be worth seeing how Codex handles a monolithic system with lots of complexity.

 

As someone who lives and breathes programming, for the first time, AI actually worries me. And I mean that seriously—it genuinely has the skills to replace people. Many of you will say that we humans will adapt to working this way, and sure, but the reality is there will be a few years where there simply won’t be juniors, because AI does a better job than a junior dev. So in a few years, we’ll be missing senior devs who are able to review what the AI is writing. Some of the errors it makes are about architecture or style, but others are more serious and can cause problems in real production applications. That’s why a senior human is still needed to review and make sure everything works as it should.

 

We’ll have to wait and see how this evolves in the coming months and years.

This post was translated from Spanish. You can see the original one here.
If there is any problem you can add a comment bellow or contact me in the website's contact form

© copyright 2025 NetMentor | Todos los derechos reservados | RSS Feed

Buy me a coffee Invitame a un café