Aiyush Gupta


Is my Career Over? Experiments with Codex

Is my Career Over? Experiments with Codex

Aiyush Gupta's photo
Aiyush Gupta
·Aug 11, 2021·

10 min read

Play this article


Code is the tissue and cells of the current world; digital monopolies exist online and both you and I are part of them. Codex is helping to bridge the gap between amateurs and hard-to-understand frameworks and languages… to some extent.

OpenAI has once again not failed to impress the developer community 🚀. Less than 24 hours ago, they released an improved version of Codex which is fully capable of translating natural language to code. They released their API in Private Beta and I just received access for it and am thoroughly impressed with what it can do. You may have heard of the name Codex before; it’s the model behind GitHub’s Co-pilot and is fluent in over a dozen languages. OpenAI is encouraging the developer community to integrate the API into existing products and fully exploit its’ capabilities.

When I watched this video yesterday, I was completely flabbergasted 😲. The Codex model is a direct child of GPT-3 and has been trained on billions of lines of code coming from multiple sources but namely GitHubs public repositories. I mentioned earlier that Codex can work with dozens or languages: JavaScript, Ruby, Perl, Go, TypeScript, Shell etc… But it works best with Python. You may be asking yourself isn’t this just another GPT-3 rebranded to work only with code? Well, GPT’s skill is generating natural language in response to natural language and cannot produce code well as a result. Sure, we’ve seen the examples on Twitter of GPT-3 creating ReactJS components but it only had 4Kb of memory whereas Codex has 14Kb meaning it can take over 3 times the amount of contextual information into account. Codex is empowering developers across the globe to take a simple well-designed prompt and spitting out almost usable, readbale and coherent code.

Another sample video

I’m almost certain when tackling larger project you tend to organise you code first and break it down into smaller problems. This is called decomposition: the breaking down of large problems into smaller problems. Since, smaller problems are easier to solve, are independent of the other issues, can be tested independently and then they can be combined to produce a response to the full problem. This is where Codex accelerates the most. It shouldn’t be used on its’ own but rather as a companion to the developer. For instance, tomorrow (12 August) Open-AI is running a competition where Codex will be both companion and competitor to programmers across the world to see how will excel the most.

So far I’ve mentioned that Codex can produce workable code. However, it is also capable of explaining code, transpiling and refactoring it 😱. According to OpenAi themselves none of the coding problems presented to Codex could be solved by GPT-3, this seems reasonable at first since GPT-3 has no / little code in its’ dataset. Then the scientists at EuletherAI trained GPT-J on the Pile (I wrote briefly about this in a previous article) which is a 800Gb dataset than combines volumes of code from StackExchange and Github and only achieved successful completing of 11.4% of coding problems -if you would like to see it in action please check out or look at my previous article to see how it works and the source code on GitHub. This is where Codex hasn’t failed to impress, a mere 159Gb (compared to 800Gb of the pile) was used to train codex reaching an accuracy of 28% to solve the problems further expounded by fine tuning allowing it to solve 38% of any problem that it was presented with. GPT-3 is alike a high school student who has some knowledge on almost every preliminary level, yet Codex is a university student who knows about a single topic very deep, but still makes mistakes here and there. Maybe this analogy isn't the best since the models' don't actually understand as discussed below but I hope it makes the point.

You know what I mean

Does Codex Actually Understand? 🤔

One aspect of the model that I even fail to remember is that Codex doesn’t understand programming, it will never be as good as an amateur how has a grasp on what to learn next and problem solve. Similar to any language model Codex just pieces together statistical data efficiently to realise relationships and links between different problems. Professional developers will never see as much code as Codex has, yet, they can out perform the model on an infinite number of levels. Remember “a strong student who completes an introductory computer science course is expected to be able to solve a larger fraction of problems than Codex-12B”. Phew, I should be ok then.

Environmental Impacts 🌳

Alike other generative models Codex maintains an energy usage for both interference and training. GPT-3-12B ate hundreds of petaflops days computing and fine tuning Codex took a very similar amount. OpenAI trained the model on the Azure platform who attempts to gain significant amounts of its’ energy from renewable sources, researchers indicated “Looking more globally and long-term, the compute demands of code generation could grow to be much larger than Codex’s training if significant inference is used to tackle challenging problems.”.


Legality ⚖️

FOSSA authored an article “Analysing the Legal Implications of GitHub Copilot” since Codex is the model used to build Copilot I believe the conclusions should / can be extrapolated onto Codex itself. This can be read here: .Researchers clearly identified that training AI on internet data as been identified as fair use and claim Codex “rarely generates code that is identical to the contents of training data” in the < 0.1% instances that it did occur was when the generated code was built of common expressions / statements that were present in the training dataset again and again. To conclude, even on GitHub (not including other platforms Codex was trained on) “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

Fill out this poll to share your thoughts! %[]

Representation (Bias) in the Generated Code 😈

It is extremely difficult to have a natural language dataset that is completely factual, this introduces bias into the intelligent models. Researchers at Open-AI wrote “we found that Codex can be prompted in ways that generate racist, denigratory, and otherwise harmful outputs as code comments” additionally “code generation models raise further bias and representation issues beyond problematic natural language: Codex can generate code with structure that reflects stereotypes about gender, race, emotion, class, the structure of names, and other characteristics. Particularly in the context of users who might over-rely on Codex or use it without first think- ing through project design, this issue could have significant safety implications, giving further motivation to discourage over-reliance.”


Using Codex 🙌

As mentioned in the title, I received access to the model this morning, I’ve tinkered with the model somewhat already, If you have any prompts you would like me to try out please share them in the comments, I’ll start a repository on GitHub with my findings soon.

I decided to start of simple and progressively make my prompts more difficult to understand, later on this week I hope to post more experiments. If you would like to team up and create something then connect with me via LinkedIn or comment down below.

Asking a user for their name ❓


What I really love about this model is that it creates multiple outputs every time you run it. This isn't always the case, however, but it seems always to produce meaningful comments to go alongside it.

Here is the code generated from this one:

#get the name from the user
name = input("Please enter your name: ")

#say hello to the user
print("Hello {0}!".format(name))

When I reran with the exact same prompt, I received this:

name = input("What is your name? ")
print("Hello, {}!".format(name))

As you can see the model creates two very similar outputs and the first one needed tinkering with slightly, however it did work causing it to pass test 1.

Asking a user for their name

Here, I asked the model to generate two lists of random names and then combine them together. What is becoming apparent is the GPT-3 nature of this model, I was super impressed with how I could just say random names and it generated 20 in total.


If you want to try the code out here it is:

import random

names = ["John", "James", "David", "Adam", "Andrew", "Chris", "Sara", "Anne", "Rose", "Kim"]
surnames = ["Smith", "Baker", "Davis", "Miller", "Clark", "Brown", "Kelvin", "Anderson", "Lee", "Simpson"]

def random_names(names, surnames):
    full_names = []
    for name in names:
        for surname in surnames:
            full_names.append(name + " " + surname)
    return full_names

random_full_names = random_names(names, surnames)

What I love in particular about this model was its' ability to indent the code. When experimenting with other language models in the past they all seem to hate that code needed to be indented correctly and left it out. Also notice how it adds meaningful parameter names which are then duplicated when the function is called and uses the same argument names. This is a prime example of how it can maintain its' context.

SQL Now I'm moving away from its' best territory 🐍 and I'm moving into the rocky waters of SQL. Now, I usually use NoSQL databases like MongoDB so I'm not accustomed to SQL syntax. A tool like this would be increasingly helpful to someone like me when I need to get something do quickly.


I've not tested this output yet, however, all the syntax seems to be correct.

Translation Above, I mentioned how this model can deal with not only code generation but also code translation. This would be extremely helpful to someone learning a new programming language or had written some code in python for the backend but realised they don't actually need a server for their project and a static website would work fine. What I haven't tested out yet is its' ability to use APIs and Frameworks that are common, this will be an interesting experiment. Do you think it will be able to do it?


Buggy Python 🐍

Imagine.... just imagine... if you and I were capable of not banging our heads on a table every time we missed of a semi-colon. I definitely know in my early days of Python IDLE, a feature like this would have been greatly appreciated, imagine the possibilities. buggyPython.gif

Time Complexity This is one of the few experiments I have done where a beginner would not know how to do this. This is definitely a real problem for developers but the idea that it was solved so simply it really quite astounding. A tool like this could be used for students learning about Big O notation etc...


Simple HTML This is where its' capabilities really started to sink in, from this simple prompt a fully functional deployable website was created. Imagine a longer prompt, if we were to continue adding and adding more and more, it would be awesome to see a social media clone using just prompts.


Screenshot 2021-08-11 at 11.26.46.png

More complex HTML Soon after trying the above, I wanted to see how it would handle images and slightly more advanced CSS. I asked it:

  1. Add a title of Hello World
  2. Make the text big
  3. Center the text
  4. Make it a gradient
  5. Add this image ""
  6. Animate the dog
  7. Add parallax


Here is the code if anyone wants to try it out:

<!DOCTYPE html>
    <title>Hello World</title>
            background-color: #000000;
            color: #FFFFFF;
            font-size: 50px;
            text-align: center;
            background: linear-gradient(to right, #FF0000, #FFFF00, #00FF00, #00FFFF, #0000FF, #FF00FF);
    <h1 class="gradient">Hello World</h1>
    <img src="" alt="dog" width="500" height="500">

Screenshot 2021-08-11 at 11.27.17.png

As you can see in the GIF, it generated some random python comments at the bottom and started telling me about Cascading Style Sheets in more detail, it would have been great if they were HTML comments.


No, my job isn't going any where any time soon 😅. Although Codex is thoroughly impressive it struggles with increasingly complex prompts and sometimes the outputs don't actually work, although the results above aren't cherry-picked I've only conducted my initial experiments on a small sample size. If anyone wants to send me prompts to try out and share the results then feel free to do so. I'm also certain if you started to code today that by the end of the first 2 months you would be able to create more advanced applications that Codex can.

I cannot stress this enough

👋 Thanks for reading, See you next time

Did you find this article valuable?

Support Aiyush Gupta by becoming a sponsor. Any amount is appreciated!

Learn more about Hashnode Sponsors
Share this