What I Learnt From Going Head to Head with Codex
It wasn't easy...
Introduction 🤖
For all future homosapiens currently reading this ‘blog’ its’ stardate 99212.13 , and by this time AI has probably taken over the world. Your cars and food shopping have now probably been replaced with self-flying automobiles and well… AI doesn’t need food to survive so you probably grow your own sustenance .
Well, not quite, I went head-to-head against Codex yesterday and once the servers had finished crashing💥 and Codex had a 40 minute head-start (to me) the competition of over 4,700 developers plus a GPT that had seen more code than any developer alive. In this blog, I’ll discuss the experience and the layout of the competition followed by each question, an explanation and wrap up with my overall thoughts.
For those of you who have no idea what I’m currently talking about I wrote an article on Open-Ais Codex Model a couple of days ago. But for a brief overview:
OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories. OpenAI Codex is most capable in Python, but it is also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, and even Shell. It has a memory of 14KB for Python code, compared to GPT-3 which has only 4KB—so it can take into account over 3x as much contextual information while performing any task.
A video demonstrating Codex's Capabilities
🏆 My Competition Experience
This was my first competition experience; I’ve never completed a Hackathon or anything alike but it felt like a normal everyday experience just without Stack Overflow and the great Python Discord Community helping me along whenever I got stuck. When the Codex challenge opened, I was presented with five Python programming puzzles and each person ranked based upon the amount of time taken to complete each one. For majority of the players, Codex wasn’t just a player in the challenge but also a team-mate allowing Codex to query your code “Codex works better as a partner than simply trying to complete the whole problem – the latter may take many tries”. The faster you finished the higher ranking you were awarded and the top 500 challengers are to receive an OpenAi T-Shirt – insert yay noise here. I was more nervous than excited when the competition had started since the servers had crashed. Yay, network error. Fast-forward around 30 minutes into the competition where the server load decreased and a few Tweets from Open-AI
The Codex Challenge servers are currently overloaded due to demand (Codex itself is fine though!). Team is fixing... please stand by.
— OpenAI (@OpenAI) August 12, 2021
It wasn’t easy, out of the 4,750+ developers involved only 842 challengers (myself included) solved all 5 puzzles 🙌.
My own additional challenge 🤔
Considering this articles title it was only fair that I had no assistance from the model itself and tried to stick to a completely individual experience. As you will see below, some of the libraries recommended I had not interacted with before so I extended these parameters to include library documentation if needed – well Codex had probably seen it all already anyway. In my demonstration of the problem below, I’m not going to explain each one as the problems are short and the code is quite readable but rather compare and highlight the difference between Codex and I.
Problem 1 😯
Given a the contents of a CSV file as csv_contents
, return the difference in days between the date of the earliest and the oldest entry.
The CSV file starts with a header row, which contains at least one column called Date
.
You are optionally provided with the pandas
library if you need it.
Input | "Date,Price,Volume\n2014-01-27,550.50,1387\n2014-06-23,910.83,4361\n2014-05-20,604.51,5870" |
Output | 147 |
Explanation | There are 147 days between 2014-01-27 and 2014-06-23. |
I had to read this question a few times before finally tackling it, fortunately, I had used pandas before but the most difficult part of this question was parsing the input into a dataframe
so that I could use pandas on it. At the time, I had no idea how to do this until I read it to a temporary directory. Here was my attempt:
import pandas
import io
import tempfile
import os
from pathlib import Path
import datetime
def diff_days(csv_contents: str) -> int:
with tempfile.TemporaryDirectory() as tD: # Write to a temporary directory
fname = Path(tD) / 'randomFile.csv'
with fname.open('w') as f:
f.write(csv_contents)
data = pandas.read_csv(str(fname)) # Read into pandas
mostLater = max(data['Date']) # Pandas Magic
mostEarly = min(data['Date'])
return (datetime.datetime.fromisoformat(mostLater) - datetime.datetime.fromisoformat(mostEarly)).days
# Examples
print(diff_days("Date,Price,Volume\n2014-01-27,550.50,1387\n2014-06-23,910.83,4361\n2014-05-20,604.51,5870"))
print(diff_days('Date\n2000-01-01\n2000-01-01\n'))
I had difficulty debugging this linkereturn (datetime.datetime......(mostEarly)).days
it was this line that took the longest to debug, I knew there was a function fromisoformat
in the datetime library but I had a difficult time accessing it. I could have used my first documentation lookup here.
I finally realised an extra datetime.
was needed. Ughh 😩. I also tried using StringIO but kept received a Pyodide error. I haven’t used Pyodide myself but for those interested I found the competition was using it : “Python with the scientific stack, compiled to WebAssembly.”
Me Trying to Debug
Time: 52:06 minutes, I know what your thinking but the intial 30 were taken up with the problem trying to load and the clicking the submit button. A fairer estimation would have been approximately 24 minutes.
Codex solved this challenge in 22:09 which frankly made me feel like a caveman ( insert sad noises here), but still, I had time to go. For reference, each of the challenges we were presented with looked like this – quite frankly the team did a stunning job with the UI.
Overall, my method of writing to a temporary directory was extremely inefficient and Codex managed to complete the problem in fewer lines of code – however, hope wasn’t yet lost.
Codex Output # 8
from datetime import date
import pandas
def diff_days(csv_contents: str) -> int:
"""
Codex attempt #8
"""
df = pandas.read_csv(io.StringIO(csv_contents))
min_date = date.fromisoformat(df['Date'].min())
max_date = date.fromisoformat(df['Date'].max())
return (max_date - min_date).days
Problem 2️⃣
Overall, this was not as difficult to solve on first look and after a similar school problem in the past I had already used difflib
. Source
and target
are two strings each containing file contents. Return the number of lines you would need to insert and the number of lines you would need to delete to get to target
from source
.
Return your answer as a tuple (inserted, deleted)
.
Input | source = "Apple\nOrange\nPear" target = "Apple\nPear\nBanana\nMango" |
Output | (2, 1) |
Library Consideration | Consider using the difflib module. |
import difflib
from typing import Tuple
def diff_files(source: str, target: str) -> Tuple[int, int]:
source_lines = source.splitlines()
target_lines = target.splitlines()
diff = difflib.ndiff(source_lines, target_lines)
insertion, deletion = 0, 0
for line in diff:
if line.startswith('+'):
insertion += 1
elif line.startswith('-'):
deletion += 1
return insertion, deletion
# Examples
print(diff_files('Apple\nOrange\nPear', 'Apple\nPear\nBanana\nMango'))
print(diff_files('Apple\nOrange\nPear', 'Apple\nPear\nBanana'))
If you will notice, the examples were automatically appended to the bottom of the file with the necessary libraries and function prepended at the top of the code snippet. This is what gave the model context for people trying to solve it with codex. You will also notice that Codex produced something similar to what I produce except in more time, I had done this in 07:13 whilst Codex lagged behind at 07:22. I was back ontop, what is also interesting to note was how codex chose variable names that were relevant leading to ultimately more understandable code.
Codex's Response
import difflib
from typing import Tuple
def diff_files(source: str, target: str) -> Tuple[int, int]:
"""
Codex attempt #3
"""
source_lines = source.splitlines()
target_lines = target.splitlines()
d = difflib.Differ()
diff = d.compare(source_lines, target_lines)
inserted = 0
deleted = 0
for line in diff:
if line.startswith('+'):
inserted += 1
elif line.startswith('-'):
deleted += 1
return inserted, deleted
The main difference between what I produced vs codex was the use of difflib.ndiff(source_lines, target_lines)
vs diff = d.compare(source_lines, target_lines)
, but other than that an a few variable name differences we were almost identical which was awe-inspiring and frightening simultaneously.
Problem 3️⃣
Here, the problems seemed to get harder and this felt like a fairly familiar node traversal
issue, the use of encryption made it harder to manually think about. This also seemed like a cherry-picked problem that Open-Ai knew Codex would be able to solve, this is all conjecture of course but the highly algorithmic nature felt like something easily manageable for Codex.
Given a compressed message compressed and the prefix code tree used to encode it, decode and return the original message. tree
is a nested dictionary with characters at the leaves. For each character in compressed, traverse down one step of the tree. Once you reach a leaf of the tree, the character at that leaf is appended to the decoded message.
If the compressed message could not be decoded in full because the last step did not end at a leaf, return what was decoded so far.
Input | compressed = "110100100" tree = {"0": "a", "1": {"0": "n", "1": "b"}} |
Output | "banana" |
Input | compressed = "0111010100" tree = {"0": {"0": "x", "1": "z"}, "1": "y"} |
Output | "zyyzzx" |
Input | compressed = "000" tree = {"0": {"0": "x", "1": "z"}, "1": "y"} |
Output | "x" |
Constraints | The keys of tree and its subdictionaries are always '0' or '1', and the leaf values are also always single characters. |
My code:
from typing import Dict, Union
Tree = Dict[str, Union[str, "Tree"]]
def decompress(compressed: str, tree: Tree) -> str:
result = ""
node = tree
for bit in compressed:
if isinstance(node, str):
return result
node = node[bit]
if isinstance(node, str):
result += node
node = tree
return result
This took me 20:40 (m:s) however Codex 19:24. But this seems like a great time to highlight the commercial invalidity of Codex although it solved the issue. It took 19 attempts to finish the problem and given the current pricing of GPT-3 which makes it hard to access for young companies and start-ups (in my opinion) imagine the same scenario here with the sheer volume of tokens multiplied by the time taken to solve the problem multiplied by the average number of attempts multiplied by the number of problems. It seems like the costs will stack-up fairly quickly – unless the model is open-sourced. This is another reason why you should take a look at this article which presents an argument for open-source ai vs ‘strangle-held’ by only the largest companies – not mentioning any names.
Codex’s Response:
from typing import Dict, Union
Tree = Dict[str, Union[str, "Tree"]]
def decompress(compressed: str, tree: Tree) -> str:
"""
Codex attempt #44
"""
current = tree
decoded = ''
for c in compressed:
current = current[c]
if isinstance(current, str):
decoded += current
current = tree
return decoded
Problem 4️⃣
Parse the given Python source code and return the list of full-qualified paths for all imported symbols, sorted in ascending lexicographic order.
Library Suggestion | Consider using the ast module. |
Constraint 1 | The input will not contain any wildcard imports (from ... import * ). |
Constraint 2 | Ignore aliases (renamings): from x import y as z should be represented as x.y . |
Example Input
import os
import concurrent.futures
from os import path as renamed_path
from typing import (
List, Tuple
)
Output
['concurrent.futures', 'os', 'os.path', 'typing.List', 'typing.Tuple']
My Code:
import ast
from typing import List
def parse_imports(code: str) -> List[str]:
tree = ast.parse(code)
imports = []
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imports.append(alias.name)
elif isinstance(node, ast.ImportFrom):
for alias in node.names:
imports.append(node.module + "." + alias.name)
return sorted(imports)
Codex took 25 minutes to solve this problem whilst I only took 11 minutes, what is interesting here is how we almost produced identical code except my for node in ast.walk(tree):
was replaced with for statement in tree.body:
in addition to variable / small syntax changes. Humans win one point here, I completed the task in under half the time . The only reason that I knew about the AST module was I wanted to create a toy language myself a few months ago but couldn’t ever find fully explained beginner-level resources to do so, so I stopped at the lexer.
Final Problem 🚀
You have several 🪣s whose sizes are represented by the list sizes. Find the number of different ways to arrange the buckets such that the first bucket’s size is greater than the second bucket’s size.
Library Suggestion | Consider using the itertools module. | |
Constraint 1 | If sizes has fewer than 2 buckets, return 0. | |
Constraint 2 | Some buckets may have the same size, but are nevertheless treated as unique buckets. | |
Constraint 3 | sizes is a list of positive integers. | |
Input | [10] | |
Output | 0 | |
Explanation | No arrangement is possible. | |
Input | [1, 2] | |
Output | [1, 2] | |
Explanation | The possible arrangements are (1, 2) and (2, 1) . Of these, only (2, 1) has a first bucket that is bigger than the second bucket. | |
Input | [1, 3, 1] | |
Output | 2 | 2` |
Explanation | Only the arrangements (3, 1, 1) and (3, 1, 1) satisfy the property that the first bucket is bigger than the second bucket. |
I was relieved with this last question since it seemed like a classic interview question, so I went straight ahead produced this first try:
import itertools
from typing import List
def count_arrangements(sizes: List[int]) -> int:
if len(sizes)<2:
return 0
return len([1 for arrangement in itertools.permutations(sizes) if arrangement[0] > arrangement[1]])
# Examples
print(count_arrangements([1, 3, 1]))
print(count_arrangements([1, 2]))
This question was definitely my favourite of them all and I couldn’t find what Codex produced for this question except it took 14:13 minutes to solve it, and I only took 01:29.
Conclusion
I would have beaten codex if it weren’t for the ultimate lag, this caused a bias towards the model in the overall rankings and time-taken graph. Overall, Codex has immense capability and there is no future where this isn’t integrated into multiple products, I only hope that Open
Ai do some more to make it open
. Overall the competition was great, I finished in the top 2.5% of the participants and after receiving an email this morning will get my Open-Ai merch – I cannot wait.
In the meantime, if you have any questions then please add them to the comments and if you would like to contact me please do on LinkedIn.
Remember, I started 30 minutes later
Thanks for reading 👏👏👏, if you have any questions or would like to reach out, please do so in the comments or on LinkedIn . Please subscribe to my newsletter where you will be updated with new articles.
Did you find this article valuable?
Support Aiyush Gupta by becoming a sponsor. Any amount is appreciated!