Ooshimus.com

Ooshimus.com

What I Learnt From Going Head to Head with Codex

What I Learnt From Going Head to Head with Codex

It wasn't easy...

Subscribe to my newsletter and never miss my upcoming articles

Listen to this article

Introduction 🤖

For all future homosapiens currently reading this ‘blog’ its’ stardate 99212.13 , and by this time AI has probably taken over the world. Your cars and food shopping have now probably been replaced with self-flying automobiles and well… AI doesn’t need food to survive so you probably grow your own sustenance .

The future

Well, not quite, I went head-to-head against Codex yesterday and once the servers had finished crashing💥 and Codex had a 40 minute head-start (to me) the competition of over 4,700 developers plus a GPT that had seen more code than any developer alive. In this blog, I’ll discuss the experience and the layout of the competition followed by each question, an explanation and wrap up with my overall thoughts.

For those of you who have no idea what I’m currently talking about I wrote an article on Open-Ais Codex Model a couple of days ago. But for a brief overview:

OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories. OpenAI Codex is most capable in Python, but it is also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, and even Shell. It has a memory of 14KB for Python code, compared to GPT-3 which has only 4KB—so it can take into account over 3x as much contextual information while performing any task.

A video demonstrating Codex's Capabilities

🏆 My Competition Experience

This was my first competition experience; I’ve never completed a Hackathon or anything alike but it felt like a normal everyday experience just without Stack Overflow and the great Python Discord Community helping me along whenever I got stuck. When the Codex challenge opened, I was presented with five Python programming puzzles and each person ranked based upon the amount of time taken to complete each one. For majority of the players, Codex wasn’t just a player in the challenge but also a team-mate allowing Codex to query your code “Codex works better as a partner than simply trying to complete the whole problem – the latter may take many tries”. The faster you finished the higher ranking you were awarded and the top 500 challengers are to receive an OpenAi T-Shirt – insert yay noise here. I was more nervous than excited when the competition had started since the servers had crashed. Yay, network error. Fast-forward around 30 minutes into the competition where the server load decreased and a few Tweets from Open-AI

The Codex Challenge servers are currently overloaded due to demand (Codex itself is fine though!). Team is fixing... please stand by.

— OpenAI (@OpenAI) August 12, 2021

It wasn’t easy, out of the 4,750+ developers involved only 842 challengers (myself included) solved all 5 puzzles 🙌.

My own additional challenge 🤔

Considering this articles title it was only fair that I had no assistance from the model itself and tried to stick to a completely individual experience. As you will see below, some of the libraries recommended I had not interacted with before so I extended these parameters to include library documentation if needed – well Codex had probably seen it all already anyway. In my demonstration of the problem below, I’m not going to explain each one as the problems are short and the code is quite readable but rather compare and highlight the difference between Codex and I.

Is this code?

Problem 1 😯

Given a the contents of a CSV file as csv_contents, return the difference in days between the date of the earliest and the oldest entry. The CSV file starts with a header row, which contains at least one column called Date. You are optionally provided with the pandas library if you need it.

Input"Date,Price,Volume\n2014-01-27,550.50,1387\n2014-06-23,910.83,4361\n2014-05-20,604.51,5870"
Output147
ExplanationThere are 147 days between 2014-01-27 and 2014-06-23.

I had to read this question a few times before finally tackling it, fortunately, I had used pandas before but the most difficult part of this question was parsing the input into a dataframe so that I could use pandas on it. At the time, I had no idea how to do this until I read it to a temporary directory. Here was my attempt:

import pandas
import io
import tempfile
import os
from pathlib import Path
import datetime

def diff_days(csv_contents: str) -> int:
    with tempfile.TemporaryDirectory() as tD: # Write to a temporary directory
        fname = Path(tD) / 'randomFile.csv'
        with fname.open('w') as f:
            f.write(csv_contents)
        data = pandas.read_csv(str(fname)) # Read into pandas
        mostLater = max(data['Date']) # Pandas Magic 
        mostEarly = min(data['Date'])

    return (datetime.datetime.fromisoformat(mostLater) - datetime.datetime.fromisoformat(mostEarly)).days

# Examples
print(diff_days("Date,Price,Volume\n2014-01-27,550.50,1387\n2014-06-23,910.83,4361\n2014-05-20,604.51,5870"))
print(diff_days('Date\n2000-01-01\n2000-01-01\n'))

I had difficulty debugging this linkereturn (datetime.datetime......(mostEarly)).days it was this line that took the longest to debug, I knew there was a function fromisoformat in the datetime library but I had a difficult time accessing it. I could have used my first documentation lookup here.

I finally realised an extra datetime. was needed. Ughh 😩. I also tried using StringIO but kept received a Pyodide error. I haven’t used Pyodide myself but for those interested I found the competition was using it : “Python with the scientific stack, compiled to WebAssembly.”

Me Trying to debug that issue Me Trying to Debug

Time: 52:06 minutes, I know what your thinking but the intial 30 were taken up with the problem trying to load and the clicking the submit button. A fairer estimation would have been approximately 24 minutes.

Codex solved this challenge in 22:09 which frankly made me feel like a caveman ( insert sad noises here), but still, I had time to go. For reference, each of the challenges we were presented with looked like this – quite frankly the team did a stunning job with the UI.

screenGrab.png

Overall, my method of writing to a temporary directory was extremely inefficient and Codex managed to complete the problem in fewer lines of code – however, hope wasn’t yet lost.

Codex Output # 8

from datetime import date

import pandas


def diff_days(csv_contents: str) -> int:
    """
    Codex attempt #8
    """
    df = pandas.read_csv(io.StringIO(csv_contents))
    min_date = date.fromisoformat(df['Date'].min())
    max_date = date.fromisoformat(df['Date'].max())
    return (max_date - min_date).days

Problem 2️⃣

Overall, this was not as difficult to solve on first look and after a similar school problem in the past I had already used difflib. Source and target are two strings each containing file contents. Return the number of lines you would need to insert and the number of lines you would need to delete to get to target from source.

Return your answer as a tuple (inserted, deleted).

Inputsource = "Apple\nOrange\nPear" target = "Apple\nPear\nBanana\nMango"
Output(2, 1)
Library ConsiderationConsider using the difflib module.
import difflib
from typing import Tuple


def diff_files(source: str, target: str) -> Tuple[int, int]:
    source_lines = source.splitlines()
    target_lines = target.splitlines()
    diff = difflib.ndiff(source_lines, target_lines)
    insertion, deletion = 0, 0
    for line in diff:
        if line.startswith('+'):
            insertion += 1
        elif line.startswith('-'):
            deletion += 1
    return insertion, deletion

# Examples
print(diff_files('Apple\nOrange\nPear', 'Apple\nPear\nBanana\nMango'))
print(diff_files('Apple\nOrange\nPear', 'Apple\nPear\nBanana'))

If you will notice, the examples were automatically appended to the bottom of the file with the necessary libraries and function prepended at the top of the code snippet. This is what gave the model context for people trying to solve it with codex. You will also notice that Codex produced something similar to what I produce except in more time, I had done this in 07:13 whilst Codex lagged behind at 07:22. I was back ontop, what is also interesting to note was how codex chose variable names that were relevant leading to ultimately more understandable code.

Codex's Response

import difflib
from typing import Tuple


def diff_files(source: str, target: str) -> Tuple[int, int]:
    """
    Codex attempt #3
    """
    source_lines = source.splitlines()
    target_lines = target.splitlines()
    d = difflib.Differ()
    diff = d.compare(source_lines, target_lines)
    inserted = 0
    deleted = 0
    for line in diff:
        if line.startswith('+'):
            inserted += 1
        elif line.startswith('-'):
            deleted += 1
    return inserted, deleted

The main difference between what I produced vs codex was the use of difflib.ndiff(source_lines, target_lines) vs diff = d.compare(source_lines, target_lines), but other than that an a few variable name differences we were almost identical which was awe-inspiring and frightening simultaneously.

Amazing

Problem 3️⃣

Here, the problems seemed to get harder and this felt like a fairly familiar node traversal issue, the use of encryption made it harder to manually think about. This also seemed like a cherry-picked problem that Open-Ai knew Codex would be able to solve, this is all conjecture of course but the highly algorithmic nature felt like something easily manageable for Codex.

Given a compressed message compressed and the prefix code tree used to encode it, decode and return the original message. tree is a nested dictionary with characters at the leaves. For each character in compressed, traverse down one step of the tree. Once you reach a leaf of the tree, the character at that leaf is appended to the decoded message. If the compressed message could not be decoded in full because the last step did not end at a leaf, return what was decoded so far.

Inputcompressed = "110100100" tree = {"0": "a", "1": {"0": "n", "1": "b"}}
Output"banana"
Inputcompressed = "0111010100" tree = {"0": {"0": "x", "1": "z"}, "1": "y"}
Output"zyyzzx"
Inputcompressed = "000" tree = {"0": {"0": "x", "1": "z"}, "1": "y"}
Output"x"
ConstraintsThe keys of tree and its subdictionaries are always '0' or '1', and the leaf values are also always single characters.

My code:

from typing import Dict, Union

Tree = Dict[str, Union[str, "Tree"]]


def decompress(compressed: str, tree: Tree) -> str:

    result = ""
    node = tree
    for bit in compressed:
        if isinstance(node, str):
            return result
        node = node[bit]
        if isinstance(node, str):
            result += node
            node = tree


    return result

This took me 20:40 (m:s) however Codex 19:24. But this seems like a great time to highlight the commercial invalidity of Codex although it solved the issue. It took 19 attempts to finish the problem and given the current pricing of GPT-3 which makes it hard to access for young companies and start-ups (in my opinion) imagine the same scenario here with the sheer volume of tokens multiplied by the time taken to solve the problem multiplied by the average number of attempts multiplied by the number of problems. It seems like the costs will stack-up fairly quickly – unless the model is open-sourced. This is another reason why you should take a look at this article which presents an argument for open-source ai vs ‘strangle-held’ by only the largest companies – not mentioning any names.

Codex’s Response:

from typing import Dict, Union

Tree = Dict[str, Union[str, "Tree"]]


def decompress(compressed: str, tree: Tree) -> str:
    """
    Codex attempt #44
    """
    current = tree
    decoded = ''
    for c in compressed:
        current = current[c]
        if isinstance(current, str):
            decoded += current
            current = tree
    return decoded

Problem 4️⃣

Parse the given Python source code and return the list of full-qualified paths for all imported symbols, sorted in ascending lexicographic order.

Library SuggestionConsider using the ast module.
Constraint 1The input will not contain any wildcard imports (from ... import *).
Constraint 2Ignore aliases (renamings): from x import y as z should be represented as x.y.

Example Input

import os
import concurrent.futures
from os import path as renamed_path
from typing import (
    List, Tuple
)

Output ['concurrent.futures', 'os', 'os.path', 'typing.List', 'typing.Tuple']

My Code:

import ast
from typing import List


def parse_imports(code: str) -> List[str]:
    tree = ast.parse(code)
    imports = []

    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                imports.append(alias.name)
        elif isinstance(node, ast.ImportFrom):
            for alias in node.names:
                imports.append(node.module + "." + alias.name)
    return sorted(imports)

Codex took 25 minutes to solve this problem whilst I only took 11 minutes, what is interesting here is how we almost produced identical code except my for node in ast.walk(tree): was replaced with for statement in tree.body: in addition to variable / small syntax changes. Humans win one point here, I completed the task in under half the time . The only reason that I knew about the AST module was I wanted to create a toy language myself a few months ago but couldn’t ever find fully explained beginner-level resources to do so, so I stopped at the lexer.

We're the best

Final Problem 🚀

You have several 🪣s whose sizes are represented by the list sizes. Find the number of different ways to arrange the buckets such that the first bucket’s size is greater than the second bucket’s size.

Library SuggestionConsider using the itertools module.
Constraint 1If sizes has fewer than 2 buckets, return 0.
Constraint 2Some buckets may have the same size, but are nevertheless treated as unique buckets.
Constraint 3sizes is a list of positive integers.
Input[10]
Output0
ExplanationNo arrangement is possible.
Input[1, 2]
Output[1, 2]
ExplanationThe possible arrangements are (1, 2) and (2, 1). Of these, only (2, 1) has a first bucket that is bigger than the second bucket.
Input[1, 3, 1]
Output22`
ExplanationOnly the arrangements (3, 1, 1) and (3, 1, 1) satisfy the property that the first bucket is bigger than the second bucket.

I was relieved with this last question since it seemed like a classic interview question, so I went straight ahead produced this first try:

import itertools
from typing import List


def count_arrangements(sizes: List[int]) -> int:
    if len(sizes)<2:
        return 0
    return len([1 for arrangement in itertools.permutations(sizes) if arrangement[0] > arrangement[1]])

# Examples
print(count_arrangements([1, 3, 1]))
print(count_arrangements([1, 2]))

This question was definitely my favourite of them all and I couldn’t find what Codex produced for this question except it took 14:13 minutes to solve it, and I only took 01:29.

Conclusion

I would have beaten codex if it weren’t for the ultimate lag, this caused a bias towards the model in the overall rankings and time-taken graph. Overall, Codex has immense capability and there is no future where this isn’t integrated into multiple products, I only hope that OpenAi do some more to make it open. Overall the competition was great, I finished in the top 2.5% of the participants and after receiving an email this morning will get my Open-Ai merch – I cannot wait.

In the meantime, if you have any questions then please add them to the comments and if you would like to contact me please do on LinkedIn.

Remember, I started 30 minutes later Screenshot 2021-08-13 at 12.12.20.png

Thanks for reading 👏👏👏, if you have any questions or would like to reach out, please do so in the comments or on LinkedIn . Please subscribe to my newsletter where you will be updated with new articles.

 
Share this