GitHub Copilot is like a junior developer

This week was released a report about the declining code quality since the introduction of AI (it can be found here). GitHub advertises Copilot with numbers like +55% faster coding, +46% code written, +75% more fulfilled, 20.000 organisations using that tool, and 92% of US-based developers in large companies using AI.

Some people started to wonder if those numbers were a good sign. It’s certainly positive to be able to write more and faster, but nobody seems to wonder about what is being written.

“Churned code” is defined as code that was uploaded and substantially changed within two weeks. Compared to the baseline of 2020 before the introduction of Copilot, churned code more than doubled moving from 3.3% to 7.1%. That goes together with the growth of repeated code and the decreasing average code age. To me, that seems what happens in a team when a new underperforming member joins.

I used Copilot to test it, and sometimes I play with a local Codellama for fun. AI is truly impressive: it can implement common functions, and explain the code clearly. However, it makes mistakes and its level is only comparable to a junior eager to prove himself.

Let me illustrate it with a practical example. I’ll run this experiment with Codellama 7B with 4-bit quantisation. It is not the smartest cookie, but I will do only trivial tasks, so it sounds fair.

Prompt: “Generate a function in Python to return the first 10 numbers of the Fibonacci sequence.”

Response:

Here is a function in Python to return the first 10 numbers of the Fibonacci sequence:

def fibonacci(n):
  if n <= 1:
    return n
  else:
    return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(10)) # prints [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

This function uses a recursive approach to calculate the Fibonacci sequence. It takes an integer `n` as input and returns a list of the first `n` numbers in the sequence. The base case is when `n` is less than or equal to 1, in which case we return the number itself. Otherwise, we calculate the previous two numbers in the sequence (using `fibonacci(n-1)` and `fibonacci(n-2)`) and add them together to get the current number in the sequence.

Note that this function will only work for small values of `n` (up to around 45), as the sequence grows very quickly and would require an enormous amount of memory to store all of its numbers.

It’s all great, however, it’s not 100% correct. When executed, that code will return “55” and not “[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]” as stated in the comment.

Another experiment. Prompt: “Explain the following python code: def fibonacci(n):\n if n <= 1:\n return n\n else:\n return fibonacci(n-1) + fibonacci(n-2)\n\nprint(fibonacci(10))”. (It’s the function generated in the previous step).

Response:

This code defines a function called `fibonacci` that takes an integer parameter `n`. The function returns the `n`-th number in the Fibonacci sequence, which is a series of numbers in which each number is the sum of the two preceding numbers (0 and 1, then 1 and 2, then 2 and 3, and so on).

Here's how the code works:

* The function starts by checking if `n` is less than or equal to 1. If it is, it returns `n`, because in this case the `fibonacci` sequence consists only of the number 0 (for `n = 0`) and the number 1 (for `n = 1`).

* If `n` is greater than 1, the function recursively calls itself with two arguments: `n-1` and `n-2`. These arguments are used to compute the previous two numbers in the Fibonacci sequence.

* The function then returns the sum of these two numbers. This is because each number in the Fibonacci sequence is the sum of the two preceding numbers.

For example, if we call the function with `n = 10`, it will return the 10th number in the Fibonacci sequence, which is equal to 55.

The `print` statement at the end of the code simply prints the result of calling the `fibonacci` function with `n = 10`.

Now, I’ll use the same prompt but I’ll rename the function because I suspect that the AI simply recognised “fibonacci”.

Prompt: “Explain the following python code: def add(n):\n if n <= 1:\n return n\n else:\n return add(n-1) + add(n-2)\n\nprint(add(10))”

Response:

This code defines a function called `add` that takes an integer parameter `n`. The function has two possible return values:

* If `n` is less than or equal to 1, the function returns `n` (i.e., it does not call itself recursively).

* If `n` is greater than 1, the function calls itself twice with arguments `n-1` and `n-2`, respectively, and then adds their return values together.

The function's base case is when `n` is less than or equal to 1, in which case it simply returns the value of `n`. The recursive case is when `n` is greater than 1, in which case it calls itself twice with arguments `n-1` and `n-2`, respectively. The function then adds the return values of these two recursive calls together to get the final result.

For example, if we call the function with the argument `10`, it will first call itself with arguments `9` and `8`, respectively, and add their return values together. Therefore, the function will return `16` when called with the argument `10`

Definitely incorrect. Where that 16 cames from?

I repeated the experiment with obfuscated code and the AI is just as effective as in the examples above: it gives an explanation that is only very close to being correct. It is certainly better than nothing in the case of obfuscated code, but for regular code it cannot be trusted.

I am sure that AI will beat humans in coding eventually, but today it’s like peer programming with a junior who is just an expert in tutorials. I will stick to classic rule-based tools like SonarQube for a bit more.