Does the fact that ChatGPT can simply generate any algorithm mean that algorithm creation won’t be a problem for coders any longer? Can a coder suffer any negative effects from using ChatGPT?

It is simple for ChatGPT to repeat changed boilerplate that is occasionally incorrect in subtle (or perhaps not so subtle) ways.

I questioned ChatGPT on how to read a line of any length from a stream in C. It generated the following:

#include <stdio.h>

#include <stdlib.h>

char *read_line(FILE *stream) {

char *line = NULL;

int line_size = 0;

int line_capacity = 0;

int c;

while ((c = fgetc(stream)) != EOF && c != ‘\n’) {

if (line_size >= line_capacity) {

line_capacity = line_capacity == 0 ? 64 : line_capacity * 2;

char *new_line = realloc(line, line_capacity);

if (new_line == NULL) {

free(line);

return NULL;

}

line = new_line;

}

line[line_size++] = (char)c;

}

if (line_size > 0 || c == ‘\n’) {

if (line_size >= line_capacity) {

line_capacity = line_capacity == 0 ? 64 : line_capacity * 2;

char *new_line = realloc(line, line_capacity);

if (new_line == NULL) {

free(line);

return NULL;

}

line = new_line;

}

line[line_size++] = ‘\0’;

}

return line;

}:

Here’s how the code works:

line is a reference to the dynamically allocated memory where the line will be stored.
The amount of characters in the line is indicated by line size.
Line capacity measures the amount of memory that is currently allocated.
The loop reads each character from the stream until it runs out of characters or hits a newline character.
The memory is reallocated to double in size if the current line size exceeds or is equal to the capacity of the allocated memory.
The line is extended by one character, and the line width is increased.
When a line or stream reaches its end while still containing characters, the null terminator is appended to the end of the line.
The function outputs a line pointer.
If there is a problem, the function returns NULL instead of a pointer to the line.

To use this function, you can pass a pointer to the stream you want to read from as an argument:

int main() {

char *line = read_line(stdin);

if (line == NULL) {

fprintf(stderr, “Error: could not read line\n”);

exit(EXIT_FAILURE);

}

printf(“Read line: %s\n”, line);

free(line);

return 0;

}

To read a line from the standard input stream, the function read line is called in this example with the argument stdin. When it is no longer required, the line is displayed to the console and the dynamically allocated memory is released using the free function.

It appears to be sensible. Not fantastic, but also not awful.

The geometric growth code may contain a signed integer overflow vulnerability. Of course, efficiency doesn’t really peak at one character at a time, does it?

I asked it if it could use fgets() to write a more effective version, and it complied. One step forward, two steps back, though:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

char *read_line(FILE *stream) {

char *line = NULL;

size_t line_capacity = 0;

ssize_t line_size = 0;

do {

line_capacity += BUFSIZ;

char *new_line = realloc(line, line_capacity);

if (new_line == NULL) {

free(line);

return NULL;

}

line = new_line;

if (fgets(line + line_size, (int)(line_capacity – line_size), stream) == NULL) {

break;

}

line_size = strlen(line);

} while (line[line_size – 1] != ‘\n’);

if (line_size == 0) {

free(line);

return NULL;

}

line[line_size – 1] = ‘\0’;

char *new_line = realloc(line, line_size);

if (new_line == NULL) {

free(line);

return NULL;

}

line = new_line;

return line;

}

A few faults:

Our nice geometric growth allocation is gone. We now have linear growth, which is O(n2).(2).
- But, we’ve upgraded to size_t, so any overflow there will be unsigned rather than UB inducing signed overflow.
When detecting how much text fgets() read in, it remeasures the entire string. Again, that’s O(n2).(2).
It weirdly keeps the string length in the non-standard ssize_t. strlen() returns size_t.
Why are we realloc’ing again at the end after we’ve successfully read the string? Just return it with a bit of extra pad at the end and avoid an extra trip through the memory allocator, an additional opportunity to fail allocation, and an extra potential memcpy().

As we are now dividing n by BUFSIZ, this effect won’t actually be felt until a moron inserts a line in your file that is roughly a megabyte in size.

But aren’t DoS attacks exactly what they live for?

When you’re through with either of these, you must free() the buffer. Your heap can become excessively fragmented as a result over time. The desire to reuse an input buffer is more frequent. No copies must be stored by you. You can just re-use the previous buffer rather than making new allocations for each line.

If you request ChatGPT to do that, it complies:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#define INITIAL_LINE_CAPACITY 128

char *read_line(FILE *stream, char *line, size_t *line_capacity) {

ssize_t line_size = 0;

do {

if (fgets(line + line_size, (int)(*line_capacity – line_size), stream) == NULL) {

break;

}

line_size = strlen(line);

if (line[line_size – 1] == ‘\n’) {

break;

}

*line_capacity *= 2;

char *new_line = realloc(line, *line_capacity);

if (new_line == NULL) {

free(line);

return NULL;

}

line = new_line;

} while (1);

if (line_size == 0) {

return NULL;

}

line[line_size – 1] = ‘\0’;

char *new_line = realloc(line, line_size);

if (new_line == NULL) {

free(line);

return NULL;

}

line = new_line;

return line;

}

And it offers the following example code:

int main() {

char *line = malloc(INITIAL_LINE_CAPACITY);

size_t line_capacity = INITIAL_LINE_CAPACITY;

while (read_line(stdin, line, &line_capacity) != NULL) {

printf(“Read line: %s\n”, line);

}

free(line);

return 0;

}

Once more, it appears like we are moving forward and then backward.

We’re back to geometric growth, which is a positive. But, that now puts this expression at risk of integer overflow: (int)(*line_capacity - line_size)
The argument line is passed by value while its companion line_size is passed by pointer.
- That requires the user to reassign line on return because realloc() can move the buffer.
- This introduces a significant opportunity to “hold it wrong” and fail to update line.
- In fact, ChatGPT itself “holds it wrong” in its own demo code.

while (read_line(stdin, line, &line_capacity) != NULL) {

ChatGPT releases the buffer but fails to zero off the line capacity if realloc() fails.

*line_capacity *= 2;

char *new_line = realloc(line, *line_capacity);

if (new_line == NULL) {

free(line);

return NULL;

}

Previous versions would bootstrap themselves from an empty allocation and zero capacity.
- This one doesn’t, requiring the caller to do that, introducing another opportunity to “hold it wrong.”
- ChatGPT “holds it wrong” subtly, failing to check that the initial malloc() succeeded
- The read_line() code itself does not check whether the incoming buffer has any room or even a non-NULL pointer. That would come naturally, though, if it could bootstrap from a NULL pointer.

This genuinely resembles putting a recent college graduate through their initial code review in the workplace. Because each new version addresses one issue while breaking another, that type of code review requires a week or two of back and forth. Alternately, a mentor could go and pair-program with the novice for a little while.

I identified a few bugs in ChatGPT’s code. How many would you have managed to catch?

What percentage of those would lazy programmers who rely on ChatGPT and other code generators manage to catch?

I compared it to Sideshow Bob in a previous answer about ChatGPT, and that’s how I feel right now.

Share this:

Related

Published by Ithakuranimesh