One of the main differences between C and Java is the capability of C to perform low-level memory operations. This allows for great power and flexibility but is also a constant source of bugs. In that sense, Java is considered a safer programming language.
This post presents some examples in C that would not happen with Java.
The following examples have been compiled and run in a Linux Virtual Machine:
fjab@fjab-VirtualBox:~/newc$ uname -sr Linux 5.8.0-44-generic fjab@fjab-VirtualBox:~/newc$ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Variable overwriting in memory
The following snippet defines two global variables var1
and var2
int var1 = 1; int var2 = 2; int main(int argc, char const *argv[]) { return 0; }
Initialised global variables are stored in the data
segment. The corresponding little-endian hexadecimal representation of the 4-byte var1
and var2
is 0x01000000
and 0x02000000
, respectively. As can be seen, these variables are stored in the consecutive memory addresses 0x4010
and 0x4014
fjab@fjab-VirtualBox:~/newc$ gcc overwrite_var.c -o overwrite_var fjab@fjab-VirtualBox:~/newc$ objdump -s -j .data overwrite_var overwrite_var: file format elf64-x86-64 Contents of section .data: 4000 00000000 00000000 08400000 00000000 .........@...... 4010 01000000 02000000 ........
With this knowledge, it’s easy to get a pointer to var1
and use it to overwrite the content of the memory address corresponding to var2
#include <stdio.h> int var1 = 1; int var2 = 2; int main(int argc, char const *argv[]) { printf("var1=%d\n", var1); printf("var2=%d\n", var2); int *ptr = &var1; *(ptr+=1) = 3; printf("var2=%d\n", var2); return 0; }
fjab@fjab-VirtualBox:~/newc$ ./overwrite_var var1=1 var2=2 var2=3
Big variables in the stack
By default, local variables in C are defined in the stack
segment. The following example, defines an array of 3 million integers:
#include <stdio.h> #define SIZE 3000000 int main(int argc, char const *argv[]) { int arr[SIZE] = {1}; long sum = 0; for (size_t i = 0; i < SIZE; i++) { sum += arr[i]; } printf("sum=%ld\n", sum); return 0; }
When running the program, a segmentation error happens as the memory available in the stack
segment is not enough to allocate the array: 3,000,000 * 4 = 12,000Kb > 8,192Kb
fjab@fjab-VirtualBox:~/newc$ gcc big_stack.c -o big_stack fjab@fjab-VirtualBox:~/newc$ ./big_stack Segmentation fault (core dumped) fjab@fjab-VirtualBox:~/newc$ ulimit -s 8192
This would not happen with Java as all objects, and in particular the arrays, are stored in the heap
.
Big variables, big files
So what if the array is defined as a global variable instead?
#include <stdio.h> #define SIZE 3000000 int arr[SIZE] = {1}; int main(int argc, char const *argv[]) { long sum = 0; for (size_t i = 0; i < SIZE; i++) { sum += arr[i]; } printf("sum=%ld\n", sum); return 0; }
This time the application runs successfully but something else happens: the size of the executable has grown to 12Mb
! The reason being that the content of the data
segment is stored in the executable object
fjab@fjab-VirtualBox:~/newc$ gcc big_data.c -o big_data fjab@fjab-VirtualBox:~/newc$ ./big_data sum=1 fjab@fjab-VirtualBox:~/newc$ ll big_data -rwxrwxr-x 1 fjab fjab 12016744 Mar 25 22:42 big_data* fjab@fjab-VirtualBox:~/newc$ size big_data text data bss dec hex filename 1644 12000616 8 12002268 b723dc big_data
This would not happen with Java either.
More big variables
We can try something else: declaring the array as a global variable without initialisation.
#include <stdio.h> #include <unistd.h> #define SIZE 3000000 int arr[SIZE]; int main(int argc, char const *argv[]) { pid_t pid = getpid(); printf("pid=%d\n", pid); long sum = 0; arr[0] = 1; for (size_t i = 0; i < SIZE; i++) { sum += arr[i]; } printf("sum=%ld\n", sum); return 0; }
In this case, the variable is stored in the bss
segment but the size of the file does not increase. This happens because it is not necessary to allocate space for the uninitialised data in the executable but just record the size required.
fjab@fjab-VirtualBox:~/newc$ gcc big_bss.c -o big_bss fjab@fjab-VirtualBox:~/newc$ ./big_bss sum=1 fjab@fjab-VirtualBox:~/newc$ ll big_bss -rwxrwxr-x 1 fjab fjab 16728 Mar 25 22:51 big_bss* fjab@fjab-VirtualBox:~/newc$ size big_bss text data bss dec hex filename 1660 600 12000032 12002292 b723f4 big_bss
In theory, the size of the array will materialise when running the program. To test this statement, let’s run the program and attach a debugger (gdb
) to stop the application and examine the memory of the process
fjab@fjab-VirtualBox:~/newc$ gcc -g big_bss.c -o big_bss fjab@fjab-VirtualBox:~/newc$ gdb big_bss GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from big_bss... (gdb) b main Breakpoint 1 at 0x1169: file big_bss.c, line 8. (gdb) r Starting program: /home/fjab/newc/big_bss Breakpoint 1, main (argc=21845, argv=0x7ffff7fb4fc8 <__exit_funcs_lock>) at big_bss.c:8 8 { (gdb) n 9 pid_t pid = getpid(); (gdb) 10 printf("pid=%d\n", pid); (gdb) pid=7032 12 long sum = 0; (gdb)
The program prints out the pid
of the process, in this case 7032
, that can be used to get the total amount of memory of the process, ~14Mb
, that is in line with the expected 12Mb
corresponding to the array.
fjab@fjab-VirtualBox:~/newc$ sudo pmap 7032 | tail -n 1 total 14216K