29 November, 2014

GCC(GNU compiler collection)

A Brief History and Introduction to GCC

The original GNU C Compiler (GCC) is developed by Richard Stallman, the founder of the GNU Project. Richard Stallman founded the GNU project in 1984 to create a complete Unix-like operating system as free software, to promote freedom and cooperation among computer users and programmers.
GCC, formerly for "GNU C Compiler", has grown over times to support many languages such as C++, Objective-C, Java, Fortran and Ada. It is now referred to as "GNU Compiler Collection". The mother site for GCC is http://gcc.gnu.org/.
GCC is a key component of "GNU Toolchain", for developing applications, as well as operating systems. The GNU Toolchain includes:
  1. GNU Compiler Collection (GCC): a compiler suit that supports many languages, such as C/C++, Objective-C and Java.
  2. GNU Make: an automation tool for compiling and building applications.
  3. GNU Binutils: a suit of binary utility tools, including linker and assembler.
  4. GNU Debugger (GDB).
  5. GNU Autotools: A build system including Autoconf, Autoheader, Automake and Libtool.
  6. GNU Bison: a parser generator (similar to lex and yacc).
GCC is portable and run in many operating platforms. GCC (and GNU Toolchain) is currently available on all Unixes. They are also ported to Windows by MinGW and Cygwin. GCC is also a cross-compiler, for producing executables on different platform.
The various GCC versions are:
  • In 1987, the first version of GCC was released.
  • In 1992, GCC version 2 was released which supports C++.
  • In 2001, GCC version 3 was released incorporating ECGS (Experimental GNU Compiler System), with improve optimization.
  • In 2005, GCC version 4 was released. As of July 2012, the latest release of GCC is 4.7.4.

1  Installing GCC


GCC (GNU Toolchain) is included in all Unixes. For Windows, you could either install MinGW GCC or Cygwin GCC. For instructions on how to install Cygwin GCC, you can refer How to Install Cygwin.
MinGW GCC
MinGW (Minimalist GNU for Windows) is a port of the GNU Compiler Collection (GCC) and GNU Binutils for use in Windows. It also included MSYS (Minimal System), which is basically a Bourne shell (bash).
Cygwin GCC
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin is huge and includes most of the Unix tools and utilities. It also included the commonly-used Bash shell.
Two versions of GCC are installed, identified via gcc-3.exe and gcc-4.exe (and g++-3.exe and g++-4.exe). It also provides symlinks gcc.exe and g++.exe, which are linked to gcc-4.exe and g++-4.exe, respectively.
Versions
You could display the version of GCC via --version option:
// Cygwin in bash shell
$ gcc --version
gcc (GCC) 4.5.3
 
$ gcc-3 --version
gcc-3 (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
 
// MinGW in CMD shell
> gcc --version
gcc (GCC) 4.6.2
 
> g++ --version
gcc (GCC) 4.6.2

More details can be obtained via -v option, for example,

> gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=d:/mingw/bin/../libexec/gcc/mingw32/4.6.2/lto-wrapper.exe
Target: mingw32
Configured with: ../gcc-4.6.2/configure --enable-languages=c,c++,ada,fortran,objc,obj-c++ 
  --disable-sjlj-exceptions --with-dwarf2 --enable-shared --enable-libgomp 
  --disable-win32-registry --enable-libstdcxx-debug --enable-version-specific-runtime-libs 
  --build=mingw32 --prefix=/mingw
Thread model: win32
gcc version 4.6.2 (GCC)
Help
You can get the help manual via the --help option. For example,
> gcc --help
Man Pages
You can read the GCC manual pages (or man pages) via the man utility:
> man gcc// or
> man g++// Press space key for next page, or 'q' to quit.
Reading man pages under CMD or Bash shell can be difficult. You could generate a text file via:
> man gcc | col -b > gcc.txt
The col utility is needed to strip the backspace. (For Cygwin, it is available in "Utils", "util-linux" package.)
Alternatively, you could look for an online man pages, e.g., http://linux.die.net/man/1/gcc.
For MinGW, the GCC man pages are kept in "share\man\man1\gcc.1". For Cygwin, it is kept under "usr\share\man\man1".

2  Getting Started

The GNU C and C++ compiler are gcc and g++, respectively.
Compile/Link a Simple C Program - hello.c
Below is the Hello-world C program hello.c:
1
2
3
4
5
6
7
// hello.c
#include <stdio.h>
 
int main() {
    printf("Hello, world!\n");
    return 0;
}
To compile the hello.c:
> gcc hello.c
  // Compile and link source file hello.c into executable a.exe
The default output executable is called "a.exe".
To run the program:
// Under CMD Shell
> a// Under Bash or Bourne Shell - include the current path (./)
$ ./a
NOTES (for Bash Shell, Bourne Shell and Unixes):
  • In Bash or Bourne shell, the default PATH does not include the current working directory. Hence, you may need to include the current path (./) in the command. (Windows include the current directory in the PATH automatically; whereas Unixes do not - you need to include the current directory explicitly in the PATH.)
  • You may need to include the file extension, i.e., "./a.exe".
  • In some Unixes, the output file could be "a.out" or simply "a". Furthermore, you may need to assign executable file-mode (x) to the executable file "a.out", via command "chmod a+x filename" (add executable file-mode "+x" to all users "a+x").
To specify the output filename, use -o option:
> gcc -o hello.exe hello.c
  // Compile and link source file hello.c into executable hello.exe
> hello
  // Execute hello.exe under CMD shell
$ ./hello
  // Execute hello.exe under Bash or Bourne shell, specifying the current path (./)
NOTE for Unixes: In Unixes, you may omit the .exe file extension, and simply name the output executable as hello. You need to assign executable file mode via command "chmod a+x hello".
Compile/Link a Simple C++ Program - hello.cpp
1
2
3
4
5
6
7
8
// hello.cpp
#include <iostream>
using namespace std;
 
int main() {
   cout << "Hello, world!" << endl;
   return 0;
}
You need to use g++ to compile C++ program, as follows. We use the -o option to specify the output file name.
> g++ -o hello.exe hello.cpp
   // Compile and link source hello.cpp into executable hello.exe
> hello
   // Execute under CMD shell
$ ./hello
   // Execute under Bash or Bourne shell, specifying the current path (./)
More GCC Compiler Options
A few commonly-used GCC compiler options are:
$ g++ -Wall -g -o Hello.exe Hello.cpp
  • -o: specifies the output executable filename.
  • -Wall: prints "allwarning messages.
  • -g: generates additional symbolic debuggging information for use with gdb debugger.
Compile and Link Separately
The above command compile the source file into object file and link with other object files (system library) into executable in one step. You may separate compile and link in two steps as follows:
// Compile-only with -c option
> g++ -c -Wall -g Hello.cpp
// Link object file(s) into an executable
> g++ -g -o Hello.exe Hello.o
The options are:
  • -ccompile into object file "Hello.o". By default, the object file has the same name as the source file with extension of ".o" (there is no need to specify -o option). No linking with other object file or library.
  • Linking is performed when the input file are object files ".o" (instead of source file ".cpp" or ".c"). GCC uses a separate linker program (called ld.exe) to perform the linking.
Compile and Link Multiple Source Files
Suppose that your program has two source files: file1.cppfile2.cpp. You could compile all of them in a single command:
> g++ -o myprog.exe file1.cpp file2.cpp 
However, we usually compile each of the source files separately into object file, and link them together in the later stage. In this case, changes in one file does not require re-compilation of the other files.
> g++ -c file1.cpp
> g++ -c file2.cpp
> g++ -o myprog.exe file1.o file2.o
Compile into a Shared Library
To compile and link C/C++ program into a shared library (".dll" in Windows, ".so" in Unixes), use -shared option. Read "Java Native Interface" for example.

3  GCC Compilation Process


GCC compiles a C/C++ program into executable in 4 steps as shown in the above diagram. For example, a "gcc -o hello.exe hello.c" is carried out as follows:
  1. Pre-processing: via the GNU C Preprocessor (cpp.exe), which includes the headers (#include) and expands the macros (#define).
    > cpp hello.c > hello.i
    The resultant intermediate file "hello.i" contains the expanded source code.
  2. Compilation: The compiler compiles the pre-processed source code into assembly code for a specific processor.
    > gcc -S hello.i
    The -S option specifies to produce assembly code, instead of object code. The resultant assembly file is "hello.s".
  3. Assembly: The assembler (as.exe) converts the assembly code into machine code in the object file "hello.o".
    > as -o hello.o hello.s
  4. Linker: Finally, the linker (ld.exe) links the object code with the library code to produce an executable file "hello.exe".
    > ld -o hello.exe hello.o ...libraries...
Verbose Mode (-v)
You can see the detailed compilation process by enabling -v (verbose) option. For example,
> gcc -v hello.c -o hello.exe
Defining Macro (-D)
You can use the -Dname option to define a macro, or -Dname=value to define a macro with a value. The value should be enclosed in double quotes if it contains spaces.

4  Headers (.h), Static Libraries (.lib.a) and Shared Library (.dll.so)

Static Library vs. Shared Library
A library is a collection of pre-compiled object files that can be linked into your programs via the linker. Examples are the system functions such as printf() and sqrt().
There are two types of external libraries: static library and shared library.
  1. A static library has file extension of ".a" (archive file) in Unixes or ".lib" (library) in Windows. When your program is linked against a static library, the machine code of external functions used in your program is copied into the executable. A static library can be created via the archive program "ar.exe".
  2. A shared library has file extension of ".so" (shared objects) in Unixes or ".dll" (dynamic link library) in Windows. When your program is linked against a shared library, only a small table is created in the executable. Before the executable starts running, the operating system loads the machine code needed for the external functions - a process known as dynamic linking. Dynamic linking makes executable files smaller and saves disk space, because one copy of a library can be shared between multiple programs. Furthermore, most operating systems allows one copy of a shared library in memory to be used by all running programs, thus, saving memory. The shared library codes can be upgraded without the need to recompile your program.
Because of the advantage of dynamic linking, GCC, by default, links to the shared library if it is available.
You can list the contents of a library via "nm filename".
Searching for Header Files and Libraries (-I-L and -l)
When compiling the program, the compiler needs the header files to compile the source codes; the linker needs the libraries to resolve external references from other object files or libraries. The compiler and linker will not find the headers/libraries unless you set the appropriate options, which is not obvious for first-time user.
For each of the headers used in your source (via #include directives), the compiler searches the so-called include-paths for these headers. The include-paths are specified via -Idir option (or environment variable CPATH). Since the header's filename is known (e.g., iostream.hstdio.h), the compiler only needs the directories.
The linker searches the so-called library-paths for libraries needed to link the program into an executable. The library-path is specified via -Ldir option (uppercase 'L' followed by the directory path) (or environment variable LIBRARY_PATH). In addition, you also have to specify the library name. In Unixes, the library libxxx.a is specified via -lxxx option (lowercase letter 'l', without the prefix "lib" and".a" extension). In Windows, provide the full name such as -lxxx.lib. The linker needs to know both the directories as well as the library names. Hence, two options need to be specified.
Default Include-paths, Library-paths and Libraries
Try list the default include-paths in your system used by the "GNU C Preprocessor" via "cpp -v":
> cpp -v
......
#include "..." search starts here:
#include <...> search starts here:
 d:\mingw\bin\../lib/gcc/mingw32/4.6.2/include             // d:\mingw\lib\gcc\mingw32\4.6.2\include
 d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include // d:\mingw\include
 d:\mingw\bin\../lib/gcc/mingw32/4.6.2/include-fixed       // d:\mingw\lib\gcc\mingw32\4.6.2\include-fixed
Try running the compilation in verbose mode (-v) to study the library-paths (-L) and libraries (-l) used in your system:
> gcc -v -o hello.exe hello.c
......
-Ld:/mingw/bin/../lib/gcc/mingw32/4.6.2                         // d:\mingw\lib\gcc\mingw32\4.6.2
-Ld:/mingw/bin/../lib/gcc                                       // d:\mingw\lib\gcc
-Ld:/mingw/bin/../lib/gcc/mingw32/4.6.2/../../../../mingw32/lib // d:\mingw\mingw32\lib
-Ld:/mingw/bin/../lib/gcc/mingw32/4.6.2/../../..                // d:\mingw\lib
-lmingw32     // libmingw32.a
-lgcc_eh      // libgcc_eh.a
-lgcc         // libgcc.a
-lmoldname
-lmingwex
-lmsvcrt
-ladvapi32
-lshell32
-luser32 
-lkernel32

Eclipse CDT: In Eclipse CDT, you can set the include paths, library paths and libraries by right-click on the project ⇒ Properties ⇒ C/C++ General ⇒ Paths and Symbols ⇒ Under tabs "Includes", "Library Paths" and "Libraries". The settings are applicable to the selected project only.

5  GCC Environment Variables

GCC uses the following environment variables:
  • PATH: For searching the executables and run-time shared libraries (.dll.so).
  • CPATH: For searching the include-paths for headers. It is searched after paths specified in -I<dir> options. C_INCLUDE_PATH and CPLUS_INCLUDE_PATH can be used to specify C and C++ headers if the particular language was indicated in pre-processing.
  • LIBRARY_PATH: For searching library-paths for link libraries. It is searched after paths specified in -L<dir> options.

6  Utilities for Examining the Compiled Files

For all the GNU utilities, you can use "command --help" to list the help menu; or "man command" to display the man pages.
"file" Utility - Determine File Type
The utility "file" can be used to display the type of object files and executable files. For example,
> gcc -c hello.c
> gcc -o hello.exe hello.o
 
> file hello.o
hello.o: 80386 COFF executable not stripped - version 30821
 
> file hello.exe
hello.exe: PE32 executable (console) Intel 80386, for MS Windows
"nm" Utility - List Symbol Table of Object Files
The utility "nm" lists symbol table of object files. For example,
> nm hello.o
00000000 b .bss
00000000 d .data
00000000 r .eh_frame
00000000 r .rdata
00000000 t .text
         U ___main
00000000 T _main
         U _printf
         U _puts
 
> nm hello.exe | grep printf
00406120 I __imp__printf
0040612c I __imp__vfprintf
00401b28 T _printf
00401b38 T _vfprintf
"nm" is commonly-used to check if a particular function is defined in an object file. A 'T' in the second column indicates a function that is defined, while a 'U' indicates a function which is undefined and should be resolved by the linker.
"ldd" Utility - List Dynamic-Link Libraries


Phew! I hope it helped you to at least some extent and do subscribe if you like what you read.

The next part of this tutorial is

GNU Make: A brief introduction

Till then, goodbye and happy coding!






No comments:

Post a Comment