Designing Linker Scripts with GNU Linker
Whether you are writing a program for embedded systems or PC in general, linker scripts are
implicitly used by the linker. Default linker script generated
by various IDE tools
will take care of different memory types and placement of input sections in the
corresponding memory region
based on the selected micro-controller's memory map.
So why should we bother about writing our own? there are situations where
customizing a linker script may be the only option or it may make things easier, like:
- Memory types: Embedded systems usually have ROM, Flash and SRAM memories. These memories may further be distinguished by execute, read, write, latency and shareable attributes. A default linker script may not be able to address all possible requirements of an application.
- Programming Flash: Flash memory usually can't be programmed by the code running in the same Flash memory. If this is the case then on the fly copy of part of code region is required where virtual and load memory addresses will be different.
- Functions with fixed addresses: In some applications low-level driver functions may be part of ROM and/or Flash memory region and exported to higher level RTOS applications. These functions will have fixed addresses and these addresses needs to be preserved across firmware upgrades.
Useful terms and definitions
Object files, Executable and output Binary
C and assembly source files are compiled into object files by the compiler and these become input to the linker along with any additional libraries. With linker script we can fine control how sections from input object files are placed in output object or executable file and in which memory regions. From the executable file, final binary can be extracted in various formats like hex, mot, plain binary, etc. using objcopy utility.
Section
Sections contain code, data or any other information inside object files.
Each section has a name, content and size. Standard naming convention
is to store executable instructions code in .text section, initialized global variables in .data section,
uninitialized global variables in .bss section, vector table in .vectors section, you
may always find variation in these conventions, the constraints on section name depends on the object file format
used. Each section also has two
addresses: Virtual Memory Address (VMA) and Load Memory Address (LMA).
Sections are described using
SECTIONS and section-output command, this command controls location, type, alignment, order of placement, target
memory region, program headers and fill pattern of
output sections, most of these parameters are optional, as simple form is as shown:
|
|
Linker script can have at most one SECTIONS command and as many output section definitions as required. Input sections can be described as object file name followed by an optional list of section names in parentheses separated by spaces. Input sections can also be described with wild card '*' to match all input object files on command line. Linker script symbols are accessible from C and vice versa, its naming convention should follow C language symbol naming rules. dot '.' is linker variable containing current location counter or VMA. Symbol expressions must be terminated by a semicolon. Output section would be placed in FLASH memory region. Linker will calculate the starting VMA address from the end of previous section, if there is no previous section then from the beginning of FLASH memory region, linker will also take into account of any section alignment requirement. As LMA is not provided in the above example, LMA would be equal to VMA.
VMA
Virtual Memory Address of a section is the running address of the program. For example, let VMA of function foo() inside .text section be 0x1000, so when foo() is called, program counter would jump to address 0x1000 where it will execute the first instruction of foo().
LMA
Load Memory Address of a section is the address in memory where it will be loaded or programmed
or flashed. Continuing with the above example, let the LMA of .text section be 0x200 and the LMA
of function foo() inside it be 0x240, before this function can be called, some
other piece of the program
need to copy the code of foo() from 0x240 to 0x1000.
If LMA of a section is equal to its VMA then it means the memory region where the
section is loaded is both readable and executable as in ROM and on-chip Flash memories.
Symbols
Every defined symbol in C program has a name, address and an allocated memory
to hold the corresponding value. Symbols defined in a C source code can be accessed in
linker script and vice versa. Symbols defined in a linker script only has name,
address and no memory is allocated to hold any value, so only address information can be
obtained from C source code. Attempting to
dereference a linker script symbol in C code may lead to incorrect results.
Symbols are stored in symbol table inside object files. You can
use objdump -t object_file to view symbol table.
Memory Layout
MEMORY command is used to specify size, location and attributes of all memory blocks available in the target. Each memory block also has read-only ('r'), read/write ('w') and executable ('x') attributes. ORIGIN and LENGTH keywords are used to specify start address and length of the memory region respectively, these can be expression and should get evaluate to constant. Suffixes 'K' and 'M' with numeric constant can be used to specify Kilobyte and Megabyte respectively. While designing linker scripts it is good practice to specify output section descriptions in a target independent way and another target dependent part that describes mapping of output sections to required memory region, this is achieved by REGION_ALIAS function. First parameter is an alias name to a memory region. Just by specifying a different set of memory aliases we can completely change memory mappings of output sections without modifying section description commands, as shown below.
|
|
Placing Sections
Listing 3, shows a sample linker script which describes memory layout and section placements which are independent of target memory layout. It is very generic and it is designed for bare metal program i.e. we can use it without including any standard library, which also means we will have to write our own code to initialize .data and .bss sections. You can download the sample source codes to see how to write simple bare metal program with linker script. You can compile the source code for arm cortex M0 or M3 and simulate it on GDB to better understand the program flow. To make things easier there is no assembly startup code (it is not required), vector table is written in C and GNU GCC's attribute feature is not used to define any sections.Now, let us look into the script in some more detail:
- ENTRY point command points the linker to the first instruction in the program, from the entry point linker can walk through the code graph and figure out unreferenced code to discard them from the output object file. KEEP keyword is used to retain unreferenced sections that will get discarded by the linker, as in the example, vector_table.o (.rodata) section won't be referenced in code however it contains addresses of interrupt service handlers to be externally referenced by the CPU.
- Notice that for .data section both >mem_region and AT>lma_region options are specified as we want to store the initialized data after the end of .text section and use to it initialize the SRAM addresses in runtime before calling application's main() function.
- PROVIDE keyword is used to define symbol in linker script only when it is referenced and not defined in any other linked library or object files.
- Uninitialized non-static global variables may get stored in COMMON section, hence it is included in output .bss section.
|
|
Overlaying Sections
Section overlaying may be used in applications where code is stored in an external non executable memory or there is a fast SRAM, in either case an overlay manager is required to copy sections into and out of overlay memory region during runtime. OVERLAY command is used inside the SECTIONS command and its syntax is similar to output section description command. Sections inside the OVERLAY command will have same virtual memory addresses and consecutive load memory addresses. Specifying overlay start address is optional, if not provided start address defaults to current location counter. In listing 4, it is assumed that some sort of overlay manager is in ROM and application code is overlayed into OVERLAY_REGION, an alias. Dummy SRAM2 region is defined to load overlay sections but in actual application it would be either a on-chip or off-chip non-volatile memory. With NOCROSSREFS keyword, linker would report an error if there are any symbol references across sections with same VMA. The linker automatically provides two symbols pointing to load addresses of each section within an OVERLAY command: __load_start_section_name and __load_stop_section_name.
|
|
When customizing linker script, it is always a good practice to generate linker map file to cross
verify section placement in the
output ELF file.
There are many more optional keywords and commands supported in linker scripts to meet
all sort of application requirements, please
refer to the reference manual of GNU linker.