Saturday, 30 August 2014

Has the field of Computer Architecture really reached its peak?

I am no expert in the field of Computer Architecture, but just a student learning about it. On referring to some books and websites, I realized some of the historical advancements in this field and why these advancements aren't able to take the field forward:-

1. Dependencies:- In the golden era of Computer Architecture, the researchers did a great job in looking at the new techniques to speed up the processor. One of these was to solve the problem of data-dependencies by a variety of techniques that included data-forwarding, out-of-order execution, etc.
However, adding new functional units does not increase program performance anymore because dependencies between instructions prevent the hardware to implement their parallel execution.

2. Power Issues:- According to Moore's law, there has been an increase in the number of transistors on a chip. However, this has also increased the power consumption and the heat production.

3. High memory access-times:- Memory access time has not increased by similar amounts as the clock period which has led to load and store instructions taking 100s of cycles to execute.

4. Chip Bandwidth:- There has not been much development in increasing the number of pins that can be used on a processor; thus, limiting the bandwidth between memory and CPU.

All these above reasons have led the researchers looking at new techniques to use multiple cores on a single processor die.

Thursday, 28 August 2014

Some GCC optimizations

When I first started using GCC, I was impressed by the sheer intelligence of its optimization options (O, O1,O2,O3).  Take the following code as an example:-



The above code, when compiled with gcc with no optimizations takes about 30 ms to execute. However, when compiled with optimization '-O3', the execution takes 0 ms. 

Why, you ask?

This is because the compiler takes the value 1,000,000 and moves it to a register without executing the loop a million times. 
SOO COOOL!

Henceforth, I started looking at the types of optimization options the gcc compiler provides us with on https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. I have tried to explain a few below:-

1. 'fno-unwind-tables' :- 

Unwind tables are tables in an elf file consisting of information to unwind the call stack in the event of an exception. 
This optimization flag helps reduce the code size by deleting the unwind tables from the elf file. However, deleting such a section can have its own repercussions, such as backtrace() stops working etc.

2. 'fomit-frame-pointer' :-

This optimization flag helps reduce the code execution time by removing the idea of a frame pointer and using the FP register as just another register.
This flag was the one that most interested me as it questioned the idea of a frame pointer that I learnt at school. As far as I can recall, the FP is used for the following 2 functions:-

1. When exiting from a function, it is the value in the FP that is moved to the SP
2. The local variables are always referenced using the FP in assembly code. 

Now, on the otherhand, it is the SP itself that is used to reference the local variables in assembly code. That surely must be a job difficult to implement in the compiler. 

3.  'fprofile-arcs' :-
4. 'floop-optimize' :-


Monday, 18 August 2014

About Unwind tables

Before we start talking about what the contents of an unwind-table are, here are some of the questions I had when I first started investigating on unwind tables.

What are unwind-tables?

Unwind Tables are tables that are prepared by the compilers, consisting of information on how the call stack should be unwounded. (Unwounding refers to the removal of the stack frames on the call stack)

When are unwind tables used?
  • To unwind the call stack in the event of an exception
  • To unwind the call stack when using backtrace()
Where are the unwind-tables stored?

On the Intel machine, the unwind-tables are stored in the '.eh_frame' section of the ELF file, whereas on the ARM machine, the unwind-tables are stored in the '.EXIDX' and '.EXTAB' sections of the ELF file.

Now that we know a bit about unwind tables, let us take an example of what exactly is contained in the sections mentioned above.

P.S :- The example shown below is that for a virtual Linux environment on an ARM machine. I have used the arm-gcc compiler I installed using the instructions at http://www.cnx-software.com/2011/03/28/installing-linaro-arm-cross-toolchain-on-ubuntu/
I have not yet been able to figure the format of unwind tables on an Intel machine but it is something that I intend to do in the near future.

Moving on, Let's see what the unwind-table looks like for the C code below:


To compile and then disassemble the object file, run the commands below:


A part of the assembly code for the above C code is:



There are 2 steps involved in unwinding the call stack, i.e:-

1. Adding to the stack pointer the amount it was decreased by(to make space for the local variables)
2. Popping from the stack FP(Frame Pointer) and LR(the link register / R14) and putting LR's value into PC(the program counter)

Now let's compare the above 2 steps with what actually is in the unwind-tables by the following command:



By running the above, I got:



The above seems to be unwinding the stack exactly how we imagined it to be if, the following pointers are kept in mind:-

1. finish - the finish instruction puts the value in r14 into PC
2. VSP - The VSP stands for the Virtual Stack Pointer, which is essentially a copy of the actual Stack Pointer.

To understand more about the output, I would suggest reading the Documentation which explains in a whole lot detail the exact format of the uwind information.