How to achieve efficient C programming under ARM _10 key points to give you the answer

Writing C programs in a certain way helps the C compiler generate faster execution of ARM code. Here are some key points related to performance:

1. Use signed and unsigned int for local variables, function parameters, and return values. This avoids type conversions and makes efficient use of ARM's 32-bit data manipulation instructions.

2. The most efficient loop body form is a do-while loop with counts down to zero.

3. Expand important loops to reduce the overhead of the loop.

4. Do not rely on the compiler to optimize for repeated memory accesses. Pointer aliases prevent this optimization by the compiler.

5. Limit the number of function parameters to 4 if possible. If the function parameters are stored in registers, the function call will be much faster.

6. Arrange the structures by arranging the element sizes from small to large, especially compiled in thumb mode.

7. Do not use bit fields. Instead, use masks and logical operations.

8. To avoid division, you can use reciprocal multiplication instead.

9. Avoid misaligned data. If the data is likely to be misaligned, it is accessed using the char * pointer type.

10. Using inline assembly in a C compiler can take advantage of instructions or optimizations that are not supported by the C compiler.

First, the use of data type optimization

Local variables

A char type of data takes up less register space or smaller ARM stack space than int type data. Both of these scenarios are wrong for ARM. All ARM registers are 32-bit, and all stack entries are at least 32-bit. When we execute i++, we must use i = 0, i + + = 0 after this condition can be defined as char type.

2. Function parameters

Although both wide and narrow function call rules have their advantages, function parameters and return values â€‹â€‹of type char or short will incur additional overhead, resulting in decreased performance and increased code size. So, even if you are transferring an 8-bit data, the function parameters and return values â€‹â€‹are more efficient using the int type.

to sum up:

1) For local variables stored in registers, except for 8-bit or 16-bit arithmetic modulo operations, try not to use char and short types, but use signed or unsigned int types. Using unsigned numbers for division performs faster.

2) For arrays and global variables stored in main memory, use small-sized data types as much as possible to meet the data size. This saves storage space. The ARMv4 architecture can efficiently load and store all width data, and can use an incremental array pointer to efficiently access the array. For short-type arrays, avoid using array base address offsets because the LDRH instruction does not support offset addressing.

3) When reading arrays or global variables and assigning them to different types of local variables, or when writing local variables to different types of arrays or global variables, explicit data type conversions are performed. This conversion allows the compiler to process explicitly and quickly, extending the data type with a narrower data width in memory and assigning it to the wider type in the register.

4) Since implicit or explicit data type conversions usually have extra instruction cycle overhead, they should be avoided in expressions. Load and store instructions generally do not generate additional conversion overhead because the load and store instructions automate data type conversions.

5) The use of char and short types should be avoided for function parameters and return values. Even if the parameter range is small, the int type should be used to prevent the compiler from doing unnecessary type conversions.

Second, the C cycle structure

On ARM, a loop requires only 2 instructions:

A subtraction instruction, which performs loop subtraction counting, and sets the result condition flags;

A conditional branch instruction.

The key here is that the end of the loop should be counting down to zero instead of increasing the count to a certain limit. Since the downcount structure is already stored in the condition flags, the instruction to compare with zero can be omitted. Since i is not used as an index of an array, there is no problem with downcounting.

In summary, i should be used for signed cycle counts. =0 as the end condition of the loop. For the signed number i, this is one instruction less than using the condition i"0.

to sum up:

1) Use a looping structure that counts down to zero so that the compiler does not need to allocate a register to hold the loop termination value, and instructions that compare to 0 can be omitted.

2) Using an unsigned loop counter value, the loop continuation condition is i! =0 instead of i, 0, this can guarantee that the cycle overhead is only two instructions.

3) If you know ahead of time that the loop body will be executed at least once, then using a do-while loop is better than a for loop, which saves the compiler the step of checking if the loop count is zero.

4) Expanding the important loop body can reduce the loop overhead, but do not over-stretch. If the cycle overhead is small for the entire program, loop expansion will increase the amount of code and reduce the performance of the cache.

5) Try to make the size of the array a multiple of 4 or 8, so that you can easily cycle through 2, 4, and 8 multiple choices without worrying about the remaining array elements.

Third, the register allocation

Efficient register allocation: The number of local variables used by the inner loop of the function should be limited to a maximum of 12, so that the compiler can allocate these variables to ARM registers.

Fourth, the function call

4 Register Rules: Functions with 4 or fewer parameters are much more efficient than functions with more than 4 parameters. For functions with fewer than four arguments, the compiler can pass all arguments with registers; for functions with more than four arguments, the function caller and callee must pass some parameters through the access stack.

If the function is very small and only a few registers are used, there are other ways to reduce the overhead of function calls. You can place the calling function and the called function in the same C file so that the compiler knows the code generated by the called function and uses it to optimize the calling function.

to sum up:

1) Try to limit the parameters of the function, do not exceed 4, so that the efficiency of the function call will be higher. It is also possible to organize several related parameters in a structure and replace multiple parameters with a pointer to a passing structure.

2) Put the relatively small called function and the calling function in the same source file and define it first. After the call, the compiler can optimize the function call or inline the smaller function.

3) Important functions that have a large impact on performance can be inlined using the keyword _inline.

Five, pointer alias

Definition: When 2 pointers point to the same address object, these 2 pointers are called aliases of the object. If one of the pointers is written, the reading from the other pointer will be affected. In a function, the compiler usually does not know which pointer is an alias and which is not; or which pointer has an alias and which one does not.

Avoid pointer aliasing:

1) Do not rely on the compiler to eliminate common subexpressions that contain memory access. Instead, create a new local variable to hold the value of the expression. This ensures that the expression is only evaluated once.

2) Avoid using the address of a local variable, otherwise the access to this variable will be less efficient.

Six, structure arrangements

There are two issues to consider when using structures on an ARM: the alignment of the structure's address boundaries and the total size of the structure.

The principle of obtaining a highly efficient structure:

1) Arrange all 8-bit elements in front of the structure;

2) Use this to arrange 16-bit, 32-bit, and 64-bit elements;

3) Arranging all arrays and larger elements at the end of the structure;

4) For an instruction, if the structure is too large to access all elements, the elements are organized into a substructure. The compiler can maintain pointers to individual substructures.

to sum up:

Structure elements should be arranged according to the size of the elements, starting with the smallest elements and arranging the largest elements at the end; avoid using large structures and replacing them with hierarchical small structures; for improved portability Manually add padding bits to the structure of the API, so that the structure's layout will not depend on the compiler; enumeration types must be used with care in the API structure. The size of an enum type is compiler-dependent.

Seven, the field

Precautions:

1) Avoid using bitfields, and use #define or enum to define the mask bits;

2) Use integer logic operations AND, OR, XOR operation and mask to test, negate and set the bit field. These operations are highly efficient and can also test, negate, and set multiple bit fields at the same time.

Eight, the border is not aligned data and byte arrangement (large / small end)

The two problems of misaligned data and byte alignment can complicate memory access and porting issues. We must consider whether array pointers are aligned with each other and whether the ARM configuration is a big-endian or little-endian memory system.

to sum up:

1) Try to avoid using unaligned data in the boundary;

2) Use the type char * to point to any byte boundary data. Accessing data by reading bytes, using logical operations to combine data, so that the code does not depend on the alignment of the boundary or the configuration of the ARM's byte arrangement;

3) In order to quickly access the boundary misaligned structure, different program variants can be written according to the pointer boundary and the processor byte ordering.

Nine, division

ARM hardware does not support division instructions. When a division occurs in the code, the ARM compiler calls the C library function (signed division call _rt_sdiv, unsigned call _rt_udiv) to perform the division operation. There are many different types of division programs to accommodate different divisors and dividends.

to sum up:

1) Avoid using division as much as possible. The ring buffer can be handled without division.

2) If you cannot avoid the division, then consider the benefits of using the division program to generate both the quotient n/d and the remainder n%d as much as possible.

3) For the division of the same divisor d repeatedly, s=(2k-1)/d is calculated in advance. The k-bit unsigned integer division divided by d can be replaced by a 2k-bit multiplication multiplied by s.

4) Use a power of 2 as a divisor. The compiler will automatically convert the division into a shift operation when it is done by an integer power of 2. Therefore, when writing a program algorithm, try to use a power of 2 as a divisor.

5) Find the remainder. Some typical arithmetic operations can be converted to avoid using division in the program.

Such as:

Uint counter1(uint count)

{

Return (++count%60);

}

Convert to:

Uint counter2(uint count)

{

If (++count = 60)

Count=0;

Return (count);

}

Most ARM processor hardware does not support floating-point operations. This saves space and reduces power consumption in a price-sensitive embedded application. In addition to the floating-point accumulator FPA on the hardware vector floating-point accumulators VFP and ARM7500FE, the C compiler must provide floating-point support in software.

Ten, inline functions and inline assembly

Calling functions efficiently, using inline functions can completely eliminate the overhead of function calls, and many compilers allow inline assembly in C source programs. Using built-in functions that contain assembly can enable the compiler to support ARM instructions and optimization methods that are not normally available.

The biggest benefit of inline functions and inline assembly is that you can implement operations that are often difficult to accomplish in the C language section. Using inline functions is better than using the #define macro definition because the latter does not check the types of function parameters and return values.

Spring Terminal

Spring-type terminals are new types of spring-type terminals, which have been widely used in the world's electrical and electronic engineering industries: lighting, elevator control, instrumentation, power, chemistry, and automotive power.

If the terminal block is black, one of the possibilities is not necessarily burning black, oxidation may also be black. So how to verify whether it is burnt black? The method we take is to wipe it with a finger. If it can be wiped off, like soot, it is the black substance formed by oxidation, which can only be ground off with sandpaper or a file.

Spring Terminal,Spring Push-In Terminal Block,Spring Clamp Terminal Block,Spring Terminal Block For Pcb

Sichuan Xinlian electronic science and technology Company , https://www.sztmlch.com