I don't understand why peephole optimization is needed? Because the compiler is smart enough to optimise the code? Can you please give me some examples where peephole optimization is needed?
Why peephole optimization is done on assembly code but not on IR code?
309 Views Asked by Tauro At
1
There are 1 best solutions below
Related Questions in ASSEMBLY
- (x64 Nasm) Writeline function on Linux
- Is the compiler Xcode uses to produce Assembly code a bad compiler?
- Why do we need AX instead of MOV DS, data directly with a segment?
- Bootloader in Assembly with Linux kernel
- How should the byte sequence 0x40 0x55 be interpreted by an x86-64 emulator?
- C++ code into assembly
- Drawing circles of increasing radius
- Assembly print on screen using pop ecx
- Equivalent to asm volatile in Gfortran?
- Show 640x480 BMP image with inline ASM c++
- Keep track of numbers entered in by a user in assembly
- 8086 Assembly Arrays with I/O
- DB ASM variable in Inline ASM C++
- What does Jump to means in callgrind?
- How to convert binary into decimal in assembly x8086?
Related Questions in COMPILER-CONSTRUCTION
- Is the compiler Xcode uses to produce Assembly code a bad compiler?
- How do compilers store hundreds of variables in only a few registers?
- Where to patch back the information gathered during program analysis
- Assignment Insertion in ROSE compiler after AssignOp
- memory layout of a multiple-inherited object in C++
- How to use my written compiler to read files on web?
- a LEX program to identify keywords and convert it into uppercase
- Identifier terminal except certain keywords
- Calling Scala compiler's AST from Java
- Computing the FOLLOW() set of a grammar
- JavaCC and Unicode issue. Why \u696d cannot be managed in JavaCC although it belong to the range "\u4e00"-"\u9fff"
- Three-address code and symbol tables
- Delegate caching behavior changes in Roslyn
- Get delimiter in Irony
- Compiler Errors including initializer before '<' token
Related Questions in COMPILER-OPTIMIZATION
- Are the conditional statements if(true) and if(false) evaluated at compile time in java?
- How can I compile *without* various instruction sets enabled?
- Does java cache array length calculation in loops
- Is an attempt to modify a const_cast-ed, but dynamically allocated constant object still undefined behavior?
- Equivalents to gcc/clang's march=native in other compilers?
- When can/will a function be inlined in C++? Can inline behavior be forced?
- GCC optimization for CPU and MEMORY usage
- Output (at most) 4 vector Elements in a Row
- Can storage for references inside a C++ class be optimized away?
- What could be the cause of this if condition getting skipped even the condition inside is True?
- Are compilers getting better at optimizing code over time, and if so at what rate?
- iOS app crashing only when -Os compilator flag is being used (release build)
- Will my compiler ignore useless code?
- Why optimization flag (-O3) doesn't speed up quadruple precision calculations?
- parameter to the java compiler of Eclipse
Related Questions in PEEPHOLE-OPTIMIZATION
- Why peephole optimization is done on assembly code but not on IR code?
- Reduce assembly number of instructions
- peephole optimization patterns
- What prevents the compiler do a peephole optimization on expression templates?
- java peephole optimization beginner compilers
- Nicer way to pattern match window of assembly instructions for peephole w/ Rust?
- Difference between Peephole and Peephole 2 in GCC
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Peepholes are often target-specific.
They may only make sense in terms of target registers (RTL), not IR.
For example e.g. x86
xor eax, eaxinstead ofmov eax,0. (What is the best way to set a register to zero in x86 assembly: xor, mov or and?). There'd be no reason to do this in IR and doing it any earlier than the last moment (final code-generation) would obfuscate the fact that the value is zero for other optimizations. Doing that for any machine except x86 would be an anti-optimization (creating a false dependency). OTOH you don't want to leave it too late, or else you might not be able to reorder it ahead of something that sets FLAGS, e.g.Instead of
or
Or as another example, x86 can multiply by 3, 5, or 9 using LEA to take advantage of the 2-bit shift and add in 2-register addressing-modes. It might be useful for an optimizer to know that this is an efficient building-block, and aim to re-factor things into a multiply by 9, but actually converting a multiply by 10 into
(x * 5) * 2is not how you'd want to do it for targets where(x<<3) + (x<<1)is more efficient (x*10 = x*8 + x*2).See
imulvs. 2xleaand how modern CPUs with fastimulmake it only worth it to spend at most 2 instructions replacing a multiply, or only 1 if the bottleneck is throughput not latency. Unless you can fold an addition into it like LEA can...