Are you tired of hakcing? take some rest here.
Just help me out with my small experiment regarding memcpy performance.
after that, flag is yours.
The full code is here(Link). To summarize, after entering 10 numbers to allocate memory, it calls the slow_memcpy() and fast_memcpy() functions for each entered number to compare copy time. And at the end of the code, there is the flag string.
The problem is that it doesn't run to the end of the code because of an error. As the result of debug, the error is occured by "moventps" of the above image.
"moventps" requires operands aligned on 16 bytes. However, the address of "dest" may not have been aligned to 16 bytes, depending on the size entered. This means that the address of each operand should be 0x ~~ 10, 20, 30 etc.
□ gcc -o memcpy memcpy.c -m32 -lm
□ __asm__ : This indicates that the next one is an inline assembly.
* -32 : Compile with 32 bit. It requires the "gcc-multilib" and "libc6-dev-i386" packages.
* -lm : Compile including math library.
□ __volatile__ : The compiler leaves the source code as entered by the programmer. It doesn't optimize the source code, so there is no bug caused by optimization(Remove variable etc.).
□ movntps : Storing packed single-precision floating-point values using non-temporal hint
* Single-precision 단정도 : Computing by basic length of a computer operation(=Word, 4 Bytes) ↔ n-precision n배정도 : computing by n times length(double, quadruple etc.).
* Floating-point 부동 소수점 : Expressing a value as significand*base^exponent 가수*밑^지수
* Non-temporal hint : Using a Write Combining
* The memory operand must be aligned on a 16-byte (128-bit version) or 32-byte (VEX.256 encoded version) boundary otherwise a general-protection exception (#GP) will be generated. The reference is here(Link)
□ movdqa : Moving aligned double quadword
* When the source or destination operand is a memory operand, the operand must be aligned on a 32-byte boundary or a general-protection exception (#GP) will be generated. The reference is here(Link).
□ 16(%1) : Address+16 of %1(=dest)
I downloaded full code and added the above code to check the "dest" address.
I found following rules in 32~64 section.
- Inputting 37~44 → malloc allocates 48 Bytes.
- Inputting 45~52 → malloc allocates 56 Bytes.
- Inputting 53~60 → malloc allocates 64 Bytes.
malloc allocates the size of the "input value + 4" in multiples of 8 bytes.
Because the address of "dest" must end like 0x ~~~10, 0x ~~~20, 0x~~~30, I made the input list as below.
[8~16] 0x~~~08 → 0x~~~20 : 24 Bytes is needed(Input 13 Bytes to make 17 Bytes)
[16~32] 0x~~~20 → 0x~~~40 : 32 Bytes is needed(Input 21 Bytes to make 25 Bytes)
[32~64] 0x~~~40 → 0x~~~70 : 48 Bytes is needed(Input 37 Bytes to make 41 Bytes)
[64~128] 0x~~~70 → 0x~~~C0 : 80 Bytes is needed(Input 69 Bytes to make 73 Bytes)
[128~256] 0x~~~C0 → 0x~~~150 : 144 Bytes is needed(input 133 Bytes to make 137 Bytes)
13 → 21 → 37 → 69 → 133 → (n=(n-1)*2-5) → 261 → 517 → 1029 → 2053 → 4101
In fact, values less than 64 bytes don't need to calculate the memory address because "slow_memcpy()" is executed, but I calculated it for formula derivation.
My Linux allocates memory starting with "0x~~~08" but pwnable.kr may allocates another address, so there may be an error. In this case, modify the value of below 64. This moves whole data gradually.