I have a working app which I need to speed up. I set up profiling (see here for details) which appears to report on how much time each function takes. I can not find a way to discover anything about time consumed in different sub-parts of functions.
I then inserted the keyword "inline" in the declarations of some frequently accessed small functions hoping for some speedup. But when I profiled again, I saw the same list of functions, including the ones I'd made inline. This made me suspicious as to whether the inline keyword had just been ignored.
I have a vague recollection that with some compilers the inline keyword was something that the compiler could optionally ignore, depending on things like the amount of memory available.
So is there some check I could do to confirm whether or not the "inline" keyword has actually done its job?
You could try:
examining the compiler's assembly or machine code output (whether disassembling or just checking for the function symbol with nm or whatever android has), or stepping through with a debugger
using a compiler pragma/attribute to force inlining (if available, for example GCC has a function attribute always_inline), if your profiling results aren't affected then presumably the compiler was already inlining
checking your profiling docs to make sure that however you're doing profiling doesn't inhibit inlining
As you recalled, inline (and member functions defined inside their class which are implicitly inline) are just hints for the compiler. Some people argue they're just convenient ways to manage One Definition Rule issues, but you'd have to check individual C++ compilers' code to see if the keyword was really that meaningless these days. The compiler might use all sorts of metrics to work out when to inline, including the optimisation flags in affect, the size of the out-of-line function, the number of calls to the function (e.g. if there's only one, why not inline even a large function?) etc..
Related
I am currently reading a research paper on obfuscation. Here's the portion of the paper that relates to my question.
"while current obfuscation schemes elevate some islets of static analysis, such as changing layout of the source code, changing control flow, and modifying data, they are easily exposed to reverse engineering analysis due to a lack of API concealment. Therefore, a quantitative evaluation scheme is needed to ensure that obfuscation is applied to an appropriate API with an adequate degree of resistance to reverse engineering."
"Obfuscating the API" probably means changing the names of identifiers, like class names, method names, field names, etc, to very un-descriptive names. so that readers of your code wouldn't know what your code is doing.
Proguard is such a tool. Here is a post I found, involving using Proguard to obfuscate private methods in a simple class. You can see how privateStaticMethod turned into a, and how the parameter names turned into paramString1 and paramString2.
By doing so, readers won't know what a does just by looking, because a tells them literally nothing about what the method actually does. The methods that a calls might also be obfuscated as b or c, which makes it even harder to know what your code is doing.
Reverse-engineering here refers to trying to figure out how obfuscated code looked originally. Obviously, changing the names of methods and parameters makes it harder to reverse-engineer than just changing control flow and code layout.
I have a C library which I'm cross-compiling to use in Android & iOS apps.
It makes use of memcpy() and mktime() so I want to know if these functions are implicitly thread-safe when used in multi-threaded environments.
iOS apps compiled with modern Xcode and Android libraries compiled with modern Android NDK use a clang compiler which is LLVM-based.
I've reviewed the following questions, but have been unable to find a definitive answer:
Is memcpy process-safe?
Are functions in the C standard library thread safe?
POSIX requires of conforming implementations that all functions it standardizes be thread safe, with the exception of a relatively short list of functions. memcpy() and mktime() are both covered by POSIX, and neither is on the list of exceptions, so POSIX requires them to be thread safe (but read on).
Note well, however, that this is not a matter of the compiler used, but rather of the C library that supports your application. I recall Apple's C libraries being non-conforming in some areas. Nevertheless, there's nothing in particular about memcpy() and mktime() that makes them inherently risky from a thread safety perspective. That is, there's no reason to expect that they access any shared data, except any provided to them via their arguments.
And there's the rub. You can rely on memcpy() and mktime() not to, say, rely internally on static data, but POSIX's requirement for thread safety does not extend to working as documented in the face of data races you create through choice of arguments. Thus, for example, if two different threads call memcpy(), and the target region of one call overlaps either the source or target region of the other, then you need some flavor of synchronization between the threads.
The question if memcpy() is thread-safe might be discussible.
I would say that memcpy() is indeed thread-safe. It doesn't rely on a (global) state, which could be mangled up by multiple instances of memcpy() running. This, however, doesn't mean, that there is some magic preventing a memory area, which is concurrently the copy destination of multiple threads doing memcpy() gets badly mangled up, i.e. the copy process as a whole is not atomic. You would have to care yourself using mutexes to ensure atomicity.
mktime() is trivially threadsafe, since it doesn't use static buffers, use a global state or similar. The manpage mentions a few functions from that family being not threadsafe (those have corresponding *_r functions), but mktime() is not amongst those.
on windows, we can call several time MyThread.waitfor on the same thread. if the thread is already terminated no problem this will not raise any exception and return immediatly (normal behavior).
on Android, it's different, if we call twice MyThread.waitfor then we will have an exception on the second try with "No such process".
function TThread.WaitFor: LongWord;
{$ELSEIF Defined(POSIX)}
var
X: Pointer;
ID: pthread_t;
begin
if FExternalThread then
raise EThread.CreateRes(#SThreadExternalWait);
ID := pthread_t(FThreadID);
if CurrentThread.ThreadID = MainThreadID then
while not FFinished do
CheckSynchronize(1000);
FThreadID := 0;
X := #Result;
CheckThreadError(pthread_join(ID, X));
end;
{$ENDIF POSIX}
the error is made because on call to waitfor they set FThreadID := 0 so off course any further call will failled
i think it's must be written like :
function TThread.WaitFor: LongWord;
{$ELSEIF Defined(POSIX)}
begin
if FThreadID = 0 then exit;
...
end;
{$ENDIF POSIX}
what do you think ? did i need to open a bug request at emb ?
The documentation for pthread_join says:
Joining with a thread that has previously been joined results in undefined behavior.
This explains why TThread takes steps to avoid invoking undefined behavior.
Is there defect in the design? That's debatable. If we are going to consider the design of this class, let's broaden the discussion, as the designers must. A Windows thread can be waited on by multiple different threads. That's not the case for pthreads. The linked documentation also says:
If multiple threads simultaneously try to join with the same thread, the results are undefined.
So I don't think Embarcadero could reasonably implement the same behaviour on Posix platforms as already exists on Windows. For sure they could special case repeated waits from the same thread, as you describe. Well, they'd have to persist the thread return value so that WaitFor could return it. But that would only get you part way there, and wouldn't be very useful anyway. After all, why would you wait again from the same thread?
I suspect that FThreadID is set to 0 in an effort to avoid the undefined behaviour and fail in a more robust way. However, if multiple threads call WaitFor then there is a data race so undefined behaviour is still possible.
If we were trying to be charitable then we could
Leaving those specific details to one side, it is clear that if WaitFor is implemented by calling pthread_join then differing behaviour across platforms is inevitable. Embarcadero have tried to align the TThread implementations for each platform, but they cannot be perfectly equivalent because the platform functionality differs. Windows offers a richer set of threading primitives than pthreads.
If Embarcadero had chosen a different path they could have aligned the platforms perfectly but would have needed to work much harder on Posix. It is possible to replicate the Windows behaviour there, but this particular method would have to be implemented with something other than pthread_join.
Facing the reality though, I think you will have to adapt to the different functionality of pthreads. In pthreads the ability to wait on a thread is included merely as a convenience. You would do better to wait on an event or a condition variable instead, if you really do want to support repeated waits. On the other hand you might just re-create your code to ensure you only wait once.
So, to summarise, you should probably raise an issue with Embarcadero, if there isn't one already. It is possible that they might consider supporting your scenario. And it's worth having an issue in the system. But don't be surprised if they choose to do nothing and justify that because of the wider platform differences that cannot be surmounted, and the extra complexity needed in the class to support your somewhat pointless use case. One thing I expect we can all agree on though is that the Delphi documentation for TThread.WaitFor should cover these issues.
So I'm trying to write some low-level code for Android, and my main concern is that I want to avoid ALL optimization by the JIT compiler (or anything else). After doing some research, the best approach seems to be to:
write Java bytecode by hand
convert it to a dex file using the "dx" command
run it on the program using the "dalvikvm" command (via adb shell) with the "-Xverify:none -Xdexopt:none" paramaters specified
My question is: will this in fact avoid ALL optimization? The previous discussion here https://groups.google.com/forum/#!topic/android-platform/Y-pzP9z6xLw makes me unsure, and I can't 100% convince myself by reading the docs.
Any confirmation one way or the other is greatly appreciated.
Some of the instruction rewriting performed by dexopt cannot be disabled. For example, accesses to volatile long fields must be handled differently from access to long fields, and the specialization is handled by replacing the field-get instruction with a different instruction.
The optimizations performed by dexopt take the form of instruction replacement, usually some sort of "quickening" that allows the VM to do a little less work. All such optimizations are performed statically, ahead of time, not dynamically at run time, so you will get consistent behavior. Enabling the dexopt optimizations doesn't introduce unknowns, it just changes from one set of knowns to a different set of knowns.
The biggest source of variation is going to be Dalvik's JIT compiler, which you can disable with -Xint:fast. See this slightly outdated doc for notes on how to configure this system-wide.
Android project with a native component. I'm using a third party library where I suspect there's an bug with uninitialized or unreset variable. The same sequence of calls (should be equivalent according to the interface definition) yields different results.
I've got the sources to the library, but I don't want to dig deep in them (it's really big and convoluted). Is there a way to leverage something like GDB to compare two runs of a piece of code - see if the variable state diverges at any point? It should not - the code is completely in-memory, no I/O or randomness there.