I am trying to make decision how to design my app.
I have about 300 instances of class just like this:
public class ParamValue {
protected String sValue = null;
protected short shValue = 0;
protected short mode = PARAM_VALUE_MODE_UNKNOWN;
/*
* ...
*/
}
I have an array of these instances. I can't find out does these shorts really take 2 bytes or they take anyway 4 bytes?
And i need to pass the list of these objects via AIDL as a List<Parcelable>. Parcel can't readShort() and writeShort(), it can only work with int. So, to use short here too i have to manually pack two my shorts into one int, parcel it, and then unpack back. Looks too obtrusive.
Could you please tell me how many bytes shorts take, and does it make sense to use short instead of int here?
UPDATE:
I updated my question for future readers.
So, I wrote a test app and I figured out that in my case there's absolutely no reason to use short, because it takes the same space as int. But if I define array of shorts like that:
protected short[] myValues[2];
then it takes less space than array of ints:
protected int[] myValues[2];
Technically, in the Java language, a short is 2 bytes. Inside the JVM, though, short is a storage type, not a full-fledged primitive data type like int, float, or double. JVM registers always hold 4 bytes at a time; there are no half-word or byte registers. Whether the JVM ever actually stores a short in two bytes inside an object, or whether it is always stored as 4 bytes, is really up to the implementation.
This all holds for a "real" JVM. Does Dalvik do things differently? Dunno.
According to the Java Virtual Machine Specification, Sec. 2.4.1, a short is always exactly two bytes.
The Java Native Interface allows direct access from native code to arrays of primitives stored in the VM. A similar thing can happen in Android's JNI. This pretty much guarantees that a Java short[] will be an array of 2-byte values in either a JVM-compliant environment or in a Dalvik virtual machine.
Related
I recently had the task of performing a cross-selection operation on some collections, to find an output collection that was matching my criteria. (I will omit the custom logic because it is not needed).
What I did was creating a class that was taking as a parameter Lists of elements, and I was then calling a function inside that class that was responsible for processing those lists of data and returning a value.
Point is, I'm convinced I'm not doing the right thing, because writing a class holding hundreds of elements, taking names lists as parameters, and returning another collection looks unconventional and awkward.
Is there a specific programming object or paradigm that allows you to process large numbers of large collections, maybe with a quite heavy custom selection/mapping logic?
I'm building for Android using Kotlin
First of all, when we talk about the performance, there is only one right answer - write benchmark and test.
About memory: list with 1,000,000 of unique Strings with average size 30 chars will take about 120 Mb (e.g. 10^6 * 30 * 4, where last is "size of char", let's think that this is Unicode character with 4 bytes). And please add 1-3% for collateral expenses, such as link references. Therefore: if you have hundreds of Strings then just load whole data into memory and use list, because this is the fastest solution (synchronous, immutable, etc.).
If you can do streaming-like operations, you can use sequences. They are pretty lazy, the same with Java Streams and .Net Linq. Please check example below, it requires small amount of memory.
fun countOfEqualLinesOnTheSamePositions(path1: String, path2: String): Flow<String> {
return File(path1).useLines { lines1 ->
File(path2).useLines { lines2 ->
lines1.zip(lines2)
.map { (line1, line2) ->
line1 == line2
}
.count()
}
}
}
If you couldn't store whole data in memory and you couldn't work with stream-like schema, you may:
Rework algorithm to single-pass to multiple-pass, there each is stream-like. For example, Huffman Coding is two-pass algorithm, so it can be used to compress 1Tb of data by using small amount of memory.
Store intermediate data on the disk (this is much complex for this short answer).
For additional optimizations:
To cover case of merging a lot of parallel streams, please consider also Kotlin Flow. It allows you to work asynchronously, to avoid IO blocks. For example, this can be useful to merge ~100 network streams.
To keep a lot of non-unique items in memory, please consider caching logic. It can save memory (however please benchmark first).
Try operate with ByteBuffers, instead of Strings. You can get much less allocation (because you can deallocate object explicitly), however code will be too complex.
I'm trying to wrap my head around the most efficient way to deal with arrays of indeterminate size as outputs of RS kernels. I would send the index of the last relevant array slot in the out allocation, but I learned in the answer to my previous question, there's not a good way to pass a global back to java after kernel execution. I've decided to "zoom out" the process again which lead me to the pattern below.
For example let's say we have an input allocation containing a struct (or structs) that that contains two arrays of polar coordinates; something like set_pair from bellow:
typedef struct polar_tag{
uint8_t angle;
uint32_t mag;
} polar;
typedef struct polar_set_tag{
uint8_t filled_slots;
polar coordinates[60];
} polar_set;
typedef struct set_pair_tag{
polar_set probe_set;
polar_set candidate_set;
} set_pair;
We want to find similar coordinate pairs between the sets so we set up a kernel to decide which (if any) of the polar coordinates are similar. If they're similar we load it into an output allocation that looks something like "matching_set":
typedef struct matching_pair_tag{
uint8_t probe_index;
uint8_t candidate_index;
} matching_pair;
typedef struct matching_set_tag{
matching_pair pairs[120];
uint8_t filled_slots;
} matching_set;
Is creating allocations with instructions like "filled_slots" the most efficient (or only) way to handle this sort of indeterminate I/O with RS or is there a better way?
I think the way I would try to approach this is to do a two pass.
For the 0-2 case:
Setup: for each coordinate, allocate an array to hold the max expected number of pairs (2).
Pass 1: run over coords, look for pairs by comparing the current item to a subset of other coords. Choose subset to avoid duplicate answers when the kernel runs on the other coord being compared.
Pass 2: Merge the results from #1 back into a list or whatever other data structure you want. Could run as an invokable if the number of coordinates is small.
For the 0-N case:
This gets a lot harder. I'd likely do something similar to what's above but with the per-coord array sized for a typical number of pairs. For the (hopefully small) number of overflows, use atomics to reserve a slot in an overflow buffer. The catch here is I think most GPU drivers would not be very happy with the atomics today. Would run very well on the CPU ref.
There are a lot of ways to go about this. One important decision point revolves around how expensive the comparison is to find the points vs the cost of writing the result.
I am reading some paper for lock-free doubly linked list.
In these papers, they store an address to next and prev node and a flag in one word(int).
Is it because in 32-bit architectures all addresses are aligned in 4 byte boundaries so all address are multiple of 4?
and if the reason is what i say is this code ok?
const int dMask = 1;
const int pMask = ~dMask;
int store(void* pPointer, bool pDel)
{
return reinterpret_cast<int>(pPointer) | (int)pDel;
}
void load(int pData, void** pPointer, bool* pDel)
{
*pPointer = reinterpret_cast<void*>(pData & pMask);
*pDel = pData & dMask;
}
And another question: Do in other platforms such as Android mobile devices, above idea is correct?
You're more or less correct. It's a common space optimization
in very low level code. It is not portable. (You could make
it slightly more portable by using intptr_t instead of int.)
Also, of course, the alignment only holds for pointers to more
complex types; a char* won't necessarily be aligned. (The only
times I've seen this used is in the implementation of memory
management, where all of the blocks involved are required to be
aligned sufficiently for any type.)
Finally, I'm not sure what the authors' of the paper are trying
to do, but the code you post cannot be used in a multithreaded
environment, at least on modern machines. To ensure that
a modification in one thread is seen in another thread, you need
to use atomic types, or some sort of fence or membar.
Addresses in most 32-bit architectures are not stored in 4-byte boundaries, but 1-byte. They are read from memory at 4-byte (the typical word size) increments. Without seeing the code for this doubly linked list, it sounds like they are enforcing some rules for how the container will store data.
I've started to really like using C# and Java enums in my code for several reasons:
They are much more type-safe than integers, strings, or sets of boolean flags.
They lead to more readable code.
It's more difficult to set an enum to an invalid value than an int or string.
They make it easy to discover the allowed values for a variable or parameter.
Everything I've read indicates that they perform just as well as integers in C# and most JVMs.
However, the Android framework has numerous cases where flags of various types need to be passed around, but none of them seem to use enums. A couple of examples where I would think their use would be beneficial are Toast.LENGTH_SHORT / Toast.LENGTH_LONG and View.GONE, View.VISIBLE, etc.
Why is this? Do enums perform worse than simple integer values in Dalvik? Is there some other drawback I'm not aware of?
This answer is out of date as of March 2011.
Enums can be used on Froyo and up - according to this answer (Why was “Avoid Enums Where You Only Need Ints” removed from Android's performance tips?) from a member of the Android VM team (and his blog).
Previous Answer:
The official Android team recommendation is to avoid enums whenever you can avoid it:
Enums are very convenient, but
unfortunately can be painful when size
and speed matter. For example, this:
public enum Shrubbery { GROUND, CRAWLING, HANGING }
adds 740 bytes to
your .dex file compared to the
equivalent class with three public
static final ints. On first use, the
class initializer invokes the
method on objects representing each of
the enumerated values. Each object
gets its own static field, and the
full set is stored in an array (a
static field called "$VALUES"). That's
a lot of code and data, just for three
integers. Additionally, this:
Shrubbery shrub = Shrubbery.GROUND;
causes a static field lookup. If
"GROUND" were a static final int, the
compiler would treat it as a known
constant and inline it.
Source: Avoid Enums Where You Only Need Ints
Integers are smaller, and require less overhead, something that still matters on mobile devices.
A colleague of mine performed a small test regarding this situation. He auto generated a
class and an enum with the same amount of "enums". I believe he generated 30000 entries.
The results were:
.class for the class was roughly 1200KB
.class for the enum was roughly 800KB
Hope this helps someone.
I"m planning to store my data in a binary format as a resource, read it into an int buffer and basically pass it straight down to a native C++ function, which might cast it to a struct/class and work with it. No pointers, obviously, just ints and floats.
The question is - what kind of fixing up do I need to do? I suppose that I need to check ByteOrder.nativeOrder(), figure out if it's big endian or little endian, and perform byte-swapping if need be.
Other than that, floats are presumably guaranteed to be expected in IEEE 754 format? Are there any other caveats I'm completely overlooking here?
(Also - since I'm compiling using the NDK, I know what architecture it is already (ARMv7-A, in my case), so can I technically skip the endian shenanigans and just take the data the way it is?)
ARM support both big and little endian. This will most probably be set by the OS so it might be worth while checking this out beforehand.
There is also the issue of padding to word size in a struct:
struct st
{
char a;
int b;
};
will have a sizeof 8 and not the expected 5 bytes. This is so that the int will be word aligned. Generally align everything to 4 bytes and probably use the gcc packed attribute (struct my_packed_struct __attribute__ ((__packed__))
) as well. This will ensure that the internals of the struct are as you expect.
Alternatively use the Android Simulator to generate the data file for you.