How to sufficiently hash an image to avoid collisions?

How to sufficiently hash an image to avoid collisions? - android

I want to use hashes to uniquely identify photos from an Android phone, to answer queries of does server have xyz? and fetch image which hashes to xyz. I face this:
Hashing the whole image is likely to be slow, hence I want to hash only the first few units (bytes) of the image file, not the whole file.
The first few characters are insufficient due to composition, eg a user takes a photo of a scene, and then takes a second photo of the same scene after adding a paper clip at the bottom of the frame
The first few characters are insufficient to avoid hash collisions, ie it may cause mixups between users.
How many characters must I hash from the image file, so that I keep the chance of a mishap low? Is there a better indexing scheme?

As soon as you leave any bytes out of the hash, you give someone the opportunity to create (either deliberately or accidentally) a file that differs only at those bytes, and hence hashes the same.
How different this image actually looks from the original depends to some extent how many bytes you leave out of the hash, and where. But you first have to decide what hash collisions you can tolerate (deliberate/accidental and major/minor), then you can think about what how fast a hash function you can use, and how much data you need to include in it.
Unless you're willing to tolerate a "largeish block" of data changing, you need to include bytes from every "largeish block" in the hash. From the point of view of I/O performance this means you need to access pretty much the whole file, since reading even one byte will cause the hardware to read the whole block that contains it.
Probably the thing to do is start with "definitely good enough", such as an SHA-256 hash of the whole file. See how much too slow that is, then think about how to improve performance by the required percentage. For example if it's only 50% too slow you could probably solve the problem with a faster (less secure) hash but still including all the data.
You can work out the limit of how fast you can go with a less secure hash by implementing some completely trivial hash (e.g. XOR of all the 4-byte words in the file), and see how fast that runs. If that's still too slow then you need to give up on accuracy and only hash part of the file (assuming you've already done your best to optimize the I/O).
If you're willing to tolerate collisions, then for most (all?) image formats, there's enough information in the header alone to uniquely identify "normal" photos. This won't protect you against deliberate collisions or against the results of image processing, but barring malice the timestamp, image size, camera model etc, together with even a small amount of image data will in practice uniquely identify every instance of "someone taking a photo of something". So on that basis, you could hash just the first 64-128k of the file (or less, I'm being generous to include the max size of an EXIF header plus some) and have a hash that works for most practical purposes but can be beaten if someone wants to.
Btw, unless done deliberately by a seriously competent photographer (or unless the image is post-processed deliberately to achieve this), taking two photos of the same scene with a small difference in the bottom right corner will not result in identical bytes at the beginning of the image data. Not even close, if you're in an environment where you can't control the light. Try it and see. It certainly won't result in an identical file when done with a typical camera that timestamps the image. So the problem is much easier if you're only trying to defend against accidents, than it is if you're trying to defend against deception.

I think the most efficient approach is to pick random bytes (previously selected, and static throughout) and calculating XOR or some other simple hash should be good enough.

Related

Android - Encode info in image pixels

I have an image processing app, and I have to remember which photos have already been processed. In order to do so, I store an EXIF metadata in the image file, but if the user decides to send the image over WhatsApp, etc, the metadata is lost, so I can't rely on this.
So I though a solution would be to use some pixels in the image to remember if the photo was processed or not, something like if that pixel has that color and that other that other color, then it was processed.
I have tried to do so, but if the image gets scaled down, then my approach won't work.
Any ideas on how to do it?
Thank you.

This is a quite complicated topic and there's no silver bullet that'll help you. One pitfall is the fact that most image compression algorithms try to throw away data our eyes cannot see, and your watermarking algorithm would then have to make modifications that are visible to the human eye, which is probably not what you want.
Another thing to note is whatever you do, if you go with the route of storing data using pixels, you're going to have false positives.
What you could do, since your app already does image processing, is to embed your application's logo as a watermark into the image somewhere. Then you can attempt to find distinct features of your logo in the image, which is admittedly not very easy, but neither is the problem.
Again, this problem is already quite complicated so I'm sorry if this answer isn't adequate.

storing information in png and jpg

I have found a number of resources but nothing that has quite helped me with what I am looking for. I am trying to understand the .png and.jpg file formats enough to be able to modify and/or read the exif or other meta data in the files or to create my own meta data if possible.
I want to do this in the context of an android application so we can keep if there but it is really not exclusive to that. I am trying to figure out how to do this using a simple imput stream byte array and go from there.
Android itself has to at least extract the RGB pixel information at some point when it creates a bmp image from the stream, I took a look in the BitMapFactory source to try and understand it but I got lost somewhere after delving into the Native files.
I assume the bmps are losing any exif/meta data in the files based on my research. So I guess I want to break the inputstreams down by byte arrays and remove meta data. In .pngs I know there is no 'standard' but based on this page it seems there is some organization of the meta data you can store.
With all that said, I wouldn't mind just leaving exif/png standards behind and trying to store my own information in some sort of standardized way, but I need to know more about how the image readers id the files as either jpg, png, ect. then determine where the pixel information is located.
So I guess my first question is, has anyone done something similar to this before so that they can file me in? If not, does anyone know of any good libraries that might be good for educational purposes into figuring out how to locate and extract this data?
Or even more basically, what is a good way to find meta data and/or the exif standard or even the rgb data programmatically using something like a byte array?

There are a lot of things to address in your question, but first I should clarify that when you say "Android itself has to at least extract the RGB pixel information," what you're referring to is the act of decompression, which is complicated in the case of JPEG, and non trivial even for PNG. I think it would be very useful for you to read through the wikipedias for JPEG and PNG before attempting to go any further (especially sections on header, syntax, file structure, etc).
That being said, you've got the right idea. It shouldn't be too difficult to read in the header of an image as a byte array/stream, make some changes, and replace the old file. A PNG file can be identified by the first 8 bytes, and there should be a similar way to identify a JPEG - I can't remember off the top of my head.
To modify PNG meta data, you'll have to understand "chunks" - types/names, ordering, format, CRC, etc. The libpng website has some good resources for this, here's general PNG info, as well as chunk specifications. Make sure you don't forget to recalculate the CRC if you change anything.
JPEG sections off a file using "markers," which are two bytes long and always start with FF. Exif is just a regular JPEG file with a more specific structure for meta data, and this seems like a reasonable introduction: Exit/TIFF
There are probably libraries for Android/Java that conveniently take care of this for you, but I've never used any myself. A quick google search turns up this, and I'm sure there are many other options if you don't want to take the time to write a parser yourself.

How can I make it harder for players to hack game level data?

Note: this is in relation to android specifically, but the best answer might not be platform specific, hence the other tags.
Consider a game similar to angry birds: you have a bunch of levels. Each time you finish a level, the next level is available for play, but not before. How can I make it harder for players to hack my game files and unlock levels that shouldn't be available? Assume that progression data is stored locally.
My thoughts:
On android, all app files are stored in a folder that the user can only access if they have root access (by default, they never do, but it's usually very easy to get as long as you google a little). Right now, I am using an sqlite database that looks something like this:
LevelId = pk | UnlockStatus = int, 0 = locked, 1 = unlocked, 2 = completed with 1 star, ...
This is fine as long as the user doesn't have root or is not at all familiar with where app files are stored. If they have root however, this file is very easy to edit.
As far as I can tell, angry birds stores its level data in a .lua file, at least according to its name. I can find no text file or db file that contains level info. Opening this .lua in a text editor displays nothing but gibberish. I haven't tried a hex editor.
Using an sql table is very convenient. Is there an easy way to store the progression data in the sql table such that the user will have a harder time making sense of it? Ideally, it should also not be too time-consuming to implement. Being an offline game, I don't care THAT much if the player hacks it or not, so I'm looking for the best quality - implementation time trade-off. Theoretical answers that yield a lot more implementation time for considerably more quality are also appreciated however.

You best bet would be saving data using some sort of encryption. In android, SQLite doesn't offer encryption at database level. However, you may encrypt the values (records) in table and decrypt them after querying.
Another way could be saving your data as key/value pair in some sort of text file (example of .lua in angrybirds) in internal or external memory and perform encryption on the file contents. On the other hand, decrypt it at run-time and read your key/value pairs.
Tadaaa! problemo solved :)

Hacking your data (game binaries/level-data/highscore-tables etc.) which is stored locally on the device (or eventually remotely) will always be accessible (and decodable) by someone who really wants to and knows how to do it.
Each layer of security you add will only make it more difficult (and such take a longer time) for the hacker. Sometimes even adding an additional layer of security takes more time to implement on your side than for the hacker to understand it. (There are famous examples in computing history, one is the XBox IIRC).
Security by obscurity is what you try to do when encrypting the data in your case. This is not sufficient in the long term. Especially when you project meets big audience.

a simple method: store differet level data for each player.
code less than 5 lines:
//encrypt code:
save_level = "level_txt" + "#"+ md5("gamename" + "playername" + level_txt)
//decrypt code:
level_plaintxt, md5_level = save_level.split("#")
if (md5_level == md5("gamename" + "playername" + level_plaintxt))
return level_plaintxt

Base64 Encoded images on Android/iOS

I'm looking for a way of obfuscating the images I store in my application and am currently considering Base46 Encoding.
I need something with minimal overhead or if possible a performance boost over standard files on the file system.
Can someone comment on the feasability of base64 encoding the images (png) and subsequently using (decoding?) on the target platforms?
Thanks.

What sort of attack are you trying to protect against? Base64 is reasonably easily recognizable and has a potentially-significant impact in terms of space (each image will take an extra 33% space).
Some sort of shifting XOR would be harder to spot just from the data, but it wouldn't be adequate protection for really significant assets.

I am sure you understand Base64 won't fool anyone who really want to get your Bitmap.
Jon Skeet is right, Base64 is nice to encode binary data in readable format but will not really help you here. An XOR against a password of yours will be faster, and won't add any size overhead.
If you really want to obfuscate your bitmaps I suggest you to store them in the "raw" ressources folder. By doing this you will be able to keep the nice Android abstraction that handles different form factors (ldpi, hdpi, ...).
Extends the ImageView class to directly work with R.raw.filename id and do the reading file/decoding stream/creating bitmap there. By doing so, you will be able to rollback easily to the standard way of doing things if needed.

Be warned that you could run into memory issues when storing multiple bitmaps within an application memory in Android. OutOfMemoryErrors seem to be a recurring problem when dealing with bitmaps in android. Here is an example: outofmemoryerror-bitmap-size-exceeds-vm-budget-android

Please suggest a good way to protect application files in android 1.6?

I m designing a big android application, where there are XMLs to store temporary data, images captured by camera and other details. Which is the best way to protect them from outer access from phone or from PC. XMLs can be encrypted. And images too, however there are times when they need to be accessed very often and encrypting-decrypting is very heavy operation. XML encryption is manageable but images cause memory problems. Is there any alternative way, something at folder level ?

Ok, so the "enemy" is the malicious user? If that's the case, there is very little you can do, especially on a root-ed phone. Essentially, since your application is the guest here, you can't really prevent your host from kicking you out.
However, there are a few things you can do to deter them from doing so. You can encrypt the XML and image, but as Macarse raised, the decryption key would have to be on the apk itself or if you contact a server to get decryption key, it is possible for an advanced attacker to spoof a request which your server wouldn't be able to distinguish with real key requests. I'd go against asking the server, it's too much hassle with little gain.
Another you can do is to devise a proprietary image format, then no standard image editing tools can edit the image. However, an advanced attacker could still reverse engineer your image format, and write a converter to a standard image format.
The third thing and most realistic you can do is to just not store the image on the phone. When you take a snap, then immediately send it to the server, so you wouldn't need to mess with securely saving the image. An attacker can still intercept the network traffic as it is being sent or they can tamper your apk(!) such that the program would save a copy of the image to the phone. You can probably do some self-authenticating apk, but that's usually much more hassle than it's worth.
In short, there is little you can do against your host. It all depends on how valuable is the data you're securing, and how likely someone would spend that much time on trying to break your security, to get to the prize.
I'd say, just encrypt the image using a locally stored decryption key, unless you have a real reason to suspect that someone would spend their time to reverse engineer your code.

There is not such a secure thing to do with assets.
If you store stuff on res/raw it can be read by other applications on a normal phone but yes on a rooted one.
If you encrypt data, the decryption key will be available in your code. Easy to get it having the apk and apktool.
Perhaps you can do some of that but also obfuscating the code (Android developer guides recommend ProGuard).

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.