Jsoup attribute selector returning empty

Jsoup attribute selector returning empty - android

I am trying to get images from google
String url = "https://www.google.com/search?site=imghp&tbm=isch&source=hp&q=audi&gws_rd=cr";
org.jsoup.nodes.Document doc = Jsoup.connect(url).get();
Elements elements = doc.select("div.isv-r.PNCib.MSM1fd.BUooTd");
ImageData is encoded in base64 so in order to get actual image url I first get the data id which is set as an attribute , this works
for (Element element : elements) {
String id = element.attr("data-id")).get();
I need to make new connection with url+"#imgrc="+id ,
org.jsoup.nodes.Document imgdoc = Jsoup.connect(url+"#"+id).get();
Now in the browser when I inspect my required data is present inside <div jsname="CGzTgf"> , so I also do the same in Jsoup
Elements images = imgdoc.select("div[jsname='CGzTgf']");
//futher steps
But images always return empty , I am unable to find the error , I do this inside new thread in android , any help will be appreciated

Turns out the way you're doing it you'll be looking in the wrong place entirely. The urls are contained within some javascript <script> tag included in the response.
I've extracted and filtered fro the relevant <script> tag (one containing attribute nonce.
I then filter those tags for one containing a specific function name used AND a generic search string I'm expecting to find (something that won't be in the other <script> tags).
Next, the value obtained needs to be stripped to get the JSON object containing about a hundred thousand arrays. I've then navigated this (manually), to pull out a subset of nodes containing relevant URL nodes. I then filter this again to get a List<String> to get the full URLs.
Finally I've reused some code from an earlier solution here: https://stackoverflow.com/a/63135249/7619034 with something similar to download images.
You'll then also get some console output detailing which URL ended up in which file id. Files are labeled image_[x].jpg regardless of actual format (so you may need to rework it a little - Hint: take file extension from url if provided).
import com.jayway.jsonpath.JsonPath;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;
public class GoogleImageDownloader {
private static int TIMEOUT = 30000;
private static final int BUFFER_SIZE = 4096;
public static final String RELEVANT_JSON_START = "AF_initDataCallback(";
public static final String PARTIAL_GENERIC_SEARCH_QUERY = "/search?q";
public static void main(String[] args) throws IOException {
String url = "https://www.google.com/search?site=imghp&tbm=isch&source=hp&q=audi&gws_rd=cr";
Document doc = Jsoup.connect(url).get();
// Response with relevant data is in a <script> tag
Elements elements = doc.select("script[nonce]");
String jsonDataElement = getRelevantScriptTagContainingUrlDataAsJson(elements);
String jsonData = getJsonData(jsonDataElement);
List<String> imageUrls = getImageUrls(jsonData);
int fileId = 1;
for (String urlEntry : imageUrls) {
try {
writeToFile(fileId, makeImageRequest(urlEntry));
System.out.println(urlEntry + " : " + fileId);
fileId++;
} catch (IOException e) {
e.printStackTrace();
}
}
}
private static String getRelevantScriptTagContainingUrlDataAsJson(Elements elements) {
String jsonDataElement = "";
int count = 0;
for (Element element : elements) {
String jsonData = element.data();
if (jsonData.startsWith(RELEVANT_JSON_START) && jsonData.contains(PARTIAL_GENERIC_SEARCH_QUERY)) {
jsonDataElement = jsonData;
// IF there are two items in the list, take the 2nd, rather than the first.
if (count == 1) {
break;
}
count++;
}
}
return jsonDataElement;
}
private static String getJsonData(String jsonDataElement) {
String jsonData = jsonDataElement.substring(RELEVANT_JSON_START.length(), jsonDataElement.length() - 2);
return jsonData;
}
private static List<String> getImageUrls(String jsonData) {
// Reason for doing this in two steps is debugging is much faster on the smaller subset of json data
String urlArraysList = JsonPath.read(jsonData, "$.data[31][*][12][2][*]").toString();
List<String> imageUrls = JsonPath.read(urlArraysList, "$.[*][*][3][0]");
return imageUrls;
};
private static void writeToFile(int i, HttpURLConnection response) throws IOException {
// opens input stream from the HTTP connection
InputStream inputStream = response.getInputStream();
// opens an output stream to save into file
FileOutputStream outputStream = new FileOutputStream("image_" + i + ".jpg");
int bytesRead = -1;
byte[] buffer = new byte[BUFFER_SIZE];
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}
outputStream.close();
inputStream.close();
System.out.println("File downloaded");
}
// Could use JSoup here but I'm re-using this from an earlier answer
private static HttpURLConnection makeImageRequest(String imageUrlString) throws IOException {
URL imageUrl = new URL(imageUrlString);
HttpURLConnection response = (HttpURLConnection) imageUrl.openConnection();
response.setRequestMethod("GET");
response.setConnectTimeout(TIMEOUT);
response.setReadTimeout(TIMEOUT);
response.connect();
return response;
}
}
Partial Result I tested with:
I've used JsonPath for filtering the relevant nodes which is good when you only care about a small portion of the JSON and don't want to deserialise the whole object. It follows a similar navigation style to DOM/XPath/jQuery navigation.
Apart from this one library and Jsoup, the libraries used are very bog standard.
Good Luck!

Related

How to cache a video in background in android?

I am building an android application where a user can view some listed video. Those videos are categories into some channel. Once a channel is selected by user I want to cache all the video related to that channel in my cache memory so can play the video when there is no internet also.
Can anyone have more understanding about video cache without playing please help me in understanding how I can achieve this task.
Right now I am able to cache video If it's played using some library.

I have find the following working solution for caching video in background (single/multiple) using below lib, no need of player/video_view.use AsyncTaskRunner
Videocaching Lib
Add following in line in your gradle file
compile 'com.danikula:videocache:2.7.0'
Since we just need to kick start the prefetching, no need to do anything in while loop.
Or we can use ByteArrayOutputStream to write down the data to disk.
URL url = null;
try {
url = new URL(cachingUrl(cachingUrl));
InputStream inputStream = url.openStream();
int bufferSize = 1024;
byte[] buffer = new byte[bufferSize];
int length = 0;
while ((length = inputStream.read(buffer)) != -1) {
//nothing to do
}
} catch (IOException e) {
e.printStackTrace();
}
Important code from lib. to do
Create static instance in application class using following code
private HttpProxyCacheServer proxy;
public static HttpProxyCacheServer getProxy(Context context) {
Applications app = (Applications) context.getApplicationContext();
return app.proxy == null ? (app.proxy = app.newProxy()) : app.proxy;
}
private HttpProxyCacheServer newProxy() {
//return new HttpProxyCacheServer(this);
return new HttpProxyCacheServer.Builder(this)
.cacheDirectory(CacheUtils.getVideoCacheDir(this))
.maxCacheFilesCount(40)
.maxCacheSize(1024 * 1024 * 1024)
.build();
}
Write following code in your activity to pass url
public String cachingUrl(String urlPath) {
return Applications.getProxy(this).getProxyUrl(urlPath, true);
}

Get personal app code and display it

I am trying to get the app code and display it, for an example if button X starts a new activity then a textView displays the whole method
I reached only how can I display code in HTML format from this question
But is there is a way to get the code of my app out, I think that there are 2 ways
An Internal one by getting it by the app itself
An External one by reading the java file then filtering it and getting the text of the method
Is there are any ideas about that?
Thanks in advance

The above is not currently possible as mentioned by others is the comments. What i can suggest is shipping your application with the source code in the assets folder and using a helper function to extract a certain methods from the source at runtime (your second proposed approach). I have written example code but it is in pure java and needs to be ported to android (a few lines).
NB: You may need to reformat the code after extraction depending on your use case.
Hope it helps :)
The code for the helper method:
static String getTheCode(String classname ,String methodSignature ) throws FileNotFoundException {
//**********************A few lines of code below need changing when porting ***********//
// open file, your will be in the assets folder not in the home dir of user, don't forget the .java extension when porting
File file = new File(System.getProperty("user.home") +"/"+ classname +".java");
// get the source, you can use FileInputReader or some reader supported by android
Scanner scanner = new Scanner(file);
String source = "";
while(scanner.hasNext()) {
source += " "+ scanner.next();
}
//**********************The above code needs changing when porting **********//
// extract code using the method signature
methodSignature = methodSignature.trim();
source = source.trim();
//appending { to differentiate from argument as it can be matched also if in the same file
methodSignature = methodSignature+"{";
//making sure we find what we are looking for
methodSignature = methodSignature.replaceAll("\\s*[(]\\s*", "(");
methodSignature = methodSignature.replaceAll("\\s*[)]\\s*", ")");
methodSignature = methodSignature.replaceAll("\\s*[,]\\s*", ",");
methodSignature = methodSignature.replaceAll("\\s+", " ");
source =source.replaceAll("\\s*[(]\\s*", "(");
source = source.replaceAll("\\s*[)]\\s*", ")");
source = source.replaceAll("\\s*[,]\\s*", ",");
source = source.replaceAll("\\s+", " ");
if(!source.contains(methodSignature)) return null;
// trimming all text b4 method signature
source = source.substring(source.indexOf(methodSignature));
//getting last index, a methods ends when there are matching pairs of these {}
int lastIndex = 0;
int rightBraceCount = 0;
int leftBraceCount = 0;
char [] remainingSource = source.toCharArray();
for (int i = 0; i < remainingSource.length ; i++
) {
if(remainingSource[i] == '}'){
rightBraceCount++;
if(rightBraceCount == leftBraceCount){
lastIndex = (i + 1);
break;
}
}else if(remainingSource[i] == '{'){
leftBraceCount++;
}
}
return source.substring(0 ,lastIndex);
}
Example usage (getTheCode methods is static and in a class called GetTheCode):
public static void main(String... s) throws FileNotFoundException {
System.out.println(GetTheCode.getTheCode("Main", "private static void shoutOut()"));
System.out.println(GetTheCode.getTheCode("Main", "private static void shoutOut(String word)"));
}
Output:
private static void shoutOut(){ // nothing to here }
private static void shoutOut(String word){ // nothing to here }
NB: When starting your new activity create a method eg
private void myStartActivty(){
Intent intent = new Intent(MyActivity.this, AnotherActivity.class);
startActivity(intent);
}
Then in your onClick:
#Override
public void onClick(View v) {
myStartActivity();
myTextView.setText(GetTheCode.getTheCode("MyActivity","private void myStartActivity()"));
}
Update: Ported the Code for android:
import android.content.Context;
import java.io.IOException;
import java.util.Scanner;
public class GetTheCode {
static String getTheCode(Context context, String classname , String methodSignature ) {
Scanner scanner = null;
String source = "";
try {
scanner = new Scanner(context.getAssets().open(classname+".java"));
while(scanner.hasNext()) {
source += " "+ scanner.next();
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
scanner.close();
// extract code using the method signature
methodSignature = methodSignature.trim();
source = source.trim();
//appending { to differentiate from argument as it can be matched also if in the same file
methodSignature = methodSignature+"{";
//making sure we find what we are looking for
methodSignature = methodSignature.replaceAll("\\s*[(]\\s*", "(");
methodSignature = methodSignature.replaceAll("\\s*[)]\\s*", ")");
methodSignature = methodSignature.replaceAll("\\s*[,]\\s*", ",");
methodSignature = methodSignature.replaceAll("\\s+", " ");
source =source.replaceAll("\\s*[(]\\s*", "(");
source = source.replaceAll("\\s*[)]\\s*", ")");
source = source.replaceAll("\\s*[,]\\s*", ",");
source = source.replaceAll("\\s+", " ");
if(!source.contains(methodSignature)) return null;
// trimming all text b4 method signature
source = source.substring(source.indexOf(methodSignature));
//getting last index, a methods ends when there are matching pairs of these {}
int lastIndex = 0;
int rightBraceCount = 0;
int leftBraceCount = 0;
char [] remainingSource = source.toCharArray();
for (int i = 0; i < remainingSource.length ; i++
) {
if(remainingSource[i] == '}'){
rightBraceCount++;
if(rightBraceCount == leftBraceCount){
lastIndex = (i + 1);
break;
}
}else if(remainingSource[i] == '{'){
leftBraceCount++;
}
}
return source.substring(0,lastIndex);
}
}
Usage:
// the method now takes in context as the first parameter, the line below was in an Activity
Log.d("tag",GetTheCode.getTheCode(this,"MapsActivity","protected void onCreate(Bundle savedInstanceState)"));

Let's start with a broader overview of the problem:
Display App code
Press X button
Open new activity with a textview which displays the method
The goal is to do the following:
Viewing app method by extracting it and then building & running it.
There are some methods we can use to run Java/Android code dynamically. The way I would personally do it is DexClassLoader and with Reflection.
If you need more details, let me know. Here is what it'd do though:
View app method
Upon pressing X, launch intent with extra to new Activity
Parse and compile code dynamically and then run it with DexClassLoader and Reflection
Sources:
Sample file loading Java method from TerminalIDE Android App
Android Library I made for Auto-Updating Android Applications without needing the Play Store on non-root devices

Load image from binary base64

EDIT: This is a bug in Android version <4.3 Kitkat. It relates to the libjpeg library in Android, which can't handle JPEGs with missing EOF/EOI bits, or apparently with metadata/EXIF data that it doesn't like.
https://code.google.com/p/android/issues/detail?id=9064
ORIGINAL QUESTION:
I have an issue when loading an image in my app.
My endpoint sends JSON which contains a BASE64 encoded image. Depending on the REST call, these images can be PNG or JPG. Some of the JPG files suffer from an issue where they are missing an EOF bit at the end. The PNG files work, and some JPG files work, but unfortunately a lot of these JPG files with the issue are present in the Oracle DB (stored as BLOB). I don't have control of the DB.
I have been looking through Google bugs here:
https://code.google.com/p/android/issues/detail?id=9064
and here:
https://code.google.com/p/android/issues/detail?id=57502
The issue is also seen where the encoding is CYMK using a custom ICC profile.
Decoding the image the standard way returns false:
byte[] imageAsBytes = Base64.decode(base64ImageString, Base64.DEFAULT);
return BitmapFactory.decodeByteArray(imageAsBytes, 0, imageAsBytes.length);
According to the bug reports above, the built in JPG parser in Android is to blame.
I'm trying to figure out a workaround for my device, which is stuck on 4.2.2. I have no other option on this OS version.
I thought it might be a good idea to try and use an image loader library like Universal Image Loader, but it requires I either have the image stored locally, or stored on a URL. As I get the data in BASE64 from the REST server, I can't use this. An option is to support decodeByteArray in a custom class that extends BaseImageDecoder, as stated by the dev at the bottom here: https://github.com/nostra13/Android-Universal-Image-Loader/issues/209
Here's where I get stuck. I already have a custom image decoder to try handle the issue of the missing EOF marker in the JPG file, but I don't know how to edit it to add support for decodeByteArray.
Here is my CustomImageDecoder:
public class CustomImageDecoder extends BaseImageDecoder {
public CustomImageDecoder(boolean loggingEnabled) {
super(loggingEnabled);
}
#Override
protected InputStream getImageStream(ImageDecodingInfo decodingInfo) throws IOException {
InputStream stream = decodingInfo.getDownloader()
.getStream(decodingInfo.getImageUri(), decodingInfo.getExtraForDownloader());
return stream == null ? null : new JpegClosedInputStream(stream);
}
private class JpegClosedInputStream extends InputStream {
private static final int JPEG_EOI_1 = 0xFF;
private static final int JPEG_EOI_2 = 0xD9;
private final InputStream inputStream;
private int bytesPastEnd;
private JpegClosedInputStream(final InputStream iInputStream) {
inputStream = iInputStream;
bytesPastEnd = 0;
}
#Override
public int read() throws IOException {
int buffer = inputStream.read();
if (buffer == -1) {
if (bytesPastEnd > 0) {
buffer = JPEG_EOI_2;
} else {
++bytesPastEnd;
buffer = JPEG_EOI_1;
}
}
return buffer;
}
}
}
By the way, using the above custom class, I am trying to load my byte array like this:
byte[] bytes = Base64.decode(formattedB64String, Base64.NO_WRAP);
ByteArrayInputStream is = new ByteArrayInputStream(bytes);
String imageId = "stream://" + is.hashCode();
...
ImageLoader imageLoader = ImageLoader.getInstance();
imageLoader.displayImage(imageId, userImage, options);
and I get this error:
ImageLoader: Image can't be decoded [stream://1097215584_656x383]
Universal Image loader does not allow the stream:// schema, so I created a custom BaseImageDownloader class that allows it:
public class StreamImageDownloader extends BaseImageDownloader {
private static final String SCHEME_STREAM = "stream";
private static final String STREAM_URI_PREFIX = SCHEME_STREAM + "://";
public StreamImageDownloader(Context context) {
super(context);
}
#Override
protected InputStream getStreamFromOtherSource(String imageUri, Object extra) throws IOException {
if (imageUri.startsWith(STREAM_URI_PREFIX)) {
return (InputStream) extra;
} else {
return super.getStreamFromOtherSource(imageUri, extra);
}
}
}
So if anyone can help me create a better CustomImageDecoder that handles a BASE64 encoded string, or a byte[] containing an image so I can use decodeByteArray, I would be grateful!
Thank you.

UnversalImageLoader uses the following schemes to decode the files
"h t t p ://site.com/image.png" // from Web
"file:///mnt/sdcard/image.png" // from SD card
"file:///mnt/sdcard/video.mp4" // from SD card (video thumbnail)
"content://media/external/images/media/13" // from content provider
"content://media/external/video/media/13" // from content provider (video thumbnail)
"assets://image.png" // from assets
"drawable://" + R.drawable.img // from drawables (non-9patch images)
your scheme is stream://
Hope that helps.

Just to close this off:
The issue here is actually a bug in Android <4.3 where Android can't display images that either aren't closed properly (missing end bytes) or contain certain metadata that, for some reason, it doesn't like. I'm not sure what metadata this is, however. My issue was with JPEGs not being terminated properly.
The bug is fixed in Android 4.3 anyway.

Capture image from web page using regex

I am writing a simple program to capture image resources from the web page. The image items in the html looks like:
CASE1:<img src="http://www.aaa.com/bbb.jpg" alt="title bbb" width="350" height="385"/>
or
CASE2:<img alt="title ccc" src="http://www.ddd.com/bbb.jpg" width="123" height="456"/>
I know how to handle either case separately, take the first one for example:
String CAPTURE = "<img(?:.*)src=\"http://(.*)\\.jpg\"(?:.*)alt=\"(.*?)\"(?:.*)/>";
DefaultHttpClient client = new DefaultHttpClient();
BasicHttpContext context = new BasicHttpContext();
Scanner scanner = new Scanner(client
.execute(new HttpGet(uri), context)
.getEntity().getContent());
Pattern pattern = Pattern.compile(CAPTURE);
while (scanner.findWithinHorizon(pattern, 0) != null) {
MatchResult r = scanner.match();
String imageUrl = "http://" +r.group(1)+".jpg";
String imageTitle = r.group(2);
//Do something with the image
}
The question is how to write the correct pattern to get all the image items from a web page source code which contains both CASE1 and CASE2? I only want to scan the page once.

Use jsoup
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
...
Document doc;
String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0";
try {
// need http protocol
doc = Jsoup.connect("http://domain.tld/images.html").userAgent(userAgent).get();
// get all images
Elements images = doc.select("img");
for (Element image: images) {
// get the values from img attribute (src & alt)
System.out.println("\nImage: " + image.attr("src"));
System.out.println("Alt : " + image.attr("alt"));
}
} catch (IOException e) {
e.printStackTrace();
}
Jsoup, a HTML parser, its “jquery-like” and “regex” selector
syntax is very easy to use and flexible enough to get whatever you
want.

android: declare huge string in strings.xml

i have a very huge HTML string in my app.
When I use it in the code, everything is fine but when I try to declare it in strings.xml, I am getting some errors. Is there a way to make a simple copy of the string in strings.xml? Thank you

HTML and XML are the same basic language, I do not believe that you can store HTML in a string, why not save the html page and package it with the application?
Save the page as a html page in res > raw and then call this method
String html = Utils.readRawTextFile(ctx, R.raw.rawhtml);
public static String readRawTextFile(Context ctx, int resId)
{
InputStream inputStream = ctx.getResources().openRawResource(resId);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i;
try {
i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
} catch (IOException e) {
return null;
}
return byteArrayOutputStream.toString();
}

Error may come at special characters like # double quote single quote etc. to overcome it prefix \ to it and your error get resolved
if you assign same string programmatically there also you will find the same issue
String mString= "your huge string with # error";
in this also you have to overcome be prefixing backslash
String mString= "your huge string with \# error";

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Jsoup attribute selector returning empty - android

Related

How to cache a video in background in android?

Get personal app code and display it

Load image from binary base64

Capture image from web page using regex

android: declare huge string in strings.xml

Categories

Resources