How do you scrape an image from a website using Flutter? - android

Hi I'm trying to do a simple task of getting the img src url from a website but I can't seem to do it, I've tried various flutter packages and now I've reverted back to vanilla Flutter. This is my code:
onPressed: () async {
http.Response response = await http.get('https://tiktok.com/#$enteredUsername');
dom.Document document = parser.parse(response.body);
final elements = document.getElementsByClassName('jsx-581822467');
print(elements);
},
I'm simply trying to get the image URL from this website (tiktok.com):
I've looked into the source code and it says the class name is 'jsx-581822467', but if I try to use that in the code it returns with a blank list.
How can I just simply get the URL of this profile picture? And the other elements with the 'jsx' prefix as their class names?

I think I figured out what your problem is. The inspector of the web browser displays the HTML on a TikTok profile page. However, this is only generated with JavaScript once the page is loaded. If we download the content via http.get(), we get the raw HTML before JavaScript can do any changes.
Write http.get(), in front of your URL or right-click on the website and click on View Page Source. Now the HTML will be displayed in the same way as your app gets it.
Search for avatar-wrapper round. You won't be able to find it, because the tag from the profile picture doesn't exist here yet.
Fortunately, the URL of the profile picture is already included in other places. Search for <meta property="og:image" content=". You will find only one hit and after the hit the URL of the profile picture starts directly.
Therefore, in my opinion, the easiest way to get the URL is:
download HTML.
remove all text up to <meta property="og:image" content=".
all following characters up to the next " are the URL we are looking for.
Here I have inserted my code, which worked fine for me:
Future<String> getProfileImageUrl(String username) async {
// Download the content of the site
http.Response response = await http.get("https://www.tiktok.com/#$username");
String html = response.body;
// The html contains the following string exactly one time.
// After this specific string the url of the profile picture starts.
String needle = '<meta property="og:image" content="';
int index = html.indexOf(needle);
// The result of indexOf() equals -1 if the needle didn't occurred in the html.
// In that case the received username may be invalid.
if (index == -1)
return null;
// Remove all characters up to the start of the text snippet that we want.
html = html.substring(html.indexOf(needle) + needle.length);
// return all chars until the first occurrence of '"'
return html.substring(0, html.indexOf('"'));
}
I hope that I could help you with my explanation.
Edit 1: General approach
view page source to view HTML of the page
search for the desired substring.
Select the previous 10 to 15 characters and see how often this string occurs before.
If it occurs more than once, you must call html = html.substring(html.indexOf(needle) + needle.length); accordingly often repeatedly.
reload the page and check if it still works.
now you have found your needle string.

Related

All webview content printed in one page

Case: User should be able to view and print pdf
My solution: I am opening PDF inside Webview with the help of docs.google.com/gview. Below is my code
Set up Webview
string url = "http://www.africau.edu/images/default/sample.pdf";
string gview = $"https://docs.google.com/gview?embedded=true&url={url}";
mWebView.LoadUrl(gview);
Print PDF
var printMgr = (PrintManager)GetSystemService(PrintService);
printMgr.Print("print", mWebView.CreatePrintDocumentAdapter("print"), null);
Below is the screenshot. As you can see PDF loads just fine
Problem
When I want to print PDF, all the PDF pages are printed in one paper which you can see below
I would appreciate any suggestion, including different library for displaying/printing pdf or suggestion in Java or Kotlin, I can convert them to C#.
I would not print the web page but print the PDF directly as when printing the web page it just sees it as a longer web page and knows nothing about the content.
Use a custom print adapter instead, but instead of drawing a PDF to print you can just use the existing PDF you already have.
See for details https://developer.android.com/training/printing/custom-docs.html

Http.get loads and parse website partially before getting HTML in Flutter

I am trying to perform web parsing in flutter. I want to grab all episode links and numbers from a certain website https://www2.9anime.to/watch/black-clover-dub.2y44/0wql03
This is my code to parse the html:
var url = 'https://www2.9anime.to/watch/black-clover-dub.2y44/0wql03';
http.Response response = await http.get((url));
dom.Document document = parse(response.body);
List<dom.Element> rapidvideoepisodelinks = document.getElementsByTagName('#servers-container');
List<Map<String, dynamic>> rapidvideoepisodelinkMap = [];
for (var link in rapidvideoepisodelinks) {
rapidvideoepisodelinkMap.add(
{
/////////////////////some logic////////////////////
});
}
var rapidvideoepisodejson = json.encode(rapidvideoepisodelinkMap);
rapidvideoepisodelist = (json.decode(rapidvideoepisodejson) as List)
.map((data) => new Rapidvideoepisodelist.fromJson(data))
.toList();
setState(() {
isLoading = false;
});
But the thing is, the episodes content area takes a few seconds to load. And the http.get is loading the website too early before this part is even loaded. Because of this, I am unable to parse it completely. This area containing the episode is not even loaded, so its HTML isn't parsed. Everything else seems to be working fine except for the areas like this that take additional time to load.
Is there a way to solve this issue?
Like parsing the website after it is completely loaded or something like that.
Any help really appreciated.
Your thinking is not really correct. The reason why you can not parse it is NOT because of partial load. http.get is getting the HTML file. That's all. You are just getting the HTML file and you got it. What you see in your browser is not that HTML file. Your browser first gets HTML file and then find what else it should load from the HTML file and then load JPG files, CSS files, JS scripts etc...
The contents you are trying to parse is manipulated by executing JS script inside the Browser. You can not achieve this with http.get. I am not sure how to achieve what you want in flutter. You may need some kind of pseudo browser in dart if any to load the URL and then parse the resulted html. You will never be able to do it with http.get because you do get the HTML file, but you are actually not looking for that HTML file. I am not sure if you can understand what I mean or not.

How to clear html page before showing into a webview in Android?

I have the URL of a webpage to be displayed into a webview in my Android app. Before showing this page i want to clear the html code of this page from some tag (such as the header, footer, ecc..) in order to show only few information. How can i do it? I tried to solve the issue working with JSoup but i can't understand how to create and pass the "new page" to the webview. Anybody can help me?
EDIT
I cleaned the html code useless through jsoup libraries. Then, always by mean of these, i get head and body content and finally i showing the "cleared" web page through these lines:
headURL = doc.select("head").outerHtml();
bodyURL = doc.select("body").outerHtml();
webview.loadData( "<html>"+headURL+bodyURL+"</html>" , "text/html", "charset=UTF-8");
webview.setWebViewClient(new DisPlayWebPageActivityClient());
The view shows the new page but do not load css files specified in the head(that has not been touched). Who can say me why?
You can fetch the WebPage you want to display as a string, parse and remove whatever you don't want and then load this string as data in your webview.
Something like:
String webContent = fetchPage(url);
String cleanedWebContent = cleanUp(webContent);
webView.loadData(cleanedWebContent, "text/html", "UTF-8");
Of course, you will need to implement fetchPage and cleanUp as they are not Android methods

Opening URL on Android web browser causes Google search

I'm having a slight problem opening a certain URL in the browser. First of all I use the following code to launch the browser:
Intent browserIntent = new Intent(Intent.ACTION_VIEW, Uri.parse(Globals.currentChatURL));
startActivity(Intent.createChooser(browserIntent, "Select browser:"));
Now if I set Globals.currentChatURL to something like http://www.google.com then it opens that site just fine. But my URL is a little more complicated as it contains multiple parameters which are all base64 encoded. Here is an example of how my URL looks:
http://webportal.mysite.com/ChatProgram/chat.php? intgroup=UFYyMA==&intid=UFYyMEZN&hg=Pw__&pref=user&en=U0NPVFQgTUlMTEFS&ee=cGF1bGdAbWFnbmF0ZWNoLmNvbQ==&eq=UFRWRkVI&ec=TUFHTkFURUNI
Now if I use my above code to try and launch this URL it brings me to the Google search page with the following message:
"Your search - http://URLabove ... did not match any documents"
Yet if I copy the URL and paste it into the address box it brings me to the right place. How can I fix this?? The whole point of this is to have the user click the button and the site to launch, not for the user to have to copy and paste the URL manually.
Any suggestions would be greatly appreciated.
Thanks a lot
There is unwanted equal signs in the query part of your http URI. Such signs have a specific meaning as delimiters in the form &parameter=value.
This equal signs represents padding values (0, 1 or 2) from your base64 encoding.
You can either
remove them because your base64 server decoder won't bother reconstructing them, or
percent encode them (with all other reserved characters).
In android you can use percent encode this way:
String value = URLEncoder.encode("annoying values with reserved chars &=#", "utf-8");
String url = "http://stackoverflow.com/search?q=" + value;
The RFC 2396 is now deprecated but that is what URI.parse() is based on as stated by the documentation:
uriString an RFC 2396-compliant, encoded URI

Launch a background web search

I am trying to launch a websearch using data input from a user. The data is input through TextEdit boxes. Upon submission of the data, i would like my program to: 1) search for a specific webpage based on the user input 2)Find specific elements at the webpage 3) Display the webpage.
Here is an example:
User Input (in a non browser/webview page)
1) Store Name: Macey's 2)Zip Code: 77471
In the background my program will:
1) Find the Macey's website
2) Find the store nearest zip code 77471
3) Load the Web page for the store nearest zip code 77471
Obviously there is a lot of error handeling, exceptions, ect that would go along with this. For the sake of making this example "easy" lets pretend that 1) A the Macey's main page exists 2)A sperate page for the 77471 store exists. 3)There is a link to the 77471 store on the Macey's main page.
I have the code for getting the user input variables and i know how to launch the webview. What i dont know how to do is to search for the Macy's home page, then find the link i am looking for on the homepage and navigate to it. Loading the webview is not the problem. Find the data is.
Below is my current code. Right now i am setup so that the user will navigate to the webpage they are looking for but i would rather handle the searching for them, if it is possible.
public void InitializeWebView(){
portal = (WebView)findViewById(R.id.web_Portal);
WebSettings Settings = portal.getSettings();
Settings.setSavePassword(false);
Settings.setSaveFormData(false);
Settings.setJavaScriptEnabled(true);
Settings.setSupportZoom(true);
Settings.supportZoom();
portal.setWebViewClient(new WebViewClient(){
#Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
view.loadUrl(url);
return true;
}
});
}
public void searchAndShow(String Store, String zip){
portal.loadUrl("http://www.google.com");
}
You can get search result in JSON format from google using their API. Here is a nice example in JAVA. Just don't use key parameter until you do not have a vlid key.

Categories

Resources