I am trying to make a giant PDF that will contain all information on one page as there can be no breaks between the information in the document. it probably wont ever be printed so the size of the PDF is not an issue. Using Itext the only way I have found possible is to create a page that is 14400px long "or 5M in A4 pages, but this leaves a trailing white space if the document is shorter than expected (I dont ever see the document being longer than 14400px) this is my code so far
private void pdfSave() {
float pageWidth = 200f;
float pageHeight = 1440f;
Rectangle pageSize = new Rectangle(pageWidth, pageHeight);
Document mDoc =new Document(pageSize);
String mFileName = new SimpleDateFormat("ddMMyyyy_HHmmss",
Locale.getDefault()).format(System.currentTimeMillis());
String mFilePath = Environment.getExternalStorageDirectory()+"/"+"pdf_viewer"+"/"+mFileName+".pdf";
File dir = new File(mFilePath);
if(!dir.exists()){
dir.getParentFile().mkdir();
}
try{
PdfWriter.getInstance(mDoc, new FileOutputStream(mFilePath));
mDoc.setMargins(10,10,10,10);
mDoc.open();
String mText = mTextEt.getText().toString();
mDoc.add(new Paragraph(mText,FontFactory.getFont(FontFactory.HELVETICA, 4, Font.BOLDITALIC)));
mDoc.close();
}
Edit: I have tried using a crop box and a second pass as stated in a comment, but my app crashes on this line if I debugging it
Rectangle rect = getOutputPageSize(pageSize, reader, i);
We need to allow users of our mobile app to browse a magazine with an experience that is fast, fluid and feels native to the platform (similar to iBooks/Google Books).
Some featurs we need are being able to see Thumbnails of the whole magazine, and searching for specific text.
The problem is that our magazines are over 140 pages long and we can’t force our users to have to fully download the whole ebook/PDF beforehand. We need pages to be loaded asynchronously, that is, to let users start reading without having to fully download the content.
I studied PDFKit for iOS however I didn’t find any mention in the documentation about downloading a PDF asynchronously.
Are there any solutions/libraries to implement this functionality on iOS and Android?
What you're looking for is called linearization and according to this answer.
The first object immediately after the %PDF-1.x header line shall
contain a dictionary key indicating the /Linearized property of the
file.
This overall structure allows a conforming reader to learn the
complete list of object addresses very quickly, without needing to
download the complete file from beginning to end:
The viewer can display the first page(s) very fast, before the
complete file is downloaded.
The user can click on a thumbnail page preview (or a link in the ToC
of the file) in order to jump to, say, page 445, immediately after the
first page(s) have been displayed, and the viewer can then request all
the objects required for page 445 by asking the remote server via byte
range requests to deliver these "out of order" so the viewer can
display this page faster. (While the user reads pages out of order,
the downloading of the complete document will still go on in the
background...)
You can use this native library to linearization a PDF.
However
I wouldn't recommend made it has rendering the PDFs wont be fast, fluid or feel native. For those reasons, as far as I know there is no native mobile app that does linearization. Moreover, you have to create your own rendering engine for the PDF as most PDF viewing libraries do not support linearization . What you should do instead is convert the each individual page in the PDF to HTML on the server end and have the client only load the pages when required and cache. We will also save PDFs plan text separately in order to enable search. This way everything will be smooth as the resources will be lazy loaded. In order to achieve this you can do the following.
Firstly
On the server end, whenever you publish a PDF, the pages of the PDF should be split into HTML files as explained above. Page thumbs should also be generated from those pages. Assuming that your server is running on python with a flask microframework this is what you do.
from flask import Flask,request
from werkzeug import secure_filename
import os
from pyPdf import PdfFileWriter, PdfFileReader
import imgkit
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
import io
import sqlite3
import Image
app = Flask(__name__)
#app.route('/publish',methods=['GET','POST'])
def upload_file():
if request.method == 'POST':
f = request.files['file']
filePath = "pdfs/"+secure_filename(f.filename)
f.save(filePath)
savePdfText(filePath)
inputpdf = PdfFileReader(open(filePath, "rb"))
for i in xrange(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
imgkit.from_file("document-page%s.pdf" % i, "document-page%s.jpg" % i)
saveThum("document-page%s.jpg" % i)
os.system("pdf2htmlEX --zoom 1.3 pdf/"+"document-page%s.pdf" % i)
def saveThum(infile):
save = 124,124
outfile = os.path.splitext(infile)[0] + ".thumbnail"
if infile != outfile:
try:
im = Image.open(infile)
im.thumbnail(size, Image.ANTIALIAS)
im.save(outfile, "JPEG")
except IOError:
print("cannot create thumbnail for '%s'" % infile)
def savePdfText(data):
fp = open(data, 'rb')
rsrcmgr = PDFResourceManager()
retstr = io.StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
# Create a PDF interpreter object.
interpreter = PDFPageInterpreter(rsrcmgr, device)
# Process each page contained in the document.
db = sqlite3.connect("pdfText.db")
cursor = db.cursor()
cursor.execute('create table if not exists pagesTextTables(id INTEGER PRIMARY KEY,pageNum TEXT,pageText TEXT)')
db.commit()
pageNum = 1
for page in PDFPage.get_pages(fp):
interpreter.process_page(page)
data = retstr.getvalue()
cursor.execute('INSERT INTO pagesTextTables(pageNum,pageText) values(?,?) ',(str(pageNum),data ))
db.commit()
pageNum = pageNum+1
#app.route('/page',methods=['GET','POST'])
def getPage():
if request.method == 'GET':
page_num = request.files['page_num']
return send_file("document-page%s.html" % page_num, as_attachment=True)
#app.route('/thumb',methods=['GET','POST'])
def getThum():
if request.method == 'GET':
page_num = request.files['page_num']
return send_file("document-page%s.thumbnail" % page_num, as_attachment=True)
#app.route('/search',methods=['GET','POST'])
def search():
if request.method == 'GET':
query = request.files['query ']
db = sqlite3.connect("pdfText.db")
cursor = db.cursor()
cursor.execute("SELECT * from pagesTextTables Where pageText LIKE '%"+query +"%'")
result = cursor.fetchone()
response = Response()
response.headers['queryResults'] = result
return response
Here is an explanation of what the flask app is doing.
The /publish route is responsible for the publishing of your magazine, turning very page to HTML, saving the PDFs text to an SQlite db and generating thumbnails for those pages. I've used pyPDF for splitting the PDF to individual pages, pdfToHtmlEx to convert the pages to HTML, imgkit to generate those HTML to images and PIL to generate thumbs from those images. Also, a simple Sqlite db saves the pages' text.
The /page, /thumb and /search routes are self explanatory. They simply return the HTML, thumb or search query results.
Secondly, on the client end you simply download the HTML page whenever the user scrolls to it. Let me give you an example for android OS. Firstly, you'd want to Create some Utils to handle the GET requestrs
public static byte[] GetPage(int mPageNum){
return CallServer("page","page_num",Integer.toString(mPageNum))
}
public static byte[] GetThum(int mPageNum){
return CallServer("thumb","page_num",Integer.toString(mPageNum))
}
private static byte[] CallServer(String route,String requestName,String requestValue) throws IOException{
OkHttpClient client = new OkHttpClient.Builder().connectTimeout(30, TimeUnit.SECONDS).writeTimeout(30, TimeUnit.SECONDS).readTimeout(30, TimeUnit.SECONDS).build();
MultipartBody.Builder mMultipartBody = new MultipartBody.Builder().setType(MultipartBody.FORM).addFormDataPart(requestName,requestValue);
RequestBody mRequestBody = mMultipartBody.build();
Request request = new Request.Builder()
.url("yourUrl/"+route).post(mRequestBody)
.build();
Response response = client.newCall(request).execute();
return response.body().bytes();
}
The helper utils above simple handle the queries to the server for you, they should be self explanatory.
Next, you simple create an RecyclerView with a WebView viewHolder or better yet an advanced webview as it will give you more power with customization.
public static class ViewHolder extends RecyclerView.ViewHolder {
private AdvancedWebView mWebView;
public ViewHolder(View itemView) {
super(itemView);
mWebView = (AdvancedWebView)itemView;}
}
private class ContentAdapter extends RecyclerView.Adapter<YourFrament.ViewHolder>{
#Override
public ViewHolder onCreateViewHolder(ViewGroup container, int viewType) {
return new ViewHolder(new AdvancedWebView(container.getContext()));
}
#Override
public int getItemViewType(int position) {
return 0;
}
#Override
public void onBindViewHolder( ViewHolder holder, int position) {
handlePageDownload(holder.mWebView);
}
private void handlePageDownload(AdvancedWebView mWebView){....}
#Override
public int getItemCount() {
return numberOfPages;
}
}
That should be about it.
I am sorry to say, But there is no any library or SDK available which provides asynchronously pages loading functionality. It is next to impossible on the mobile device to open PDF file without downloading the full pdf file.
Solution:
I have already done R&D for the same and fulfilled your requirement in the project. I am not sure iBooks and Google books used below mechanism or not. But is working fine as per your requirements.
Divide your pdf into n number of part (E.g Suppose you have 150 pages in pdf then every pdf contain 15 pages -> It will take some effort from web end.)
Once first part download successfully then display it to the user and other part downloading asynchronously.
After downloading all part of the pdf file, Use below code the merge Pdf file.
How to Merge PDF file
UIGraphicsBeginPDFContextToFile(oldFile, paperSize, nil);
for (pageNumber = 1; pageNumber <= count; pageNumber++)
{
UIGraphicsBeginPDFPageWithInfo(paperSize, nil);
//Get graphics context to draw the page
CGContextRef currentContext = UIGraphicsGetCurrentContext();
//Flip and scale context to draw the pdf correctly
CGContextTranslateCTM(currentContext, 0, paperSize.size.height);
CGContextScaleCTM(currentContext, 1.0, -1.0);
//Get document access of the pdf from which you want a page
CGPDFDocumentRef newDocument = CGPDFDocumentCreateWithURL ((CFURLRef) newUrl);
//Get the page you want
CGPDFPageRef newPage = CGPDFDocumentGetPage (newDocument, pageNumber);
//Drawing the page
CGContextDrawPDFPage (currentContext, newPage);
//Clean up
newPage = nil;
CGPDFDocumentRelease(newDocument);
newDocument = nil;
newUrl = nil;
}
UIGraphicsEndPDFContext();
Reference: How to merge PDF file.
Update:
Main advantage of this mechanism is Logic remain same for all device Android and iOS Device.
I am trying to convert iOS application into android. But I just start learning Java a few days ago. I'm trying to get a value from a tag inside html.
Here is my swift code:
if let url = NSURL(string: "http://www.example.com/") {
let htmlData: NSData = NSData(contentsOfURL: url)!
let htmlParser = TFHpple(HTMLData: htmlData)
//the value which i want to parse
let nPrice = htmlParser.searchWithXPathQuery("//div[#class='round-border']/div[1]/div[2]") as NSArray
let rPrice = NSMutableString()
//Appending
for element in nPrice {
rPrice.appendString("\n\(element.raw)")
}
let raw = String(NSString(string: rPrice))
//the value without trimming
let stringPrice = raw.stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil)
//result
let trimPrice = stringPrice.stringByReplacingOccurrencesOfString("^\\n*", withString: "", options: .RegularExpressionSearch)
}
Here is my Java code using Jsoup
public class Quote extends Activity {
TextView price;
String tmp;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_quote);
price = (TextView) findViewById(R.id.textView3);
try {
doc = Jsoup.connect("http://example.com/").get();
Element content = doc.getElementsByTag("//div[#class='round-border']/div[1]/div[2]");
} catch (IOException e) {
//e.printStackTrace();
}
}
}
My problems are as following:
I got NetworkOnMainThreatException whenever i tried any codes.
I'm not sure that using getElementByTag with this structure is correct.
Please help,
Thanks.
I got NetworkOnMainThreatException whenever i tried any codes.
You should use Volley instead of Jsoup. It will be a faster and more efficient alternative. See this answer for some sample code.
I'm not sure that using getElementByTag with this structure is correct.
Element content = doc.getElementsByTag("//div[#class='round-border']/div[1]/div[2]");
Jsoup doesn't understand xPath. It works with CSS selectors instead.
The above line of code can be corrected like this:
Elements divs = doc.select("div.round-border > div:nth-child(1) > div:nth-child(2)");
for(Element div : divs) {
// Process each div here...
}
Usually after using Google to search for a city, there is a part of Wikipedia page on the right with an image and a map. Can anyone tell me how I could access this image? I should know how to download it.
Actually the main image (that goes with the map image on the right) is very rarely from Wikipedia, so you can't use Wikipedia API to get it. If you want to access the actual main image you can use this:
private static void GetGoogleImage(string word)
{
// make an HTTP Get request
var request = (HttpWebRequest)WebRequest.Create("https://www.google.com.pg/search?q=" + word);
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36";
using (var webResponse = (HttpWebResponse)request.GetResponse())
{
using (var reader = new StreamReader(webResponse.GetResponseStream()))
{
// get all images with base64 string
var matches = Regex.Matches(reader.ReadToEnd(), #"'data:image/jpeg;base64,([^,']*)'");
if (matches.Count > 0)
{
// get the image with the max height
var bytes = matches.Cast<Match>()
.Select(x => Convert.FromBase64String(x.Groups[1].Value.Replace("\\75", "=").Replace("\\075", "=")))
.OrderBy(x => Image.FromStream(new MemoryStream(x, false)).Height).Last();
// save the image as 'image.jpg'
using (var imageFile = new FileStream("image.jpg", FileMode.Create))
{
imageFile.Write(bytes, 0, bytes.Length);
imageFile.Flush();
}
}
}
}
}
This work for me, and always returns the actual main image (if such exists). For example, GetGoogleImage("New York") give me ....
I use the fact that from the all base64 string images in response the main has the max height, so its need only to order them by height and to select the last one. If it's required, you can check here also for minimum image height. The replacing \075 to = is needed base64's padding.
If you want Wikipedia article main image you have to use Wikipedia API.
Update:
You can use jsoup: Java HTML Parser org.jsoup:jsoup:1.8.3 which return list of image inside page.
String stringResponse = getHtmlContent(url);
Document doc = Jsoup.parse(stringResponse);
Element content = doc.getElementById("content");
//Get all elements with img tag ,
Elements img = content.getElementsByTag("img");
for (Element el : img) {
//for each element get the src image url
String src = el.attr("src");
Log.d(TAG, "src attribute is : " + src);
String alt = el.attr("alt");
//do some stuff
}
Update:
Wikipida provide API for to return HTML Content
I am new to Box2d and have a situation where I have two bodies. One is static and the other is dynamic. I want my dynamic body to go down wards and come back and hit the other body along same line. I thought to use prismatic joint after some initial study. I have looked into some example and written a piece of code in onLoadScene(). But nothing is moving. Here is the code:
#Override
public Scene onLoadScene()
{
.....
PrismaticJointDef prismaticJointDef = new PrismaticJointDef();
prismaticJointDef.initialize(bdy_holder, bdy_spring, bdy_holder.getWorldCenter(), new Vector2(1.0f, 0.0f));
prismaticJointDef.lowerTranslation = -5.0f;
prismaticJointDef.upperTranslation = 2.5f;
prismaticJointDef.enableLimit = true;
prismaticJointDef.maxMotorForce = 200.0f;
prismaticJointDef.motorSpeed = 10.0f;
prismaticJointDef.enableMotor = true;
prismaticJointDef.collideConnected = true;
prismatic_Joint = (PrismaticJoint)this.mPhysicsWorld.createJoint(prismaticJointDef);
}
Now I think the bodies should be moving when I run the application, but they are not movng. I am totally new and can't figure out the exact problem. Kindly guide me to the problem, solution and proper example of using this. Thanks.
try
prismaticJointDef.collideConnected = false;