How to remove white space from bottom of PDF itext

How to remove white space from bottom of PDF itext - android

I am trying to make a giant PDF that will contain all information on one page as there can be no breaks between the information in the document. it probably wont ever be printed so the size of the PDF is not an issue. Using Itext the only way I have found possible is to create a page that is 14400px long "or 5M in A4 pages, but this leaves a trailing white space if the document is shorter than expected (I dont ever see the document being longer than 14400px) this is my code so far
private void pdfSave() {
float pageWidth = 200f;
float pageHeight = 1440f;
Rectangle pageSize = new Rectangle(pageWidth, pageHeight);
Document mDoc =new Document(pageSize);
String mFileName = new SimpleDateFormat("ddMMyyyy_HHmmss",
Locale.getDefault()).format(System.currentTimeMillis());
String mFilePath = Environment.getExternalStorageDirectory()+"/"+"pdf_viewer"+"/"+mFileName+".pdf";
File dir = new File(mFilePath);
if(!dir.exists()){
dir.getParentFile().mkdir();
}
try{
PdfWriter.getInstance(mDoc, new FileOutputStream(mFilePath));
mDoc.setMargins(10,10,10,10);
mDoc.open();
String mText = mTextEt.getText().toString();
mDoc.add(new Paragraph(mText,FontFactory.getFont(FontFactory.HELVETICA, 4, Font.BOLDITALIC)));
mDoc.close();
}
Edit: I have tried using a crop box and a second pass as stated in a comment, but my app crashes on this line if I debugging it
Rectangle rect = getOutputPageSize(pageSize, reader, i);

Related

How to implement a PDF viewer that loads pages asynchronously

We need to allow users of our mobile app to browse a magazine with an experience that is fast, fluid and feels native to the platform (similar to iBooks/Google Books).
Some featurs we need are being able to see Thumbnails of the whole magazine, and searching for specific text.
The problem is that our magazines are over 140 pages long and we can’t force our users to have to fully download the whole ebook/PDF beforehand. We need pages to be loaded asynchronously, that is, to let users start reading without having to fully download the content.
I studied PDFKit for iOS however I didn’t find any mention in the documentation about downloading a PDF asynchronously.
Are there any solutions/libraries to implement this functionality on iOS and Android?

What you're looking for is called linearization and according to this answer.
The first object immediately after the %PDF-1.x header line shall
contain a dictionary key indicating the /Linearized property of the
file.
This overall structure allows a conforming reader to learn the
complete list of object addresses very quickly, without needing to
download the complete file from beginning to end:
The viewer can display the first page(s) very fast, before the
complete file is downloaded.
The user can click on a thumbnail page preview (or a link in the ToC
of the file) in order to jump to, say, page 445, immediately after the
first page(s) have been displayed, and the viewer can then request all
the objects required for page 445 by asking the remote server via byte
range requests to deliver these "out of order" so the viewer can
display this page faster. (While the user reads pages out of order,
the downloading of the complete document will still go on in the
background...)
You can use this native library to linearization a PDF.
However
I wouldn't recommend made it has rendering the PDFs wont be fast, fluid or feel native. For those reasons, as far as I know there is no native mobile app that does linearization. Moreover, you have to create your own rendering engine for the PDF as most PDF viewing libraries do not support linearization . What you should do instead is convert the each individual page in the PDF to HTML on the server end and have the client only load the pages when required and cache. We will also save PDFs plan text separately in order to enable search. This way everything will be smooth as the resources will be lazy loaded. In order to achieve this you can do the following.
Firstly
On the server end, whenever you publish a PDF, the pages of the PDF should be split into HTML files as explained above. Page thumbs should also be generated from those pages. Assuming that your server is running on python with a flask microframework this is what you do.
from flask import Flask,request
from werkzeug import secure_filename
import os
from pyPdf import PdfFileWriter, PdfFileReader
import imgkit
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
import io
import sqlite3
import Image
app = Flask(__name__)
#app.route('/publish',methods=['GET','POST'])
def upload_file():
if request.method == 'POST':
f = request.files['file']
filePath = "pdfs/"+secure_filename(f.filename)
f.save(filePath)
savePdfText(filePath)
inputpdf = PdfFileReader(open(filePath, "rb"))
for i in xrange(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
imgkit.from_file("document-page%s.pdf" % i, "document-page%s.jpg" % i)
saveThum("document-page%s.jpg" % i)
os.system("pdf2htmlEX --zoom 1.3 pdf/"+"document-page%s.pdf" % i)
def saveThum(infile):
save = 124,124
outfile = os.path.splitext(infile)[0] + ".thumbnail"
if infile != outfile:
try:
im = Image.open(infile)
im.thumbnail(size, Image.ANTIALIAS)
im.save(outfile, "JPEG")
except IOError:
print("cannot create thumbnail for '%s'" % infile)
def savePdfText(data):
fp = open(data, 'rb')
rsrcmgr = PDFResourceManager()
retstr = io.StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
# Create a PDF interpreter object.
interpreter = PDFPageInterpreter(rsrcmgr, device)
# Process each page contained in the document.
db = sqlite3.connect("pdfText.db")
cursor = db.cursor()
cursor.execute('create table if not exists pagesTextTables(id INTEGER PRIMARY KEY,pageNum TEXT,pageText TEXT)')
db.commit()
pageNum = 1
for page in PDFPage.get_pages(fp):
interpreter.process_page(page)
data = retstr.getvalue()
cursor.execute('INSERT INTO pagesTextTables(pageNum,pageText) values(?,?) ',(str(pageNum),data ))
db.commit()
pageNum = pageNum+1
#app.route('/page',methods=['GET','POST'])
def getPage():
if request.method == 'GET':
page_num = request.files['page_num']
return send_file("document-page%s.html" % page_num, as_attachment=True)
#app.route('/thumb',methods=['GET','POST'])
def getThum():
if request.method == 'GET':
page_num = request.files['page_num']
return send_file("document-page%s.thumbnail" % page_num, as_attachment=True)
#app.route('/search',methods=['GET','POST'])
def search():
if request.method == 'GET':
query = request.files['query ']
db = sqlite3.connect("pdfText.db")
cursor = db.cursor()
cursor.execute("SELECT * from pagesTextTables Where pageText LIKE '%"+query +"%'")
result = cursor.fetchone()
response = Response()
response.headers['queryResults'] = result
return response
Here is an explanation of what the flask app is doing.
The /publish route is responsible for the publishing of your magazine, turning very page to HTML, saving the PDFs text to an SQlite db and generating thumbnails for those pages. I've used pyPDF for splitting the PDF to individual pages, pdfToHtmlEx to convert the pages to HTML, imgkit to generate those HTML to images and PIL to generate thumbs from those images. Also, a simple Sqlite db saves the pages' text.
The /page, /thumb and /search routes are self explanatory. They simply return the HTML, thumb or search query results.
Secondly, on the client end you simply download the HTML page whenever the user scrolls to it. Let me give you an example for android OS. Firstly, you'd want to Create some Utils to handle the GET requestrs
public static byte[] GetPage(int mPageNum){
return CallServer("page","page_num",Integer.toString(mPageNum))
}
public static byte[] GetThum(int mPageNum){
return CallServer("thumb","page_num",Integer.toString(mPageNum))
}
private static byte[] CallServer(String route,String requestName,String requestValue) throws IOException{
OkHttpClient client = new OkHttpClient.Builder().connectTimeout(30, TimeUnit.SECONDS).writeTimeout(30, TimeUnit.SECONDS).readTimeout(30, TimeUnit.SECONDS).build();
MultipartBody.Builder mMultipartBody = new MultipartBody.Builder().setType(MultipartBody.FORM).addFormDataPart(requestName,requestValue);
RequestBody mRequestBody = mMultipartBody.build();
Request request = new Request.Builder()
.url("yourUrl/"+route).post(mRequestBody)
.build();
Response response = client.newCall(request).execute();
return response.body().bytes();
}
The helper utils above simple handle the queries to the server for you, they should be self explanatory.
Next, you simple create an RecyclerView with a WebView viewHolder or better yet an advanced webview as it will give you more power with customization.
public static class ViewHolder extends RecyclerView.ViewHolder {
private AdvancedWebView mWebView;
public ViewHolder(View itemView) {
super(itemView);
mWebView = (AdvancedWebView)itemView;}
}
private class ContentAdapter extends RecyclerView.Adapter<YourFrament.ViewHolder>{
#Override
public ViewHolder onCreateViewHolder(ViewGroup container, int viewType) {
return new ViewHolder(new AdvancedWebView(container.getContext()));
}
#Override
public int getItemViewType(int position) {
return 0;
}
#Override
public void onBindViewHolder( ViewHolder holder, int position) {
handlePageDownload(holder.mWebView);
}
private void handlePageDownload(AdvancedWebView mWebView){....}
#Override
public int getItemCount() {
return numberOfPages;
}
}
That should be about it.

I am sorry to say, But there is no any library or SDK available which provides asynchronously pages loading functionality. It is next to impossible on the mobile device to open PDF file without downloading the full pdf file.
Solution:
I have already done R&D for the same and fulfilled your requirement in the project. I am not sure iBooks and Google books used below mechanism or not. But is working fine as per your requirements.
Divide your pdf into n number of part (E.g Suppose you have 150 pages in pdf then every pdf contain 15 pages -> It will take some effort from web end.)
Once first part download successfully then display it to the user and other part downloading asynchronously.
After downloading all part of the pdf file, Use below code the merge Pdf file.
How to Merge PDF file
UIGraphicsBeginPDFContextToFile(oldFile, paperSize, nil);
for (pageNumber = 1; pageNumber <= count; pageNumber++)
{
UIGraphicsBeginPDFPageWithInfo(paperSize, nil);
//Get graphics context to draw the page
CGContextRef currentContext = UIGraphicsGetCurrentContext();
//Flip and scale context to draw the pdf correctly
CGContextTranslateCTM(currentContext, 0, paperSize.size.height);
CGContextScaleCTM(currentContext, 1.0, -1.0);
//Get document access of the pdf from which you want a page
CGPDFDocumentRef newDocument = CGPDFDocumentCreateWithURL ((CFURLRef) newUrl);
//Get the page you want
CGPDFPageRef newPage = CGPDFDocumentGetPage (newDocument, pageNumber);
//Drawing the page
CGContextDrawPDFPage (currentContext, newPage);
//Clean up
newPage = nil;
CGPDFDocumentRelease(newDocument);
newDocument = nil;
newUrl = nil;
}
UIGraphicsEndPDFContext();
Reference: How to merge PDF file.
Update:
Main advantage of this mechanism is Logic remain same for all device Android and iOS Device.

What is the equivalent of Android's "html-textview" in Objective-C

Getting HTML text from back-end, used uilabel for dynamic height with autolayout in uitableview, i am removing html tags, it is affecting the performance while scrolling and while opening the app it is taking more time.
Now i tried with uiwebview, tried code
dynamicWebview.delegate = self;
dynamicWebview.scrollView.contentInset = UIEdgeInsetsZero;
dynamicWebview.scrollView.scrollEnabled = false;
heightConstraint.constant = dynamicWebview.scrollView.contentSize.height;
[dynamicWebview loadHTMLString:[syntable valueForKey:#"extraDesc"] baseURL:nil];
and in webviewdidfinishload
- (void)webViewDidFinishLoad:(UIWebView *)aWebView {
CGRect frame = aWebView.frame;
frame.size.height = 1;
aWebView.frame = frame;
CGSize fittingSize = [aWebView sizeThatFits:CGSizeZero];
frame.size = fittingSize;
aWebView.frame = frame;
}
but in this i am facing the dynamic height problem for the uiwebview and for the row height too, it is going out of the rowheight, after reloading the cell its getting fit. After trying this i am thinking uilabel only will be the best to acheive what am i trying.
I tried RTLabel and TTTAttributedLabel RTLabel is supporting html tags but unable to use it in IB, TTTAttributedLabel is supporting IB but not the HTML Tags
In android to achieve this they are using a library called html-textview it is dynamically adjusting and handling the html tags too.
Is there any other Objective c way to do that, if not; what should i use.
Edit: i have tried NSAttributedString too like below, this also affecting the performance much.
NSString *extraDesc = [syntable valueForKey:#"extraDesc"];
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:[extraDesc dataUsingEncoding:NSUnicodeStringEncoding] options:#{ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType } documentAttributes:nil error:nil];
splittext.attributedText = attributedString;

try this
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUnicodeStringEncoding] options:#{ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType } documentAttributes:nil error:nil];
self.htmlLabel.attributedText = attributedString;

How to get the Image from first page when search in Google?

Usually after using Google to search for a city, there is a part of Wikipedia page on the right with an image and a map. Can anyone tell me how I could access this image? I should know how to download it.

Actually the main image (that goes with the map image on the right) is very rarely from Wikipedia, so you can't use Wikipedia API to get it. If you want to access the actual main image you can use this:
private static void GetGoogleImage(string word)
{
// make an HTTP Get request
var request = (HttpWebRequest)WebRequest.Create("https://www.google.com.pg/search?q=" + word);
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36";
using (var webResponse = (HttpWebResponse)request.GetResponse())
{
using (var reader = new StreamReader(webResponse.GetResponseStream()))
{
// get all images with base64 string
var matches = Regex.Matches(reader.ReadToEnd(), #"'data:image/jpeg;base64,([^,']*)'");
if (matches.Count > 0)
{
// get the image with the max height
var bytes = matches.Cast<Match>()
.Select(x => Convert.FromBase64String(x.Groups[1].Value.Replace("\\75", "=").Replace("\\075", "=")))
.OrderBy(x => Image.FromStream(new MemoryStream(x, false)).Height).Last();
// save the image as 'image.jpg'
using (var imageFile = new FileStream("image.jpg", FileMode.Create))
{
imageFile.Write(bytes, 0, bytes.Length);
imageFile.Flush();
}
}
}
}
}
This work for me, and always returns the actual main image (if such exists). For example, GetGoogleImage("New York") give me data:image/jpeg;base64,/9j/4AAQSkZJRg....
I use the fact that from the all base64 string images in response the main has the max height, so its need only to order them by height and to select the last one. If it's required, you can check here also for minimum image height. The replacing \075 to = is needed base64's padding.

If you want Wikipedia article main image you have to use Wikipedia API.
Update:
You can use jsoup: Java HTML Parser org.jsoup:jsoup:1.8.3 which return list of image inside page.
String stringResponse = getHtmlContent(url);
Document doc = Jsoup.parse(stringResponse);
Element content = doc.getElementById("content");
//Get all elements with img tag ,
Elements img = content.getElementsByTag("img");
for (Element el : img) {
//for each element get the src image url
String src = el.attr("src");
Log.d(TAG, "src attribute is : " + src);
String alt = el.attr("alt");
//do some stuff
}
Update:
Wikipida provide API for to return HTML Content

Append line of text to Azure Block Blob from Android

I want to append a line of text to an existing Azure cloud block blob from an Android device.
In VB.Net I would AcquireLease, getBlockBlobReference, DownloadToFile, add the line on the local files system, UploadToFile, ReleaseLease . Simple and secure, if a bit long-winded.
In Android, it looks a little more tricky. At the moment, my best solution is this:
CloudBlockBlob blob1=container.getBlockBlobReference(chosenOne+".txt");
String proposedLeaseId1 = UUID.randomUUID().toString();
OperationContext operationContext1 = new OperationContext();
blob1.acquireLease(15, proposedLeaseId1, null /*access condition*/,null/* BlobRequestOptions */, operationContext1);
AccessCondition condition = new AccessCondition();
condition.setLeaseID(proposedLeaseId1);
BlobInputStream blobIn = blob1.openInputStream();
blob1.downloadAttributes();
long blobLengthToUse = blob1.getProperties().getLength();
byte[] result = new byte[(int) blobLengthToUse];
blob1.downloadToByteArray(result,0);
blobIn.close();
CloudBlockBlob blob1 = container.getBlockBlobReference(chosenOne+".txt");
String proposedLeaseId1 = UUID.randomUUID().toString();
OperationContext operationContext1 = new OperationContext();
blob1.acquireLease(15, proposedLeaseId1, null /*access condition*/,null/* BlobRequestOptions */, operationContext1);
AccessCondition condition = new AccessCondition();
condition.setLeaseID(proposedLeaseId1);
BlobInputStream blobIn = blob1.openInputStream();
blob1.downloadAttributes();
long blobLengthToUse = blob1.getProperties().getLength();
byte[] result = new byte[(int) blobLengthToUse];
blob1.downloadToByteArray(result,0);
blobIn.close();
blob1.deleteIfExists(DeleteSnapshotsOption.NONE,condition, null, operationContext1);
BlobOutputStream blobOut = blob1.openOutputStream();
//this is a byte by byte write ...
//which is fine ... but no use if you want to replace ...
/*int next = blobIn.read();
while (next != -1) {
blobOut.write(next);
next = blobIn.read();
}*/
blobOut.write(result);
String strTemp="This is just a test string";
blobOut.write(strTemp.getBytes());
blobOut.close();
Apart from being extremely long-winded, I am concerned that as soon as I delete the blob, the lease will go and that I may hit integrity issues. I would appreciate any help in making this code simpler and more secure. I know that Microsoft are planning to introduce append blobs in 3Q 2015, but I want to implement this now.

You can call PutBlock to upload the appended content (the maximum size of each block is 4MB, so please split the appended content into blocks if required), and then call PutBlockList on this blob by passing in the previously committed blocks plus and newly appended blocks.

How can I load an image from a url into an android remote view for a widget

I have an image url I parse form json that I want to load into an android widget onto the homescreen. Right now I am trying to do it this way but its wrong:
ImageDownloadTask imageD = new ImageDownloadTask(image);
views.setImageViewBitmap(R.id.image, imageD.execute(image));
image is a string holding a url to an image that needs to be downloaded and I am trying to set it to R.id.image
I found another stack question and tried this as a result:
views.setBitmap(R.id.image, "setImageBitmap",BitmapFactory.decodeStream(new URL(image).openStream()));
And when I use that nothing in the app loads at all, none of the text views get set.
My third try was this:
//get beer data
JSONObject o = new JSONObject(result);
String name = getName(o);
String image = getImage(o);
String abv = getABV(o);
String ibu = getIBU(o);
String glass = getGlass(o);
String beerBreweryName = getBreweryName(o);
String beerBreweryStyle = getBreweryStyle(o);
String beerDescription = getDescription(o);
InputStream in = new java.net.URL(image).openStream();
Bitmap bitmap = BitmapFactory.decodeStream(in);
views.setTextViewText(R.id.beerTitle, name);
views.setTextViewText(R.id.beerBreweryName, beerBreweryName);
views.setTextViewText(R.id.beerStyleName, beerBreweryStyle);
views.setImageViewBitmap(R.id.image, bitmap);
This gave the same result as the last attempt, it would not even set any text views....
Just tried another attempt after one of the answers posted below:
RemoteViews views = new RemoteViews(c.getPackageName(), R.layout.widget_test);
//get beer data
JSONObject o = new JSONObject(result);
String name = getName(o);
String imageURL = getImage(o);
String abv = getABV(o);
String ibu = getIBU(o);
String glass = getGlass(o);
String beerBreweryName = getBreweryName(o);
String beerBreweryStyle = getBreweryStyle(o);
String beerDescription = getDescription(o);
Log.d("widgetImage" , imageURL);
views.setImageViewUri(R.id.image, Uri.parse(imageURL));
views.setTextViewText(R.id.beerTitle, name);
views.setTextViewText(R.id.beerBreweryName, beerBreweryName);
views.setTextViewText(R.id.beerStyleName, beerBreweryStyle);
mgr.updateAppWidget(appWidgetIds, views);
This attempt lets all the text views load, but no image ever shows up.

The way to do this reliably is to use setImageViewURI on the remote ImageView. The trick is that the URI you give it is a content:// URI which then points back to a content provider that you export from your application. In your content provider you can do anything you need to do to supply the image bytes.
For example, in your manifest:
<provider android:name=".ImageContentProvider" android:authorities="com.example.test" android:exported="true" />
And your provider:
public class ImageContentProvider extends ContentProvider {
// (Removed overrides that do nothing)
#Override
public ParcelFileDescriptor openFile(Uri uri, String mode) throws FileNotFoundException {
List<String> segs = uri.getPathSegments();
// Download the image content here, get the info you need from segs
return ParcelFileDescriptor.open(new File(path), ParcelFileDescriptor.MODE_READ_ONLY);
}
}
And then your URL is something like:
content://com.example.test/something-you-can-define/here
This is necessary because your remote image view is not running in your process. You are much more limited in what you can do because everything must be serialized across the process boundary. The URI can serialize just fine but if you try to send a megabyte of image data with setImageViewBitmap, it's probably going to fail (depending on available device memory).

Got a lot of help from multiple sources for this question. The big problem for me why a bunch of the attempts I tried listed above seemed to lock the widget app and not load anything is because I can not download the image and set it in a UI thread.
To accomplish this I had to move everything to the do in background of my async task and not in the onPostExecute.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

How to remove white space from bottom of PDF itext - android

Related

How to implement a PDF viewer that loads pages asynchronously

What is the equivalent of Android's "html-textview" in Objective-C

How to get the Image from first page when search in Google?

Append line of text to Azure Block Blob from Android

How can I load an image from a url into an android remote view for a widget

Categories

Resources