Everyone at some point needs to convert and html page into a PDF for some reason or another. For this I've always used a component which was cheap at the time (and is far from it now). This component has worked well for the last 8(ish) years, most ish in the last while because it doesn't deal well with https sites. When contacting the vendor they said hey but our latest version which can be expected I guess because it's 8 years down the line but then came the it's ALL OF THE $ . The last issue is that the component I used doesn't work in Azure. That lead me to some investigation and then ultimately this post.
What's the problem?
We have 3 problems
- We need to convert html 2 pdf
- We it not to break the bank as there potentially aren't high returns on it's usage
- The last one is the specific component I used doesn't work in Azure web apps because of the way permissions work there
Easy to solve right, well if you reading this because you needed this component then yes but if you were me Googling with Bing for 40 minutes then writing some code that makes me all kinds of dirty then that yes becomes a ja, kind of but it works so here it is .
What's the primary component?
The component being used for all the magic is wkhtmltopdf which is an open source project licensed under the LGPLv3 license. The component is command line based which renders HTML into PDF using the Qt WebKit rendering engine. It run entirely "headless" and do not require a display or display service.
With little work you can for example generate a pdf of Google's home page with the script below
That will go off an think for a couple seconds and return you a pdf
Super simple right? There is also a bunch of arguments that you can pass into the exe which you can find on the projects site from the basics like changing orientation to passing in a username and password for authenticated pages that you want to generate PDF's for.
How we using it?
As mentioned above this is a command line utility so we are basically just wrapping it in a standard execution of cmd from C# and then waiting for the pdf to generate and returning it to the caller.
And you call this code like the below example.
I'm not sure how stable this code is to run in production but in the basic testing I've done it gets the job done and without any issues so far
So we've solved all of our problems
- We are converting html 2 pdf
- It's free, can't get cheaper then that (unless you want to pay me for it )
- And although not shown in the blog (because I could have got the screenshots from anywhere) this code works in Azure.
Anything is possible. The interesting thing about this solution is that you see the guts of what looks like a messy solution but don't realize that under the covers if you are using a 3rd party component they are probably doing something similar (or worst) but the key takeaway is that it works .
The code used for this sample is on GitHub if you want to download it and see it working for yourself.
Do you know of any cool converting components? Why not share them below in the comments