polyglot.printpdf (class) ∞
-
class
polyglot.
printpdf
(log, settings=False, url=False, title=False, folderpath=False, append=False, readability=True)[source] ∞ PDF printer
- Key Arguments:
log
– loggersettings
– the settings dictionaryurl
– the webpage urltitle
– title of pdffolderpath
– path at which to save pdfappend
– append this at the end of the file name (not title)readability
– clean text with Mercury Parser
Usage:
To print a webpage to PDF without any cleaning of the content using the title of the webpage as filename:
from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/Volkswagen", folderpath="/path/to/output", readability=False ).get()
To give the PDF an alternative title use:
from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/Volkswagen", folderpath="/path/to/output", title="Cars", readability=False ).get()
Or to append a string to the end of the filename before .pdf extension (useful for indexing or adding date created etc):
from datetime import datetime, date, time now = datetime.now() now = now.strftime("%Y%m%dt%H%M%S") from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/Volkswagen", folderpath="/path/to/output", append="_"+now, readability=False ).get()
To clean the content using the Mercury Parser and apply some simple styling and pretty fonts:
from polyglot import printpdf pdf = printpdf( log=log, settings=settings, url="https://en.wikipedia.org/wiki/Volkswagen", folderpath=pathToOutputDir, readability=True ).get()
-
__init__
(log, settings=False, url=False, title=False, folderpath=False, append=False, readability=True)[source] ∞
Methods
__init__
(log[, settings, url, title, ...])get
()get the PDF