polyglot.printpdf (class) ∞

class polyglot.printpdf(log, settings=False, url=False, title=False, folderpath=False, append=False, readability=True)[source] ∞

PDF printer

Key Arguments:

log – logger
settings – the settings dictionary
url – the webpage url
title – title of pdf
folderpath – path at which to save pdf
append – append this at the end of the file name (not title)
readability – clean text with Mercury Parser

Usage:

To print a webpage to PDF without any cleaning of the content using the title of the webpage as filename:

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath="/path/to/output",
    readability=False
).get()

To give the PDF an alternative title use:

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath="/path/to/output",
    title="Cars",
    readability=False
).get()

Or to append a string to the end of the filename before .pdf extension (useful for indexing or adding date created etc):

from datetime import datetime, date, time
now = datetime.now()
now = now.strftime("%Y%m%dt%H%M%S")

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath="/path/to/output",
    append="_"+now,
    readability=False
).get()

To clean the content using the Mercury Parser and apply some simple styling and pretty fonts:

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath=pathToOutputDir,
    readability=True
).get()

__init__(log, settings=False, url=False, title=False, folderpath=False, append=False, readability=True)[source] ∞

Methods

`__init__`(log[, settings, url, title, ...])
`get`()	get the PDF