polyglot.printpdf (class)

class polyglot.printpdf(log, settings=False, url=False, title=False, folderpath=False, append=False, readability=True)[source]

PDF printer

Key Arguments:
  • log – logger
  • settings – the settings dictionary
  • url – the webpage url
  • title – title of pdf
  • folderpath – path at which to save pdf
  • append – append this at the end of the file name (not title)
  • readability – clean text with Mercury Parser

Usage:

To print a webpage to PDF without any cleaning of the content using the title of the webpage as filename:

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath="/path/to/output",
    readability=False
).get()

To give the PDF an alternative title use:

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath="/path/to/output",
    title="Cars",
    readability=False
).get()

Or to append a string to the end of the filename before .pdf extension (useful for indexing or adding date created etc):

from datetime import datetime, date, time
now = datetime.now()
now = now.strftime("%Y%m%dt%H%M%S")

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath="/path/to/output",
    append="_"+now,
    readability=False
).get()

To clean the content using the Mercury Parser and apply some simple styling and pretty fonts:

from polyglot import printpdf
pdf = printpdf(
    log=log,
    settings=settings,
    url="https://en.wikipedia.org/wiki/Volkswagen",
    folderpath=pathToOutputDir,
    readability=True
).get()
__init__(log, settings=False, url=False, title=False, folderpath=False, append=False, readability=True)[source]

Methods

__init__(log[, settings, url, title, ...])
get() get the PDF