About replacing text and links on PDF files
Go to file
2024-01-23 14:15:27 +01:00
pdf_replace init 2024-01-23 14:15:27 +01:00
tests init 2024-01-23 14:15:27 +01:00
.gitignore init 2024-01-23 14:15:27 +01:00
pdf-replace.iml init 2024-01-23 14:15:27 +01:00
poetry.lock init 2024-01-23 14:15:27 +01:00
pyproject.toml init 2024-01-23 14:15:27 +01:00
README.md init 2024-01-23 14:15:27 +01:00
shell.nix init 2024-01-23 14:15:27 +01:00

PDF Replace

About replacing text and links on PDF files.

See pdf_replace/print_links.py

PDF - Print text from all pages

See pdf_replace/print_text.py

Source: https://gist.github.com/Nezteb/e761bb85ced6ce965e37d54ceb04635d

  1. Uncompress the PDF file
nix-shell -p qpdf
qpdf --qdf --object-streams=disable input.pdf uncompressed.pdf

or

nix-shell -p pdftk
pdftk input.pdf output uncompressed.pdf uncompress
  1. Edit the PDF as "plain text" file

Works:

  • With LC_ALL=C sed
    • LC_ALL=C sed -e 's|some text||g' uncompressed.pdf > uncompressed-output.pdf

Does not work:

  • With nano by pressing ALT + r (search and replace). The resulting PDF is distorted - some text is missing or misaligned.

Untested:

  • With a text editor using "search and replace". Warning, on large files this might be laggy.
  1. Compress the PDF
qpdf uncompressed-output.pdf output.pdf

or

pdftk uncompressed-output.pdf output output.pdf compress