I have a set of images that are output from a code, and I want to be able to remove all of the excess whitespace within the image so that it reduces the image to only the text within the image.
This is the pertinent code:
from PIL import Image, ImageFont, ImageDraw, ImageChops
from docx import Document
import textwrap
import re
doc = Document('Patents.docx')
docText = ''.join(paragraph.text for paragraph in doc.paragraphs)
def trim(im, color):
bg = Image.new(im.mode, im.size, color)
diff = ImageChops.difference(im, bg)
diff = ImageChops.add(diff, diff)
bbox = diff.getbbox()
if bbox:
return im.crop(bbox)
for match in find_matches(text=docText, keywords=("responsive", "detecting", "providing")):
W, H = 300, 300
body = Image.new('RGB', (W, H), (255, 255, 255))
border = Image.new('RGB', (W + 4, H + 4), (0, 0, 0))
border.save('border.png')
body.save('body.png')
patent = Image.open('border.png')
patent.paste(body, (2, 2))
draw = ImageDraw.Draw(patent)
font = ImageFont.load_default()
current_h, pad = 100, 20
for key in textwrap.wrap(match, width=45):
line = key.encode('utf-8')
# (width, height) = font.getsize(line)
# patent.resize((width, height), resample=0, box=None)
w, h = draw.textsize(line, font=font)
draw.text(((W - w) / 2, current_h), line, (0, 0, 0), font=font)
current_h += h + pad
for count, matches in enumerate(match):
patent.save(f'{match}.png')
patentCrop = trim(patent, 255)
patentCrop.save(f'{match}_new.png')
Here are the 2 of the 4 outputs from the code that I've constructed (each box is its own output):
I would like to keep the border, but obviously I can always not use the border and then crop the image and then add the border, but at any rate, I need help removing the whitespace. As shown in my code, I'm using a trim function, however it doesn't seem to be working for whatever reason. If there's any solution, be it a fix to my function or an entirely different method, I'd really appreciate the help. The following is what I'm trying to accomplish, of course, each box being its own output: