Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
186 views
in Technique[技术] by (71.8m points)

How to remove whitespace from an image in Python?

I have a set of images that are output from a code, and I want to be able to remove all of the excess whitespace within the image so that it reduces the image to only the text within the image. This is the pertinent code:

from PIL import Image, ImageFont, ImageDraw, ImageChops
from docx import Document
import textwrap
import re

doc = Document('Patents.docx')
docText = ''.join(paragraph.text for paragraph in doc.paragraphs)


def trim(im, color):
    bg = Image.new(im.mode, im.size, color)
    diff = ImageChops.difference(im, bg)
    diff = ImageChops.add(diff, diff)
    bbox = diff.getbbox()
    if bbox:
        return im.crop(bbox)


for match in find_matches(text=docText, keywords=("responsive", "detecting", "providing")):
    W, H = 300, 300
    body = Image.new('RGB', (W, H), (255, 255, 255))
    border = Image.new('RGB', (W + 4, H + 4), (0, 0, 0))
    border.save('border.png')
    body.save('body.png')
    patent = Image.open('border.png')
    patent.paste(body, (2, 2))
    draw = ImageDraw.Draw(patent)
    font = ImageFont.load_default()

    current_h, pad = 100, 20

    for key in textwrap.wrap(match, width=45):
        line = key.encode('utf-8')
        # (width, height) = font.getsize(line)
        # patent.resize((width, height), resample=0, box=None)
        w, h = draw.textsize(line, font=font)
        draw.text(((W - w) / 2, current_h), line, (0, 0, 0), font=font)
        current_h += h + pad
    for count, matches in enumerate(match):
        patent.save(f'{match}.png')
        patentCrop = trim(patent, 255)
        patentCrop.save(f'{match}_new.png')

Here are the 2 of the 4 outputs from the code that I've constructed (each box is its own output):

enter image description here enter image description here

I would like to keep the border, but obviously I can always not use the border and then crop the image and then add the border, but at any rate, I need help removing the whitespace. As shown in my code, I'm using a trim function, however it doesn't seem to be working for whatever reason. If there's any solution, be it a fix to my function or an entirely different method, I'd really appreciate the help. The following is what I'm trying to accomplish, of course, each box being its own output:

Sample Out


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I think this is what you want - a kind of double-matted surround:

#!/usr/bin/env python3

from PIL import Image, ImageDraw, ImageOps

# Open input image
im = Image.open('zHZB9.png')

# Get rid of existing black border by flood-filling with white from top-left corner
ImageDraw.floodfill(im,xy=(0,0),value=(255,255,255),thresh=10)

# Get bounding box of text and trim to it
bbox = ImageOps.invert(im).getbbox()
trimmed = im.crop(bbox)

# Add new white border, then new black, then new white border
res = ImageOps.expand(trimmed, border=10, fill=(255,255,255))
res = ImageOps.expand(res, border=5, fill=(0,0,0))
res = ImageOps.expand(res, border=5, fill=(255,255,255))
res.save('result.png')

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...