How to remove whitespace from an image in Python?

Question

Welcome To Ask or Share your Answers For Others

How to remove whitespace from an image in Python?

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

How to remove whitespace from an image in Python?

I have a set of images that are output from a code, and I want to be able to remove all of the excess whitespace within the image so that it reduces the image to only the text within the image. This is the pertinent code:

from PIL import Image, ImageFont, ImageDraw, ImageChops
from docx import Document
import textwrap
import re

doc = Document('Patents.docx')
docText = ''.join(paragraph.text for paragraph in doc.paragraphs)


def trim(im, color):
    bg = Image.new(im.mode, im.size, color)
    diff = ImageChops.difference(im, bg)
    diff = ImageChops.add(diff, diff)
    bbox = diff.getbbox()
    if bbox:
        return im.crop(bbox)


for match in find_matches(text=docText, keywords=("responsive", "detecting", "providing")):
    W, H = 300, 300
    body = Image.new('RGB', (W, H), (255, 255, 255))
    border = Image.new('RGB', (W + 4, H + 4), (0, 0, 0))
    border.save('border.png')
    body.save('body.png')
    patent = Image.open('border.png')
    patent.paste(body, (2, 2))
    draw = ImageDraw.Draw(patent)
    font = ImageFont.load_default()

    current_h, pad = 100, 20

    for key in textwrap.wrap(match, width=45):
        line = key.encode('utf-8')
        # (width, height) = font.getsize(line)
        # patent.resize((width, height), resample=0, box=None)
        w, h = draw.textsize(line, font=font)
        draw.text(((W - w) / 2, current_h), line, (0, 0, 0), font=font)
        current_h += h + pad
    for count, matches in enumerate(match):
        patent.save(f'{match}.png')
        patentCrop = trim(patent, 255)
        patentCrop.save(f'{match}_new.png')

Here are the 2 of the 4 outputs from the code that I've constructed (each box is its own output):

I would like to keep the border, but obviously I can always not use the border and then crop the image and then add the border, but at any rate, I need help removing the whitespace. As shown in my code, I'm using a trim function, however it doesn't seem to be working for whatever reason. If there's any solution, be it a fix to my function or an entirely different method, I'd really appreciate the help. The following is what I'm trying to accomplish, of course, each box being its own output:

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-27T04:32:31+0000

I think this is what you want - a kind of double-matted surround:

#!/usr/bin/env python3

from PIL import Image, ImageDraw, ImageOps

# Open input image
im = Image.open('zHZB9.png')

# Get rid of existing black border by flood-filling with white from top-left corner
ImageDraw.floodfill(im,xy=(0,0),value=(255,255,255),thresh=10)

# Get bounding box of text and trim to it
bbox = ImageOps.invert(im).getbbox()
trimmed = im.crop(bbox)

# Add new white border, then new black, then new white border
res = ImageOps.expand(trimmed, border=10, fill=(255,255,255))
res = ImageOps.expand(res, border=5, fill=(0,0,0))
res = ImageOps.expand(res, border=5, fill=(255,255,255))
res.save('result.png')

Categories

How to remove whitespace from an image in Python?

How to remove whitespace from an image in Python?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags