Skip to content

Instantly share code, notes, and snippets.

@S4tyendra
Last active March 31, 2025 05:21
Show Gist options
  • Save S4tyendra/b6a186749c9709164b845fe8f60d42ae to your computer and use it in GitHub Desktop.
Save S4tyendra/b6a186749c9709164b845fe8f60d42ae to your computer and use it in GitHub Desktop.
CaptureTextScreen: Screen OCR and QR Code Tool - quickly select an area of your screen, extract any text using Optical Character Recognition (OCR), and decode any QR codes found within that area.

CaptureTextScreen: Screen OCR and QR Code Tool

This tool allows you to quickly select an area of your screen, extract any text using Optical Character Recognition (OCR), and decode any QR codes found within that area.

Inspired by the convenience of features like Android's "Circle to Search", this script provides a fast way to grab information visually from your screen without needing to manually retype or handle QR codes separately.

It saves the captured image and the extracted text/QR data to a history folder for later reference.

image

Features

  • Select any rectangular region on your screen.
  • Performs OCR (Text Recognition) on the selected image using Tesseract.
  • Detects and decodes QR codes within the selected image using ZBar.
  • Displays the captured screenshot, OCR text, and QR code data in a simple GUI.
  • Saves captures and results to a history folder (~/.cache/capturetextscreen).
  • Browse past captures via a history sidebar in the GUI.

Installation (Linux - Debian/Ubuntu based)

Follow these steps to install the necessary dependencies and the script.

1. System Prerequisites:

You need several system packages installed first. Open a terminal and run:

sudo apt update
sudo apt install python3 python3-pip git # Basic Python and Pip
sudo apt install gnome-screenshot        # For taking the screenshot
sudo apt install tesseract-ocr           # The OCR engine
sudo apt install libzbar0                # Library needed for QR code scanning
sudo apt install python3-tk              # Base GUI library (often needed by CustomTkinter)

2. Get the Script:

Copy the provided Python code and save it to a file on your system. A common place is in your home directory or a dedicated scripts folder.

Example: Save the code as ~/screen_ocr.py

Make the script executable:

chmod +x ~/screen_ocr.py

3. Install Python Libraries:

The script requires several Python libraries. Install them using pip:

pip install Pillow pyzbar pytesseract customtkinter

(Note: If pip defaults to Python 2 on your system, you might need to use pip3 instead).

Usage

1. Running Directly:

You can run the script directly from your terminal:

python3 ~/screen_ocr.py

(Adjust the path ~/screen_ocr.py if you saved the script elsewhere).

This will immediately trigger the gnome-screenshot area selection tool. Select the desired area on your screen. Once selected, the script will process the image and open a window showing the screenshot, the extracted OCR text, and any detected QR codes.

2. Setting up a Keyboard Shortcut (Recommended):

For maximum convenience, assign a keyboard shortcut to run the script. The exact steps depend on your Desktop Environment (GNOME, KDE Plasma, XFCE, etc.), but generally involve:

  • Opening your system's Keyboard Settings or Custom Shortcuts panel.
  • Adding a new custom shortcut.
  • Giving it a name (e.g., "Capture Screen Text").
  • Setting the Command to execute the script using its full path and the correct Python interpreter.

Important: You need the full path to your Python 3 executable and the full path to your script.

  • Find your Python 3 path: which python3 (e.g., /usr/bin/python3)
  • Find your script's full path (e.g., /home/your_username/screen_ocr.py)

Example Command for the shortcut:

/usr/bin/python3 /home/your_username/screen_ocr.py

(Remember to replace /home/your_username/screen_ocr.py with the actual full path to where you saved the script!)

  • Assigning your desired key combination (e.g., Ctrl+Shift+C, Super+C, etc.).

Now, whenever you press your chosen keyboard shortcut, the screen area selection will start, followed by the results window.

3. Using the Results Window:

  • The captured screenshot is shown at the top (resized to fit).
  • Extracted text is shown in the "OCR Text" tab.
  • Detected QR code data is shown in the "QR Codes" tab.
  • The sidebar on the left shows previous captures (History). Click on an entry to load its results.
  • Click "Close" or close the window when finished.

History:

Captured images and their corresponding text/QR results are stored in: ~/.cache/capturetextscreen/


Enjoy 🥳

#!/usr/bin/env python3
import subprocess
import os
import tempfile
from datetime import datetime
import sys
import shutil
import math
APP_NAME = "CaptureTextScreen"
HISTORY_DIR = os.path.join(os.path.expanduser("~"), ".cache", APP_NAME.lower())
MAX_IMAGE_DISPLAY_WIDTH = 450
MAX_IMAGE_DISPLAY_HEIGHT = 400
try:
from PIL import Image, ImageTk
except ImportError:
print(f"Error: Pillow (PIL) library not found. Please install it: pip install Pillow")
sys.exit(1)
try:
import pyzbar.pyzbar as pyzbar
except ImportError:
print("Error: pyzbar library not found. Please install it: pip install pyzbar")
print("System library needed: sudo apt-get install libzbar0 (Debian/Ubuntu)")
sys.exit(1)
try:
import pytesseract
pytesseract.get_tesseract_version()
except ImportError:
print("Error: pytesseract library not found. Please install it: pip install pytesseract")
sys.exit(1)
except pytesseract.TesseractNotFoundError:
print("Error: Tesseract OCR executable not found.")
print("Please install Tesseract OCR (e.g., sudo apt-get install tesseract-ocr)")
sys.exit(1)
except Exception as e:
print(f"Warning: Could not verify tesseract version: {e}")
try:
import customtkinter as ctk
ctk.set_appearance_mode("Dark")
ctk.set_default_color_theme("blue")
except ImportError:
print("Error: customtkinter library not found (needed for modern GUI).")
print("Install it using: pip install customtkinter")
try:
import tkinter
except ImportError:
print("\nError: Base Tkinter library also not found.")
print("Install it (e.g., sudo apt-get install python3-tk on Debian/Ubuntu)")
sys.exit(1)
def show_error(title, message):
"""Displays an error message box using CTk (if available)."""
print(f"ERROR: {title} - {message}") # Always print to console
try:
msg_win = ctk.CTkToplevel()
msg_win.withdraw()
ctk.CTkMessagebox.showerror(title, message, parent=msg_win)
msg_win.destroy()
except Exception:
pass
def show_warning(title, message):
"""Displays a warning message box using CTk."""
print(f"WARNING: {title} - {message}")
try:
msg_win = ctk.CTkToplevel()
msg_win.withdraw()
ctk.CTkMessagebox.showwarning(title, message, parent=msg_win)
msg_win.destroy()
except Exception:
pass
def setup_history_dir():
"""Creates the history directory if it doesn't exist."""
try:
os.makedirs(HISTORY_DIR, exist_ok=True)
print(f"Using history directory: {HISTORY_DIR}")
except OSError as e:
show_error("History Error", f"Could not create history directory:\n{HISTORY_DIR}\n\nError: {e}\n\nHistory feature will be disabled.")
return False
return True
def take_screenshot():
"""Takes screenshot, saves to history, returns history image path or None."""
try:
fd, temp_screenshot_path = tempfile.mkstemp(suffix=".png", prefix="cts_temp_")
os.close(fd)
except Exception as e:
show_error("File Error", f"Could not create temporary file for screenshot:\n{e}")
return None
print("Taking screenshot (select an area)...")
# Hide the root window temporarily if it exists (doesn't exist yet in this flow)
# if ctk.CTk._get_root_window(): ctk.CTk._get_root_window().withdraw()
try:
process = subprocess.Popen(["gnome-screenshot", "-a", "-f", temp_screenshot_path])
process.wait()
# if ctk.CTk._get_root_window(): ctk.CTk._get_root_window().deiconify() # Show window again
if process.returncode != 0:
print(f"gnome-screenshot cancelled or failed (exit code: {process.returncode}).")
if os.path.exists(temp_screenshot_path):
try: os.remove(temp_screenshot_path)
except OSError: pass
return None
if not os.path.exists(temp_screenshot_path) or os.path.getsize(temp_screenshot_path) == 0:
show_warning("Screenshot Failed", "Screenshot file was not created or is empty.")
if os.path.exists(temp_screenshot_path):
try: os.remove(temp_screenshot_path)
except OSError: pass
return None
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
history_img_filename = f"{timestamp}.png"
history_img_path = os.path.join(HISTORY_DIR, history_img_filename)
try:
shutil.copy2(temp_screenshot_path, history_img_path)
print(f"Screenshot saved to history: {history_img_path}")
return history_img_path
except Exception as e:
show_error("History Error", f"Could not copy screenshot to history:\n{e}")
return None
except FileNotFoundError:
show_error("Dependency Error", "'gnome-screenshot' not found. Please install it.")
return None
except Exception as e:
show_error("Screenshot Error", f"An unexpected error occurred:\n{e}")
return None
finally:
if os.path.exists(temp_screenshot_path):
try:
os.remove(temp_screenshot_path)
except OSError as e:
print(f"Warning: Could not remove temp file {temp_screenshot_path}: {e}")
def scan_qr_codes(image_path):
"""Scan for QR codes in the image"""
qr_results = []
try:
image = Image.open(image_path)
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
try:
qr_data = obj.data.decode('utf-8')
qr_results.append(f"QR Code: {qr_data}")
except UnicodeDecodeError:
qr_results.append(f"QR Code (non-utf8): {obj.data}")
except FileNotFoundError:
print(f"Error: Image file not found for QR scan: {image_path}")
qr_results.append("Error: Image file missing for QR scan.")
except Exception as e:
print(f"Error scanning QR codes: {e}")
qr_results.append(f"Error during QR scan: {e}")
return qr_results
def perform_ocr(image_path):
"""Extract text using Tesseract OCR"""
try:
image = Image.open(image_path)
text = pytesseract.image_to_string(image)
return text
except FileNotFoundError:
print(f"Error: Image file not found for OCR: {image_path}")
return "--- OCR FAILED: Image file missing ---"
except pytesseract.TesseractError as ocr_error:
print(f"Tesseract OCR error: {ocr_error}")
return f"--- OCR FAILED ---\n{ocr_error}\n------------------"
except Exception as e:
print(f"Error performing OCR: {e}")
return f"--- OCR FAILED: An unexpected error occurred ---\n{e}\n------------------"
def save_results_to_history(history_img_path, qr_results, ocr_text):
"""Saves the processed text results to a .txt file matching the image."""
if not history_img_path:
print("Warning: Cannot save results, invalid history image path provided.")
return None
base_filename = os.path.splitext(os.path.basename(history_img_path))[0]
result_txt_path = os.path.join(HISTORY_DIR, f"{base_filename}.txt")
try:
with open(result_txt_path, 'w', encoding='utf-8') as f:
if qr_results:
f.write("=== QR CODES DETECTED ===\n")
for result in qr_results: f.write(f"{result}\n")
f.write("\n\n")
else:
f.write("=== NO QR CODES DETECTED ===\n\n")
f.write("=== OCR TEXT ===\n")
f.write(ocr_text)
print(f"Results saved to history: {result_txt_path}")
return result_txt_path
except Exception as e:
show_error("History Error", f"Could not save results text to history:\n{e}")
return None
def load_history_list():
"""Loads a sorted list of history entry timestamps (basenames without extension)."""
history_items = []
if not os.path.isdir(HISTORY_DIR): return history_items
try:
for filename in os.listdir(HISTORY_DIR):
if filename.lower().endswith(".txt"):
basename = os.path.splitext(filename)[0]
if len(basename) == 15 and basename.replace('_','').isdigit():
img_path = os.path.join(HISTORY_DIR, basename + ".png")
if os.path.exists(img_path): history_items.append(basename)
else: print(f"Warning: Found text '{filename}' without matching '.png'. Skipping.")
history_items.sort(reverse=True)
return history_items
except Exception as e:
print(f"Error loading history list from '{HISTORY_DIR}': {e}")
return []
def load_history_item_text(timestamp_basename):
"""Loads the text content for a given history timestamp basename."""
txt_path = os.path.join(HISTORY_DIR, f"{timestamp_basename}.txt")
qr_section = []
ocr_section = ""
try:
with open(txt_path, 'r', encoding='utf-8') as f: content = f.read()
qr_header = "=== QR CODES DETECTED ==="; no_qr_header = "=== NO QR CODES DETECTED ==="
ocr_header = "=== OCR TEXT ==="
if ocr_header in content:
parts = content.split(ocr_header, 1)
qr_part = parts[0].strip(); ocr_section = parts[1].strip() if len(parts) > 1 else ""
if qr_part.startswith(qr_header):
qr_lines = qr_part.replace(qr_header, "").strip().split('\n')
qr_section = [line for line in qr_lines if line.strip()]
elif qr_part.startswith(no_qr_header): qr_section = []
else: qr_section = [qr_part] if qr_part else []
else: ocr_section = content # Fallback
return qr_section, ocr_section
except FileNotFoundError: return ["Error: Text file not found."], "Error: Text file not found."
except Exception as e: return [f"Error loading text: {e}"], f"Error loading text: {e}"
class ResultsApp(ctk.CTk): # Inherit from CTk
def __init__(self, initial_qr_results, initial_ocr_text, initial_image_path, history_available):
super().__init__() # Initialize CTk
self.history_available = history_available
self.current_image_path = initial_image_path # Store path of initially displayed image
self.history_items = [] # Store basenames
self.history_buttons = {} # Store refs to history buttons {basename: button_widget}
self.title(f"{APP_NAME} - Results")
self.geometry("950x700")
self.minsize(700, 500)
self.grid_columnconfigure(1, weight=1)
self.grid_rowconfigure(0, weight=1)
self.sidebar_frame = ctk.CTkFrame(self, width=200, corner_radius=0)
self.sidebar_frame.grid(row=0, column=0, rowspan=2, sticky="nsew")
self.sidebar_frame.grid_rowconfigure(1, weight=1)
history_label = ctk.CTkLabel(self.sidebar_frame, text="History", font=ctk.CTkFont(size=16, weight="bold"))
history_label.grid(row=0, column=0, padx=20, pady=(20, 10))
# Scrollable Frame for History Buttons
self.history_scrollable_frame = ctk.CTkScrollableFrame(self.sidebar_frame, label_text="")
self.history_scrollable_frame.grid(row=1, column=0, padx=10, pady=10, sticky="nsew")
self.history_scrollable_frame.grid_columnconfigure(0, weight=1)
self.content_frame = ctk.CTkFrame(self, corner_radius=5)
self.content_frame.grid(row=0, column=1, padx=10, pady=10, sticky="nsew")
self.content_frame.grid_columnconfigure(0, weight=1) # Column for image/text
self.content_frame.grid_rowconfigure(0, weight=0) # Image row (fixed height initially)
self.content_frame.grid_rowconfigure(1, weight=1) # Text results row (expands)
# Image Display Label (placeholder)
self.image_label = ctk.CTkLabel(self.content_frame, text="Screenshot will appear here", corner_radius=5)
# Using compound="top" might allow text+image, but let's keep it simple image-only for now
self.image_label.grid(row=0, column=0, padx=10, pady=10, sticky="nsew")
# Text Results Tabs (using CTkTabview)
self.tab_view = ctk.CTkTabview(self.content_frame, corner_radius=5)
self.tab_view.grid(row=1, column=0, padx=10, pady=(0, 10), sticky="nsew")
self.tab_view.add("OCR Text")
self.tab_view.add("QR Codes")
self.ocr_textbox = ctk.CTkTextbox(self.tab_view.tab("OCR Text"), wrap="word", corner_radius=5, activate_scrollbars=True)
self.ocr_textbox.pack(expand=True, fill="both", padx=5, pady=5)
self.ocr_textbox.configure(state="disabled") # Read-only
self.qr_textbox = ctk.CTkTextbox(self.tab_view.tab("QR Codes"), wrap="word", corner_radius=5, activate_scrollbars=True)
self.qr_textbox.pack(expand=True, fill="both", padx=5, pady=5)
self.qr_textbox.configure(state="disabled") # Read-only
self.bottom_frame = ctk.CTkFrame(self, height=30, corner_radius=0)
self.bottom_frame.grid(row=1, column=1, sticky="nsew", padx=10, pady=(0,10))
close_button = ctk.CTkButton(self.bottom_frame, text="Close", command=self.destroy)
close_button.pack(side="right", padx=10)
if self.history_available:
self.populate_history_list()
else:
history_label.configure(text="History (Disabled)")
disabled_label = ctk.CTkLabel(self.history_scrollable_frame, text="History dir error", text_color="gray")
disabled_label.pack(pady=5)
# Display initial results (must be done after widgets are created)
self.display_item(initial_qr_results, initial_ocr_text, initial_image_path)
self.highlight_history_item(os.path.splitext(os.path.basename(initial_image_path))[0] if initial_image_path else None)
def _update_text_areas(self, qr_results, ocr_text):
"""Helper to update text areas."""
self.ocr_textbox.configure(state="normal")
self.qr_textbox.configure(state="normal")
self.ocr_textbox.delete("1.0", "end")
self.qr_textbox.delete("1.0", "end")
self.ocr_textbox.insert("1.0", ocr_text if ocr_text else "--- NO OCR TEXT DETECTED ---")
if qr_results:
for result in qr_results:
self.qr_textbox.insert("end", f"{result}\n")
else:
self.qr_textbox.insert("1.0", "--- NO QR CODES DETECTED ---")
self.ocr_textbox.configure(state="disabled")
self.qr_textbox.configure(state="disabled")
def _load_and_display_image(self, image_path):
"""Loads, resizes, and displays the image at the given path."""
if not image_path or not os.path.exists(image_path):
self.image_label.configure(text="Image not found", image=None)
self.current_image_path = None
return
try:
pil_image = Image.open(image_path)
# Calculate scaled size maintaining aspect ratio
img_w, img_h = pil_image.size
ratio = min(MAX_IMAGE_DISPLAY_WIDTH / img_w, MAX_IMAGE_DISPLAY_HEIGHT / img_h)
new_w = int(img_w * ratio)
new_h = int(img_h * ratio)
# Resize using LANCZOS for better quality
resized_image = pil_image.resize((new_w, new_h), Image.Resampling.LANCZOS)
ctk_image = ctk.CTkImage(light_image=resized_image,
dark_image=resized_image, # Use same image for dark/light mode
size=(new_w, new_h))
# Configure the label to show the image
self.image_label.configure(image=ctk_image, text="") # Remove placeholder text
self.image_label.image = ctk_image # Keep reference! Crucial for CTkImage
self.current_image_path = image_path
except Exception as e:
print(f"Error loading/displaying image {image_path}: {e}")
self.image_label.configure(text=f"Error loading image:\n{os.path.basename(image_path)}", image=None)
self.current_image_path = None
def display_item(self, qr_results, ocr_text, image_path):
"""Displays a specific item (initial or history)."""
self._update_text_areas(qr_results, ocr_text)
self._load_and_display_image(image_path)
if image_path:
base = os.path.splitext(os.path.basename(image_path))[0]
try:
dt_obj = datetime.strptime(base, "%Y%m%d_%H%M%S")
display_str = dt_obj.strftime("%Y-%m-%d %H:%M:%S")
self.title(f"{APP_NAME} - {display_str}")
except ValueError:
self.title(f"{APP_NAME} - {base}") # Fallback title
else:
self.title(f"{APP_NAME} - Current")
def populate_history_list(self):
"""Loads history items and creates buttons in the scrollable frame."""
if not self.history_available: return
# Clear previous buttons first
for widget in self.history_scrollable_frame.winfo_children():
widget.destroy()
self.history_items = []
self.history_buttons = {}
self.history_items = load_history_list()
if self.history_items:
for item_base in self.history_items:
try:
dt_obj = datetime.strptime(item_base, "%Y%m%d_%H%M%S")
display_str = dt_obj.strftime("%Y-%m-%d\n%H:%M:%S") # Multi-line display
except ValueError:
display_str = item_base
# Create a button for each history item
# Use lambda to capture the correct item_base for the command
button = ctk.CTkButton(self.history_scrollable_frame,
text=display_str,
anchor="w", # Align text left
command=lambda base=item_base: self.on_history_select(base))
button.grid(sticky="ew", padx=5, pady=(0, 5)) # Use grid within scrollable frame
self.history_buttons[item_base] = button
else:
no_hist_label = ctk.CTkLabel(self.history_scrollable_frame, text="(No history found)", text_color="gray")
no_hist_label.grid(pady=5)
def highlight_history_item(self, basename_to_highlight):
"""Changes the appearance of the selected history button."""
for base, button in self.history_buttons.items():
if base == basename_to_highlight:
button.configure(fg_color=ctk.ThemeManager.theme["CTkButton"]["hover_color"]) # Use hover color for highlight
else:
button.configure(fg_color=ctk.ThemeManager.theme["CTkButton"]["fg_color"]) # Reset to default
def on_history_select(self, selected_basename):
"""Handles selection of a history item button."""
if not self.history_available: return
print(f"Loading history item: {selected_basename}")
qr_section, ocr_section = load_history_item_text(selected_basename)
image_path = os.path.join(HISTORY_DIR, f"{selected_basename}.png")
self.display_item(qr_section, ocr_section, image_path)
self.highlight_history_item(selected_basename)
def main():
# 1. Check dependencies early (gnome-screenshot specifically) # linux only
try:
subprocess.run(["gnome-screenshot", "--version"], check=True, capture_output=True, text=True)
except (subprocess.CalledProcessError, FileNotFoundError):
# Try to show CTk message box even before mainloop starts
show_error("Dependency Error", "'gnome-screenshot' command not found or failed.\nThis tool is required.\nPlease install it (e.g., 'sudo apt-get install gnome-screenshot').")
sys.exit(1)
# 2. Setup History Directory
history_available = setup_history_dir()
# 3. Take Screenshot (gets path in history dir)
history_screenshot_path = take_screenshot()
if not history_screenshot_path:
print("Screenshot was cancelled or failed. Exiting.")
sys.exit(0) # Exit gracefully, no GUI needed
# 4. Process the screenshot (QR + OCR)
print(f"Processing screenshot: {history_screenshot_path}")
qr_results = scan_qr_codes(history_screenshot_path)
ocr_text = perform_ocr(history_screenshot_path)
# 5. Save text results to history
save_results_to_history(history_screenshot_path, qr_results, ocr_text)
# 6. Launch GUI
# Pass initial data to the app constructor
app = ResultsApp(initial_qr_results=qr_results,
initial_ocr_text=ocr_text,
initial_image_path=history_screenshot_path,
history_available=history_available)
app.mainloop()
# Yay! we're done, the app is closed.
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment