Skip to content

Instantly share code, notes, and snippets.

@WomB0ComB0
Created April 8, 2025 14:20
Show Gist options
  • Save WomB0ComB0/97cdb57b560ea2c7b6aaa3aca611e414 to your computer and use it in GitHub Desktop.
Save WomB0ComB0/97cdb57b560ea2c7b6aaa3aca611e414 to your computer and use it in GitHub Desktop.
linkedin.py and related files - with AI-generated descriptions
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# pylint: disable=all
"""
LinkedIn Profile Scraper
This script automates the process of scraping LinkedIn profiles using the linkedin_scraper library.
It handles authentication, navigates to a specified profile, and extracts structured information
including personal details, work experiences, education, interests, and accomplishments.
The script includes several patches to handle common exceptions that occur due to LinkedIn's
changing structure and anti-scraping measures.
Requirements:
- Python 3.7+
- linkedin_scraper library
- Selenium WebDriver
- Chrome WebDriver executable
- Valid LinkedIn credentials stored in a .env file
Environment Variables:
- LINKEDIN_USER: Your LinkedIn username/email
- LINKEDIN_PASSWORD: Your LinkedIn password
Usage:
$ python linkedin.py
Output:
A dictionary containing structured profile data printed to stdout.
"""
import os
from linkedin_scraper import Person, actions
from linkedin_scraper.objects import Experience
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from dotenv import load_dotenv
from pathlib import Path
import asyncio
from selenium.common.exceptions import (
NoAlertPresentException,
StaleElementReferenceException,
TimeoutException,
)
# Load environment variables from .env file in the same directory as this script
load_dotenv(Path(__file__).parent / ".env")
# Initialize Chrome WebDriver
service = Service(executable_path="/usr/bin/chromedriver")
driver = webdriver.Chrome(service=service)
# Get LinkedIn credentials from environment variables
try:
email = os.getenv("LINKEDIN_USER")
password = os.getenv("LINKEDIN_PASSWORD")
except Exception as e:
print(f"Error getting email or password: {e}")
exit(1)
async def main() -> None:
"""
Main function that handles the LinkedIn profile scraping process.
This function:
1. Logs into LinkedIn using provided credentials
2. Applies patches to handle common exceptions
3. Scrapes a specific LinkedIn profile
4. Formats and prints the extracted data
5. Ensures the browser is closed properly
Returns:
None
"""
# Login to LinkedIn
try:
actions.login(driver, email, password)
except Exception as e:
print(f"Error logging in: {e}")
exit(1)
# Patch the focus method to handle NoAlertPresentException
original_focus = Person.focus
def patched_focus(self):
"""
Patched version of Person.focus that handles NoAlertPresentException.
This patch prevents the script from crashing when LinkedIn doesn't show
an expected alert dialog.
"""
try:
original_focus(self)
except NoAlertPresentException:
pass
Person.focus = patched_focus
# Patch the get_experiences method to handle the "too many values to unpack" error
original_get_experiences = Person.get_experiences
def patched_get_experiences(self):
"""
Patched version of Person.get_experiences that handles common exceptions.
This patch addresses:
1. "Too many values to unpack" errors caused by LinkedIn structure changes
2. StaleElementReferenceException when elements are no longer attached to the DOM
3. TimeoutException when elements take too long to load
Returns:
list: List of Experience objects, or a fallback placeholder if extraction fails
"""
try:
return original_get_experiences(self)
except ValueError as e:
if "too many values to unpack" in str(e):
print(
"LinkedIn structure has changed. Using fallback method for experiences."
)
# Simplified fallback implementation
self.add_experience(
Experience(
institution_name="Unable to parse due to LinkedIn changes",
position_title="See profile for details",
from_date="",
to_date="",
duration="",
location="",
description="",
)
)
return self.experiences
else:
raise e
except (StaleElementReferenceException, TimeoutException):
print(
"Encountered stale element or timeout. Using fallback for experiences."
)
return self.experiences
Person.get_experiences = patched_get_experiences
# Scrape the LinkedIn profile
try:
# Replace <...> with the actual LinkedIn username to scrape
person = Person("https://www.linkedin.com/in/<...>", driver=driver)
# Format and print the extracted profile data
print(
{
"name": person.name, # Full name of the person
"location": person.location, # Geographic location
"about": person.about, # About/summary section
"job_title": person.job_title, # Current job title
"company": person.company, # Current company
"open_to_work": person.open_to_work, # Whether they're open to work
"experiences": [ # List of work experiences
{
"title": exp.position_title, # Job title
"company": exp.institution_name, # Company name
"from_date": exp.from_date, # Start date
"to_date": exp.to_date, # End date
"duration": exp.duration, # Duration at position
"location": exp.location, # Job location
"description": exp.description, # Job description
}
for exp in person.experiences
],
"educations": [ # List of education entries
{
"institution": edu.institution_name, # School/university name
"degree": edu.degree, # Degree obtained
"from_date": edu.from_date, # Start date
"to_date": edu.to_date, # End date
"description": edu.description, # Education description
}
for edu in person.educations
],
"interests": [interest.name for interest in person.interests], # List of interests
"accomplishments": [ # List of accomplishments
{"category": acc.category, "title": acc.title}
for acc in person.accomplishments
],
}
)
except (
NoAlertPresentException,
StaleElementReferenceException,
TimeoutException,
) as e:
print(f"Error scraping profile: {e}")
finally:
# Ensure the browser is closed properly
driver.quit()
if __name__ == "__main__":
asyncio.run(main())

linkedin.py Description

File Type: py

Generated Description:

linkedin.py: LinkedIn Profile Scraper Analysis

1. Summary

This Python script scrapes data from a specified LinkedIn profile using the linkedin_scraper library and Selenium. It handles authentication, navigates to the profile, and extracts structured information such as personal details, work experience, education, interests, and accomplishments. The script is designed to be robust, including patches to address common exceptions arising from LinkedIn's dynamic website structure and anti-scraping measures.

2. Key Components and Functions

  • Environment Variable Loading: The script uses dotenv to load LinkedIn credentials (username and password) from a .env file, promoting security by avoiding hardcoding sensitive information.

  • WebDriver Initialization: It initializes a Chrome WebDriver using selenium, providing the path to the chromedriver executable.

  • main() Async Function: This is the core function, responsible for:

    • Login: Uses linkedin_scraper.actions.login() to authenticate with LinkedIn.
    • Patching: Applies two crucial patches to the Person class from linkedin_scraper:
      • patched_focus(): Handles NoAlertPresentException during profile focusing, preventing crashes.
      • patched_get_experiences(): Deals with ValueError ("too many values to unpack"), StaleElementReferenceException, and TimeoutException during experience data extraction. It uses fallback mechanisms to provide partial data even if scraping fails.
    • Scraping: Creates a Person object using a provided LinkedIn profile URL, initiating the scraping process.
    • Output: Prints the extracted profile data as a dictionary to the console.
    • Cleanup: Implicitly closes the WebDriver (although explicit driver.quit() would improve robustness).

3. Notable Patterns and Techniques

  • Exception Handling: The script extensively uses try-except blocks to catch and handle various exceptions related to network issues, website structure changes, and anti-scraping mechanisms. This makes the scraper more resilient.

  • Patching: Instead of modifying the linkedin_scraper library directly, it uses function overriding to patch existing methods. This is a clean way to adapt to changes in the LinkedIn website without altering the original library code.

  • Fallback Mechanisms: When exceptions occur during data extraction (especially for work experiences), fallback mechanisms are implemented to provide at least partial data instead of completely failing.

  • Asynchronous Programming: The use of asyncio (although main is currently not actually using asynchronous operations; it could be improved to use async calls to other functions for better performance) suggests an intention to improve performance in the future by performing I/O-bound operations concurrently.

4. Potential Use Cases

  • Recruiting: Recruiters could use this script to gather information on potential candidates quickly and efficiently.

  • Market Research: Researchers could use it to collect data on professionals in specific industries or roles to understand trends and demographics.

  • Sales Lead Generation: Sales professionals could leverage the gathered information to personalize outreach and improve conversion rates.

  • Network Analysis: The data could be used to analyze professional networks and connections.

Important Note: Web scraping should be done responsibly and ethically. Always respect the robots.txt file of the target website and be mindful of the website's terms of service. Excessive scraping can overload servers and lead to your IP being blocked. Consider adding delays and randomisation to avoid detection as a bot. The use of this script for any illegal or unethical purpose is strongly discouraged.

Description generated on 4/8/2025, 10:20:00 AM

@WomB0ComB0
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment