Skip to content

Instantly share code, notes, and snippets.

@WomB0ComB0
Last active April 19, 2025 03:55
Show Gist options
  • Save WomB0ComB0/b7f0d4c793dec5351be83793e7ab1f9a to your computer and use it in GitHub Desktop.
Save WomB0ComB0/b7f0d4c793dec5351be83793e7ab1f9a to your computer and use it in GitHub Desktop.
linkedin-title-cleaner.ts and related files - with AI-generated descriptions
if (require.main === module) {
try {
(async () => {
const { argv } = await import('bun')
const fs = await import('node:fs')
const { Logger, LogLevel } = await import('./logger')
const logger = Logger.getLogger('LinkedinTitleCleaner', {
minLevel: LogLevel.INFO,
includeTimestamp: true
})
logger.info('Starting LinkedIn title cleaner')
const args = argv.slice(2)
if (!args.every((arg) => typeof arg === 'string')) {
logger.error('Invalid arguments')
throw new Error('Invalid arguments')
}
const [argCompany, matchCase] = [args[0], args[1]]
logger.info(`Processing with company: ${argCompany}${matchCase ? `, match case: ${matchCase}` : ''}`)
try {
logger.debug('Reading LinkedIn recruiters file')
const file = fs.readFileSync('./linkedin-recruiters.csv', 'utf-8')
const lines = file.split('\n')
logger.info(`Found ${lines.length} entries to process`)
const information: {
company: string
title: string
link: string
}[] = []
const cleaner = (s: string) => s.toLowerCase()
.trim()
.normalize('NFD')
.replace(/\p{Diacritic}/gu, '')
.replace(/[^.\p{L}\p{N}\p{Zs}\p{Emoji}]+/gu, '')
.replace(/[\s_#]+/g, '')
.replace(/^-+/, '')
.replace(/\.{2,}/g, '.')
.replace(/^\.+/, '')
.replace(
/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g,
'',
)
{
logger.debug('Processing entries')
for(let i = 1; i < lines.length; i++) {
const line = lines[i].trim()
if (!line) continue
const parseCsvLine = (line: string): string[] => {
const result: string[] = [];
let current = '';
let inQuotes = false;
for (let i = 0; i < line.length; i++) {
const char = line[i];
if (char === '"') {
inQuotes = !inQuotes;
} else if (char === ',' && !inQuotes) {
result.push(current);
current = '';
} else {
current += char;
}
}
result.push(current); // Add the last field
return result;
};
const parts = parseCsvLine(line);
if (parts.length < 4) {
logger.warn(`Skipping line with insufficient fields: ${line}`);
continue;
}
const [company, , title, link] = parts;
const OPERATIONS: Map<string, (s: string) => string> = new Map([
['|', (s) => s.split('|')[0].trim()],
['-', (s) => s.split('-')[0].trim()],
[',', (s) => s.split(',')[0].trim()],
['/', (s) => s.split('/')[0].trim()],
['@', (s) => s.split('@')[0].trim()],
]);
const filteredTitle = Array.from(OPERATIONS.entries()).reduce((acc, [_, fn]) => fn(acc), title);
const cleanedTitle = cleaner(filteredTitle);
information.push({
company: cleaner(company),
title: cleanedTitle,
link: link.trim()
});
// Debug the first few entries to verify parsing
if (i < 5) {
logger.debug(`Parsed: Company="${company}", Title="${title}", Link="${link}"`);
logger.debug(`Cleaned: Company="${cleaner(company)}", Title="${cleanedTitle}"`);
}
}
}
logger.info(`Processed ${information.length} entries`)
const output: string[] = []
logger.debug('Filtering results based on criteria')
Object.entries(information).forEach(([, {company, title, link}]) => {
const companies = argCompany.split('|').map(comp => cleaner(comp.trim()));
if (matchCase) {
if (companies.includes(company) && title.match(new RegExp(matchCase, 'i'))) {
output.push(link)
}
} else {
if (companies.includes(company)) {
output.push(link)
}
}
})
logger.info(`Found ${output.length} matching entries`)
logger.debug('Writing results to output.txt')
fs.writeFileSync('./output.txt', output.join('\n'))
logger.info('Successfully wrote results to output.txt')
} catch (error) {
logger.error('Error processing file', { error })
console.error(error)
} finally {
logger.info('[Inner Try]: Done')
console.log('[Inner Try]: Done')
}
})()
} catch(error) {
console.error(error)
} finally {
console.log('[Outer Try]: Done')
}
}

linkedin-title-cleaner.ts Description

File Type: ts

Generated Description:

linkedin-title-cleaner.ts Analysis

This TypeScript file (linkedin-title-cleaner.ts) is a command-line tool designed to process a CSV file containing LinkedIn recruiter information, clean and filter the data based on specified criteria, and output the results to a text file. It leverages asynchronous operations and robust error handling.

1. Summary

The script reads a CSV file (linkedin-recruiters.csv), likely containing columns for company, title, and LinkedIn profile link. It cleans the company and title data using a series of string manipulation functions, removing diacritics, special characters, extra whitespace, and standardizing the format. It then filters the data based on a specified company name (provided as a command-line argument) and optionally a regular expression pattern to match within the cleaned title. Finally, it writes the LinkedIn profile links of the matching entries to an output file (output.txt).

2. Key Components and Functions

  • Command-line Argument Parsing: Uses Bun's argv to retrieve command-line arguments, specifically the target company name and an optional case-sensitive regex pattern for title matching. Input validation ensures arguments are strings.
  • Custom Logger: Uses a custom logger (./logger) for structured logging with timestamp and configurable log levels (INFO, DEBUG, WARN, ERROR). This improves debugging and monitoring.
  • CSV Parsing: Implements a custom CSV parser (parseCsvLine) to handle potential commas within fields by correctly managing quoted fields. This is crucial for robustness.
  • String Cleaning Function (cleaner): A comprehensive function to clean strings, converting to lowercase, removing diacritics, non-alphanumeric characters, extra whitespace, and emojis.
  • Title Filtering Operations (OPERATIONS): A Map that defines functions to handle various title delimiters (|, -, ',', /, @) by taking only the part of the string before the delimiter.
  • Data Filtering: Filters the processed data based on the provided company name and optional regex pattern for title matching, offering both case-sensitive and case-insensitive options.
  • File I/O: Uses node:fs for reading the input CSV file and writing the output text file. readFileSync and writeFileSync are used for simplicity, but for very large files, streaming approaches would be more efficient.

3. Notable Patterns and Techniques

  • Asynchronous Operations: Uses async/await for asynchronous file I/O and module imports, improving performance and preventing blocking.
  • Error Handling: Includes comprehensive try...catch...finally blocks at multiple levels for robust error handling and logging. Errors are logged using the custom logger and also printed to the console for better visibility.
  • Functional Programming Elements: Uses reduce to chain multiple title cleaning operations defined in the OPERATIONS map. This improves code readability and maintainability.
  • Regular Expressions: Employs regular expressions for flexible title matching based on the user-provided pattern.

4. Potential Use Cases

  • LinkedIn Recruiter Data Analysis: This script could be used to extract specific recruiter information based on company and title keywords, for example, to identify recruiters working at a particular company and specializing in a certain area.
  • Lead Generation: The cleaned and filtered data can be used for targeted outreach or lead generation efforts.
  • Data Cleansing: The string cleaning functions can be reused for cleaning data in other contexts where similar formatting issues exist.
  • Automation: This script could be integrated into a larger data processing pipeline for automated LinkedIn data extraction and analysis.

The script is well-structured, efficient, and addresses potential errors effectively. The use of a custom logger and detailed error handling significantly enhances its maintainability and usability. However, for extremely large input files, the use of streaming techniques for file I/O would improve performance.

Description generated on 4/18/2025, 11:42:18 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment