Warning
Using this will send the data of your (public URL) to jina ai, and the license for readerlm-v2 is cc-by-nc
Here's the improved version of your smart_curl
function that better handles filename extraction and works in Zsh:
function smart_curl() {
local raw_url="$1"
local header="${2:-'x-engine: readerlm-v2'}"
local jina_prefix="r.jina.ai/"
local url="https://${jina_prefix}${raw_url#*://}"
# Clean and parse URL components
local clean_url="${raw_url%%[?#]*}" # Remove query/fragment
clean_url="${clean_url#*://}" # Remove protocol
local domain="${clean_url%%/*}" # Extract domain
local path_part="/${clean_url#*/}" # Extract path
# Split path into meaningful segments
local -a path_segments
path_segments=(${(s:/:)path_part})
path_segments=(${path_segments:#}) # Remove empty elements
# Determine base filename
local output_file=""
case ${#path_segments} in
0) # No path - use domain
output_file="${domain//./_}"
output_file="${output_file#www_}" # Remove www prefix
;;
1) # Single path segment
output_file="${path_segments[-1]}"
;;
*) # Multiple path segments
output_file="${path_segments[-2]}_${path_segments[-1]}"
;;
esac
# Fallback if all above failed
if [[ -z "$output_file" ]]; then
output_file="${domain//./_}_$(date +%s)"
fi
# Clean problematic characters and ensure proper extension
output_file="${output_file//[^a-zA-Z0-9._-]/_}"
output_file="${${output_file//_+/_}#_}"
[[ "$output_file" == *_ ]] && output_file="${output_file%_}"
[[ "$output_file" != *.* ]] && output_file+=".md"
# Download with improved curl command
echo "Downloading to: $output_file"
curl -sSL -H "$header" "$url" -o "$output_file"
echo "Downloaded to: ${output_file:a}" # Show absolute path
}
Key improvements:
- Better URL Parsing: Uses parameter expansion instead of regex for more reliable component extraction
- Path Segment Handling: Properly splits and analyzes URL path components
- Domain Cleaning: Removes
www.
prefix and converts dots to underscores - Filename Sanitization: More robust cleaning of special characters and underscores
- Error Handling: Fallback to timestamp if all extraction methods fail
- Zsh Compatibility: Uses Zsh-specific features like array slicing and parameter expansion flags
- Curl Improvements: Uses
-o
flag for output and-sSL
for better behavior
Example usage with your test case:
smart_curl "https://minecraft.fandom.com/wiki/Music_Disc?so=search#Discs"
# Output filename: wiki_Music_Disc.md
Other cases:
https://news.ycombinator.com
→ycombinator_com.md
https://example.com/about
→about.md
https://www.blog.google/features/new
→features_new.md