"Chat with any website"
jailbreak pplx to summaries any website.
- Copy the website you want to summaries.
- prepend this url eonist/gistify/?url=your-url-here
- enter this prompt into pplx*ai:
"What are the main points here: eonist/gistify/?url=your-url-here"
- we will host javascript and html on github pages
- javascript on a html page gets value from the browser url, inserts into https://archive.today/?run=1&url=%s where $s is the url.
- we should get the html content of this url, and render the html content on the website
- example: github.com/eonist/gistify/?url=https://www.wsj.com/sports/basketball/nba-trades-luka-doncic-jimmy-butler-63879018
- This should fetch the html content and present it.
Automated Web Archival Proxy Service via GitHub Pages
This report outlines the technical implementation of a web archival proxy service hosted on GitHub Pages. The system leverages modern web APIs including URLSearchParams and Fetch API to dynamically retrieve and display archived webpage content from archive.today. By analyzing the provided search results and technical requirements, we develop a comprehensive solution that addresses URL parameter handling, content retrieval, and security considerations.
URL Parameter Processing and Validation
Extracting Query Parameters with URLSearchParams
Modern web applications require robust handling of URL query parameters to ensure proper functionality. The URLSearchParams API provides a standardized interface for working with URL query strings without manual string parsing[1][3][7].
This approach eliminates common pitfalls associated with manual parameter parsing, such as improper encoding/decoding of special characters or mishandling of duplicate parameters[5][7]. The API automatically decodes percent-encoded characters and provides type-safe access to parameter values[3][5].
Input Validation and Sanitization
Secure handling of user-supplied URLs requires multiple validation layers:
This validation pattern ensures proper URL structure while restricting dangerous protocols like file: or javascript:[3][7]. The URL constructor performs automatic normalization and syntax checking[3][7].
Archive Service Integration
Constructing Archive API Requests
The archival proxy service integrates with archive.today's submission interface through parameterized URL construction:
The encodeURIComponent function ensures proper URL encoding of special characters, maintaining compliance with RFC 3986 standards[3][7]. This encoding is crucial when handling user-supplied URLs that may contain spaces, Unicode characters, or query parameters of their own[1][3].
Content Retrieval Mechanism
The Fetch API provides modern asynchronous resource loading capabilities[2][4][6]:
This implementation includes essential security checks:
2. Content-Type verification
Content Rendering and Security
Safe HTML Injection
While innerHTML provides convenient content insertion, it requires careful security handling:
This sanitization process removes executable code while preserving document structure[6][8]. The implementation addresses common XSS vectors by stripping script tags and external stylesheets[6][8].
Alternative Sandboxed Rendering
For enhanced security, consider using sandboxed iframes:
The sandbox attribute enables fine-grained control over frame capabilities, balancing functionality with security[8]. This approach isolates third-party content from the parent page's execution context[8].
Complete System Implementation
HTML Structure
The core page template establishes the rendering container:
JavaScript Application Logic
The complete client-side implementation integrates all components:
This implementation follows modern asynchronous programming patterns using async/await for improved readability over promise chains[2][4]. Error handling propagates through all stages of the workflow, providing user-facing feedback for various failure scenarios[2][6].
Security Considerations
Cross-Origin Resource Sharing (CORS)
The same-origin policy presents challenges when fetching resources from archive.today. Several mitigation strategies exist:
A hybrid approach using service workers enables advanced caching and transformation:
This service worker implementation adds CORS headers to archive.today responses, enabling cross-origin access from the proxy page[2][6].
Content Security Policy
A strict CSP header mitigates injection attacks:
This policy restricts script execution to same-origin sources while allowing inline styles required by some archived pages[8].
Performance Optimization
Caching Strategies
Implement a caching layer using the Cache API:
Integration with the main fetching logic:
This cache-first strategy improves load times for repeat visits while maintaining freshness through standard HTTP caching headers[2][4].
Error Handling and User Feedback
Comprehensive Error Reporting
Implement structured error handling across all system components:
This approach provides consistent user feedback while maintaining separation between technical errors and user-facing messages[2][6].
Deployment Configuration for GitHub Pages
Repository Structure
Organize project files following GitHub Pages conventions:
GitHub Actions Deployment
Automate deployment with GitHub Actions:
This workflow automates build and deployment processes, ensuring the latest version remains available on GitHub Pages[1][4].
Example Implementation
Functional Demonstration
A complete working example implements all discussed components:
Usage Example
Access the service through URL parameters:
The system will:
Conclusion
This technical implementation provides a robust solution for web archival proxying through GitHub Pages. By leveraging modern browser APIs and following security best practices, the system achieves:
Future enhancements could incorporate service workers for offline functionality, additional archival services as fallback sources, and expanded content transformation capabilities. The architecture demonstrates how static hosting platforms like GitHub Pages can support dynamic content applications through client-side JavaScript and modern web APIs.
Citations:
[1] https://googlechrome.github.io/samples/urlsearchparams/index.html
[2] https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Scripting/Network_requests
[3] https://developer.chrome.com/blog/urlsearchparams
[4] https://mahekunnisa.hashnode.dev/fetch-api-in-javascript
[5] https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams/get
[6] https://stackoverflow.com/questions/36631762/returning-html-with-fetch
[7] https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams
[8] https://www.reddit.com/r/firefox/comments/jzivo5/how_to_save_rendered_page_as_htmlonly_like/
[9] vercel/next.js#57408
[10] https://greyboyle.hashnode.dev/javascript-fetch-api-a-complete-overview
[11] https://www.pluralsight.com/resources/blog/guides/using-fetch-with-github-api-v3
[12] https://www.screamingfrog.co.uk/how-to-fetch-render-any-site/
[13] https://stackoverflow.com/questions/72666698/github-pages-how-to-fetch-file-in-js-from-repo
[14] https://developer.chrome.com/docs/devtools/javascript/snippets
[15] https://gist.github.com/justsml/529d0b1ddc5[2](https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Scripting/Network_requests)49095ff4b890aad5e801
[16] https://developers.google.com/search/blog/2014/05/rendering-pages-with-fetch-as-google
[17] bigskysoftware/htmx#2330
[18] https://www.sitepoint.com/community/t/history-api-how-to-reload-content-on-back-and-forward-browser-clicks/303219
[19] https://www.reddit.com/r/learnjavascript/comments/rhcj00/fetching_from_a_http_site_in_github_pages/
Answer from Perplexity: pplx.ai/share