Skip to content

Instantly share code, notes, and snippets.

@benjaminsehl
Last active May 23, 2023 20:43
Show Gist options
  • Save benjaminsehl/33efd56fd26faeb70dd3a741578d2df6 to your computer and use it in GitHub Desktop.
Save benjaminsehl/33efd56fd26faeb70dd3a741578d2df6 to your computer and use it in GitHub Desktop.
import { CacheLong } from '@shopify/hydrogen';
interface Config {
cacheControl: string;
removeNoIndex: boolean;
updateCanonical: boolean;
ignoreRedirects: boolean;
}
const config: Config = {
cacheControl: 'public, max-age=3600, stale-while-revalidate=86400', // Set to the amount of time you want to cache the page, in seconds
removeNoIndex: true, // Set to false if you want to respect robots noindex tags
updateCanonical: true, // Set to false if you want to respect canonical meta tags
ignoreRedirects: true, // Set to false if you aren't redirecting to Hydrogen in your theme
};
/**
* Remove the noindex meta tag from the input data.
* @param data - The HTML data to process.
* @returns The processed HTML data without the noindex meta tag.
*/
function removeNoIndexMetaTag(data: string): string {
return data.replace(/<meta.*name="robots".*content="noindex.*".*>/gi, '');
}
/**
* Update the canonical tag in the input data.
* @param data - The HTML data to process.
* @param origin - The origin of the request.
* @param url - The primary domain URL.
* @returns The processed HTML data with the updated canonical tag.
*/
function updateCanonicalTag(data: string, origin: string, url: string): string {
return data.replace(/<link.*rel="canonical".*href=".*".*>/gi, (match) => {
return match.replace(url, origin);
});
}
/**
* Replace the monorailRegion value in the input data.
* @param data - The HTML data to process.
* @returns The processed HTML data with the updated monorailRegion value.
*/
function replaceMonorailRegionValue(data: string): string {
return data.replace(/"monorailRegion":"shop_domain"/gi, '"monorailRegion":"global"');
}
/**
* Remove window.location.replace calls from the input data.
* @param data - The HTML data to process.
* @returns The processed HTML data without window.location.replace calls.
*/
function removeWindowLocationReplaceCalls(data: string): string {
return data.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, (match) => {
return match.replace(/window\.location\.replace\([^)]*\);?/g, '');
});
}
/**
* Process the HTML data by updating meta tags, canonical tags, monorailRegion, and removing window.location.replace calls.
* @param data - The HTML data to process.
* @param origin - The origin of the request.
* @param url - The primary domain URL.
* @param config - The configuration object.
* @returns The processed HTML data.
*/
function processHtmlData(data: string, origin: string, url: string, config: Config): string {
let processedData = data;
if (config.removeNoIndex) {
processedData = removeNoIndexMetaTag(processedData);
}
if (config.updateCanonical) {
processedData = updateCanonicalTag(processedData, origin, url);
}
processedData = replaceMonorailRegionValue(processedData);
if (config.ignoreRedirects) {
processedData = removeWindowLocationReplaceCalls(processedData);
}
processedData = processedData.replace(new RegExp(url, 'g'), origin);
return processedData;
}
export async function loader({ request, context }: { request: Request; context: any }) {
try {
const {
shop: {
primaryDomain: { url },
},
} = await context.storefront.query(
`#graphql
query {
shop {
primaryDomain {
url
}
}
}
`,
{
cacheControl: config.cacheControl,
},
);
const { origin, pathname, search } = new URL(request.url);
const customHeaders = new Headers({
'X-Shopify-Client-IP': request.headers.get('X-Shopify-Client-IP') || '',
'X-Shopify-Client-IP-Sig': request.headers.get('X-Shopify-Client-IP-Sig') || '',
'User-Agent': 'Hydrogen',
});
const response = await fetch(url + pathname + search, {
headers: customHeaders,
});
if (response.status === 301) {
return redirect(response.headers.get('Location') || '');
}
const data = await response.text();
const processedData = processHtmlData(data, origin, url, config);
const status = /<title>(.|\n)*404 Not Found(.|\n)*<\/title>/i.test(data) ? 404 : response.status;
const headers = new Headers(response.headers);
headers.set('content-type', 'text/html');
headers.delete('content-encoding');
headers.set('Cache-Control', config.cacheControl);
return new Response(processedData, { status, headers });
} catch (error) {
console.error('Error in loader function:', error);
return new Response('An error occurred while processing the request.', { status: 500 });
}
}
@benjaminsehl
Copy link
Author

Tests:

import { test } from 'vitest';
import {
  processHtmlData,
  removeNoIndexMetaTag,
  updateCanonicalTag,
  replaceMonorailRegionValue,
  removeWindowLocationReplaceCalls,
} from './loader'; // Update the import path to match your loader file location

const origin = 'https://hydrogen.shop';
const url = 'https://checkout.hydrogen.shop';

// Test removeNoIndexMetaTag function
test('removeNoIndexMetaTag: removes noindex meta tag', () => {
  const data = '<meta name="robots" content="noindex, follow">';
  const result = removeNoIndexMetaTag(data);
  expect(result).not.toContain(data);
});

// Test updateCanonicalTag function
test('updateCanonicalTag: updates canonical tag', () => {
  const data = `<link rel="canonical" href="${url}/some-page">`;
  const result = updateCanonicalTag(data, origin, url);
  expect(result).toContain(`<link rel="canonical" href="${origin}/some-page">`);
});

// Test replaceMonorailRegionValue function
test('replaceMonorailRegionValue: replaces monorailRegion value', () => {
  const data = '"monorailRegion":"shop_domain"';
  const result = replaceMonorailRegionValue(data);
  expect(result).toContain('"monorailRegion":"global"');
});

// Test removeWindowLocationReplaceCalls function
test('removeWindowLocationReplaceCalls: removes window.location.replace calls', () => {
  const data = '<script>window.location.replace("https://shop.com/some-page");</script>';
  const result = removeWindowLocationReplaceCalls(data);
  expect(result).not.toContain('window.location.replace');
});

// Test processHtmlData function with different configurations
test('processHtmlData: processes data with default configuration', () => {
  const data = `
    <meta name="robots" content="noindex, follow">
    <link rel="canonical" href="${url}/some-page">
    "monorailRegion":"shop_domain"
    <script>window.location.replace("https://shop.com/some-page");</script>
  `;
  const result = processHtmlData(data, origin, url, {
    removeNoIndex: true,
    updateCanonical: true,
    ignoreRedirects: true,
  });
  expect(result).not.toContain('<meta name="robots" content="noindex, follow">');
  expect(result).toContain(`<link rel="canonical" href="${origin}/some-page">`);
  expect(result).toContain('"monorailRegion":"global"');
  expect(result).not.toContain('window.location.replace');
});

test('processHtmlData: processes data with custom configuration', () => {
  const data = `
    <meta name="robots" content="noindex, follow">
    <link rel="canonical" href="${url}/some-page">
    "monorailRegion":"shop_domain"
    <script>window.location.replace("https://shop.com/some-page");</script>
  `;
  const result = processHtmlData(data, origin, url, {
    removeNoIndex: false,
    updateCanonical: false,
    ignoreRedirects: false,
  });
  expect(result).toContain('<meta name="robots" content="noindex, follow">');
  expect(result).toContain(`<link rel="canonical" href="${url}/some-page">`);
  expect(result).toContain('"monorailRegion":"global"');
  expect(result).toContain('window.location.replace');
});

// Test processHtmlData function with complex input data
test('processHtmlData: processes complex data', () => {
  const data = `
    <html>
      <head>
        <meta name="robots" content="noindex, follow">
        <link rel="canonical" href="${url}/some-page">
        <meta name="description" content="A sample page">
        <link rel="stylesheet" href="${url}/styles.css">
      </head>
      <body>
        "monorailRegion":"shop_domain"
        <script>window.location.replace("https://shop.com/some-page");</script>
        <script>
          if (condition) {
            window.location.replace("https://shop.com/another-page");
          }
        </script>
      </body>
    </html>
  `;
  const result = processHtmlData(data, origin, url, {
    removeNoIndex: true,
    updateCanonical: true,
    ignoreRedirects: true,
  });
    expect(result).not.toContain('<meta name="robots" content="noindex, follow">');
  expect(result).toContain(`<link rel="canonical" href="${origin}/some-page">`);
  expect(result).toContain('"monorailRegion":"global"');
  expect(result).not.toContain('window.location.replace');
  expect(result).toContain('<meta name="description" content="A sample page">');
  expect(result).toContain(`<link rel="stylesheet" href="${origin}/styles.css">`);
});

@juanpprieto
Copy link

juanpprieto commented May 17, 2023

L122

if (response.status === 301) {

It would be good to add a response.ok check before L122. If it's not ok, maybe redirect to the homepage or throw an error

L128

const status = /<title>(.|\n)*404 Not Found(.|\n)*<\/title>/i.test(data) ? 404 : response.status;

Where does this <title>404 Not Found</title> response come from? Hydrogen? If yes, I'm a bit worried that we are relying on this text to be present. If this is a hydrogen error page response, I would consider returning a header from the 404 like Hydrogen-Error-Page and reading the header to validate if its indeed a 404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment