Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Last active November 7, 2024 06:28
Show Gist options
  • Save tanaikech/e6251657d425d2827fee6dd3daf47976 to your computer and use it in GitHub Desktop.
Save tanaikech/e6251657d425d2827fee6dd3daf47976 to your computer and use it in GitHub Desktop.
Workaround for Retrieving Direct Links of All Sheets from URL of 2PACX-### of Web Published Google Spreadsheet

Workaround for Retrieving Direct Links of All Sheets from URL of 2PACX-### of Web Published Google Spreadsheet

This is a sample script for retrieving the direct links of all sheets from the URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml of the web published Google Spreadsheet. This sample script can be used for the following situation.

  1. The Spreadsheet is published to Web and the URL like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml is known
  2. You are not the owner of Google Spreadsheet.
  3. You don't know the Spreadsheet ID and Sheet IDs.

Under above situation, unfortunately, the direct links of each sheet cannot be directly retrieved. I think that this is the specification of Google side. So in this post, I would like to introduce a workaround for retrieving the direct links of each sheet under above situation.

Flow

The flow of this workaround is as follows.

  1. Download the Google Spreadsheet as a XLSX data from the URL of https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml.
  2. Convert the XLSX data to Google Spreadsheet.
  3. Publish the converted Google Spreadsheet to Web.
  4. Retrieve the URLs of each sheet.

Sample script 1

Please copy and paste the following script (Google Apps Script) to the script editor. And please enable Google Drive API at Advanced Google services. This script is for the Web Apps.

function myFunction() {
  const inputUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml";

  const prop = PropertiesService.getScriptProperties();
  const ssId = prop.getProperty("ssId");
  if (ssId) {
    DriveApp.getFileById(ssId).setTrashed(true);
    prop.deleteProperty("ssId");
  }
  const re = new RegExp(
    "(https?:\\/\\/docs\\.google\\.com\\/spreadsheets\\/d\\/e\\/2PACX-.+?\\/)"
  );
  if (!re.test(inputUrl)) throw new Error("Wrong URL.");
  const url = `${inputUrl.match(re)[1]}pub?output=xlsx`;
  const blob = UrlFetchApp.fetch(url).getBlob();
  const id = Drive.Files.insert(
    { mimeType: MimeType.GOOGLE_SHEETS, title: "temp" },
    blob
  ).id;
  prop.setProperty("ssId", id);
  Drive.Revisions.update(
    { published: true, publishedOutsideDomain: true, publishAuto: true },
    id,
    1
  );
  const sheets = SpreadsheetApp.openById(id).getSheets();
  const pubUrls = sheets.map((s) => ({
    [s.getSheetName()]: `https://docs.google.com/spreadsheets/d/${id}/pubhtml?gid=${s.getSheetId()}`,
  }));
  console.log(pubUrls);
}
  • When this script is run, the Google Spreadsheet is created to the root folder and the direct links of each sheet of the web published Spreadsheet are returned.

  • In this script, when the script is run, the Google Spreadsheet is downloaded as a XLSX data, and the XLSX data is converted to Google Spreadsheet. Then, the converted Spreadsheet is published to the web. By this, the direct links of each sheet can be retrieved.

    • Also, in this script, it supposes that the original Spreadsheet is changed. So if you run the script again, the existing Spreadsheet is deleted and new Spreadsheet is created by downloading from the original Spreadsheet. In this case, the URLs are updated.

    • So if the Spreadsheet is not changed, you can continue to use the retrieved URLs. Of course, you can also directly use the downloaded and converted Spreadsheet.

Sample script 2

Please copy and paste the following script (Google Apps Script) to the script editor. And please enable Google Drive API v2 at Advanced Google services. This script is for the Web Apps.

As another workaround, when the original Spreadsheet is often changed, and the number of sheet is constant in the original Spreadsheet, and then, you want to retrieve only values, you can also use the following script. In this script, the URL is not changed even when the script is run again. So you can continue to use the URL. And, when this function is run using the time-driven trigger, you can also synchronize with the original Spreadsheet.

function myFunction() {
  const inputUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml";
  
  const re = new RegExp("(https?:\\/\\/docs\\.google\\.com\\/spreadsheets\\/d\\/e\\/2PACX-.+?\\/)");
  if (!re.test(inputUrl)) throw new Error("Wrong URL.");
  const url = `${inputUrl.match(re)[1]}pub?output=xlsx`;
  const blob = UrlFetchApp.fetch(url).getBlob();
  const prop = PropertiesService.getScriptProperties();
  let sheets;
  let ssId = prop.getProperty("ssId");
  if (ssId) {
    const temp = Drive.Files.insert({mimeType: MimeType.GOOGLE_SHEETS, title: "tempSpreadsheet"}, blob).id;
    const tempSheets = SpreadsheetApp.openById(temp).getSheets();
    sheets = SpreadsheetApp.openById(ssId).getSheets();
    tempSheets.forEach((e, i) => {
      const values = e.getDataRange().getValues();
      sheets[i].getRange(1, 1, values.length, values[0].length).setValues(values);
    });
    DriveApp.getFileById(temp).setTrashed(true);
  } else {
    ssId = Drive.Files.insert({mimeType: MimeType.GOOGLE_SHEETS, title: "copiedSpreadsheet"}, blob).id;
    Drive.Revisions.update({published: true, publishedOutsideDomain: true, publishAuto: true}, ssId, 1);
    prop.setProperty("ssId", ssId);
    sheets = SpreadsheetApp.openById(ssId).getSheets();
  }
  const pubUrls = sheets.map(s => ({[s.getSheetName()]: `https://docs.google.com/spreadsheets/d/${ssId}/pubhtml?gid=${s.getSheetId()}`}));
  console.log(pubUrls);  // You can see the URLs for each sheet at the log.
}

Testing

  • On September 12, 2024, confirmed that the script works. In the current stage, when Drive API is enabled at Advanced Google services, v3 is automatically used as the default version. So, when you use this script, please enable Drive API v2. When you use Drive API v3, please modify Drive.Files.insert({mimeType: MimeType.GOOGLE_SHEETS, title: "###"}, blob) to Drive.Files.create({mimeType: MimeType.GOOGLE_SHEETS, name: "###"}, blob).

References

@Nevidebla
Copy link

how to download xlsx in "pubhtml" link?

To download it, as any allowed format (xslx, pdf, csv...), just replace "/pubhtml" with "/export?format=[Yourformat]". Ex: Original: https://docs.google.com/spreadsheets/d/[SpreadSheetID]/pubhtml Downloadable version: https://docs.google.com/spreadsheets/d/[SpreadSheetID]/export?format=xlsx

Thx, but it is no use to this:
https://docs.google.com/spreadsheets/d/e/2PACX-1vSTWLlj1luPBQBGNpzs_npdN7oM-0OFwmfdduufbXSOjxQDD3bkPdeo23xE0r6rHFwX1SWmYM0j9xJW/pubhtml#

@bidaumm
Copy link

bidaumm commented Nov 9, 2021

@TuVieja007
Copy link

BRO THANK YOU

@DictoDictov
Copy link

Doesn;t work with https://docs.google.com/spreadsheets/d/e/2PACX-1vQvfslN3Xa7nMYeC2fhPTEPIyjsbTzi_8F9pX-4zpqwjXLab5qXhiFhA_JvZT-Si6fF67mE-WlWesbL/pubhtml
Tried with export?format=xlsx, export?output=xlsx, pub?format=xlsx, pub?output=xlsx, pubhtml?format=xlsx and pubhtml?output=xlsx.

@Guise-Leon
Copy link

Also doesn't work with :https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vTl1soDsbFySVbaPg1jsk2y8taNAbmb9ny1IWHhBed6rxjUXFzHm_3nGiOrjqZMyjt70cx99-EdSWMF/pubhtml. Tried the same as @DictoDictov

Doesn;t work with https://docs.google.com/spreadsheets/d/e/2PACX-1vQvfslN3Xa7nMYeC2fhPTEPIyjsbTzi_8F9pX-4zpqwjXLab5qXhiFhA_JvZT-Si6fF67mE-WlWesbL/pubhtml Tried with export?format=xlsx, export?output=xlsx, pub?format=xlsx, pub?output=xlsx, pubhtml?format=xlsx and pubhtml?output=xlsx.

@DictoDictov
Copy link

DictoDictov commented Nov 6, 2024

Well, it seems that the trick with "output=xlsx" works with some tables and doesn't work with the others.
These are OK: example 1 (pubhtml link), example 2 (pubhtml link);
these are not: example 1 (pubhtml link), example 2 (pubhtml link).
I can not see any fundamental differences between them: each has the name on the top, then all sheets (if more than one), "Published by" at the bottom etc. Could some research what's happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment