Skip to content

Instantly share code, notes, and snippets.

@Stanback
Last active February 4, 2022 18:05
Show Gist options
  • Save Stanback/6998085 to your computer and use it in GitHub Desktop.
Save Stanback/6998085 to your computer and use it in GitHub Desktop.
Example Nginx configuration for serving pre-rendered HTML from Javascript pages/apps using the Prerender Service (https://github.com/collectiveip/prerender).Instead of using try_files (which can cause unnecessary overhead on busy servers), you could check $uri for specific file extensions and set $prerender appropriately.
# Note (November 2016):
# This config is rather outdated and left here for historical reasons, please refer to prerender.io for the latest setup information
# Serving static html to Googlebot is now considered bad practice as you should be using the escaped fragment crawling protocol
server {
listen 80;
listen [::]:80;
server_name yourserver.com;
root /path/to/your/htdocs;
error_page 404 /404.html
index index.html;
location ~ /\. {
deny all;
}
location / {
try_files $uri @prerender;
}
location @prerender {
#proxy_set_header X-Prerender-Token YOUR_TOKEN;
set $prerender 0;
if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_|prerender=1") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($prerender = 1) {
rewrite .* /$scheme://$host$request_uri? break;
#proxy_pass http://localhost:3000;
proxy_pass http://service.prerender.io;
}
if ($prerender = 0) {
rewrite .* /index.html break;
}
}
}
@Stanback
Copy link
Author

Stanback commented Jan 7, 2014

@saintberry Good question, you can try getting rid of the following lines:

if ($args ~ "_escaped_fragment_|prerender=1") {
    set $prerender 1;
}

Since it's already checking for the user agent, I don't think there will be a problem with omitting that additional check. An alternate option that you could experiment with would be to set $prerender to 0 if the $http_referer matches Facebook.

@Stanback
Copy link
Author

FYI:
Googlebot is starting to render Javascript so you may want to remove it from the user agents list (after testing in webmaster tools)

See:
http://googlewebmastercentral.blogspot.com/2014/05/rendering-pages-with-fetch-as-google.html

@dfmcphee
Copy link

Could this be used with another proxy pass to a node.js application using an upstream? I can't seem to get it to work.

@mrchilds
Copy link

mrchilds commented Feb 9, 2015

@ldijkstra
Copy link

As I am not familiar with all this, I'd like to have some general explanation and guidance. Forgive me for being ignorant. Given the docs I have read I am assuming this:

2: listen to port 80 )
3: I have no clue what this means
4: the server name
6: the root path where the documents are
8: error page
9: the index page
11-13: location with obviously a regex, probably this is saying that we are not serving yourserver.com/. ??
15-17: first we try serving the uri that is input, if we cannot we will default to named location 'prerender' ??

Now, I have a case where I want to serve any incoming request directly, except for Facebot, Twitterbot and maybe some other bots. But Googlebot for instance just processes the JS fine.

BTW: I am not using <meta name="fragment" content="!" />

For Facebot I want to serve very plain HTML with some og meta tags. This is the first I will be focussing on.
Now I would prefer for the regular website not suffering from the overhead of processing conditional logic. As it seems that is impossible, right?

What confuses me is the try_files. This is obviously an AJAX / Angular scenario. So, we have no static pages, only index.html. That is why try_files will always hit the fallback @prerender, am I right?

With regards to the prerender section, I would probably have something like:

if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {
34           rewrite .* /$scheme://$host$request_uri? break;
35            #proxy_pass http://localhost:3000;
36            proxy_pass http://service.prerender.io;
        }

I think the above means that the uri is changed first [line 34], but it seems to change it to what is was exactly? And then passed on to the server at http://service.prerender.io [line 36].

If prerendering does not apply you are serving index.html [line 39](to the b browser)

Finally, I have the following questions:

  • I had a discussion with the developer he said he could check for the http_user_agent in code (in this case node.js).
    • what are the pros and cons of doing it it in nginx like this?
  • I have also seen an alternative method outlined here: http://serverfault.com/questions/316541/check-several-user-agent-in-nginx
    • Expecially the map sections seems a pretty elegant solution for checking multiple distinct cases, I would like to have your comment on this as well.

Last but not least, I have been trying / experimenting in order to learn something. That will never hurt.
So had this scenario in mind.
IF origin = Facebot
use proxy_pass to relay to server that serves what Facebook needs
ELSE IF origin = Twitter
use proxy_pass to relay to server that serves what Twitter needs
...
ELSE
serve index.

I would really have liked to do some conditional processing using the location derivative.

        # route to server for prerendered stuff
        location  ?????  {
            proxy_pass http://localhost:8081;
        }

But that does not seem possible. So we require IF or MAP or something similar.

@ldijkstra
Copy link

Finally, I tried something like this:

        location / { 
            if ($ua_redirect != '') {
                rewrite .* /$scheme://$host$request_uri? break;
                proxy_pass $scheme://localhost:$ua_redirect;
            }
        }

It seems proxy_pass does not like dynamic stuff? So I would need to add a resolver:

resolver 127.0.0.1;

And deploy a DNS server on my host machine. Am I right?

@Jrubensson
Copy link

@dfmcphee Did you get that to work? Encountered the same problem right now. I've got one proxy_pass for my node-servers and the real clients requesting the web page, and want to redirect the rest to another proxy_pass. Anyone got a solution for this?

@manikmi
Copy link

manikmi commented Jul 31, 2017

Hi Brian,
I am using angularjs based website with nginx as the server and without node js.
I have used the nginx settings as mentioned here: https://gist.github.com/thoop/8165802
I have configured a two server approach:

Website is running on a server: www.mywebsite.com. Running on SSL port 443
Prerender is running on another server: www.myprerenderservice.com. Running on port 80
The prerender service is running with PM2 as the process manager

In google crawler, I have used ?escaped_fragment= as well

The google crawler is sometimes able to return the results, while sometimes not. If it fails, I just tried killing the Prerender PM2 service and start it again and clear memory on server and then try it again. It starts working again for once. But it fails again. Don't know what is happening. Can you help?

image

@FamousMai
Copy link

I want to limit the rendering of only partial pages. Do you have better settings?

# Except for the home page and the info page, other pages are not forwarded.
if ($document_uri !~ "/index.html|/info.html") {
   set $prerender 0;
}

@cojocaru3
Copy link

Is there a way to avoid using prerender.io at all?
Let's say I have generated static files with rendertron, I want to store them in a sub-folder
How can i ask nginx "if is googlebot|otherbot", please load files from this directory?
Why would I need a rendertron server running all the time, or why would I need prerender.io, if the end result is kinda the same as with static files...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment