spiderable: Let apps configure timeout for phantomjs

`Spiderable.requestTimeout` can now be changed in server code in an app
to the number of milliseconds to wait until spiderable gives up on
phantomjs.

This is motivated by frontpage hitting 15 seconds at times (due to some
other problem we have), but regardless slow page loads are better than
non-crawlable ones.
This commit is contained in:
Avital Oliver
2015-05-13 19:28:14 -07:00
parent e666e12210
commit 2c343a0788
2 changed files with 7 additions and 3 deletions

View File

@@ -125,6 +125,9 @@
package on `EmailInternals.NpmModules`. Allow specifying a `MailComposer`
object to `Email.send` instead of individual options. #4209
* Expose `Spiderable.requestTimeout` from `spiderable` package to
allow apps to set the timeout for running phantomjs.
### Other bug fixes and improvements

View File

@@ -15,8 +15,9 @@ var urlParser = Npm.require('url');
Spiderable.userAgentRegExps = [
/^facebookexternalhit/i, /^linkedinbot/i, /^twitterbot/i];
// how long to let phantomjs run before we kill it
var REQUEST_TIMEOUT = 15*1000;
// how long to let phantomjs run before we kill it (and send down the
// regular page instead). Users may modify this number.
Spiderable.requestTimeout = 15*1000;
// maximum size of result HTML. node's default is 200k which is too
// small for our docs.
var MAX_BUFFER = 5*1024*1024; // 5MB
@@ -107,7 +108,7 @@ WebApp.connectHandlers.use(function (req, res, next) {
['-c',
("exec phantomjs " + phantomJsArgs + " /dev/stdin <<'END'\n" +
phantomScript + "END\n")],
{timeout: REQUEST_TIMEOUT, maxBuffer: MAX_BUFFER},
{timeout: Spiderable.requestTimeout, maxBuffer: MAX_BUFFER},
function (error, stdout, stderr) {
if (!error && /<html/i.test(stdout)) {
res.writeHead(200, {'Content-Type': 'text/html; charset=UTF-8'});