This will let us get a sense of how much work is actually done. I'm looking
at splitting the CommentOrderer update out into a separate queue and need
to understand how many writes actually happen.
This warning was no longer true--any missing scores are automatically
calculated and updated.
We actually have the opposite issue--the CommentTree must be updated
before writing scores because the QA score reads it.
Score updates are processed through commentstree_q. When a new comment
is created an automatic initial vote (by the comment's author) is created.
This results in two messages in commentstree_q: one from the vote and one
from queries.new_comments. Don't create the message from the vote because it
is redundant. This will let us reduce the volume of messages in commentstree_q
which is currently very high.
Instead of checking _featurestate_cache for a key's existence and then
retrieving it, just get it and then check for a miss. Doing the two step
process can result in a KeyError if _featurestate_cache is cleared between
the existence check and the retrieval.
"email", "authorize", "hc", and "traffic" databases aren't used that
often and maybe we can reduce the number of connections to pg-05 by
waiting to establish a connection until it's actually needed.
The fastlane processor was meant to handle votes on both Links and Comments,
but it can't do that easily anymore now that the vote processing has been
split. It's not a big deal now because Link vote processing is much faster
now that the query updating has been separated and sharded. The Comment vote
consumer/queue was getting some benefit from the fastlane and it can be
resurrected if we run into problems.
Change the error from a 500 to a line in the error log with more info about what/why it failed.
This seems to be the result of someone's crappy votebot.
The queue can be sharded by domain to minimize lock contention
and the consumer will batch updates to the same links (e.g. several
votes for the same link) and to the same domain (e.g. votes for different
links to a single domain).
The queue can be sharded by subreddit id to minimize lock contention
and the consumer will batch updates to the same links (e.g. several
votes for the same link) and to the same subreddit (e.g. votes for different
links submitted to a single subreddit).
The queue can be sharded by author id to minimize lock contention
and the consumer will batch updates to the same links (e.g. several
votes for the same link) and to the same author (e.g. votes for different
links submitted by a single author).
The raven client inspects the traceback and attempts to figure out which
parts belong to the app and which belong to external libraries. It uses
a whitelist of paths to identify application code. Previously we had been
using the list of repository names, but that was incorrect because the
plugin "liveupdate" is actually called "reddit_liveupdate" in the traceback.
The whitelist was also incomplete because it didn't account for scripts run with
paster run which can have a path like /opt/something/script.py.
This creates a sys.excepthook handler that reports any exceptions
to Sentry. This results in double reporting errors when in script
mode because the exception is re-raised and caught by that handler.
We can conduct experiments that impact how pages are rendered across
users, bucketing pages according to the fullname, so that search engines
will crawl and index the same experimental content that users see. We
support subreddit listings pages, comments pages, and comment permalink
pages. We use the link fullname for both comments pages and comment
permalink pages, so that they are bucketed together.
We don't want to spend crawl budget or rank on what are essentially
duplicate pages. In case we have inbound links to these pages, we don't
want the robots.txt to prevent crawlers from accessing them.