This is a rewrite of much of the code related to voting. Some of the
improvements include:
- Detangling the whole process of creating and processing votes
- Creating an actual Vote class to use instead of dealing with
inconsistent ad-hoc voting data everywhere
- More consistency with naming and other similar things like vote
directions (previously had True/False/None in some places,
1/-1/0 in others, etc.)
- More flexible methods in determining and applying the effects of votes
- Improvement of modhash generation/validation
- Removing various obsolete/unnecessary code
The write_live_config script often gives really strange results when
trying to display diffs of changes to dict values, since the ordering of
a dict is not defined. So key/value pairs will sometimes be rearranged
between the old and new versions, creating a confusing diff.
This changes to use the pprint module to generate string representations
for dicts, because it sorts the dict by key before outputting it, so we
should get consistent representations that can be compared more easily.
http://pylons-webframework.readthedocs.org/en/latest/upgrading.html
This requires several code changes:
* pylons `config` option must be explicitly passed during setup
* the pylons global has been renamed from `g` to `app_globals`
* the pylons global has been renamed from `c` to `tmpl_context`
* set pylons.strict_tmpl_context = False (instead of pylons.strict_c)
* redirect_to() has been swapped for redirect()
* must implement `ErrorDocuments` middleware ourselves
pylons 1.0 also required an upgrade of routes from 1.11 to 1.12. This
required the following changes:
* set Mapper.minimization = True (the default value changed)
* set Mapper.explicit = False (the default value changed)
Turns out that SequenceMatcher is not quite as neat as it appears. When
diffing a string change like:
old: "this is the old string value"
new: "on"
It was displayed as:
- "this is the "
- "ld stri"
- "g value"
Since the "o" and "n" were kept. This just displays it as a wholesale
key removal/addition unless the strings are at least 50% similar.
The script would previously dump out the entire new parsed live
configuration, which (now that we have a lot of fields in it) made it
difficult to find the ones that had actually changed. This fetches the
existing live config, then compares it to the new one and only outputs
any data that has changed for confirmation.
There's no index on SRMember.c.rel_id so instead sort the query
by SRMember.c.thing2_id (the user's id). Also the timestamp for
writing to C* is updated to an integer timestamp corresponding to
when the dual write was deployed.
We've changed the url structure of image previews a number of times, which
breaks everything uploaded prior to the latest version. This script should
find all preview images that have been uploaded thus far, move them to the
appropriate place, and save an updated and correct storage url in every Link
that uses them.
See http://docs.python-requests.org/en/latest/api/#migrating-to-1-x
for the rationale,
`.json()` also differs from `.json` in that it `raise`s instead of
returning `None` on a decoding error, but that shouldn't affect us
anywhere.
Conflicts:
r2/r2/lib/media.py
If you go to a userpage and sort by top (in either the overview or comments
tabs), and restrict the time range to anything other than "all time", no
comments will be shown.
The data in these listings is built from functions in `lib/db/queries.py`
(specifically from `get_comments()` down). This ends up trying to pull the
query results from permacache (in `CachedResults.fetch_multi()`), defaulting to
an empty list if no cache entry is found.
Now, the cache entry is supposed to be populated periodically by a cronjob that
calls `scripts/compute_time_listings`. This script (and its Python helpers in
`lib/mr_top.py` and `lib/mr_tools/`) generates a dump of data from Postgresql,
then reads through that and builds up entries to insert into the cache. As
with many scripts of this sort, it expects to get in some bad data, and so
performs some basic sanity checks.
The problem is that the sanity checks have been throwing out all comments.
With no new comments, there's nothing new to put into the cache!
The root of this was a refactoring in reddit/reddit@3511b08 that combined
several different scripts that were doing similar things. Unfortunately, we
ended up requiring the `url` field on comments, which doesn't exist because,
well, comments aren't links.
Now we have two sets of fields that we expect to get, one for comments and one
for links, and all is good.
We also now have a one-line summary of processed/skipped entries printed out,
which will help to make a problem like this more obvious in the future.
This new script attempts to generate some subreddits that are more like
production data. It first pulls down data from reddit.com, then uses
markov chains to generate new data for insertion into the databases.
* configuration now comes from the command line so it's easier to use
for multiple projects.
* the bucket is now an s3 url allowing a path prefix to be added to
files.
* authentication now comes from boto's credential system which allows
us to use IAM roles.
This takes our current config payload from 4700 bytes to 1700. The goal
is to reduce zookeeper network load during config changes as well as app
restarts during deploys.
This adds in two redirects - `event_click` and `event_redirect` - `event_click`
to allow appending in a user ID to an event before redirect, if we require one,
and `event_redirect` to service a local evented redirect, similar to ad clicks.
`event_click` is necessary for tracking clicks from users on embeds, which are
served via redditmedia, and therefore are always anonymous. When a user clicks
through, we want to know who they were and redirect them on their way. Because
of the way we're using nginx to store events as an access log right now, this
means we'll need to use two redirects: one to append the session ID and
another to store the event with the proper session ID.
Thanks to Nathanael A. Hoyle for the report! Some of these may have
been exploitable due to pointer arithmetic before reads / writes.
Just bail out if we can't allocate.
Double checking in the click app and in the processing scripts was
difficult. Just trust the click app and assume any request that got
a 302 response is valid.
Some advertisers set their ad's url to an intermediate tracker so
they can independently track clicks. This results in a series of
redirects like this:
reddit tracker > intermediate tracker > final destination
The ad's url is communicated to the reddit tracker through a query
parameter which is urlencoded on reddit.com and then unquoted when
being handled by the reddit tracker. This unquoting causes problems
if there is an intermediate tracker with its own query string
that needs to be urlencoded. This commit adds handling for those query
strings.