Commit Graph

116 Commits

Author SHA1 Message Date
xiongchiamiov
0702bc4886 Image previews: fix old urls
We've changed the url structure of image previews a number of times, which
breaks everything uploaded prior to the latest version.  This script should
find all preview images that have been uploaded thus far, move them to the
appropriate place, and save an updated and correct storage url in every Link
that uses them.
2015-05-26 11:47:56 -07:00
Jordan Milne
644f0988d7 Add a shim for requests' response.json to ease upgrading to 2.x
See http://docs.python-requests.org/en/latest/api/#migrating-to-1-x
for the rationale,

`.json()` also differs from `.json` in that it `raise`s instead of
returning `None` on a decoding error, but that shouldn't affect us
anywhere.

Conflicts:
	r2/r2/lib/media.py
2015-05-15 13:28:44 -07:00
Brian Simpson
91699b7acd Add backfill script for CommentScoresByLink. 2015-05-12 08:44:11 -04:00
umbrae
03bc77b203 Beta mode: Add preference and subreddit callouts 2015-05-07 10:57:47 -07:00
MelissaCole
f05603ad98 Add backfill script for num_gildings
This script will update Account.num_gildings for all gildings in the gold_table
(which is where trans_id like 'X%' in the gold table).
2015-05-06 11:10:39 -07:00
xiongchiamiov
6ef290a1cf Userpage: fix top listings for comments
If you go to a userpage and sort by top (in either the overview or comments
tabs), and restrict the time range to anything other than "all time", no
comments will be shown.

The data in these listings is built from functions in `lib/db/queries.py`
(specifically from `get_comments()` down).  This ends up trying to pull the
query results from permacache (in `CachedResults.fetch_multi()`), defaulting to
an empty list if no cache entry is found.

Now, the cache entry is supposed to be populated periodically by a cronjob that
calls `scripts/compute_time_listings`.  This script (and its Python helpers in
`lib/mr_top.py` and `lib/mr_tools/`) generates a dump of data from Postgresql,
then reads through that and builds up entries to insert into the cache.  As
with many scripts of this sort, it expects to get in some bad data, and so
performs some basic sanity checks.

The problem is that the sanity checks have been throwing out all comments.
With no new comments, there's nothing new to put into the cache!

The root of this was a refactoring in reddit/reddit@3511b08 that combined
several different scripts that were doing similar things.  Unfortunately, we
ended up requiring the `url` field on comments, which doesn't exist because,
well, comments aren't links.

Now we have two sets of fields that we expect to get, one for comments and one
for links, and all is good.

We also now have a one-line summary of processed/skipped entries printed out,
which will help to make a problem like this more obvious in the future.
2015-04-30 15:53:33 -07:00
zeantsoi
7f1aa2a520 Command line script to add subreddit to collection 2015-04-15 17:04:46 -07:00
Chad Birch
0fbea80d45 Integrate AutoModerator into the site 2015-03-31 14:56:19 -06:00
xiongchiamiov
fb507e59a6 TryLater: use more generic parameter name
`mature_items` made sense in the original context, but now that I'm stealing it
for other uses it's really just some set of some sort of data.
2015-03-03 15:50:04 -08:00
Jordan Milne
28a913c242 Add backfill script for deleted user accounts 2015-03-03 14:26:22 -08:00
Neil Williams
274a8d7008 Overhaul populatedb script.
This new script attempts to generate some subreddits that are more like
production data.  It first pulls down data from reddit.com, then uses
markov chains to generate new data for insertion into the databases.
2015-03-02 14:44:57 -08:00
Neil Williams
1605b48eda upload_static_files_to_s3: Improve logging clarity.
The goal is to make it seem less like the build is "hanging" during the
upload step.
2015-03-02 10:41:07 -08:00
Neil Williams
30e2256fdd upload_static_files_to_s3: Remove vestigial support for gzipped statics.
This feature was never really used and the core support for it was torn
out in 68857e1a7d.
2015-03-02 10:41:07 -08:00
Neil Williams
64a469e64e upload_static_files_to_s3: Rework to use command line arguments.
* configuration now comes from the command line so it's easier to use
  for multiple projects.
* the bucket is now an s3 url allowing a path prefix to be added to
  files.
* authentication now comes from boto's credential system which allows
  us to use IAM roles.
2015-03-02 10:41:07 -08:00
Neil Williams
4090acd8d8 Remove unused static file cleaner.
It never really worked right and is getting in the way now.
2015-03-02 10:41:07 -08:00
Neil Williams
f501106d9c zookeeper: gzip live config payload.
This takes our current config payload from 4700 bytes to 1700. The goal
is to reduce zookeeper network load during config changes as well as app
restarts during deploys.
2015-03-02 10:41:07 -08:00
Neil Williams
f00e90f0ed zookeeper: Remove obsolete LiveDict class.
This is no longer used since the relevant consumers have switched to
Cassandra as a backing store.
2015-03-02 10:41:07 -08:00
umbrae
0850bd3044 Tracker: URL decode session cookie 2015-02-20 23:58:51 -08:00
umbrae
ea5aa9c538 Tracker: add domain prefix to redirect domain 2015-02-20 21:49:46 -08:00
umbrae
76fb41a3f8 Tracker: add session tracking redirector
This adds in two redirects - `event_click` and `event_redirect` - `event_click`
to allow appending in a user ID to an event before redirect, if we require one,
and `event_redirect` to service a local evented redirect, similar to ad clicks.

`event_click` is necessary for tracking clicks from users on embeds, which are
served via redditmedia, and therefore are always anonymous. When a user clicks
through, we want to know who they were and redirect them on their way. Because
of the way we're using nginx to store events as an access log right now, this
means we'll need to use two redirects: one to append the session ID and
another to store the event with the proper session ID.
2015-02-20 21:23:09 -08:00
John-William Trenholm
9e64e56960 tracker.py: use env var for configuration file
Upstart needs to use a environment variable to determine the
configuration file.
2015-02-12 16:00:04 -08:00
Jordan Milne
b1dbd3071d Handle allocation failures in C code gracefully
Thanks to Nathanael A. Hoyle for the report! Some of these may have
been exploitable due to pointer arithmetic before reads / writes.
Just bail out if we can't allocate.
2015-02-02 16:07:13 -08:00
Neil Williams
af09fa8dee Update license headers to 2015.
The highlight of each year for me.
2015-01-08 13:35:03 -08:00
Brian Simpson
5863fb6b8f pixel: Use user id36 rather than user name. 2015-01-06 04:01:12 -05:00
Brian Simpson
00e03edbfd Traffic processing: Validate clicks by checking response code.
Double checking in the click app and in the processing scripts was
difficult. Just trust the click app and assume any request that got
a 302 response is valid.
2014-12-12 17:00:29 -05:00
Brian Simpson
af04006baf click: unquote destination before unmangling query string. 2014-12-10 13:09:19 -08:00
Brian Simpson
966bb14675 Click redirect: fix encoding of destination url.
Some advertisers set their ad's url to an intermediate tracker so
they can independently track clicks. This results in a series of
redirects like this:

reddit tracker > intermediate tracker > final destination

The ad's url is communicated to the reddit tracker through a query
parameter which is urlencoded on reddit.com and then unquoted when
being handled by the reddit tracker. This unquoting causes problems
if there is an intermediate tracker with its own query string
that needs to be urlencoded. This commit adds handling for those query
strings.
2014-12-10 13:09:18 -08:00
Brian Simpson
a2e41ed4a6 click: Don't unquote destination url.
The url is already unquoted correctly and double unquoting can cause
problems with unicode characters.
2014-11-04 09:21:45 -05:00
Brian Simpson
78631fa746 Properly encode arguments for click tracker. 2014-10-09 05:55:50 -04:00
Brian Simpson
4bd2eb6ab8 Remove support for old click and impression hashes. 2014-10-09 05:55:50 -04:00
Brian Simpson
0a67287684 tracker: Delete unused adtracker_url. 2014-10-07 16:21:59 -04:00
Brian Simpson
4a9d7457bd tracker: Use constant_time_compare for hash check. 2014-10-07 16:21:54 -04:00
Brian Simpson
5a012fb789 Support new click and impression hashes that don't include IP.
Support new and old style hashes in verify.c and the click redirect app,
but only generate old style hashes.
2014-10-07 16:21:26 -04:00
umbrae
ee5ea8ca3c Inbox_counts: corrections on deletes, spams, edits
Conflicts:

	r2/r2/controllers/listingcontroller.py
2014-10-03 11:46:48 -07:00
umbrae
aad94d3f80 Inbox counts: fix typo in backfill script 2014-10-03 11:46:45 -07:00
umbrae
09c98d6dde Inbox counts: Add dark unread counts badge, start writing to inbox_count 2014-10-03 11:46:34 -07:00
Brian Simpson
eb9f0ae0e3 PromotionWeights: speed up queries by using distinct.
The queries are used to find the ids of PromoCampaign or Link objects
and we don't need the many (one per campaign per subreddit target per day)
PromotionWeights objects.
2014-10-01 02:44:16 -04:00
umbrae
9361596d68 Register hooks on app load rather than inline 2014-08-23 00:09:05 -07:00
umbrae
c0bff7498b Support 'all' in compute_time_listings 2014-07-17 13:03:13 -07:00
Brian Simpson
c18dcac467 Add subreddit gildings backfill script. 2014-06-11 14:39:06 -04:00
Roger Ostrander
dc68b16776 Trylater: Enable temporary subreddit bans 2014-06-05 14:45:28 -07:00
Brian Simpson
20f57a17eb Add GeoIP service. 2014-05-28 12:57:10 -07:00
Neil Williams
90cfcaaecc Update license headers to 2014.
Ok, now I'm getting some angst in my commit messages like my
predecessors had.  I understand now.  It's a terrible burden.  Why must
the calendar progress?  Why must numbers increment?  The world is
forever turning.

The future is here.

It is 2014.
2014-05-02 16:26:31 -04:00
Brian Simpson
17fcb723fc fetch_trackers: allow up to 100 ids. 2014-04-01 21:41:39 -04:00
Neil Williams
d2ccc40733 Automatically delete password hashes of deleted accounts.
The password hash is no longer necessary once an account is deleted (and
after a period of time for safety in case it needs to be restored)
2014-02-26 12:45:55 -08:00
Roger Ostrander
89762c93f0 Add TryLater: a system for scheduling events. 2014-02-26 12:45:55 -08:00
Chad Birch
7b24dacd77 compute_time_listings MINID query: order by date 2014-02-26 11:44:08 -08:00
Neil Williams
3511b08110 Combine and generalize the time listing precomputer scripts.
Previously, the subreddit/domain and account precomputers were separate.
This merges the two and improves their portability in the process.
Because of the increased portability, the precomputer can now be added
to the install script by default.
2014-02-13 13:50:52 -08:00
Neil Williams
5e249f4773 Make all moderators have a modmsgtime attribute.
This attribute can serve as a handy indicator that a user is a moderator
somewhere and can therefore replace the more costly modship lookup in
reddit_base.
2014-01-12 10:08:07 -05:00
bsimpson63
2a05f17161 No intermediate storage step in mr_process_hour.pig.
Made possible by upgrading pig to 0.10.
2013-12-05 04:04:35 -05:00