In the Q&A sort type, we now collapse everything except:
* top-level posts
* OP posts
* the comments preceding OP posts
to help readers find OP interactions.
prequeued_votes is meant to hide the fact that votes are processed
asynchronously when a user refreshes a page right after voting (i.e. the
vote arrow will remain colored even if it wasn't processed fully yet on
the backend.)
Now that LastModified for votes is written in-request rather than in the
queue, we can take advantage of this to reduce the number of items we
have to look up prequeued_votes for.
A new configurable, vote_queue_grace_period, specifies what amount of
queue lag we're willing to paper over. If the user hasn't voted in that
grace period, we just assume that the vote has been processed, skip
prequeued_votes lookups for it and go straight to Cassandra.
Additionally, if we are going to do lookups, we'll skip lookups for
items that were created since the last time the user voted like we were
already doing at the DenormalizedRelation layer.
This will likely shift some reads to Cassandra so we should keep an eye
on that.
Even if it's opt-in, we want people to be able to easily unsubscribe from
notification emails.
Using an HMAC instead of a generated token means we don't have to store
anything extra, but just perform a calculation on email send and in the
unsubscribe responder.
Despite our best efforts, we're probably still going to appear a bit spammy
with our notification emails. To help prevent this from affecting everything
else, we can send these from an alternate domain.
This allows setting (via live config) minimum age and karma requirements
to be able to create a subreddit. The age requirement and at least one
of the karma requirements must be met. A hook was added as well for
potential private-code use.
This allows the minimum amount of karma needed to be exempt from the
captcha to be modified via live config. In addition, it adds the
ability to set a comment karma minimum, where previously it was required
to get link karma in order to be exempt from the captcha.
A hook has also been added to the function for private-code purposes.
While in the process of rolling out comment embeds, we'd like to have restrict
our beta a bit - because by their nature, once embeds are out, we lose control
over them, making it extremely difficult to make changes. So we're restricting
the embed generation modal to a certain subset of users (for now), but a savvy
user could simply modify an existing public embed to plug in another comment
id, which would defeat the point of restricting it. Enter hmac.
We know generate a unique token for each comment, and only by using the
appropriate token will your embed work. This will be transparent to users, as
its just another piece of the html that they copy and paste onto their website.
Performance-wise, we're generating tokens for every comment that can be
embedded. However, for now that's a limited set, and the operation is pretty
fast (roughly 5ms for 1000 tokens on my dev VM); if that becomes a problem, we
can easily take this code out after we no longer need the restriction.
I forgot how insanely brittle this bit of code is since I last tried
messing with it last year. At some point we might want to look into
deprecating everything but oauth., the lang subdomains, and www. To
hell with `www.ssl.circlejerk.json.reddit.com`.
This is the final step in the saga of the pg vote rel destruction. We've
been dual-writing to PG and C* while gaining confidence in the pure-C*
model being able to survive full load. This kills the pgvote databases
and moves forward in a pure-cassandra world for votes which should save
us considerable operational headaches. After rolling this out, we can
not switch back without considerable effort.
When he reached the New World, Cortez burned his relational databases.
As a result his queue processors were well motivated.
The logic of this code contained a couple subtle errors that could cause
strange behavior. In reddit's current state of having two "automatic
subreddits" (which are always included in the front page set, and not
counted towards the limit), the fact that the automatic_ids list could
have an item removed while being iterated over meant that unsubscribing
from the first automatic subreddit (/r/blog) made it so that it was
effectively impossible to unsubscribe from the second one
(/r/announcements). If you unsubscribed, it would still be present in
your front page regardless, and if you stayed subscribed it would
actually be present twice.
The goal of a login ratelimit system is to prevent brute force attacks
on passwords.
The current login ratelimit system is based on VDelay which uses
exponential backoff based on IP address after failed login attempts.
This is not ideal because of corporate proxies and LSN causing the
number of false positives to be very high resulting in users getting
the dreaded "you've been doing that too much".
This new system uses a factored out version of the core ratelimiting
system which uses fixed ratelimits per period (allowing some burstiness)
and is per-account. To help mitigate the effects of a denial of service
attack on a specific user, different ratelimit buckets are used
depending on whether or not the user has used the IP the login request
is coming from before.
As an escape hatch, successfully resetting an account's password adds
the current IP to that account's recent IPs allowing it into the safer
ratelimit bucket.
The ratelimit never applies if you are currently logged in as the user,
allowing account deletion to happen regardless of ongoing brute force /
denial of service attacks.
Since we have an HTTPS-capable CDN in front of our S3 static domains
now, it's far faster for clients to use the CDN on HTTPS as well rather
than going straight to (high-latency) S3.
This patch makes it so that we continue to store URLs with explicit HTTP
schemes but instead of conditionally converting to HTTPS, we render
protocol-relative URLs. This should be safe for systems using the
filesystem media provider as we've installed an SSL cert there all
along.
Since the introduction of the media providers and the default
installation of the filesystem media provider, it's no longer necessary
for local / non-AWS installs to use dynamically served stylesheets.
This patch removes that option to reduce complexity in the stylesheet
flows.
reddit uses Google Analytics[0] as a tool to track events on the reddit.com
website, which allows for gathering page load and user event data while
keeping users anonymized. However, with the high volume[1] of traffic
that reddit recieves, the data collection limit[2]-- even with a premium
account-- is often surpassed by a large volume.
Wikpedia states[3] "... sampling is concerned with the selection of a
subset of individuals from within a statistical population to estimate
characteristics of the whole population." We can, using this principle,
send a small portion of user events to Google Analytics collection
endpoints rather than sending the entire data set and achieve a
reasonable approximation of global user behavior without exceeding
reasonable data usage limits as defined by Google Analaytics.
In order to achieve this, the Google Analytics javascript library
provides a method to set a sampling rate[4], a percentage from 1-100.
By calling:
```
_gaq.push(['_setSampleRate', '80']);
```
One can set the sample rate to 80% of users. In reddit's case, I suggest
a default sampling rate of 50%. Here, I have added the `_setSampleRate`
properties to the `_gaq` object created within `utils.html`. It gets its
value from the config, which allows for easy value changes and avoids
using a 'magic value' set multiple places in the code.
[0] - https://www.reddit.com/help/privacypolicy#p_22
[1] - https://www.reddit.com/r/AskReddit/about/traffic
[2] - https://support.google.com/analytics/answer/1070983?hl=en
[3] - http://en.wikipedia.org/wiki/Sampling_(statistics)
[4] -
https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate
For some app pools that are selected based on the incoming request
source, such as whoalane, we may want to apply the ratelimit to ALL
kinds of requests to ensure that resources are being used fairly. This
adds a strict enforcement mode which can be enabled in the config. Oauth
will continue to be enforced per-client ID but all other requests will
get the sitewide ratelimit.
Right now we only give HSTS grants when the user is on g.domain
so we can easily revoke the grant. We also track changes to the
forced HTTPS pref accross sessions and modify the user's session
cookies as needed.