queries: Move /r/all/comments to new query cache (try 3).

The first time we tried to move /r/all/comments to the new query cache, the row quickly grew to be massive because tombstones were piling up in the row cache. The row started taking seconds to retrieve after only 12 hours. We reverted. We took gc_grace_seconds down to 30 minutes which is relatively safe in the query cache (prunes can be re-executed without issue and lost deletes of non-pruned things will be covered by keep_fns). Additionally, we switched to the leveled compaction strategy for the relevant column families. Then we tried this again. This time, things ran fine for three days before we started seeing out-of-memory issues on the nodes responsible for this key. The row size was rather large again. We reverted. Now, we're trying again with three more changes, working on the hypothesis that runaway growth of a hot query can happen because prunes start failing after a small bad spike. * tdb_cassandra.max_column_count has been drastically reduced in favor of xget for the models that actually need to fetch hugely wide rows. This saves memory pressure via materialized thrift buffers in general and when the row grows large for whatever reason. * The pruning behaviour has been tweaked to only prune a portion of the extraneous columns if there are a large number. This should reduce the likelihood that prunes will fail after a row has grown too much. * This query is now in its own column family that is designed to have its rowcache disabled. Why bother shoehorning this query into this data model, you say? It's a canary for extreme scaling of other queries. If we can't fix this problem for this query, we should re-evaluate the whole data model.
2026-04-27 03:00:12 -04:00 · 2012-10-10 16:53:59 -07:00
parent 2d2a8d887b
commit 13497f6f08
2 changed files with 24 additions and 8 deletions
--- a/r2/r2/lib/db/queries.py
+++ b/r2/r2/lib/db/queries.py
@@ -33,7 +33,8 @@ from r2.models.promo import PROMOTE_STATUS, get_promote_srid
 from r2.models.query_cache import (cached_query, merged_cached_query,
                                   CachedQuery, CachedQueryMutator,
                                   MergedCachedQuery)
-from r2.models.query_cache import UserQueryCache, SubredditQueryCache
+from r2.models.query_cache import (UserQueryCache, SubredditQueryCache,
+                                   HotQueryCache)
 from r2.models.query_cache import ThingTupleComparator
 from r2.models.last_modified import LastModified
 from r2.lib.utils import SimpleSillyStub
@@ -469,10 +470,10 @@ def user_query(kind, user_id, sort, time):
        q._filter(db_times[time])
    return make_results(q)

+@cached_query(HotQueryCache)
 def get_all_comments():
    """the master /comments page"""
-    q = Comment._query(sort = desc('_date'))
-    return make_results(q)
+    return Comment._query(sort=desc('_date'))

 def get_sr_comments(sr):
    return _get_sr_comments(sr._id)
@@ -839,7 +840,7 @@ def new_comment(comment, inbox_rels):
        if comment._deleted:
            job_key = "delete_items"
            job.append(get_sr_comments(sr))
-            job.append(get_all_comments())
+            m.delete(get_all_comments(), [comment])
        else:
            job_key = "insert_items"
            if comment._spam:
@@ -1163,6 +1164,7 @@ def _common_del_ban(things):


 def unban(things, insert=True):
+    query_cache_inserts = []
    query_cache_deletes = []

    by_srid, srs = _by_srid(things)
@@ -1200,8 +1202,8 @@ def unban(things, insert=True):
            query_cache_deletes.append([get_spam_links(sr), links])

        if insert and comments:
-            add_queries([get_all_comments(), get_sr_comments(sr)],
-                        insert_items=comments)
+            query_cache_inserts.append((get_all_comments(), comments))
+            add_queries([get_sr_comments(sr)], insert_items=comments)
            query_cache_deletes.append([get_spam_comments(sr), comments])

        if links:
@@ -1212,6 +1214,9 @@ def unban(things, insert=True):
            query_cache_deletes.append([get_spam_filtered_comments(sr), comments])

    with CachedQueryMutator() as m:
+        for q, inserts in query_cache_inserts:
+            m.insert(q, inserts)
+
        for q, deletes in query_cache_deletes:
            m.delete(q, deletes)

@@ -1327,8 +1332,8 @@ def run_new_comments(limit=1000):
        fnames = [msg.body for msg in msgs]

        comments = Comment._by_fullname(fnames, data=True, return_dict=False)
-        add_queries([get_all_comments()],
-                    insert_items=comments)
+        with CachedQueryMutator() as m:
+            m.insert(get_all_comments(), comments)

        bysrid = _by_srid(comments, False)
        for srid, sr_comments in bysrid.iteritems():
--- a/r2/r2/models/query_cache.py
+++ b/r2/r2/models/query_cache.py
@@ -554,3 +554,14 @@ class UserQueryCache(_BaseQueryCache):
 class SubredditQueryCache(_BaseQueryCache):
    """A query cache column family for subreddit-keyed queries."""
    _use_db = True
+
+class HotQueryCache(_BaseQueryCache):
+    """A query cache for very hot single-key queries.
+
+    Some queries such as all_comments appear to cause rowcache related issues
+    due to the high volume of writes happening to the single row.  This column
+    family is intended to house such queries. The row cache should be disabled
+    here to prevent these issues.
+
+    """
+    _use_db = True