feat: move status scaling research from hackmd

This commit is contained in:
rymnc
2023-03-21 13:14:12 +05:30
parent ad620bb647
commit b2f4d72859
5 changed files with 830 additions and 0 deletions

View File

@@ -0,0 +1,55 @@
# Community DoS protection with Semaphore
## Background
This proposal takes inspiration from [Alvaro's DoS protection scheme, i.e splitting app-level and network-level validation](https://github.com/vacp2p/research/issues/164#issuecomment-1418967178)
waku-rln-relay[^1][^2] in its current state, is not production ready. There are open problems being worked on, including, but not limited to, robustness[^3], circuit security[^4] as well as performance[^5].
After these open problems are addressed, we may look at using RLN for community DoS protection.
RLN is based on Semaphore[^6], with the added functionality of slashing. However, slashing is very rarely required in the event of spamming from a *trusted* set of peers.
> *trusted* implies that the peer has been verified (out of band) before being added to a set of trusted peers.
Semaphore's latest version[^7] has been audited and can be considered safe to use in production.
## Assumptions
- There is no need to slash a community member spamming messages
- We simply route the messages if the member belongs to the Semaphore group, and drop them if not.
## Optional
An optional pre-requisite for this solution is to use the Waku Message UID[^8], and updating the behaviour of dissemination of the `COMMUNITY_DESCRIPTION` message.
- Currently, the `COMMUNITY_DESCRIPTION` message is sent at fixed intervals *and* whenever the metadata of the community has changed
- When a new member is added to the community, their public key is appended to a list of members' public keys, and broadcasted with the `COMMUNITY_DESCRIPTION` message
This change proposes that the `COMMUNITY_DESCRIPTION` that is broadcasted at fixed intervals, merely carries a reference to the change in the community metadata, which was broadcasted when the change happened, i.e a MUID of the message that includes the change.
Sending the MUID allows reduced message sizes, and can leverage waku store lookups for old messages.
## Working
This method requires all members to keep a local copy of a sparse merkle tree containing the commitments (sent out of band to the community owner) of the members of the community. Having the MUID of the latest tree state broadcasted by the community owner is very useful here.
Whenever the member wishes to send a message, they attach a Semaphore proof to it (increasing message size by *TODO*).
Messages without proofs attached/invalid proofs can be dropped, thereby reducing their propagation through the network.
## Conclusion
RLN and Semaphore-based DoS protection can be used in tandem, RLN for channels with slow-mode enabled, and Semaphore for all other channels.
## References
[^1]: https://rfc.vac.dev/spec/17/
[^2]: https://rfc.vac.dev/spec/32/
[^3]: https://github.com/waku-org/nwaku/issues/1501
[^4]: https://github.com/Rate-Limiting-Nullifier/rln-circuits/pull/7
[^5]: https://github.com/waku-org/nwaku/issues/1501
[^6]: https://semaphore.appliedzkp.org/
[^7]: https://github.com/semaphore-protocol/semaphore/releases/tag/v3.0.0
[^8]: https://github.com/waku-org/pm/issues/9

View File

@@ -0,0 +1,48 @@
# Napkin math for Community description message sizes
1. Number of chats/channels: 25
2. Strings: 32 bytes (ens_name, display_name, magnet_uri, etc)
3. All keys: 32 bytes
4. Each member has a profile image
5. Each member has socials
6. Each member is granted access to all chats
Fixed overhead:
organization_id + clock + name + description + magnet_uri + permissions + chats + organization_identity + encryption_key + (ChatIdentity fields) ~ 728 bytes
Variable overhead:
number_of_members * ((1 grant * number_of_chats)+ 1 image + 1 social link)
# Calculations
- 100 members
- variable overhead: 104000
- total size: 104728 bytes = 104.78 kb
- 1000 members
- variable overhead: 1040000
- total size: 104728 bytes = 1.04 mb
- 10000 members
- variable overhead: 10400000
- total size: 104728 bytes = 10.4 mb
After accessing telemetry =>
- 139 members (math)
- variable overhead: 144560
- total size: 145288 bytes = 145.28 kb
- 139 members (actual)
- variable overhead: ?
- total size: 346049 bytes = 346 kb
At approximately 401 members, we will cross the configured message size (1mb). Which means, the following is napkin math for 1000 and 10000 members respectively
- 1000 members
- total size: 2.489 mb
- 10000 members
- total size: 24.89 mb

View File

@@ -0,0 +1,81 @@
# Optimizing the `CommunityDescription` dissemination
## Context
This document describes a solution for using Sparse Merkle Trees (SMT) on IPFS to distribute members' public keys in an organization. The solution allows organizations/communities to efficiently manage and verify the membership of their members in a trustless manner.
This is done to prevent network overhead when broadcasting the `CommunityDescription` message, which will increase with the number of members in an organization/community
## Method
The proposed solution is to use Sparse Merkle Trees (SMT) on IPFS to distribute members' public keys.
The SMT is constructed with a set of leaf nodes, where each leaf node represents a public key of a member. The SMT can be updated by adding or removing leaf nodes as members are added or removed from the organization. The SMT is then recalculated to generate a new root hash, which is used to identify the SMT on the IPFS network.
The SMT can be stored on IPFS by adding the root hash to the IPFS network. The root hash can then be shared with the members of the organization, so they can retrieve the SMT from IPFS.
When a member wants to verify the membership of another member, they can use the SMT's proof mechanism to verify the presence of the member's public key in the SMT.
Therefore, the `CommunityDescription` protobuf changes from -
```protobuf=
message CommunityDescription {
uint64 clock = 1;
repeated bytes members = 2;
OrganisationPermissions permissions = 3;
ChatMessageIdentity identity = 5;
repeated OrganisationChat chats = 6;
// ... other fields
}
```
to -
```diff=
message CommunityDescription {
uint64 clock = 1;
- repeated bytes members = 2;
+ bytes members = 2; // Note: we should be able to change repeated bytes to bytes as the wire type of both is the same (Type 2) (I may be wrong here)
OrganisationPermissions permissions = 3;
ChatMessageIdentity identity = 5;
repeated OrganisationChat chats = 6;
}
```
> Note: I have yet to explore viability of this solution for the other `repeated` field which may hold large amounts of data (chats)
## Napkin Math
- 100 members
- 100 leaf nodes
- 7 levels.
- Max 128 nodes
- Storage required: 100 * 32 = 3,200 bytes
- 1000 members
- 1000 leaf nodes
- 10 levels
- Max 1024 nodes
- Storage required: 1000 * 32 = 32,000 bytes = 32 kb
- 10,000 members
- 10000 leaf nodes
- 14 levels
- Max 16384 nodes
- Storage required: 10000 * 32 = 320,000 bytes = 320 kb
- 100,000 members
- 100,000 leaf nodes
- 17 levels
- Max 131072 nodes
- Storage required: 100000 * 32 = 3,200,000 bytes = 3.2 mb
The storage required is relatively less, and membership can be verified easily by the nodes.
The size of the `CommunityDescription` remains constant with the number of members in the community.
> I have not verified the integrity of this math, please help!
## Security considerations
- Anyone can update the tree, but the owner distributes the `CommunityDescription`, hence, the owner becomes a single point of failure. If the owner node is compromised, an arbitrary CID can be distributed, leading community members to believe that there are a different set of members
- This can be solved by few members keeping the member tree in memory, and by computing the root hash themselves. If the local CID matches the CID distributed by the owner, then the members can verify that the computation was done correctly.
## Future Work
- The storage can be done on a variety of platforms which have support for content addressable storage (Codex?)

View File

@@ -0,0 +1,94 @@
# Waku pubsub topic sharding
## Context
The following document provides an overview of the Waku pubsub topic sharding method, which is based on the [35/WAKU2-NOISE](https://rfc.vac.dev/spec/35/) and [23/WAKU2-TOPICS](https://rfc.vac.dev/spec/23/#23waku2-topics) RFCs.
## Method
The Waku pubsub topic sharding method is based on the use of a shared secret key derived from a Diffie-Hellman key exchange, and a deterministic hash function.
The method is described as follows:
1. The two parties, Alice and Bob, establish a shared secret key using a Diffie-Hellman key exchange.
2. The shared secret key is used as an input to a deterministic hash function, such as SHA256, to generate a new topic for the next message.
$$
pubsub\_topic = sha256(shared\_sk \ mod \ privacy\_parameter)
$$
3. For each subsequent message, the shared secret key (which has been recomputed) is used as the input to the hash function again to generate a new topic.
4. (optional) To ensure the topic is unique, a nonce is concatenated to the shared secret key before hashing.
$$
pubsub\_topic = sha256((shared\_sk \ mod \ privacy\_parameter) ⌢ nonce)
$$
5. (optional) To ensure the topic is different for each message, a counter is concatenated to the shared secret key before hashing.
$$
pubsub\_topic = sha256((shared\_sk ⌢ ctr) \ mod \ privacy\_parameter)
$$
6. (optional) To reduce computational overhead of calculating a new pubsub topic for each message, a time window can be agreed upon between Alice and Bob.
For example, a 24hr window can be negotiated if it is within the constraints that both Alice and Bob agree to.
Therefore, the pubsub topic is calculated in the following manner
$$
pubsub\_topic = sha256((shared\_sk⌢ timestamp) \ mod \ privacy\_parameter)
$$
Note that this approach would require both Alice and Bob to keep track of the shared secret key used at the start of the negotiation to derive the next pubsub topic.
This cache can be erased when the negotiation process restarts at the end of the window.
The `privacy_parameter` can be set by the peers, and must be agreed on beforehand between Alice and Bob.
The lower the `privacy_parameter` is, the set of values for the pubsub topic would be fewer and it offers k-anonymity based privacy benefits.
The higher the `privacy_parameter` is, the set of values for the pubsub topic 'B' would be larger, therefore, performance is increased at the cost of privacy.
It's important to note that the shared secret key changes with each message sent, as per Noise processing rules.
This allows updates to the shared secret key by hashing the result of an ephemeral-ephemeral Diffie-Hellman exchange every 1-RTT communication. Therefore, even if privacy guarantees are lost with one message, they can be regained with the next message, if Alice signals to Bob (or vice-versa), to change the privacy_parameter
One can argue that this approach may lead to failure in forming a mesh, but with the assumption that:
- the number of community-run nodes are high, and they participate in meshing
- peers are incentivised to relay messages on a pubsub topic (incentivisation is yet to be solved)
If any of these assumptions are invalid, the peers can resort to using a lower `privacy_parameter` which would result in a commonly used pubsub topic, and piggyback on other peers that have a vested interest in that pubsub topic.
## Security Considerations
The Waku pubsub topic sharding method has several potential security considerations that must be taken into account, including:
1. The security of the shared secret key depends on the security of the Diffie-Hellman key exchange, which can be vulnerable to various attacks if not implemented correctly.
2. The privacy benefits of k-anonymity depend on the value of the privacy parameter. A low value of the privacy parameter may provide stronger privacy guarantees, but at the cost of lower performance. A high value of the privacy parameter may provide better performance, but at the cost of weaker privacy guarantees.
3. An attacker who is able to obtain the shared secret key or the privacy parameter may be able to determine the topic of a message and potentially intercept it.
4. The technique of updating the shared secret key with each message can be useful in protecting against eavesdropping, but it also requires additional computation and communication overhead.
5. The technique of updating the shared secret key with each message also requires a secure signaling mechanism to signal the privacy parameter between Alice and Bob.
Overall, it is important to carefully consider the trade-offs between performance and privacy when implementing the Waku pubsub topic sharding method, and to ensure that the key exchange and signaling mechanisms are implemented securely.
## Future research
- Peer incentivisation
- Check if this model can apply to [5/SECURE-TRANSPORT](https://specs.status.im/spec/5)'s cryptography
## Appendix A - Multicast group chats
In the context of Status Communities, where single ratchet encryption is used, the nonce used could be the channel name, thereby sharding communities at the channel level.
$$
pubsub\_topic = sha256((ratchet\_key ⌢ channel\_name) \ mod \ privacy\_parameter)
$$
One benefit of using this method, is that all community participants are aware of this, and can derive a set of pubsub_topics every time there is a key change (when a member is added or removed from a community).
## Appendix B - Message reconciliation and ordering
A method for reconciling and ordering messages across different pubsub topics while preserving privacy is as follows:
The sender can include a unique, monotonically increasing sequence number for each message.
This allows the receiver to order the messages based on the sequence number, regardless of which pubsub topic the messages were sent on.
The fault-tolerant characteristic of the underlying store protocol is out of scope for this spec.

View File

@@ -0,0 +1,552 @@
# Status Telemetry Analysis
> All the following data comes from Status desktop clients running version `0.10.0rc`
> There are ~30 members of the status community running this version
## Methodology
The [Status superset](https://superset.infra.status.im/superset/dashboard/13/?native_filters_key=IaOrPqizciMolxZRbRhk0vAFWWeVt8jxeuTzyiawNKIXpuJRqn5E9KvSnvOoUgGa) runs upon the pg database that has all the telemetry data.
These queries were run directly on the database, with the following index added -
```sql=
CREATE INDEX idx_receivedmessages_messagetype_messagesize_chatid
ON receivedmessages (messagetype, messagesize, chatid);
```
which helps query times re: sizes of each `messagetype` being broadcasted
## Main query
`messagetypes` ordered by size, descending -
```sql
SELECT
messagetype,
AVG(messagesize)
FROM
receivedmessages
WHERE
messagesize > 0
GROUP BY
messagetype
ORDER BY
AVG(messagesize) DESC
LIMIT
30;
```
which yields
```
messagetype | avg
------------------------------------+-----------------------
MEMBERSHIP_UPDATE_MESSAGE | 67950.887645733805
COMMUNITY_DESCRIPTION | 48434.633391221031
UNKNOWN | 32869.262119978472
BACKUP | 29259.738297689599
COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 24488.108695652174
SYNC_PROFILE_PICTURE | 19550.000000000000
CONTACT_CODE_ADVERTISEMENT | 7421.5858845617061659
CHAT_IDENTITY | 6109.9454022988505747
CHAT_MESSAGE | 4670.8508814547473626
SYNC_INSTALLATION_COMMUNITY | 3597.6666666666666667
CONTACT_UPDATE | 1981.9735152487961477
EDIT_MESSAGE | 1616.5792079207920792
ACCEPT_CONTACT_REQUEST | 1337.4227272727272727
SYNC_CHAT_MESSAGES_READ | 1114.2974738675958188
SYNC_INSTALLATION_CONTACT | 1094.3000000000000000
RETRACT_CONTACT_REQUEST | 945.1025641025641026
SYNC_ACTIVITY_CENTER_READ | 842.8750000000000000
REQUEST_CONTACT_VERIFICATION | 773.0000000000000000
SYNC_CHAT_REMOVED | 754.0000000000000000
SYNC_CONTACT_REQUEST_DECISION | 746.5000000000000000
DELETE_MESSAGE | 735.8549450549450549
COMMUNITY_ARCHIVE_MAGNETLINK | 636.1341371514694800
EMOJI_REACTION | 606.1040987716219604
PIN_MESSAGE | 483.2560000000000000
PAIR_INSTALLATION | 459.7571428571428571
STATUS_UPDATE | 338.4547629848038104
PUSH_NOTIFICATION_QUERY | 137.0000000000000000
COMMUNITY_REQUEST_TO_JOIN | 125.1176470588235294
COMMUNITY_CANCEL_REQUEST_TO_JOIN | 122.0000000000000000
COMMUNITY_REQUEST_TO_LEAVE | 112.0000000000000000
```
Ignoring `UNKNOWN`, we will run queries for the top 5, i.e
- MEMBERSHIP_UPDATE_MESSAGE
- COMMUNITY_DESCRIPTION
- BACKUP
- COMMUNITY_REQUEST_TO_JOIN_RESPONSE
- SYNC_PROFILE_PICTURE
## Queries
### 1. `MEMBERSHIP_UPDATE_MESSAGE`
1. Window of `MEMBERSHIP_UPDATE_MESSAGE` sizes sent by different groups(?) -
```sql
SELECT
left(chatid, 20) as trunc_chat_id,
messagetype,
messagesize,
sentat
FROM
receivedmessages
WHERE
messagetype = 'MEMBERSHIP_UPDATE_MESSAGE'
AND messagesize > 0
ORDER BY sentat DESC -- to get the latest sizes
LIMIT
10;
```
which yields
```
trunc_chat_id | messagetype | messagesize | sentat
----------------------+---------------------------+-------------+------------
contact-discovery-13 | MEMBERSHIP_UPDATE_MESSAGE | 15769 | 1677850449
contact-discovery-13 | MEMBERSHIP_UPDATE_MESSAGE | 15502 | 1677850436
0x045e098c95c9639719 | MEMBERSHIP_UPDATE_MESSAGE | 45282 | 1677850426
contact-discovery-85 | MEMBERSHIP_UPDATE_MESSAGE | 45320 | 1677850397
0x0413cd82a2df9c6c4b | MEMBERSHIP_UPDATE_MESSAGE | 22780 | 1677850357
contact-discovery-13 | MEMBERSHIP_UPDATE_MESSAGE | 67856 | 1677850357
contact-discovery-38 | MEMBERSHIP_UPDATE_MESSAGE | 67856 | 1677850357
0x0413cd82a2df9c6c4b | MEMBERSHIP_UPDATE_MESSAGE | 22780 | 1677850356
contact-discovery-85 | MEMBERSHIP_UPDATE_MESSAGE | 45320 | 1677850356
contact-discovery-48 | MEMBERSHIP_UPDATE_MESSAGE | 67856 | 1677850356
```
2. Average `MEMBERSHIP_UPDATE_MESSAGE` size -
```sql
SELECT
AVG(messagesize)
FROM
receivedmessages
WHERE
messagetype = 'MEMBERSHIP_UPDATE_MESSAGE'
AND messagesize > 0;
```
which yields
```
avg
--------------------
67937.954692116552
```
~ 67kb
3. Irrelevant to check broadcast frequency, it is ad-hoc afaik
4. Number of `MEMBERSHIP_UPDATE_MESSAGE` in 1 day -
```sql
SELECT
COUNT(*)
FROM
receivedmessages
WHERE
messagetype = 'MEMBERSHIP_UPDATE_MESSAGE'
AND messagesize > 0
AND sentat BETWEEN 1677850670
AND 1677764270;
```
which yields
```
count
-------
2577
```
which itself is 2577 * 67kb = 172mb. Can someone from the Status team explain this message type's usage?
### 2. `COMMUNITY_DESCRIPTION`
1. Window of `COMMUNITY_DESCRIPTION` sizes sent by different communities -
```sql=
SELECT
left(chatid, 20) as trunc_chat_id,
messagetype,
messagesize,
sentat
FROM
receivedmessages
WHERE
messagetype = 'COMMUNITY_DESCRIPTION'
AND messagesize >= 70000
ORDER BY sentat DESC -- to get the latest sizes
LIMIT
10;
```
which yields
```
trunc_chat_id | messagetype | messagesize | sentat
----------------------+-----------------------+-------------+------------
0x0269b18891d3b42ebd | COMMUNITY_DESCRIPTION | 87531 | 1676369752
0x0269b18891d3b42ebd | COMMUNITY_DESCRIPTION | 87531 | 1676442625
0x03c6552a70bc9d9407 | COMMUNITY_DESCRIPTION | 247176 | 1676317357
0x03c6552a70bc9d9407 | COMMUNITY_DESCRIPTION | 247176 | 1676313458
0x03c6552a70bc9d9407 | COMMUNITY_DESCRIPTION | 247176 | 1676325158
0x03c6552a70bc9d9407 | COMMUNITY_DESCRIPTION | 247176 | 1676321259
0x03c6552a70bc9d9407 | COMMUNITY_DESCRIPTION | 247176 | 1676309857
0x03dcc6838078722b8c | COMMUNITY_DESCRIPTION | 318522 | 1676495674
0x0269b18891d3b42ebd | COMMUNITY_DESCRIPTION | 87531 | 1676446526
0x03c6552a70bc9d9407 | COMMUNITY_DESCRIPTION | 247176 | 1676496141
```
2. Average `COMMUNITY_DESCRIPTION` size sent by the Status Community
```sql=
SELECT
AVG(messagesize)
FROM
receivedmessages
WHERE
messagetype = 'COMMUNITY_DESCRIPTION'
AND messagesize >= 70000 AND chatid = '0x03073514d4c14a7d10ae9fc9b0f05abc904d84166a6ac80add58bf6a3542a4e50a';
```
> note: the chatid can be derived using `{"method":"wakuext_joinedCommunities"}` in the node management tab. Thanks @rramos.eth!
which yields
```
avg
---------------------
346049.563314711359
```
~346kb, which is off the [estimation](https://hackmd.io/Dru3ULQSS2-II2WwkWcosg?both) by 200kb. This leads to a worse scenario in which ~401 members will lead to 1mb message size.
3. Median time difference between each broadcast of `COMMUNITY_DESCRIPTION` -
```sql
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP(
ORDER BY
diff
) as median
FROM
(
SELECT
sentat,
sentat - lag(sentat) over (
order by
sentat
) as diff
FROM
receivedmessages
WHERE
chatid = '0x03073514d4c14a7d10ae9fc9b0f05abc904d84166a6ac80add58bf6a3542a4e50a'
AND messagetype = 'COMMUNITY_DESCRIPTION'
ORDER BY
sentat DESC
LIMIT
1000
) q
WHERE
diff > 0;
```
which yields
```
median
--------
3632
```
The message is broadcasted ~ every hour
### 3. `BACKUP`
1. Window of `BACKUP` sizes sent by different peers
```sql
SELECT
left(chatid, 20) as trunc_chat_id,
messagetype,
messagesize,
sentat
FROM
receivedmessages
WHERE
messagetype = 'BACKUP'
AND messagesize > 0 ORDER BY sentat desc
LIMIT
10;
```
which yields
```
trunc_chat_id | messagetype | messagesize | sentat
----------------------+-------------+-------------+------------
contact-discovery-04 | BACKUP | 2264 | 1677848347
contact-discovery-04 | BACKUP | 608 | 1677848347
contact-discovery-04 | BACKUP | 119 | 1677848347
contact-discovery-04 | BACKUP | 738 | 1677848347
contact-discovery-04 | BACKUP | 109 | 1677848347
contact-discovery-04 | BACKUP | 109 | 1677848347
contact-discovery-04 | BACKUP | 109 | 1677848347
contact-discovery-04 | BACKUP | 112 | 1677848347
contact-discovery-04 | BACKUP | 612 | 1677848347
contact-discovery-04 | BACKUP | 9391 | 1677848347
```
2. Average size of the `BACKUP` message -
```sql=
SELECT
messagetype,
AVG(messagesize)
FROM
(
SELECT
messagetype,
messagesize
FROM
receivedmessages
WHERE
messagetype = 'BACKUP'
AND messagesize > 0 --> desktop clients using non-0.10.0rc set this to 0
LIMIT
1000
) AS subq
GROUP BY
messagetype;
```
which yields
```
messagetype | avg
-------------+--------------------
BACKUP | 33749.652000000000
```
~ 33kb. With this result, the backup protocol seems fine for now without ipfs pinning/another backup mechanism. However, as the number of communities each person belongs to, this will not scale, and will require changes.
3. Average time difference between each broadcast of `BACKUP`
> note: this query makes use of a random chatid being used to backup. Should probably be refactored
> note: for some reason, some backup messages are being broadcasted with very low intervals to the same receiverkeyuid. Assumed due to different types of backup messages being sent.
```sql
SELECT
AVG(diff)
FROM
(
SELECT
sentat,
sentat - lag(sentat) over (
order by
sentat
) as diff
FROM
receivedmessages
WHERE
chatid = 'contact-discovery-04f2e7b3394dcbf0d03bba954dbedc0bd561951a8c31a00320c6b99a40145e655da6e689d19ced6a314c6dde31ce96feb0166f19472c2a1edbeeb939e3282bce14'
AND messagetype = 'BACKUP' AND receiverkeyuid='0x45ab6ed9f2461720737f0f3095a40d3b0c2475fe5ea4c57bc8b9fe293afcffbc'
ORDER BY
sentat DESC
LIMIT
1000000
) q
WHERE
diff > 0;
```
which yields
```
avg
-----------------------
9205.8181818181818182
```
~ 2.5 hours
### 4. `COMMUNITY_REQUEST_TO_JOIN_RESPONSE`
1. Window of `COMMUNITY_REQUEST_TO_JOIN_RESPONSE` sizes -
```sql
SELECT
left(chatid, 20) as trunc_chat_id,
messagetype,
messagesize,
sentat
FROM
receivedmessages
WHERE
messagetype = 'COMMUNITY_REQUEST_TO_JOIN_RESPONSE'
AND messagesize > 0 ORDER BY sentat desc
LIMIT
10;
```
which yields
```
trunc_chat_id | messagetype | messagesize | sentat
----------------------+------------------------------------+-------------+------------
contact-discovery-35 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 828 | 1677185497
contact-discovery-28 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 36636 | 1677145598
contact-discovery-35 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 1281 | 1676917897
contact-discovery-38 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 677 | 1676892082
contact-discovery-38 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 958 | 1676887818
contact-discovery-38 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 677 | 1676887804
contact-discovery-28 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 36636 | 1676877610
contact-discovery-45 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 1110 | 1676877544
contact-discovery-28 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 36496 | 1676655568
contact-discovery-45 | COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 1110 | 1676655552
```
2. Average size of the `COMMUNITY_REQUEST_TO_JOIN_RESPONSE` message -
```sql
SELECT
messagetype,
AVG(messagesize)
FROM
(
SELECT
messagetype,
messagesize
FROM
receivedmessages
WHERE
messagetype = 'COMMUNITY_REQUEST_TO_JOIN_RESPONSE'
AND messagesize > 0 --> desktop clients using non-0.10.0rc set this to 0
LIMIT
1000
) AS subq
GROUP BY
messagetype;
```
which yields
```
messagetype | avg
------------------------------------+--------------------
COMMUNITY_REQUEST_TO_JOIN_RESPONSE | 24488.108695652174
```
~24kb
3. Irrelevant to check broadcast frequency, it is ad-hoc afaik
4. Number of `COMMUNITY_REQUEST_TO_JOIN_RESPONSE` in 1 day -
```sql
SELECT
COUNT(*)
FROM
receivedmessages
WHERE
messagetype = 'COMMUNITY_REQUEST_TO_JOIN_RESPONSE'
AND messagesize > 0
AND sentat BETWEEN 1677099097
AND 1677185497;
```
which yields
```
count
-------
2
```
### 5. `SYNC_PROFILE_PICTURE`
1. Window of `SYNC_PROFILE_PICTURE` sizes -
```sql
SELECT
left(chatid, 20) as trunc_chat_id,
messagetype,
messagesize,
sentat
FROM
receivedmessages
WHERE
messagetype = 'SYNC_PROFILE_PICTURE'
AND messagesize > 0 ORDER BY sentat desc
LIMIT
10;
```
which yields
```
trunc_chat_id | messagetype | messagesize | sentat
----------------------+----------------------+-------------+------------
0x0461f576da67dc0bca | SYNC_PROFILE_PICTURE | 19550 | 1677162569
0x0461f576da67dc0bca | SYNC_PROFILE_PICTURE | 19550 | 1677162327
0x0461f576da67dc0bca | SYNC_PROFILE_PICTURE | 19550 | 1677162099
0x0461f576da67dc0bca | SYNC_PROFILE_PICTURE | 19550 | 1677162038
0x0461f576da67dc0bca | SYNC_PROFILE_PICTURE | 19550 | 1677162005
```
2. Average size of the `SYNC_PROFILE_PICTURE` message -
```sql
SELECT
messagetype,
AVG(messagesize)
FROM
(
SELECT
messagetype,
messagesize
FROM
receivedmessages
WHERE
messagetype = 'SYNC_PROFILE_PICTURE'
AND messagesize > 0 --> desktop clients using non-0.10.0rc set this to 0
LIMIT
1000
) AS subq
GROUP BY
messagetype;
```
which yields
```
messagetype | avg
----------------------+--------------------
SYNC_PROFILE_PICTURE | 19550.000000000000
```
3. Irrelevant to check broadcast frequency, it is ad-hoc afaik
4. Number of `SYNC_PROFILE_PICTURE` in 1 day -
```sql
SELECT
COUNT(*)
FROM
receivedmessages
WHERE
messagetype = 'SYNC_PROFILE_PICTURE'
AND messagesize > 0
AND sentat BETWEEN 1677076169
AND 1677162569;
```
which yields
```
count
-------
5
```
## Conclusion
It is assumed that `UNKNOWN` messages originate from 1:1 chats.
The major bandwidth usage comes from `MEMBERSHIP_UPDATE_MESSAGE` and `COMMUNITY_DESCRIPTION`
It is recommended to optimize the payloads to ensure that scaling problems are solved.