Backported from main: b25738c416
When a packet contains binary elements, the built-in parser does not modify them and simply sends them in their own WebSocket frame.
Example: `socket.emit("some event", Buffer.of(1,2,3))`
is encoded and transferred as:
- 1st frame: 51-["some event",{"_placeholder":true,"num":0}]
- 2nd frame: <buffer 01 02 03>
where:
- `5` is the type of the packet (binary message)
- `1` is the number of binary attachments
- `-` is the separator
- `["some event",{"_placeholder":true,"num":0}]` is the payload (including the placeholder)
On the receiving end, the parser reads the number of attachments and buffers them until they are all received.
Before this change, the built-in parser accepted any number of binary attachments, which could be exploited to make the server run out of memory.
The number of attachments is now limited to 10, which should be sufficient for most use cases.
The limit can be increased with a custom `parser`:
```js
import { Encoder, Decoder } from "socket.io-parser";
const io = new Server({
parser: {
Encoder,
Decoder: class extends Decoder {
constructor() {
super({
maxAttachments: 20
});
}
}
}
});
```
A packet like '2[{"toString":"foo"}]' was decoded as:
{
type: EVENT,
data: [ { "toString": "foo" } ]
}
Which would then throw an error when passed to the EventEmitter class:
> TypeError: Cannot convert object to primitive value
> at Socket.emit (node:events:507:25)
> at .../node_modules/socket.io/lib/socket.js:531:14
Backported from 3b78117bf6
A specially crafted packet could be incorrectly decoded.
Example:
```js
const decoder = new Decoder();
decoder.on("decoded", (packet) => {
console.log(packet.data); // prints [ 'hello', [Function: splice] ]
})
decoder.add('51-["hello",{"_placeholder":true,"num":"splice"}]');
decoder.add(Buffer.from("world"));
```
As usual, please remember not to trust user input.
Backported from b5d0cb7dc5
Separated the encoding and decoding into two public-facing objects,
Encoder and Decoder.
Both objects take nothing on construction. Encoder has a single method,
encode, that mimics the previous version's function encode (takes a
packet object and a callback). Decoder has a single method too, add, that
takes any object (packet string or binary data). Decoder emits a 'decoded'
event when it has received all of the parts of a packet. The only
parameter for the decoded event is the reconstructed packet.
I am hesitant about the Encoder.encode vs Decoder.add thing. Should it be
more consistent, or should it stay like this where the function names are
more descriptive?
Also, rewrote the test helper functions to deal with new event-based
decoding. Wrote a new test in test/arraybuffer.js that tests for memory
leaks in Decoder as well.
This is a squash of a few commits. Below is a small summary of commits.
Results from it: before the build size of socket.io-client was ~250K.
Now it is ~215K.
Tests I was doing here
(https://github.com/kevin-roark/socketio-binaryexample/tree/speed-testing)
take about 1/4 - 1/5 as long with this commit compared to msgpack.
The first was the initial rewrite of the encoding, which removes msgpack
and instead uses a sequence of engine.write's for a binary event. The
first write is the packet metadata with placeholders in the json for
any binary data. Then the following events are the raw binary data that
get filled by the placeholders.
The second commit was bug fixes that made the tests pass.
The third commit was removing unnecssary packages from package.json.
Fourth commit was adding nice comments, and 5th commit was merging
upstream.
The remaining commits involved merging with actual socket.io-parser,
rather than the protocol repository. Oops.