Backported from main: b25738c416
When a packet contains binary elements, the built-in parser does not modify them and simply sends them in their own WebSocket frame.
Example: `socket.emit("some event", Buffer.of(1,2,3))`
is encoded and transferred as:
- 1st frame: 51-["some event",{"_placeholder":true,"num":0}]
- 2nd frame: <buffer 01 02 03>
where:
- `5` is the type of the packet (binary message)
- `1` is the number of binary attachments
- `-` is the separator
- `["some event",{"_placeholder":true,"num":0}]` is the payload (including the placeholder)
On the receiving end, the parser reads the number of attachments and buffers them until they are all received.
Before this change, the built-in parser accepted any number of binary attachments, which could be exploited to make the server run out of memory.
The number of attachments is now limited to 10, which should be sufficient for most use cases.
The limit can be increased with a custom `parser`:
```js
import { Encoder, Decoder } from "socket.io-parser";
const io = new Server({
parser: {
Encoder,
Decoder: class extends Decoder {
constructor() {
super({
maxAttachments: 20
});
}
}
}
});
```
A packet like '2[{"toString":"foo"}]' was decoded as:
{
type: EVENT,
data: [ { "toString": "foo" } ]
}
Which would then throw an error when passed to the EventEmitter class:
> TypeError: Cannot convert object to primitive value
> at Socket.emit (node:events:507:25)
> at .../node_modules/socket.io/lib/socket.js:531:14
Backported from 3b78117bf6
A specially crafted packet could be incorrectly decoded.
Example:
```js
const decoder = new Decoder();
decoder.on("decoded", (packet) => {
console.log(packet.data); // prints [ 'hello', [Function: splice] ]
})
decoder.add('51-["hello",{"_placeholder":true,"num":"splice"}]');
decoder.add(Buffer.from("world"));
```
As usual, please remember not to trust user input.
Backported from b5d0cb7dc5
When maxHttpBufferSize is large (1e8 bytes), a payload of length 100MB
can be sent like so:
99999991:422222222222222222222222222222222222222222222...
This massive packet can cause OOM via building up many many
`ConsOneByteString` objects due to concatenation:
99999989 `ConsOneByteString`s and then converting the massive integer to
a `Number`.
The performance can be improved to avoid this by using `substring`
rather than building the string via concatenation.
Below I tried one payload of length 7e7 as the 1e8 payload took so
long to process that it timed out before running out of memory.
```
==== JS stack trace =========================================
0: ExitFrame [pc: 0x13c5b79]
Security context: 0x152fe7b808d1 <JSObject>
1: decodeString [0x2dd385fb5d1] [/node_modules/socket.io-parser/index.js:~276] [pc=0xf59746881be](this=0x175d34c42b69 <JSGlobal Object>,0x14eccff10fe1 <Very long string[69999990]>)
2: add [0x31fc2693da29] [/node_modules/socket.io-parser/index.js:242] [bytecode=0xa7ed6554889 offset=11](this=0x0a2881be5069 <Decoder map = 0x3ceaa8bf48c9>,0x14eccff10fe1 <Very...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xa09830 node::Abort() [node]
2: 0xa09c55 node::OnFatalError(char const*, char const*) [node]
3: 0xb7d71e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xb7da99 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xd2a1f5 [node]
6: 0xd2a886 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
7: 0xd37105 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
8: 0xd37fb5 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
9: 0xd3965f v8::internal::Heap::HandleGCRequest() [node]
10: 0xce8395 v8::internal::StackGuard::HandleInterrupts() [node]
11: 0x1042cb6 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x13c5b79 [node]
```
Backported from master: dcb942d24d
Switch from depending on a tarball URL to the published
component-emitter package at its latest version.
Change all references to emitter module to the new
component-emitter name.
Remove blobs has to iterate over a javascript object and asynchronously
remove the blobs / files. It does this by iterating over arrays and
objects in the larger object recursively.
Problem was in checking for object to iterate over, wasn't checking if
that object was binary data itself. So it was working, but really slowly,
by iterating over every byte in a Buffer and checking it for blobs.
Much faster now :)
Separated the encoding and decoding into two public-facing objects,
Encoder and Decoder.
Both objects take nothing on construction. Encoder has a single method,
encode, that mimics the previous version's function encode (takes a
packet object and a callback). Decoder has a single method too, add, that
takes any object (packet string or binary data). Decoder emits a 'decoded'
event when it has received all of the parts of a packet. The only
parameter for the decoded event is the reconstructed packet.
I am hesitant about the Encoder.encode vs Decoder.add thing. Should it be
more consistent, or should it stay like this where the function names are
more descriptive?
Also, rewrote the test helper functions to deal with new event-based
decoding. Wrote a new test in test/arraybuffer.js that tests for memory
leaks in Decoder as well.
This is a squash of a few commits. Below is a small summary of commits.
Results from it: before the build size of socket.io-client was ~250K.
Now it is ~215K.
Tests I was doing here
(https://github.com/kevin-roark/socketio-binaryexample/tree/speed-testing)
take about 1/4 - 1/5 as long with this commit compared to msgpack.
The first was the initial rewrite of the encoding, which removes msgpack
and instead uses a sequence of engine.write's for a binary event. The
first write is the packet metadata with placeholders in the json for
any binary data. Then the following events are the raw binary data that
get filled by the placeholders.
The second commit was bug fixes that made the tests pass.
The third commit was removing unnecssary packages from package.json.
Fourth commit was adding nice comments, and 5th commit was merging
upstream.
The remaining commits involved merging with actual socket.io-parser,
rather than the protocol repository. Oops.