Skip to content

Latest commit

 

History

History
254 lines (148 loc) · 17.1 KB

File metadata and controls

254 lines (148 loc) · 17.1 KB

WebAssembly logo

Agenda for the June 17 video call of WebAssembly's Community Group

  • Where: Virtual meeting
  • When: June 17, 16:00-17:00 UTC (9am-10am PDT, 18:00-19:00 CEST)
  • Location: link on W3C calendar or Google Calendar invitation

Registration

No registration is required for VC meetings. The meeting is open to CG members only.

Agenda items

  1. Opening
  2. Proposals and discussions
    1. Discussion: JS interop design (Thomas Lively, 45 minutes)
  3. Closure

Agenda items for future meetings

Further discussion of "problem 2" from the presentation

Meeting Notes

Attendees

Derek Schuff Thomas Lively Sébastien Doeraene Chris Woods Yury Delendik Robin Freyler Guy Bedford Andreas Rossberg Alex Crichton Jeff Charles Francis McCabe Emanuel Ziegler Deepti Gandluri Ben Visness Ryan Hunt Heejin Ahn Sam Clegg Julien Pages Yuri Iozzelli Andrew Brown Luke Wagner Nick Fitzgerald Mattias Liedtke Paul Peny David Degazio Johnnie Birch Jakob Kummerow Ricky Vetter Daniel Lehmann Bailey Hayes

Proposals and discussions

Discussion: JS interop design

TL presenting slides

AR: Can you give some background on the scenario that this experiment is supposed to model? Do you really often have tens of thousands of methods to call from JS? I would think it would more often be a selected set of methods to export.

TL: Agreed, but it’s not up to us, the users want what they want. One example is a large library with a very expansive interface and you don’t know which bits users will actually need to use, so you have to export the whole public API. A user might only use a small piece and you’d hope to tree-shake it. But if that doesn’t work, or in debug mode etc then you could have a lot. The other scenario is that a particular language/toolchain might just decide to export everything by default. But e..g talking to Zalim a few weeks ago, the Kotlin/JS toolchain might just export everything by default, and so the Kotlin/Wasm toolchain would do the same. That would be unfortunate and we’d discourage that, but in the end we can’t control what users want.

AR: I’d still question the wisdom of that, but ok 🙂

TL: Presenting Option 1A

AR: I guess in this picture the dotted line should actually go to the proto?

TL: right.

More questions about this design?

AR: I like it

TL: presenting Option 1B

AR: This solution already seems quite overfitted for JS. I can’t imagine any other use case for this, and if there were, how do we know that they want exactly one thing, vs, say 3 extra operands? So that worries me a bit about this one.

TL: not too concerned about multiple things, but even for JS, one of the things about WA.FeatureOptions, is that it could be extensible, e.g. to configure own properties, so you could pack more configuration into it. So i think attaching a single externref is pretty general. As far as being language specific… maybe? We are obviously focused on JS interop in V8, but in principle other languages might want to configure things, and they’d need some kind of hook to configure it. I haven’t talked to other embedders, and haven’t heard, but I suspect that’s just because they aren’t as far along as JS in maturity. Hard to say.

LW (chat): Python meta classes…

AR: I'm skeptical, but OK.

TL: Ok to be skeptical, just exploring the design space here.

Presenting Option 1C

It’s like Option 1b but with a builtin instead. But the problem is that all of these things want to be in globals, so you’d have to store the descriptor structs in immutable globals for optimization, but you can’t call an initializer function in a global initializer. So we’d want some sort of “early import” stage during instantiation to call the initializers. I don’t have a lot of appetite for adding that, but it has been suggested, so I’m adding this for completeness.

Option 1D

Fairly simple approach but we’re concerned about the memory overhead. It might be about 10 pointers per empty prototype.

Option 1E

This is the most direct, it corresponds to “direct” on the opening slide. But it requires a backup plan for when a descriptor isn’t in an immutable global, needs to be allocated at runtime. So it’s not a catch-all but it’s very efficient.

AR: IIUC, 1A and 1D don’t require any random addition to the core language, correct?

TL: right.

JK (chat): 1C would also be novel in that it'd be a "dynamic-type" call instruction: you'd need to inspect the identity of the callee to figure out what the call does to the value stack

(editor's note: I didn't see the above comment during the meeting, but I don't think it's correct. The function import would have a normal function type used normally during validation, but at instantiation we would have to check that the parameters make sense for the result type - TL)

AR: 1A seems nice because it’s so simple. Only drawback is that we have 2x the pointer in the descriptor? Is that really an issue in practice? This is only for descriptors, I would think that for every descriptor there are many times more instances, so the space overhead would be dominated by the instances. Is this a relevant problem to have those 2 slots in there? It seems so much simpler than the other options.

TL: agree that the space overhead is not expected to be a problem. One of the unfortunate things here is that because it doesn’t add any anything to the language, it relies very heavily on the import and export sections as they are today. So it spends a lot of binary size on import/export. So if you look at the results, the nice simple option A corresponds to the modular experiment. You can see the extra bytes are largely from the imports/exports and they do add up. But yes, you are right about being simple and not needing language extensions.

AR: So the extra bytes are the import/export names? We discussed storing those more efficiently.

TL: These were done with small generated names, e.g. numbers. So not totally minimized but not super long either. So the baseline is also exporting all the functions, so the difference is from importing all the prototypes and the custom section for populating them.

LW: Is "modular indexed" where the functions are exported but they are referred to by index?

TL: yes. Named is polyfillable once you add custom descriptors. Indexed uses the export index which is available in the core embedding API but there’s no way to get an ordered list of exports in todays’ JS API.

LW: IIRC direct involves punching into the core Wasm in a way that we can’t do now. Could we use ref.func as a parameter to an import?

TL: These experiments are not just about problem 1 but problem 2, which is configuring the contents of the prototyes. So that suggestion of passing the funcs as ref.func is sort of in the realm of problem 2.

LW: would that be closer to direct in terms of overhead? Or still more bloaty?

TL: you’d be calling imports in the start function, so we’re concerned it would be more bloaty. You’re saving the JS code, but replacing it with a bunch of Wasm in the start function which won’t be tiered up. So unclear how much better that is.

AR: As a clarification, the diff between named and indexed, is just on the JS side?

TL: It’s from not putting the export names in the custom section that specifies how exported functions get attached to prototypes.

AR: still confused about how the problem 1 options correlate to the experiments. It seems like they are all in the imperative or modular space except for the last one?

TL: 1A is the modular approach, 1B is not prototyped, but would be similar to modular because you’d be importing all of the prototypes, and exporting the functions and configuring them. 1C, you’d need function calls in global initializers, we don’t have any experiments for that because it would need new machinery. 1D, you’re not importing anything so that’s nice. You still export all the functions, so fairly similar to modular but in between. And 1E is the “direct” approach we tried.

JK (chat): the experimental results are, in fact, dominated by the solution to Problem 2, because there are more methods than prototypes.

AR: So all but 1E are sort of similar.

RH: My preference is somewhere between 1B and 1C for this; in defense of 1B where we add new instruction or take an extra parameter for configuring. I would contest how random this is; the problem for JS where we have GC structs and we want the host to interop is the same if we had Wasm in a Python or .NET and want them to look like native objects, it’s not that strange of an issue. It’s not just that we want one little configuration point, but several. So having a general host extension point seems reasonable. And the cost is just 1 instruction. Looking at this with exact types, descriptors etc, this is so much smaller than the rest of the proposal. For 1C, this would be importing a builtin that could construct a descriptor with extra argos; the concern there was infrastructure for early globals. The simplest version of that is a global section that happens before. A very small variation on repeated sections, the first one can call imports. Same with the type system extensions, it’s pretty small overall and has other uses as well. Not necessarily stuck on one of those but wouldn’t discount them just because of that. 1A is also possibly fine. From an aesthetic point of view, having this extra thing that’s manually inspected is a bit surprising.

LW: for 1B a different framing that makes it more regular and less bolted on is that we have this concept of a descriptor field in every object and it points to something, you can relax that restriction and let it point to an extern instead of just a GC object, is a strict relaxation of the current rules, then you can just take the externref on creation, so you say there’s a field for the thing that describes me, all these things are just descriptors, so we are just reusing the field that we are already adding, so its more of a relaxation instead of an extra operand.

TL: I don’t think you can see it as such a nice relaxation because the descriptors have such an important structure. The fact that they point to the RTT for casts is important, and you can’t be sure that general objects have that structure.

AR: Also unsound to take an arbitrary structure, it has to have the right type.

TL: So by analogy, we have to have it be engine managed.

LW: So we need to add a little complexity where the type has to be described, so we have a reference to a thing that describes another type, so on creation it has to be the right type.

TL: We could add a definition for a descriptor type that is not a struct but is a subtype of extern but is guaranteed to have the proper structure.

AR: That would imply that you import the entire descriptor, you can't construct it in wasm. That would make more sense to me. The downside is that you can’t use it for other purposes internally. The other problem is that it would be hard to make the typing work

TL: Another problem is that to construct this in JS you have to say what type it describes but you haven’t instantiated yet. So you need another module with a type section or type reflection API.

LW: for JS these are just random JS objects so you don’t have to know about the type you’re describing so the JS API would accept any object.

TL: You'd have to wrap the object in a descriptor.

AR: I could imagine if we had a kind of descriptor type, the JS API could have a coercion that wraps a JS object into a descriptor in toWebAssemblyValue. But you still have an initialization problem.

TL: and other fields are important because we want to use them for vtables.

AR: And you want them to be constant.

LW: You’d have a vtable in the proto chain before them and its descriptor would be the same as extern, it would be tweaked as a descriptor but just a JS object.

AR: but it needs to be sound at every level, so even the descriptor’s descriptor has to have the right type

TL: But you could do the implicit wrapping in the ToWebAssembly implementation, you wouldn’t need extra fields.

AR: I see. Then the prototype is one level up and it has to be copied down?

TL: There’s a distinction between the shape/map, and you need that already, it could be the descriptor.

AR: In general we want fields in this, we want to use it in wasm too. An extra indirection would be weird.

TL: I think this implicit wrapping could work but at that point why not just take the descriptor field, it seems simpler.

LW: This does show up , if you put WasmGC in a native VM you get this again. This is a pretty magical hack for JS but you have the same problem later.

AR: Not sure. If you have e.g. Java you don’t have this same meta descriptor machinery…

LW: java.lang.object?

AR: for java you get away with a canonical descriptor, and you just have the vtable.

JK (chat) : one variant of 1B: we could split "(type $foo (descriptor $foo.desc) ...)" into two options:

(type $foo (wasm-only-descriptor $foo.desc) ...) or (type $bar (wasm+host-descriptor $bar.desc) ...)

where "wasm+host-descriptor" is basically an opt-in to "option 1D", i.e. $bar.desc will get a prototype. (Setting up that prototype and its parent will happen separately; see Problem 2.)

LW: the value prop for WasmGC is that it uses the host GC.

AR: You can pass the reference around but that doesn’t mean it behaves exactly like a JS object. We don’t necessarily need to go so far out of our way to do that, it crosses a line into being too JS specific.

TL: This just lets you attach type associated data from the host, it doesn’t change the layout of the descriptor, it's just an associated chunk of data, it's not enough to make it appear like a java lang object on the outside. So it’s kind of specific to prototype-based inheritance.

LW: If we have the gc objects, the host knows about wasm, it’s going to access them natively from its jit like JS, we know its a WasmGC object but ill follow the descriptor chain up to a native object, then if find a terminal externref, then if i find it it will tell me how it should look in the embedder. Otherwise you can just return something generic. You can at least have e.g. interfaces even if it's not a full java lang object.

TL: Why is it better to have it in the descriptor chain rather than in a field? because the host can get to it either way.

AR: Would question whether it's possible in general to have just random stuff on the descriptor chain, because in JS these have to be maps.

TL: To be clear it would have to be a field of a descriptor somewhere on the chain, whether it's visible or not, or the first vs the last… why are any of those nicer than a user visible field on the first descriptor?

LW: if we are allowing multiple chains, is this any better?

TL: I think so, if we have these, if you put the prototypes as the first fields in the descriptor, if you do that on every level of the chain, then each one can be exposed to JS nicely. So i don’t know the language that would want that but if there were one, I would say making it an explicit field would be beneficial.

LW: is this like a duck type, where if i find a particular object with the value, then it gets reflected?

TL: That’s one option, the other is to have this DescriptorOptions wrapper, which is still like a duck type but it’s marked intentionally as the thing that has the descriptor. So do we want to pay the cost of the wrapper object, or have the random field on the random object.

AR: Is that allowed, to have a prototype that’s an arbitrary JS value?

TL: yes, any JS object can be a prototype, so not a string or Number but any object.

AR: If we don’t have DescriptorOptions then either it traps if you pass something not an object, or it’s just ignored?

TL: Yes. At instantiation time you look at the first field and if its’ not the right kind, you get a null prototype or whatever.

AR: So its just more explicit. I really want to have to opt into this. You can also imagine that it’s not a wrapper but a specific class of objects, maybe an instance of something in particular.

TL: yeah you could stamp the prototype object with a hidden property. Was surprised to find out though that it’s more expensive than just a wrapper object.

I would like a straw poll because there are a lot of people who haven’t said anything but might have an opinion. Totally nonbinding, just a sense of how people feel.

Would like to know how many are in favor/N/Against. Let’s lump the “prototype in the descriptor chain” options together as 1A , any variant where there's an explicit field is 1A, and variants where there’s a hidden field is the other one.

Just type whatever you have an opinion on, e.g. 1AF, 1BN etc

Option F N A
1A: Prototype as Descriptor Field 4 5 0
1B: Prototype as Operand 4 1 2
1C: Prototype via Imported Functions 2 3 3
1D: Ubiquitous Prototypes 2 2 1
1E: Prototype via Direct Association 2 1 4

Next time we’ll talk about problem 2.

Closure