Conversation
|
Again, I could not test this but this dae41d2 is how I would organize Notes: we're cutting a dense, no bias logits layer off the end and swapping cross_entropy for l1_loss. This is a price I would pay for being able to run on non token data. |
|
@jordandekraker hey Jordan, thanks for the pull request any chance you could disentangle this so it can support language modeling and your use-case (with a few tests)? are you seeing something with this architecture on continuous data? |
|
@jordandekraker it may be faster if i just build it for you, but you'll have to share with me what you are seeing. just reach out over Signal |
|
The README code worked for me and The changes to get away from tokens MAY be a bit deeper than I thought - not sure if additional classes and modules will need to be updated to have a features dimension. That is, even though I cannot run it all locally, i think I prefer to chat via github but can move if that is prohibitive for you |
This is a pretty simple PR, it just removes the embedder so people can feed in other (flattened) data types, such as embedded video frames or audio or other. We also remove softmax and logits. From the sampler, we remove
min_p_filterandgumbel_sample.For tokens, it is recommended to do embedding and logits outside
mac_transformer.py. I wasn't able to runtrain_mac.pydue to incompatible dependencies, so I left it alone, but it should be simple to add an embedder and logits.min_p_filterandgumbel_samplecould possibly be added back in somewhere else (utilis?)