Discussion about this post

User's avatar
Michael C's avatar

Your piecewise smoothing embedding may not work as expected on language models. Have you tried sending some text input to your model and see if the output makes any sense?

Since your model is fundamentally different from current transformers, I wonder if it is even valid to compare the two training errors. Not Apple to Apple comparison.

3 more comments...

No posts

Ready for more?