Spaces:
Running
Running
typo
#90
by
jvelja
- opened
In the Activation Memory
section, the calculations for $$m_act$$ is the following
And you guys state that this scales linearly with $$seq_len$$ and $$bs$$. Though, this actually scales quadratically with $$seq_len$$:
- The first term, $$34 \cdot L \cdot seq \cdot bs \cdot h$$, scales linearly with $$seq$$.
- The second term, $$5 \cdot L \cdot bs \cdot n_{heads} \cdot seq^2$$, scales quadratically with $$seq$$
eliebak
changed discussion status to
closed