File size: 1,981 Bytes
d54ea16 9ff0cd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
import streamlit as st
from st_pages import add_indentation
add_indentation()
st.title('Loss functions')
st.markdown('In order to align textual and visual features, multiple loss functions are employed. '
'The most notable loss function was proposed in [arXiv: Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval](https://arxiv.org/abs/2303.12501) '
'with the introduction of the SDM loss and the usage of the IRR (Implicit Reason Relations) loss.')
with st.expander('SDM Loss'):
st.markdown('''
The similarity distribution matching (SDM) loss, which is the KL divergence
of the image to text and text to image to the label distribution.
We define $f^v$ and $f^t$ to be the global representation of the visual and textual features respectively.
The cosine similarity $sim(u, v) = \\frac{u \\cdot v}{|u||v|}$ will be used to compute the probability of the labels.
We define $y_{i, j}=1$ if the visual feature $f^v_i$ matches the textual feature $f^t_j$, else $y_{i, j}=0$.
The predicted label distribution can be formulated by''')
st.latex(r'''
p_{i} = \sigma(sim(f^v_i, f^t))
''')
st.markdown('''
We can define the image to text loss as
''')
st.latex(r'''
\mathcal{L}_{i2t} = KL(\mathbf{p_i} || \mathbf{q_i})
''')
st.markdown('Where $\\mathbf{q_i}$, the true probability distribution, is defined as')
st.latex(r'''
q_{i, j} = \frac{y_{i, j}}{\sum_{k=1}^{N} y_{i, k}}
''')
st.markdown('It should be noted that the reason this computation is needed is because there could be multiple correct labels.')
st.markdown('The SDM loss can be formulated as')
st.latex(r'''
\mathcal{L}_{sdm} = \mathcal{L}_{i2t} + \mathcal{L}_{t2i}
''')
with st.expander('IRR (MLM) Loss'):
...
with st.expander('ID Loss'):
... |