import streamlit as st from st_pages import add_indentation add_indentation() st.title('Loss functions') st.subheader('SDM Loss') st.markdown(''' The similarity distribution matching (SDM) loss, which is the KL divergence of the image to text and text to image to the label distribution. We define $f^v$ and $f^t$ to be the global representation of the visual and textual features respectively. The cosine similarity $sim(u, v) = \\frac{u \\cdot v}{|u||v|}$ will be used to compute the probability of the labels. We define $y_{i, j}=1$ if the visual feature $f^v_i$ matches the textual feature $f^t_j$, else $y_{i, j}=0$. The predicted label distribution can be formulated by''') st.latex(r''' p_{i} = \sigma(sim(f^v_i, f^t)) ''') st.markdown(''' We can define the image to text loss as ''') st.latex(r''' \mathcal{L}_{i2t} = KL(\mathbf{p_i} || \mathbf{q_i}) ''') st.markdown('Where $\\mathbf{q_i}$, the true probability distribution, is defined as') st.latex(r''' q_{i, j} = \frac{y_{i, j}}{\sum_{k=1}^{N} y_{i, k}} ''') st.markdown('It should be noted that the reason this computation is needed is because there could be multiple correct labels.') st.subheader('IRR (MLM) Loss') st.subheader('ID Loss')