import streamlit as st from src.configs import Languages def write(*args): # ==== HOW IT WORKS ==== # with st.beta_container(): st.markdown("") st.markdown("") st.markdown( """ Wordify makes it easy to identify words that discriminate categories in textual data. Let's explain Wordify with an example. Imagine you are thinking about having a glass of wine :wine_glass: with your friends :man-man-girl-girl: and you have to buy a bottle. You know you like `bold`, `woody` wine but are unsure which one to choose. You wonder whether there are some words that describe each type of wine. Since you are a researcher :female-scientist: :male-scientist:, you decide to approach the problem scientifically :microscope:. That's where Wordify comes to the rescue! """ ) st.markdown("") st.markdown("") st.header("Steps") st.subheader("Step 1 - Prepare your data") st.markdown( """ Create an Excel or CSV file with two columns for each row: - a column with the name or the label identifying a specific object or class (e.g., in our wine example above it would be the type of wine or the name of a specific brand). It is common practice naming this column `label` - a column with the text describing that specific object or class (e.g., in the wine example above it could be the description that you find on the rear of the bottle label). It is common practice naming this column `text` To have reliable results, we suggest providing at least 2000 labelled texts. If you provide less we will still wordify your file, but the results should then be taken with a grain of salt. Consider that we also support multi-language texts, therefore you'll be able to automatically discriminate between international wines, even if your preferred Italian producer does not provide you with a description written in English! """ ) st.subheader("Step 2 - Upload your file and Wordify!") st.markdown( """ Once you have prepared your Excel or CSV file, click the "Browse File" button. Browse for your file. Choose the language of your texts (select multi-language if your file contains text in different languages). Push the "Wordify|" button, set back, and wait for wordify to do its tricks. Depending on the size of your data, the process can take from 1 minute to 5 minutes """ ) # ==== FAQ ==== # with st.beta_container(): st.markdown("") st.markdown("") st.header(":question:Frequently Asked Questions") with st.beta_expander("What is Wordify?"): st.markdown( """ Wordify is a way to find out which terms are most indicative for each of your dependent variable values. """ ) with st.beta_expander("What happens to my data?"): st.markdown( """ Nothing. We never store the data you upload on disk: it is only kept in memory for the duration of the modeling, and then deleted. We do not retain any copies or traces of your data. """ ) with st.beta_expander("What input formats do you support?"): st.markdown( """ The file you upload should be .xlsx, with two columns: the first should be labeled 'text' and contain all your documents (e.g., tweets, reviews, patents, etc.), one per line. The second column should be labeled 'label', and contain the dependent variable label associated with each text (e.g., rating, author gender, company, etc.). """ ) with st.beta_expander("How does it work?"): st.markdown( """ It uses a variant of the Stability Selection algorithm [(Meinshausen and Bühlmann, 2010)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2010.00740.x) to fit hundreds of logistic regression models on random subsets of the data, using different L1 penalties to drive as many of the term coefficients to 0. Any terms that receive a non-zero coefficient in at least 30% of all model runs can be seen as stable indicators. """ ) with st.beta_expander("How much data do I need?"): st.markdown( """ We recommend at least 2000 instances, the more, the better. With fewer instances, the results are less replicable and reliable. """ ) with st.beta_expander("Is there a paper I can cite?"): st.markdown( """ Yes please! Reference coming soon... """ ) with st.beta_expander("What languages are supported?"): st.markdown( f""" Currently we support: {", ".join([i.name for i in Languages])}. """ )