# installed pip packages # pip install streamlit # pip install beautifulsoup4 # pip install docx2txt # pip install pypdf2 # pip install pdfplumber import streamlit as st # File Processing pkgs from PIL import Image import requests from bs4 import BeautifulSoup import json import docx2txt # import textract from PyPDF2 import PdfFileReader import pdfplumber # ---- LOAD ASSETS ---- img_page_icon = Image.open("images/web_icon.jpeg") # Find more emojis here: https://www.webfx.com/tools/emoji-cheat-sheet/ st.set_page_config(page_title="OdiaGenAI ", page_icon=img_page_icon, layout="wide") # Load CSS file def load_css(file_path): with open(file_path) as f: st.markdown(f"", unsafe_allow_html=True) # Load CSS file load_css('styles.css') # ---- HEADER SECTION ---- with st.container(): st.subheader("Hi, username :wave:") st.write("##") st.markdown("
tags within the div_tag p_tags = body.find_all('p') # Extract the text content from each
tag paragraphs = [p.get_text(strip=True) for p in p_tags] paragraphs = '\n'.join(paragraphs) news = news + (headline + '\n\n' + paragraphs) # Display the extracted text content from url st.text_area("Extracted Text", value=news, height=200) else: st.error("Error: Unable to fetch content from the provided URL.") except requests.exceptions.RequestException as e: st.error("Error: An exception occurred while fetching content from the URL.") # Check if the user has provided a document elif documents is not None: for document in documents: document_details = { "filename":document.name, "filetype":document.type, "filesize":document.size } st.write(document_details) # Extract content from the txt file if document.type == "text/plain": # Read as bytes news += str(document.read(), "utf-8") # Extract content from the pdf file elif document.type == "application/pdf": # using PyPDF2 # news += read_pdf(document) # using pdfplumber try: with pdfplumber.open(document) as pdf: all_text = "" for page in pdf.pages: text = page.extract_text() all_text += text + "\n" news += all_text except: st.write("None") # Extract content from the docx file else: news += docx2txt.process(document) # Display the extracted text content from file st.text_area("Extracted Text", value=news, height=200) else: st.error("Error: An error occurred while fetching content .") col1, col2, col3 = st.columns([0.6, 0.2, 0.2]) with col1: url = st.text_input(label='', placeholder="Enter URL") with col2: documents = st.file_uploader("", type=["png", "jpg", "jpeg", "pdf", "txt", "docx"], accept_multiple_files=True) with col3: b = st.button("Enter") if b: run_function(url, documents)