AIDXteam commited on
Commit
150fb85
·
verified ·
1 Parent(s): c792e3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -58,3 +58,61 @@ pipeline_tag: text-generation
58
  </code></pre>
59
 
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  </code></pre>
59
 
60
 
61
+ Here’s the English version of the provided text:
62
+
63
+ ---
64
+
65
+ ## ❶ Model Description
66
+
67
+ **Model Name and Key Features**:
68
+ KTDSbaseLM v0.11 is based on the OpenChat 3.5 model, fine-tuned using the SFT method on the Mistral 7B model.
69
+ It is designed to understand Korean and various cultural contexts, utilizing data from 135 domains in Korean society.
70
+ The model supports tasks such as text generation, conversation inference, document summarization,
71
+ question answering, sentiment analysis, and other NLP tasks.
72
+ Its applications span fields like law, finance, science, education, business, and cultural research.
73
+
74
+ **Model Architecture**:
75
+ KTDSBaseLM v0.11 is a high-performance language model with 7 billion parameters based on the Mistral 7B model.
76
+ It uses OpenChat 3.5 as the foundation and is fine-tuned using SFT to excel in Korean language and culture.
77
+ The streamlined Mistral 7B architecture ensures fast inference and memory efficiency,
78
+ optimized for various NLP tasks like text generation, question answering, document summarization, and sentiment analysis.
79
+
80
+ ---
81
+
82
+ ## ❷ Training Data
83
+
84
+ KTDSbaseLM v0.11 was trained on 3.6GB of data, comprising 2.33 million Q&A instances.
85
+ This includes 1.33 million multiple-choice questions across 135 domains such as history,
86
+ finance, law, tax, and science, trained with the Chain of Thought method. Additionally,
87
+ 1.3 million short-answer questions cover 100 domains including history, finance, and law.
88
+
89
+ **Training Instruction Dataset Format**:
90
+ `{"prompt": "prompt text", "completion": "ideal generated text"}`
91
+
92
+ ---
93
+
94
+ ## ❸ Use Cases
95
+
96
+ KTDSbaseLM v0.11 can be used across multiple fields, such as:
97
+
98
+ - **Education**: Answering questions and generating explanations for subjects like history, math, and science.
99
+ - **Business**: Providing responses and summaries for legal, financial, and tax-related queries.
100
+ - **Research and Culture**: Performing NLP tasks, sentiment analysis, document generation, and translation.
101
+ - **Customer Service**: Generating conversations and personalized responses for users.
102
+
103
+ This model is highly versatile in various NLP tasks.
104
+
105
+ ---
106
+
107
+ ## ❹ Limitations
108
+
109
+ KTDSBaseLM v0.11 is specialized in Korean language and culture.
110
+ However, it may lack accuracy in responding to topics outside its scope,
111
+ such as international or specialized data.
112
+ Additionally, it may have limited reasoning ability for complex logical problems and
113
+ may produce biased responses if trained on biased data.
114
+
115
+ ---
116
+
117
+ ## ❺ Usage Instructions
118
+