Update README.md

Browse files

Files changed (1) hide show

README.md +68 -3

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
----
-license: mit
----

+---
+library_name: transformers
+license: mit
+base_model:
+- XeTute/Phantasor_V0.1-137M
+tags:
+- llama-factory
+- full
+- generated_from_trainer
+- story
+- tiny
+- chinese
+- english
+datasets:
+- Chamoda/atlas-storyteller-1000
+- jaydenccc/AI_Storyteller_Dataset
+- zxbsmk/webnovel_cn
+- XeTute/Pakistan-China-Alpaca
+language:
+- zh
+- en
+pipeline_tag: text-generation
+---
+> [!TIP]
+> Model is still in its testing phase. We don't recommend it for high-end production enviroments, it's only a model for story-generation.
+> Model trained using LLaMA-Factory by Asadullah Hamzah at XeTute Technologies.
+# Phantasor V0.2
+We introduce Phantasor V0.2, the continuation of [Phantasor V0.1](https://huggingface.co/XeTute/Phantasor_V0.1-137M). It has been trained ontop of V0.1 using a new dataset (more details below) and the old datasets.
+Licensed under MIT, feel free to use it in your personal projects, both commercially and privately, Since this is V0.2, we're open to feedback to improve our project(s).
+*The Chat-Template used is Alpaca. For correct usage, insert your prompt as a **system** prompt. The model can also be used without any template to continue a sequence of text.*
+[You can find the GGUF version here.](https://huggingface.co/XeTute/Phantasor_V0.2-137M-GGUF)
+## Example =)
+```txt
+Coming later
+```
+```txt
+Coming later
+```
+## Training
+This model was trained on all samples, tokens included in:
+- [Chamoda/atlas-storyteller-1000](https://huggingface.co/datasets/Chamoda/atlas-storyteller-1000)
+- [jaydenccc/AI_Storyteller_Dataset](https://huggingface.co/datasets/jaydenccc/AI_Storyteller_Dataset)
+- [zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)
+for exactly 4.0 epochs on all model parameters. Following is the loss curve, updated with each training step over all four epochs.
+![training_loss.png](https://huggingface.co/XeTute/Phantasor_V0.2-137M/resolve/main/training_loss.png)
+Instead of AdamW, which is often used for large GPTs, we used **SGD**, which enabled the model to generalize better, which can be seen when using the model on non-dataset prompts.
+## Finished Model
+- ~137M Parameters, all of which are trainable
+- 1024 / 1k input tokens / context length, from which all were used
+- A loss ~1.2 on all samples (see Files => train_results.json)
+This is very good performance for the V0.2.
+# Our platforms
+## Socials
+[BlueSky](https://bsky.app/profile/xetute.bsky.social) | [YouTube](https://www.youtube.com/@XeTuteTechnologies) | [HuggingFace 🤗](https://huggingface.co/XeTute) | [Ko-Fi / Financially Support Us](https://ko-fi.com/XeTute)
+## Our Platforms
+[Our Webpage](https://xetute.com) | [PhantasiaAI](https://xetute.com/PhantasiaAI)
+Have a great day!