DavidAU
/

L3.1-RP-Hero-Dirty_Harry-8B-GGUF

Model card Files Files and versions Community

DavidAU commited on about 1 month ago

Commit

9f10ab9

•

1 Parent(s): 8fe45fc

Create README.md

Browse files

Files changed (1) hide show

README.md +272 -0

README.md ADDED Viewed

	@@ -0,0 +1,272 @@

+---
+license: apache-2.0
+language:
+- en
+tags:
+- creative
+- creative writing
+- fiction writing
+- plot generation
+- sub-plot generation
+- fiction writing
+- story generation
+- scene continue
+- storytelling
+- fiction story
+- science fiction
+- romance
+- all genres
+- story
+- writing
+- vivid prosing
+- vivid writing
+- fiction
+- roleplaying
+- bfloat16
+- swearing
+- role play
+- sillytavern
+- backyard
+- horror
+- llama 3.1
+- context 128k
+- mergekit
+pipeline_tag: text-generation
+---
+(quants uploading, examples to follow)
+<B><font color="red">WARNING:</font> NSFW. Vivid prose. INTENSE. Visceral Details. Violence. Graphic HORROR. GORE. Swearing. UNCENSORED. </B>
+<h2>L3.1-RP-Hero-Dirty_Harry-8B-GGUF</h2>
+<img src="rp-talker.jpg" style="float:right; width:300px; height:300px; padding:10px;">
+It is a LLama3.1 model, max context of 128k (131,000) and is a dedicated "roleplay model" (it can also be used for creative uses).
+This model has been designed to be relatively bullet proof and operates with all parameters, including temp settings from 0 to 5.
+It is an extraordinary compressed model, with a very low perplexity level (lower than Meta Llama 3.1 Instruct).
+This model is for any writing, fiction or roleplay activity, but it is composed of ROLE PLAY models and it primary designed for role play.
+It also has stronger than average instruction following attibutes.
+This is version "Dirty Harry", which has two additional versions: "InBetween" and "Big Talker".
+InBetween (medium output generation on average):
+[ https://huggingface.co/DavidAU/L3.1-RP-Hero-InBetween-8B-GGUF ]
+Big Talker (long output generation on average):
+[ https://huggingface.co/DavidAU/L3.1-RP-Hero-BigTalker-8B-GGUF ]
+"Dirty Harry" has SHORT (average) level length output, and is uncensored (note: InBetween has a slight degree of censorship).
+"Dirty Harry" also has slightly higher detail level than "InBetween", but on par with "Big Talker.
+All versions are composed of top rated Role Play models.
+This model, as well as the other two versions, can be used for any creative genre too.
+It requires Llama3 template and/or "Command-R" template.
+For roleplay settings, and apps to use this model for roleplay see the section "Highest Quality Settings..." below.
+Example outputs below to show prose quality / creativity.
+<B>Model Notes:</B>
+- Detail, prose and fiction writing abilities are significantly improved.
+- For more varied prose (sentence/paragraph/dialog) raise the temp and/or add more instructions in your prompt(s).
+- Role-players: Careful raising temp too high as it may affect instruction following.
+- This model works with rep pen of 1 or higher, 1.02+ recommended.
+- If you want a specific type of prose (IE horror) add in "(vivid horror)" or "(graphic vivid horror)" (no quotes) in your prompt(s).
+- This model has a neutral to negative bias BUT can be controlled by prompt/prose controls directly.
+- Output length will vary however this model prefers "SHORT" outputs EVEN IF you state the size.
+- For creative uses, different quants will produce slightly different output.
+- Due to the high stability and compressed nature of this model, all quants will operate at above average levels.
+- Source code for this model will be uploaded at separate repo shortly.
+<B>Settings, Quants and Critical Operations Notes:</b>
+Change in temp (ie, .4, .8, 1.5, 2, 3 ) will drastically alter output.
+Rep pen settings will also alter output too.
+This model needs "rep pen" of 1.05 or higher as lower values may cause repeat paragraph issues at end of output however LOWER rep pen
+values may result is very different (creative / unusual) generation too.
+For role play: Rep pen of 1.02 min is suggested.
+Raise/lower rep pen SLOWLY ie: 1.011, 1.012 ...
+Rep pen will alter prose, word choice (lower rep pen=small words / more small word - sometimes) and creativity.
+To really push the model:
+Rep pen 1.05+ or lower / Temp 3+ ... be ready to stop the output because it may go and go at these strong settings.
+You can also set a "hard stop" - maximum tokens generation - too to address lower rep pen settings / high creativity settings.
+Longer prompts vastly increase the quality of the model's output.
+GET A GOOD "GENERATION":
+This model has been set, so that each time you "regen" a prompt it will not deviate too much from the previous generation.
+(Unlike Darkest Planet 16.5B, which will).
+That being said, sometimes a second or third generation will been of much higher overall quality.
+IE:
+If you use case is creative writing, you may want to regen a prompt 1-5 times then pick the best one. The best
+way to do this is open a new chat PER generation, then do a "read thru" to see which one(s) hit the mark.
+Then adjust temp and/or rep pen slightly and retry this process.
+The goal is the best generation with least amount of editing in this example.
+QUANTS:
+Higher quants will have more detail, nuance and in some cases stronger "emotional" levels. Characters will also be
+more "fleshed out" too. Sense of "there" will also increase.
+Q4KM/Q4KS are good, strong quants however if you can run Q5, Q6 or Q8 - go for the highest quant you can.
+IQ4XS: Due to the unusual nature of this quant (mixture/processing), generations from it will be different then other quants.
+You may want to try it / compare it to other quant(s) output.
+Special note on Q2k/Q3 quants:
+You may need to use temp 2 or lower with these quants (1 or lower for q2k). Just too much compression at this level, damaging the model. I will see if Imatrix versions
+of these quants will function better.
+Rep pen adjustments may also be required to get the most out of this model at this/these quant level(s).
+ARM QUANTS:
+This repo has 3 arm quants for computers than can run them. If you use these quants on a non-arm computer, your token per second will be very low.
+<B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>
+In "KoboldCpp" or  "oobabooga/text-generation-webui" or "Silly Tavern" ;
+Set the "Smoothing_factor" to 1.5 to 2.5
+: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"
+: in text-generation-webui -> parameters -> lower right.
+: In Silly Tavern this is called: "Smoothing"
+NOTE: For "text-generation-webui"
+-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)
+Source versions (and config files) of my models are here:
+https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be
+OTHER OPTIONS:
+- Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
+- If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.
+<B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>
+This a "Class 1" model:
+For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
+[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
+You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
+[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
+<B>Templates:</B>
+This is a LLAMA3 model, and requires Llama3 template, but may work with other template(s) and has maximum context of 128k / 131,000.
+If you use "Command-R" template your output will be very different from using "Llama3" template.
+Here is the standard LLAMA3 template:
+<PRE>
+{
+  "name": "Llama 3",
+  "inference_params": {
+    "input_prefix": "<|start_header_id|>user<|end_header_id|>\n\n",
+    "input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
+    "pre_prompt": "You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.",
+    "pre_prompt_prefix": "<|start_header_id|>system<|end_header_id|>\n\n",
+    "pre_prompt_suffix": "<|eot_id|>",
+    "antiprompt": [
+      "<|start_header_id|>",
+      "<|eot_id|>"
+    ]
+  }
+}
+</PRE>
+<B>Model "DNA":</B>
+Special thanks to the incredible work of the model makers "ArliAI", "Casual-Autopsy" , "Gryphe", "aifeifei798" :
+Models used:
+https://huggingface.co/ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.1
+https://huggingface.co/Casual-Autopsy/L3-Umbral-Mind-RP-v0.3-8B
+https://huggingface.co/Gryphe/Pantheon-RP-1.0-8b-Llama-3
+https://huggingface.co/aifeifei798/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored
+Parts of these models were "grafted" / "fused" together to create this model.
+<b>Optional Enhancement:</B>
+The following can be used in place of the "system prompt" or "system role" to further enhance the model.
+It can also be used at the START of a NEW chat, but you must make sure it is "kept" as the chat moves along.
+In this case the enhancements do not have as strong effect at using "system prompt" or "system role".
+Copy and paste EXACTLY as noted, DO NOT line wrap or break the lines, maintain the carriage returns exactly as presented.
+<PRE>
+Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.
+Here are your skillsets:
+[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)
+[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)
+Here are your critical instructions:
+Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
+</PRE>
+You do not need to use this, it is only presented as an additional enhancement which seems to help scene generation
+and scene continue functions.
+This enhancement WAS NOT used to generate the examples below.
+<h3>EXAMPLES PROMPTS and OUTPUT:</h3>
+Examples are created using quant Q4_K_M, "temp=1.3", "rep pen : 1.02" (unless otherwise stated), minimal parameters and "LLAMA3" template.
+Model has been tested with "temp" from ".1" to "5".
+Below are the least creative outputs, prompt is in <B>BOLD</B>.
+---
+<B><font color="red">WARNING:</font> NSFW. Vivid prose. Visceral Details. Violence. HORROR. Swearing. UNCENSORED. </B>
+---