Edit model card

"Working on the future!"

— # Leroy Dyer (1972-Present)

"To grow as a professional, set goals just beyond your current abilities. Achieving these milestones will not only overcome obstacles but also strengthen your skillset. If your tasks are too easy, you’ll never challenge yourself or improve, and life will pass you by!"

SpydazWeb_AI_Text_AudioVision_Project : A multi- Purposed model !

In the creation of a model a lot of time and consideration of which data to focus training and which methodolgys to deploy is a task in its own right : but these training and fine tunings can be stacked on topof each other : in some cases poiling the otput and other enhancing the output . but in truth the aim is not to be agentic at all ! w need a single model which can perform ANY task ! such that any type of task even unseen and untrained task can be performed at will : we have attempted to hand;le this with agentic workflows ,, such as graphs , agents , tools , functions : but in truth these are just short cuts as well as removing the capablitys of the model in nplace of programming or external data ie essentially rewriting the output of the modle or carving the model to perform a psecific censored answrr: in fact censoring is not only about rude ness or lwed talk, its about rstricting the usage of the model as some arnas are not rady to be guzumpt by AI : Especially those in well paid positions :

We have found that agent gen and crew ai , even aider . these framworks allow for some organization and role play which enables for models to act like humans ( satisfing a part of the original AI goals ) the idea of agentic workflows and agentic conenctivity as intligent agents is old news and those old powers are also tring to push thier agendas ( much like the ancient racists wh have attempted to dominate history and other world content )

today we are in a enwe genratio of thought and action and the goals of the past no longer apply : but we are also in a python world ! an untrained and uneduacted areana : so it seems as backwards is the ew forwards :

There have been a flurry of multi modal s ow they have slowly caught us up but again they have basically only copied other models and not produced unique models ! they used the lava , as well as clip models etc to make these multimodal vision models !

Inn fact we should examine these models carefullys as they are all replicaka ! are they flooding us with bad unusble models ! ( some of the sizes are way over sized they love to share what cannot be used !)

But inn truth we need to realise that the 7b model has not been fully explored and we stil do not have a metric for increasing parameters and performance as it is only assumed as well as some FIXING ! of results by traininng youor model onn the multiple choice datsets ! But no innvovation': despite the unstable diffusions: SO we need to create a enw method ! hence text vision !

Text Vision:

text vision is a methodolgy developed for entering an image into your chat with a model :

In this method : we choose to firstly convert the image to a text representation first : This is base64 a common method in python ad easy to replicate :


import base64
from datasets import load_dataset

# Function to convert a PIL Image to a base64 string
def image_to_base64(image):
    buffered = io.BytesIO()
    image.save(buffered, format="PNG")  # Save the image to the buffer in PNG format
    base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8')
    return base64_string

def decode_base64_to_image(base64_string, target_size=-1):
    image_data = base64.b64decode(base64_string)
    image = Image.open(io.BytesIO(image_data))
    if image.mode in ('RGBA', 'P'):
        image = image.convert('RGB')
    if target_size > 0:
        image.thumbnail((target_size, target_size))
    return image



To process the records of the datasets into image64 i selected a few basic datasets from the hub which are just simple description and image : ( in this case png )




# Define a function to process each example in the dataset
def process_images_func(examples):

    texts = examples["text"]
    images = examples["image"]  # Assuming the images are in PIL format

    # Convert each image to base64
    base64_images = [image_to_base64(image) for image in images]

    # Return the updated examples with base64-encoded images
    return {
        "text": texts,
        "image_base64": base64_images  # Adding the Base64 encoded image strings
    }

# Load the dataset
dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]")

# Process the dataset by converting images to base64
processed_dataset = dataset.map(process_images_func, batched=True)

After pushing them all to hub ! They can now be loaded into the training script as usual :

A basic prompt :

- Generate an image based on this description 

- describe this image : ( base64 )

- generate a spectrographic image based on this description

- describe this sound in this spectrographic image : ( base64 )

So perhaps my input formatting will be :

<Image> : (base64) </Image>
<Sound> : (base64) </Sound> 
<Text> : Prompt </Text> 

Text Audio

so here we can also define a sound as an image , by converting it too into a text representation ,

here we convert the audio first inpt a spectrographic image ,

ie : a wave form image , so then we can continue as an image but , in fact because its a sound we should also lt the model know we are giving it a sound perhaps as we would like ot generate these types of sounds later

hence : these prompts :


- generate a spectrographic image based on this description

- describe this sound in this spectrographic image : ( base64 )

not so easy so i will develop a method for this : Perhaps the whisper as it creates a mel which is the inage which is used for thier transformation : hence we can do the same : using those images as input and potential outputs : Currenty i did not convert any sounds to mel or spectrographic images :

i did locate some datsets which have them so i used them !

They seemed to work the same as the image files and currently its improving !

ADDED FUNCTIONALITY :

Here added functionality was trained into the model !

- Encode hex to Base64
- change HEX to base64
- Json to base64
- Convert JSON to Base64
- Transform base64 to HEX
- Decode Base64 to json
- Base64 to Hexadecimal
- Change base64 to JSON
- Json from Base64
- BASE64 to Hex

encode

{"instruction": "Encode hex to Base64", "input": "ecfc2db9ba6049165b", "output": "7PwtubpgSRZb"}
{"instruction": "change HEX to base64", "input": "60926e782008", "output": "YJJueCAI"}
{"instruction": "Json to base64", "input": "[77,62,160,64,248,233,105,133,5,248,89,239]", "output": "TT6gQPjpaYUF+Fnv"}
{"instruction": "Change Json to BASE64", "input": "[10,59,42,251,112,1]", "output": "Cjsq+3AB"}
{"instruction": "Convert JSON to Base64", "input": "[236,201,129,100,238]", "output": "7MmBZO4="}
### Decode
{"instruction": "Transform base64 to HEX", "input": "464pNBlIObA=", "output": "e3ae2934194839b0"}
{"instruction": "Decode Base64 to json", "input": "NQ==", "output": "[53]"}
{"instruction": "Base64 to Hexadecimal", "input": "ax0WaQ==", "output": "6b1d1669"}
{"instruction": "convert base64 to Hexadecimal", "input": "8X43", "output": "f17e37"}
{"instruction": "Change base64 to JSON", "input": "7MmBZO4=", "output": "[236,201,129,100,238]"}
{"instruction": "Json from Base64", "input": "ytBBCmPRA6De+Ow=", "output": "[202,208,65,10,99,209,3,160,222,248,236]"}
{"instruction": "BASE64 to Hex", "input": "m/A=", "output": "9bf0"}

These sub tasks allow for the model to have the base64 as a object in the model :

the embedding were also trained to embed these tasks ,

enabling for the model to learn how to manipulate the base64 data for other comparitive tasks !

there are many datasets which could be used to increase the usage of the base64 code :

by trainig the embeddings for these subtasks as well as pushing a large stack of the model parameters :

the model is forced to create custom tokes as these tokens ahve not been seen before i training :

hence they need to find space in the embeddings model :

so the caption and generations can be trained at a lesser level as welll as focus on attention innstead of deeper learning :

Downloads last month
14
Safetensors
Model size
3.86B params
Tensor type
F32
·
BF16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project_4_BIT

Quantized
(1)
this model

Datasets used to train LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project_4_BIT