John Smith PRO

John6666

AI & ML interests

None yet

Organizations

None yet

John6666's activity

replied to erinys's post 1 day ago
view reply

Hi.
I'm ashamed to say that my English is not good enough for English conversation because I rely on DeepL. Listening is barely possible, but words don't come out quickly.😅
I'm a bored PC guy in a small company, so I don't have much experience, just a little OSS development a long time ago when there probably wasn't even github, and many years of internet.
I don't know if I'm sensible myself, but I'm confident that I'm an average, common person. I don't represent the average person, though.

I've been working to make it easier for folks in the healthcare sector (without coding experience) to use HF tools by either building base projects for them to use in clinic (see https://github.com/huggingface/chat-macOS

Oh. I like this kind of attempt. I don't think it is necessary to do everything with no code, but people don't know what they might be able to do with the technology until they try it first. Then they can't give feedback. Without feedback from end users, coders have trouble fixing bugs and coming up with appropriate new features. Too much of it is a problem, but too little means that you have to play the role of a user and test it yourself.

This was fine on the scale of free software that we ourselves created for actual use and were the primary users, and it worked in the olden days. This is partly because many of the target users were PC geeks as well as the developers. Recently, I like VSCode, although it is a little different from free software.

In the case of generative AI, I think its true value will be demonstrated if the target users are spread to non-geeks and non-coders, so I think it is important to first work on gathering people and lowering the threshold for feedback and further participation. I somehow think that is what multimodalart is doing on a daily basis, for example.
Also, lllyasviel is a major GUI maintainer, though not on the HF staff.

I'm not doing it with very noble aspirations at the moment, I'm basically just playing with what I want to play with, so the only thing I'm good at is talking.

replied to singhsidhukuldeep's post 2 days ago
view reply

https://github.com/sayakpaul/diffusers-torchao

We provide end-to-end inference and experimental training recipes to use torchao with diffusers in this repo. We demonstrate 53.88% speedup on Flux.1-Dev* and 27.33% speedup on CogVideoX-5b when comparing compiled quantized models against their standard bf16 counterparts**.

Each quantization method seems to have its own suitability for saving model files, but torchao seems promising as a runtime quantization method.
I think it would be easier if diffusers and transformers supported it as a format at load time. Well, currently it is just one or two lines, but it would be even easier if it could be done with just torch_dtype= or so.

replied to erinys's post 2 days ago
view reply

Thank you.

would you be open to chatting sometime about potential approaches?

That's good. If I feel something is wrong, I try to speak up before I forget. That kind of feeling is something we forget once we get used to the environment.

I think it's good to talk with as many people as possible, but when I created my account, it already seemed like only researchers, engineers and coders were left to speak up in the community...
I'm not an ML engineer either, but I'm barely a coder, so I'm not bothered by this situation.
And if there are only people left who are not bothered and don't see it as a problem as well, what can I say about this...😅

But since I'm new to HF, or rather to generative AI, only about 6 months ago, it just seems strange in comparison to other sites and past internet communities.
However, I'm not an expert in community functions, or even a good communicator, so I can't really offer any concrete suggestions for improvement.

However, there are still many actual account holders who are not coders, so there is a chance we could hear from them.
There just wouldn't be the opportunity or motivation to speak up and interact. Because usually everyone is basically talking about code or technology. And that's not a wrong use of HF actually.
In fact, the replies to victor's post above include several non-coders I know.
I also see occasional posts from novice coders and non-coders on the forum.

Anyway, I think it would be easier to get good collaboration and feedback if we could get as many diverse people as possible together easily: researchers, library developers, model developers, model tuners, demo developers, demo users, and local tool users. It's called an ecosystem. If current situation is what HF is aiming for, that's fine, but I think it's probably not.

replied to erinys's post 3 days ago
view reply

I read the article with great interest.

Git outta here! I’m not changing my workflow.
ML engineers and data scientists aren’t always software engineers

I believe this is an issue that HF currently faces as well.
Maybe the situation is a bit better in English-speaking countries, but in my country, HF is used because people are shown how to use it on external forums, blogs, social networking sites, and know-how sites, and few people are able to use HF on their own based on information provided by HF and its community.
Of course, there are many researchers who know a lot about coding, especially in the NLP field, but what if we expand the use cases to medicine, art, or music, for example? They don't make coding part of their job, so there are few people who know much about it.

Current HF is good for initial invention and experimentation by researchers, developers and their teams, but it is not as good as it could be for maintaining the ecosystem afterwards. The more the field moves into practical areas, the more people move away from HF.
I don't necessarily think that's a bad thing, and it's important to use different sites for different use cases.
However, I think there could be a little more ingenuity in HF to look over the AI resources of the entire net or society as a whole, and to facilitate the mutual transfer of information and resources.

Simply put, could there be more technical support, both manned and unmanned, for non-coders like painters, musicians, and doctors, or could there be more no-code, low-code, graphical usage? I think so.
Currently, even the official GUI tools are broken in places. People who can code don't notice it because it's not a problem for them. Neither do I. I think we should consider how critical it is that it actually goes unnoticed for so long. A geek-only service can easily fall apart.
There are so many questions on the forum but not enough respondents. There should be so many people out there...
HF is firmly established as an infrastructure for the entire generative AI world, but it is as a tool and service provider and storage, not as a community. At best it is just a temporary hub for inventors. I think it is even a daily progression.
HF is now a hub for things for coders and researchers, but it is not and is not likely to become a hub for information, people, culture, and civilization, and so on.

I hope what you have learned will be put to good use in HF. For a long time now, OSS and related projects tend to start smoothly because there is no leader, and end dumbly because there is no leader. It is difficult to manage a project by the wisdom of a group without relying on a superior individual, but I guess we have no choice but to do it.

HF is looking for customer requests, so if anyone has an opinion, please post it in the post below.
https://huggingface.co/posts/victor/964839563451127

replied to Wauplin's post 4 days ago
view reply

https://huggingface.co/docs/huggingface_hub/v0.25.1/package_reference/hf_api
I think there are many features that are very useful but not well known to HF users.
Veteran features such as super_squash_history should be supported on the GUI side.
Also, it would be nice if there is a flag in the API or GUI to restart_space when get_space_runtime is not RUNNING, BUILDING, or PAUSED.
If HF only has an infinite loop countermeasure, it will be fine.

replied to Desgait's post 5 days ago
view reply

There are copies of the spaces if you look for them. If you can't find it, you can trace it via my Spaces.

I thought I get the 5x usage quota from ZeroGPU

My understanding is that for now it will only be 5x when you sign in with a Pro account in a space that has the sign-in feature implemented. It's easy to add the feature, but you can't put it on someone else's space.
https://huggingface.co/docs/hub/oauth

Then what ARE my benefits to my subscription?

https://huggingface.co/pricing

Inference API: Get higher rate limits for serverless inference.

Specifically, it was about 3 times as much? Also, I see a couple of models that are only available in Pro, especially in the larger LLMs.

And if you're going to do FLUX inference, you can definitely do more with Zero GPU space, even with Quota, but you'll still benefit from having 40GB of VRAM at your disposal.
And you don't have to use Serverless Inference API, so there's virtually no need to ask for permission. Putting aside the question of whether it is ethical.

replied to alielfilali01's post 5 days ago
view reply

For now, I use this if I just want to duplicate a model or dataset.
https://huggingface.co/spaces/huggingface-projects/repo_duplicator
It would be much easier if it were built into the official GUI. In general, there are a lot of useful features that are not yet integrated into the GUI... I know it's hard to layout the GUI parts, deploy them, and make sure they work...

replied to Desgait's post 5 days ago
view reply

https://huggingface.co/black-forest-labs/FLUX.1-dev
dev is gated model. You have to get permission on this page to use it. You can use other copied repos, but the official ones are cached on HF's server, so they are faster for Serverless Inference API use.
By the way, you don't have to be a Pro subscriber to use it, as long as you give server a token for an account with your permission.

replied to victor's post 5 days ago
view reply

Auto-recovery/auto-reboot for spaces.

I think this needs to be realized as a priority. For example, the space below crashes often because people use it too much, rather than because of a flaw in the code. This time it seems we ran out of disk space.
The individual author could set up a cron with code to knock it up using the HF API, but that would be roundabout and not essential, and would probably end up overloading the server.

In consideration of the server load, why not just prohibit frequent reboots in the auto-reboot function?
If the space does not get the Running state after 3 auto-reboots, then auto-reboot should be disabled for 24 hours or something like that. Without affecting manual operation.
https://huggingface.co/spaces/multimodalart/flux-lora-the-explorer

replied to sayakpaul's post 7 days ago
view reply

It looks like we could convert large model differences to Low Rank LoRA for fast switching.

replied to victor's post 8 days ago
view reply

Post:

  • The Forum has a log search function, but I don't think it's in the Post. At least not installed on the screen.

Notifications:

  • I would like to see Notifications come when a Forum Message (Mail) is received.
  • I would like to see an extraction feature like a model or dataset search screen. Just a word search would be fine.
  • I would like Notifications to be able to be subdivided and structured in a semi-automatic manner. I don't need as much as Gmail or Thunderbird, so something simple. It may be smart to augment the tabs that are currently already in place.
  • I would like to see more variations in the color change of the icon when Notifications arrive. Currently only blue and yellow. I can't even tell the difference between an emergency notification and a Parquest Bot, which is a nice guy, but that's not the point.

Collections:

  • I would like to be able to do Collections for Collections.
  • It would be better if other HF resources could be added to the Collections, e.g. URLs of Posts, Forums and Discussions.
  • Structured or hierarchical Collections would be nice

Forum:

  • We're probably in a situation where the forum trolls are spiking, and HF better get ready to hire more response personnel. There is no point in trying to figure out why there is an increase, but at any rate, this type of activity has increased dramatically in many areas over the last year or so, and many communities have been irreparably damaged. The forum vandalism is not likely to end up being a forum problem, as it has also made its way into the Discussion section.
  • It seems that the invitation link from the forum to the HF Discord expired a long time ago and has never been used again. Virtually no new people can join. I don't have a Discord account, so I don't really care.
  • There is a critical shortage of respondents on the forum, but there are far more decent and well-informed question posts than one might expect. It could be a resource if only there were enough respondents.
  • It seems to me that about half of the questions raised in the forum that seem to be error-related can be resolved with a proper search on the error content. However, this seems to be more difficult than expected for beginners. How about having a chatbot with a search function handle the initial response? It would be confusing, so it would be separate from regular submissions.
  • There are often submissions of library glitches, specifications that are not good, repo glitches, etc. Wouldn't it be easier to make the initial response if we could have a chatbot summarize these and deliver them to the appropriate parties where each person can view them at their discretion?

Spaces:

  • There seem to be a few things that are impossible to install when Spaces starts up, even with pre-requirements.txt or packages.txt, and calling subprocess directly in Python. Some VMs have an environment variable that enhances permissions only at startup, and HF seems to have one as well, but using it does not seem to yield satisfactory results.

QOL:

  • I think there are some Organizations that are supposed to be official but have no maintainers. Specifically, there is no response to commits made to the Diffusers community; I could send mentions to multimodalart or sayakpaul, but this would be a bad situation in the long run.
  • The situation that the HF staff themselves do not use the built-in community very much may be an important clue to improve the community function.
  • I don't know if the concept of being in charge of something exists on the HF staff itself, but there is no list of staff in charge of each issue or section, so it's impossible to even send a mentions, except to a few people I happen to know. I can try to trace them through Organizations, but there are too many people and too many non-staff members.
  • It would be better to have a prominent information board about what can be called official infrastructure, such as Spaces belonging to Organization and Utilities. I don't think anyone but the heavy users who have been around since the beginning would be able to figure out where everything is.
  • Propose an inspection of the official's infrastructure. To use an urban function as an analogy, tourists would be horrified to see a dilapidated train station or an unmanned government office, even though in real life everyone has found their own loopholes and is living without any problems.
  • Regarding the management of libraries, as in the case of free software in ancient times, it would be better to leave them alone if it is software or functions that programmers themselves create for themselves as users, or if it is work to support a new AI model, or simply bug fixes. However, without some guidelines apart from those, the developer usually cannot understand the needs of the users and tends to continue to make modifications that miss the mark and benefit no one. This is not limited to libraries.
    Many OSS developers don't like to be told what to do, and I don't either, but can't HF put someone or some mechanism in place to loosely direct the direction of library development? It's easy to understand the problem if you imagine an Apple without Jobs or an unwanted new feature in Windows.
  • It is good to use this kind of opinion poll to set guidelines, but it is better to use it as a help to understand the needs of many developers and consumers in the world, not to adopt the opinions of a noisy minority like me, and to look over the situation again on the HF side and devise features that would improve the situation if they existed, I think it would be easier to achieve better results. In general, it is out of the question to not listen to what customers say, but it is also not good to just take their word for it.
  • The same could be said of library development with regard to overall HF retailing.
replied to victor's post 10 days ago
view reply

I'm going to write a crappy poem because it just popped into my head. I'm not a forum troll, although I do post too much. I'm just a guy with time on his hands.
Let me say in advance that I really love the OSS community for its worldliness, its looseness, its nice people, and its pace.
But that doesn't mean I don't see the problem.

I came up with an analogy that makes the current problems with HF easy to understand for those who like video games.
In a nutshell, the current HF is like a “Dried squd (Japanese traditional food) game” or “Kusoge” or incomplete "Minecraft".
"Dried squad” is a mainly Japanese food that is hard and messy, but the more you chew it, the more the flavor seeps out and the tastier it gets. It has a certain number of lovers.

Think of all the games that have been popular over the past 40 years. They were mostly good at tutorials, level design, visuals, music, and above all, how to comfortably cripple the user. Or they were lucky to get a lot of users at the start.
It's not how free you make the game, but how you create a stress-free yet crippling situation that is important to attract consumers. Why do we need to attract consumers? Every model author wants feedback. For that, we need a population. There will be exceptions. Some people don't like the noise, and I don't either, but the absence of an audience is more of a problem.
Even in open-world games with a high degree of freedom, there is a tutorial and you are given a set of initial equipment that is weak but easy to understand. The first enemies look weak, and the battle background music is also kind of weak mood. Even when unknown enemies appear, hints are usually provided before they appear. The game is designed to keep you hooked.
There's not much you can do in stage 1 of Super Mario, right? That's actually the important thing.

The current HF is the exact opposite. It is designed to do as much as possible, and to avoid limiting use cases as much as possible. What happens as a result is that, analogous to an open world game, you are not given initial equipment and have to self-serve to find it. You are not even told what the clear conditions are. Or you need to get strategy information on an outside forum and then come here. Tutorials are either non-existent or the starting point is hard to find out how to look for them. The enemies (Spaces and Models, Datasets) you see wandering around are at first glance unrecognizable, and you can't tell if they are just mooks or demon kings. Who should you engage in combat with?
One of the worst game designs is that you can do everything but don't know what to do.
But if we reduce what we can do, the HF itself will lose its meaning.

If HF has a marketing guy (who thinks about and improves user demand and experience, not sales pitches), he maybe good to learn about the basics of game design, even if it's just on YouTube. In Japanese, the one by Sakurai, the creator of Smash Bros. is excellent.
If only people could understand that an HF UI designed to do just about anything is synonymous with not being able to do anything for an outsider.
The game balance can be adjusted later.

Doing anything is the flip side of the coin of not being able to do anything. Except for those with hackerish personalities.
Simple demos are more popular, right?

That said, give me a $20 personal Zero GPU space plan so I can build a community tool, and I can build a converter in a high performance space that doesn't have access to a GPU, but 10 spaces is too few for a permanent installation anyway, and the longer I stay at HF, the more inconvenient it becomes, How funny is that?
I don't mind if you strengthen the Enterpsise plan, but the custom of one person calling himself an organization is not in Japan, it's very uncomfortable. Is it a major practice in other countries?

Thanks.

replied to victor's post 12 days ago
replied to victor's post 12 days ago
replied to bartowski's post 12 days ago
view reply

His nuance about there being a lot of waste is not wrong, though.
The number of types that could actually be filled with zeros is limited.

Well, fp32 is only left because it's needed for training models and computing on CPUs.
I don't even hear the name "double" these days. It's mostly due to GPU's convenience.

FP8-scheme.png

posted an update 13 days ago
view post
Post
2848
@victor @not-lain There has been a sudden and unusual outbreak of spam postings on the HF Forum that seem to be aimed at relaying online videos and commenting on them. It is also spanning multiple languages for some reason. I've flagged it too, but I'm not sure if the staff will be able to keep up with the manual measures in the future.
  • 1 reply
·
replied to victor's post 15 days ago
replied to victor's post 15 days ago
view reply

https://hf.co/playground

I see an error...? "403 Need to be a member of Hugging Face"

I think it's within the last two months,
There have been a few reports of some people not being able to browse without an HF account in spaces that are supposed to be set up to not require a login, or something like that, is this the same kind of error...?

replied to m-ric's post 16 days ago
view reply

Up to 2.0, Qwen's Japanese language performance was not very good, but with 2.5 it suddenly took a leap forward.
As far as I have tested it on 7B and 14B, I think it is at a level that can compete with Nemo. Even at 3B, the vocabulary is small but the output does not break down, making it comparable to the upper tier of the current 4B class.

replied to victor's post 16 days ago
view reply

Well, if you want to make money from images, the destination is probably the same everywhere: pornography, animal images, shock images, or memes at best.
I can't tell you how many times I've seen the process of monetization of a community flooded with copies, the originals buried, and the users exhausted. As an aside, I was until recently in a community of people who hated that phenomenon. (Though the initial will to fight has long since been lost due to the unimaginable devastation of the reality. Or they simply got bored.)

That said, the nice thing about Civitai is that many services can be completed in no or low code on the Hub, as we say in HF.
Not that this is something that multimodalart and others are promoting on HF, but it would be interesting to know where HF as a whole, not just as individuals, is going in the future.

In the current official HF repo, in the GUI alone can only create Diffusers files for SD1.5. (SDXL model converter often errors out because the fp32 files are too large...)
Anyway, I think there are extremely few things that can be done only on the HF hub, not only on the image generation AI. Or has it become less and less over the years? Well, for LLM there is a Pro-specific Hugging Chat...

I think many people would cooperate if they could see the direction, whether HF itself wants to make HF just a place to put files made by each person in a local environment, as it is almost the case now, or whether it wants to progress the hub to a more convenient work place, whether GUI or CUI, or whether it wants to kick out the no-coders. If it's something other than getting rid of the non-coders, I'd be willing to help, too, if only in a small way.

replied to zhabotorabi's post 16 days ago
view reply

Sorry, I don't know much about it. I've never used Endpoint. I think this might work locally, but...
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "Tell me about AI"
prompt_template=f'''<s>[INST] {prompt} [/INST]
~~
replied to zhabotorabi's post 16 days ago
view reply

but my email has been associated with the Pro account in the settings. Where should I apply for the scaling, or is there something else I need to do?

The gated model is not directly related to HF's Pro subscription or other paid services, it's a restriction that is determined individually by the author or company of the model, so only the author of the model knows.
https://huggingface.co/docs/hub/models-gated

I had requested access previously but can no longer use it for free.

Yeah... how is it possible to stop using it halfway through...
Anyway, so you want to use it from transformers. I'm sure you can use these GPTQ quantized files, GGUF support is still incomplete, but GPTQ and BNB should work.
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ

replied to zhabotorabi's post 17 days ago
view reply

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
It's a gated model, so it could be free for use from your own program if you provide an email address, which is not possible in Pro if you don't provide one...

If it can be used for HuggingChat or something else, you would be correct, as I think it was for Pro only.
https://huggingface.co/chat/

If you want to use a GGUFed version from the Pro-only Zero GPU space, you can use the space below or duplicate and modify it for your own use.
https://huggingface.co/spaces/CaioXapelaum/GGUF-Playground
https://huggingface.co/spaces/John6666/text2tag-llm

replied to nroggendorff's post 18 days ago
view reply

Are you okay? Are you sure you haven't been hacked?😰

replied to inflatebot's post 19 days ago
view reply

I don't know what it is, but this model knows a lot about Japanese anime.

replied to victor's post 19 days ago
view reply

All you want is to get out of the way anything you don't like?
I hate that kind of approach, so I'm sure there will be a parallel wherever it goes.
Policemen or something?

It's pathetic that the two of us are fighting here, so why don't I, as a coder, create a space where I can load them and use them in the future?
I can make them, except for the Flux save converter. That one has a slight accuracy bottleneck in the calculation process.

Also, I apologize for the lack of explanation, but ComfyUI is superior to WebUI and Diffusers for complex modifications of models, even unrelated porn, especially when doing special modifications on unknown models. I don't use it, though. My GPU is too weak.

In the meantime, I just repaired this space. (not yet merged) Now GGUF should no longer be debris in HF. Is this kind of approach wrong for a coder?
https://huggingface.co/spaces/CaioXapelaum/GGUF-Playground

replied to victor's post 20 days ago
view reply

Was there some unavoidable reason why it was there in the past and then disappeared?
If it is held again, I would like to participate, even though I can only do duct-tape-like coding.
Also, someone in the Post said he was bored, so I'm sure such people would be happy with the event.

replied to victor's post 20 days ago
view reply

https://huggingface.co/docs/diffusers/v0.30.2/api/loaders/single_file
I disagree, because with very few exceptions (broken files, single unets cut out, etc.), ComfyUI and WebUI files are available from Diffusers, and in fact many people have put them there for that purpose.

Even coders don't use only ComfyUI or only Diffusers. Anyone who is concerned about only such things is just a madman. It is similar to the mouse-keyboard controversy or the GUI/CUI controversy, maybe a little different from the Mac, Windows controversy.
Incidentally, coders can convert and use these single files, which are a small exception, and there are some who do so.
https://huggingface.co/spaces/nyanko7/flux1-dev-nf4
https://github.com/huggingface/diffusers/issues/9165#issue-2462431761
However, I would like to see a solution to the problem of the files being scattered around and difficult to find, as well as a server-side verification of whether or not they work with a single shot from_pretrained.
Now that the Inference API has become virtually irrelevant to individual users, I would like to see the benefits of using the HF file format for more than just researchers and corporate (Endpoint API) users.
If it works, it would be even better if the server could generate some safe sample images.

replied to alielfilali01's post 21 days ago
view reply

I often encounter them around noon. It happened just a little while ago.

replied to kingabzpro's post 21 days ago
view reply

The prize money is important, but perhaps it is more important that we are given several themes to work on together.
Sometimes it works out better if everyone looks for a theme in a disparate and self-explanatory way, and sometimes it doesn't. I have yet to see a pattern in HF when a theme is given.

replied to kingabzpro's post 22 days ago
view reply

Try repry to this post. Would it be faster to mentions it to @ victor?
https://huggingface.co/posts/victor/964839563451127

I have only known HF for the last 6 months or so, but not all of it, although in some places it looks like ruins or is broken and blowing smoke.
Something must be missing. It would be great if more people who know about the past would speak up.

replied to nyuuzyou's post 22 days ago
view reply

I'm not a data analyst, let alone a researcher, but I have the data I want.

  • Danbooru tag / plain English corresponding table
  • E621 tag / plain English corresponding table

This would probably make a more lightweight and precise tagger for creating image-generating AI models that interpret natural language in English. Whether I myself would build it or not is beside the point.
With the advent of FLUX, the use of natural language has become practically recommended, so I think there is a demand for some people.

Danbooru dataset

https://huggingface.co/datasets/isek-ai/danbooru-wiki-2024

Danbooru / JA

https://huggingface.co/datasets/p1atdev/danbooru-ja-tag-pair-20240715

E621 dataset

?

replied to victor's post 22 days ago
view reply

QoL:

  • I would like to see an X(Twitter)-like translation function. Basically, Forum, Post, and Discussion are assumed, but it would be better if it could be applied to README.md, HF UI in general, Spaces, etc. Translation, especially of the UI, need not be very precise. The actual processing can be done by HF ourselves or by a link using the services of an external site. Since we can use markdown, it would be good to quote and fold the original text to avoid fatal misunderstandings.
    I don't know about people in other countries, but Japanese people have a terrible allergy to foreign language sites. They tend to run away when they see a foreign language only site.
    It would be nice to install a browser extension such as Google Translate, but in Japan, smartphones are more mainstream than PCs, which lack a wide range of extensions. Furthermore, there is almost no tendency to dare to read foreign-language sites, except for programmers, scholars, stockbrokers, pornographic videos, and piracy. There are probably other countries like that, and I think it is a lost opportunity.
    The prerequisite for this feature is the setting of the main native language. If public and private settings can be set, shy people will be at ease. If you don't know what language to translate to, there is nothing you can do. You can use the browser's code page, but it may not be accurate.
  • I heard that some HF spaces are experiencing a phenomenon where you have to log in to see them, even though they are not NSFW. I have never encountered this problem myself. I don't know if it is a bug or a specification.
  • I've written before, I would like to see a permanent space for soliciting opinions. It would be nice to have one both open and closed, and to explicitly allow submissions in the native language to make it easier to submit. One-way communication would be no problem with translation. The challenge is to prevent pranks. It would be easier to use if there were simple categories such as bug reports, complaints, consultations, etc.

Proprietary format:

  • HF is generally very easy to use as long as the HF-specific formats (README.md, config.json, and unquantized safetensors files) are available. If not, it is generally terrible.
    Transformers' handling of GGUF without config.json was buggy and nearly unusable when I was messing with it just now, and Diffusers seems to be too busy to deal with it in the first place.
    This would not be a problem if HF was intentionally using its own format for its enclosure strategy, but I suspect that they either don't have the capacity to deal with it or are simply unaware of the rapid changes in the surrounding environment.

Serverless Inference API:

  • Come to think of it, why can't we use some of the features that are normally supported by Diffusers save_pretrained, such as StableCascade and the officially adopted community pipeline? I used to think it was due to server load, but I don't see how it could be heavier than Flux.
  • If it is not too difficult in terms of specifications, it would be useful to be able to specify specific files in the repo in the YAML of the README.md, to specify LoRA strength other than 1.0, and to have more licenses that can be specified. It's about time that HF supports FairAI and Flux dev licenses as standard.
    In addition, it would be better if the editing function of README.md could be expanded.
  • It would be useful to be able to specify the scheduler and sampler for image generation, VAE, embedding, chat template for text generation, and various parameters for VLM. (I know it's easy to do locally, but trying to do it server-side is a lot of work, but it should be useful.)
  • I would like to do runtime overrides of the parameters specified in README.md. For example, if I can override base model parameters, it will make it easier to use LoRA.
  • If detailed metadata is not written when generating images, it is recommended that it be written, as it is done in the Animagine space, for example.
replied to Shreyas094's post 24 days ago
view reply

Hmm. I'm glad you found it helpful. These spaces are usually mit or Apache 2.0 licensed, so you could copy and paste.

The open source LLMs for coding are getting stronger every day, and the 70B class seems to be starting to get some pretty good ones here and there.
I'm just looking at it sideways, though.

replied to Shreyas094's post 24 days ago
view reply

I haven't tried letting an AI write source code yet, but I wonder if we could, for example, let an AI read the source code and explain it to us?
I thought it would be quickest if we could refer to the relevant parts of similar applications in the space below.
https://huggingface.co/spaces?sort=trending&search=pdf

replied to Shreyas094's post 24 days ago
view reply

Hi.
PDFs...I've never dealt with them in AI.
I'd better call someone who knows more about it.
If you simply wait, someone might be able to tell you, but a mention with @ specific_username (actually, remove space) would be a sure thing.
If you search Spaces for pdf, there are quite a few LLM spaces that interpret pdfs on their own, so that might be helpful. Also, I'll leave you with a recent Forum post.
https://discuss.huggingface.co/t/generate-dataset-for-fine-tuning-on-pdf-s/104902/
https://discuss.huggingface.co/t/fine-tune-llms-on-pdf-documents/71374

replied to Shreyas094's post 24 days ago
view reply

I think changing this would change the search results somewhat, but there don't seem to be too many options to choose from.
I can give you some advice if I know how you want to enhance it.

https://huggingface.co/spaces/Shreyas094/SearchGPT/blob/main/app.py

def get_web_search_results(query: str, max_results: int = 10) -> List[Dict[str, str]]:
    try:
        results = list(DDGS().text(query, max_results=max_results))

https://pypi.org/project/duckduckgo-search/#2-text---text-search-by-duckduckgocom

replied to nroggendorff's post 25 days ago
view reply

There's a flag like that...you got auto-banned by mistake.🤢

replied to nroggendorff's post 25 days ago
view reply

It's a mystery, I don't think nroggendorff is a bad user for HF. Maybe he was hacked or locked out of some kind of authentication?

replied to bartowski's post 26 days ago
view reply

If I dare to avoid talking about technicalities, here's what I would say.
If inexhaustible hardware resources are available free of charge when training models, it would be better to train models with 64 or 128 bits rather than 32 bits for higher accuracy.

However, if you just use the finished product, 50% of people will not be bothered by a very ingenious 4-bit, 90% of people will not be bothered by an ingenious 8-bit, just as much by a mere 16-bit, and bfloat16 (simply put, super 16-bit for GeForce) is almost as good as 32-bit, and nobody will complain.

Above all, the model load speed is twice as fast as the shrunken model load speed! Many people would be happier to be able to generate contents with a lot of models quickly, even if the accuracy is slightly reduced.

Even when training LoRA, it is said in some circles that accuracy is not so necessary; it is easier on the planet and everyone's wallet to assume that 32 bits and above are just a mode dedicated to training the model itself.

The only problem with HF switching is that the people in charge of the Inference API and Diffusers may be overworked.😎

replied to victor's post 26 days ago
view reply

Miscellaneous:

I have been at HF for a few months now, and I am writing about what I have noticed and what I was having trouble with or feeled weird.

  • Many spaces cannot be updated due to library compatibility issues, either because Gradio frequently truncates backward compatibility in version upgrades and crashes with syntax errors, or because the libraries required for the space are extremely old. (even 2020)
    • As for Gradio, simply ignoring version-derived syntax errors in further new versions would greatly improve the situation. I don't think it is necessary to be compatible with all past features, but right now it is a bit too much. https://github.com/gradio-app/gradio/issues/6339
    • This is not a job that HF should normally do, but since there are some libraries that have stopped being updated even though they are often used in the space, it would be more progressive if HF staff went around fixing the even out-of-date dependencies part alone or fork them to HF dedicated. Many of the libraries are particularly outdated, especially those related to sound and video. Text and images have transformers and diffusers, so these problems rarely occur.
  • There are countless entrances in the Diffusers file format, but only practically three python scripts for exits, and the HF-trained model can't get off the island.
  • In many cases, even if there is a script, the GUI space is not created and it goes unnoticed. (Not only text-to-image related) Or it does not work in the CPU space due to lack of performance. For SD1.5 it is fast enough, and SDXL is manageable. But what about the future?
  • In general, HF seems to have a high hurdle for people who can't write code, and few of the types of people who think it is a problem.
  • HF in general is easy to use once you know it, but if you don't know where it is, who is there, and what it can be used for, you really don't know what to do.
  • The company building is large and has an unexpected variety of things to explore, but there is no receptionist and no information desk, making it worthwhile to explore.
  • It is in my nature, but in any case, everyone does what they want to do by themselves at their own pace, so they are generally indifferent about contact with the outside world, and sharing of know-how is not progressing. As a result, HF relies on outside communities probably more than it should. Just a little bit of introduction elsewhere can spike the number of downloads. Usually, this means that there is no way to find them.
  • HF appears to be somewhat command-less. No, I think it would be depressing if there were...
  • I enjoy communicating in English and it is unrelated to the Tower of Babel, but shouldn't there be an optional field for each person to write about the languages they speak?
  • Can we have an alias in HF for mentions so that if you send a mentions, it will reach all staff and someone will respond? The only biggest problem is the trolling issue...
  • Well, I'm having fun.
replied to their post 29 days ago
view reply

Zero GPU issue Fixed!

https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/104#66db15e136aa505569dd6d0d

a major issue on ZeroGPU has just been fixed.
The issue could make Spaces stuck forever (thus leading to "GPU task aborted"), as well as prevent Gradio modals and progress to display (which seems to be linked to your original message)

To benefit from the fix, all you need to do is push a change to your Space in order to trigger an update of the spaces package (note that "Restart" or "Factory rebuild" won't trigger the update)

replied to their post 29 days ago
view reply

Fixed!😸

https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/104#66db15e136aa505569dd6d0d

a major issue on ZeroGPU has just been fixed.
The issue could make Spaces stuck forever (thus leading to "GPU task aborted"), as well as prevent Gradio modals and progress to display (which seems to be linked to your original message)

To benefit from the fix, all you need to do is push a change to your Space in order to trigger an update of the spaces package (note that "Restart" or "Factory rebuild" won't trigger the update)

replied to Shamurangaiah's post 29 days ago
view reply

https://discuss.huggingface.co/t/space-evicted-storage-limit-exceeded/36876/6
Isn't that impossible in Spaces?
Or maybe accelerate or something can handle it?
I've never tried to use a model that big, so I have no idea.😓
https://huggingface.co/docs/accelerate/usage_guides/big_modeling

No, you're on a Pro subscription, so can you make good use of your Zero GPU space?
If it's CPU space you're using now, switch to Zero GPU space anyway. You'll probably get an error, but I can show you how to fix it.

replied to Shamurangaiah's post 29 days ago
view reply

Ah, I guess the 70B model was too big.
You don't have to download it if you just want to infer.
Such as,

from huggingface_hub import InferenceClient
hf_token = "hf_****"
client = InferenceClient("meta-llama/Meta-Llama-3.1-70B-Instruct", api_key=hf_token)

system_message = "You are a helpful assistant. Try your best to give the best response possible to the user."
user_message = "test"
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message}
]

response = client.chat_completion(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    max_tokens=1024,
    temperature=0.7,
    top_p=0.95,
    messages=messages,
)
replied to Shamurangaiah's post 29 days ago
view reply

How about this?

hf_token = "hf_***"

from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-70B", revision="main", token=hf_token)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-70B", config=config, token=hf_token)

And
https://github.com/meta-llama/llama3/issues/299

Check this:
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/15
You need to upgrade transformers. I solved it with transformers==4.43.1

replied to their post 29 days ago
view reply

I did it as soon as possible. I was able to work around it successfully, but somehow it seems like the problem was already fixed on the server side without having to do so.
The 4.x space is fixed, but the 3.x space still needs to be verified. I guess you are right, it is a separate issue.

By the way, is there any plan to update future versions of Gradio 4.x so that it does not error out on obsolete parameters when migrating from 3.x, but simply ignores the depricated parameters with a warning?
Doing so with critical parameters is a source of bugs, but I see too many non-programmers who are stuck with 3.x and unable to migrate to 4.x because they get stuck with .Styles, etc.
Often the original maintainers have left.
I don't really care if I use it myself...

replied to bartowski's post 29 days ago
view reply

This research means that if we set the inference in the HF server to load torch.bfloat16 or torch.float16 instead of torch.float32 by default, we would get more than 99-% of the same results with half the VRAM...
Not only LLMs, but also StableDiffusion systems have widespread know-how that fp16 has no accuracy problems except when training extremely large models (such as Kohaku and Animagine).

When you drop down to torch.float8_e4m3fn, you may notice the difference, but in fact, it is not much of a problem for animated pictures, so some people seem to use it.
If it is a quantized 8-bit model, it is difficult to distinguish the output from fp16. Especially if it is GGUF.
The slightly more complex bitsandbytes NF4 quantization also seems to produce very good output for 4 bits. I honestly think NF4 is better than torch.float8_e4m3fn in outputs.

Currently fp32, fp16, and bf16 work fine serverless (if turned on) as long as the filename to be uploaded is as usual, but fp8 and quantized files are not possible.
As all model sizes will become more and more huge, HF should make bf16 or fp16 the default on VRAM (is it already?). I think it would be better to support serverless use of quantized files. In many cases, VRAM capacity is more valuable than a little latency.
Above all, it would be easier for me to upload.😎

P.S.
Well, I guess fp32 can produce more accurate output with no rounding errors when computing. In fact, few people can tell the difference between fp16 computing and fp32 computing on images; how about on an LLMs?
Anyway, fp32 is like 320kbps for mp3. For 99% of people, 128kbps is fine.

replied to their post 29 days ago
view reply

Setting fastapi==0.112.2 in requirements has helped me as a temporary fix.

Nice workaroud!

replied to their post 29 days ago
view reply

https://huggingface.co/spaces/Yntec/ToyWorld
The spaces that have not experienced a reboot are still running, he said. But there are spaces that are not working.

But if it's a issue, it's a relief. 😀
Because that means someone will probably fix it and it will be fixed someday.
And unless the hardware is broken, it's probably just a misconfiguration.

replied to Shamurangaiah's post 29 days ago
replied to their post 29 days ago
view reply

You are right and you are wrong.😎
I was going to write here now, but the 4.x space (and maybe all the infreferences) has been wiped out since this morning!😭
At the time of my initial report, 4.x was working.
https://discuss.huggingface.co/t/huggingface-space-failed-after-working-initially/105514/
https://discuss.huggingface.co/t/run-time-error-for-huggingface-space/15090/
https://discuss.huggingface.co/t/new-gradio-space-connection-errored-out/105509/

Well, this isn't about Gradio anymore, it's not even about GUI or even virtual machines. It's a server problem!

P.S.
@victor The core function has stopped. When the same problem is reported simultaneously in the forum, it is often the case that it is happening in the whole HF.

P.S.
Spaces that are not using InferenceClient or requests are still working properly.
If I just download it and run it on the VM, it seems to work fine.

replied to Shamurangaiah's post 29 days ago
view reply

Maybe. It's faster if you send a mentions. Like @ victor. Please remove the spaces after @. And I don't know if it's a good idea to call only victor, but I don't know who's in charge of what...

replied to Shamurangaiah's post 29 days ago
view reply

https://huggingface.co/settings/tokens
First, create one read token and add it to the Secrets of your space as HF_TOKEN. If you add tokens to an environment variable or write it directly in your code, it will be exposed to the whole world.
Any name for the HF_TOKEN part.

import os
hf_token = os.getenv("HF_TOKEN")

from transformers import AutoConfig
config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-70B", revision="main", token=hf_token)
config.rope_scaling = {"type": "llama3", "factor": 8.0}
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-70B", config=config, use_auth_token=True, token=hf_token)

I'm pretty sure this model was only for people on a Pro subscription, but if you're posting, you must be on a subscription, so I'm sure you'll be fine.

replied to their post 30 days ago
view reply

I am reporting one thing I noticed while reading the source code.

When performing inference, Gradio 3.x series seem to use python's requests library for requests to the server, while 4.x series seem to use HF's InferenceClient.
The 3.x series used JSON to communicate with the server.
Is there a clue to the cause in this or in the external.py of Gradio3.x?

P.S.
I modified it myself and tried the same access method as 4.x, but from 3.x it doesn't succeed...is it being repelled by UA...?
The pipeline of Transformers as well as Diffusers doesn't seem to work properly.😇

replied to fptisthebest's post about 1 month ago
view reply

Other than the HF staff, probably no one knows for sure... I am also interested.
I have a Pro subscription and so far have never incurred any additional fees.
I don't think there is any sort of limit on the total number of times, but there are often cases where a lot of requests in a short period of time will result in a limit (or Quota).

Pro subscription subscribers have different positions at different times, but in your case I think we are talking about APIs.

  1. As a Space Publisher
  2. AS a Space User
  3. As an API User

In your case, I think I might be caught in limit by short period requesting, but I don't know for sure.
I use only Spaces usually (often use APIs in the space), so my information maybe inaccurate.

Incidentally, as a Spaces publisher, $10/month is too cheap, but as a mere Space user of generative AI, the benefit is so far almost nonexistent. Quota is tight. I'm not saying we should be like PixAI, but I wonder if we can do something more...
I don't dislike the free-user-friendly way of doing things, and I think it's one of the most important thing for the development community. It's a little bit...how can I say it...it's not mundane, it's not worldly.

Spec of Zero GPU Space

https://discuss.huggingface.co/t/does-a-pro-subscription-add-memory-to-hf-spaces/103927/3

replied to their post about 1 month ago
view reply

Thanks for the reply. Good. So it's a bug. 😅 (Not a spec change)

posted an update about 1 month ago
view post
Post
3622
@victor Sorry for the repetitiveness.

I'm not sure if Post is the right place to report such an error, but it seems to be a server error unrelated to the Zero GPU space error the other day, so I don't know where else to report it.

Since this morning, I have been getting a strange error when running inference from space in Gradio 3.x.
Yntec (https://huggingface.co/Yntec) discovered it, but he is not in the Pro subscription, so I am reporting it on behalf of him.

The error message is as follows: 1girl and other prompts will show cached output, so experiment with unusual prompts.

Thank you in advance.

John6666/blitz_diffusion_error
John6666/GPU-stresser-t2i-error
ValueError: Could not complete request to HuggingFace API, Status Code: 500, Error: unknown error, Warnings: ['CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 14.75 GiB total capacity; 1.90 GiB already allocated; 3.06 MiB free; 1.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF', 'There was an inference error: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 14.75 GiB total capacity; 1.90 GiB already allocated; 3.06 MiB free; 1.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF']

·
replied to victor's post about 1 month ago