Stable diffusion rocm vs directml reddit

Stable diffusion rocm vs directml reddit. The first GPU with truly useful ML acceleration (for ML training) is V100, which implements fp16 computation + fp32 accumulate with its HMMA instruction. ControlNet works, all tensor cores from CivitAI work, all LORAs work, it even connects just fine to Photoshop. I've tried your docker image but when PLMS sampling starts, the temperature of GPU goes from ~42°C to over 100°C (reaching 115°C) in just few seconds, which leads to shutdown of my system ("kernel: amdgpu 0000:2d:00. Yet a couple of weeks ago I decided to really try to use it for stable diffusion. This release allows accelerated machine learning training for PyTorch on any DirectX12 GPU and WSL, unlocking new potential in computing with mixed reality. I run A111 or SD Next on Linux these days because of better ROCm support. Creating model from config: E:\stable-diffusion-webui-directml\repositories\generative-models\configs\inference\sd_xl_refiner. hodak2. In the meantime, there is a workaround. Stable Diffusion 1. Stable Diffusion models can run on AMD GPUs as long as ROCm and its compatible packages are properly installed. But the bottom line is correct, currently Linux is the way for AMD SD until PyTorch makes use of ROCm on Windows. I have an RX 6800. SD Next on Win however also somehow does not use the GPU when forcing ROCm with CML argument (--use-rocm) Add --use-DirectML to Great stuff, should really let the new Navi31 GPUs flex their AI accelerators and VRAM. 0 I installed everything using the AMD Quick-start and ComfyUI guides on Ubuntu 22 without any issues. python main. The best I am able to get is 512 x 512 before getting out of memory errors. r/Documentaries. Los Angeles Lakers. I know the 4070 is faster in image generation and in general a better option for Stable Diffusion, but now with SDXL, lora / model / embedding creation and also several movie options like mov2mov and animatediff and others, it made me doubt. CPU mode is more compatible with the libraries and easier to make it work. If you must use Windows, you can either use SHARK or Auto1111 Hello. 97 comments. Hey everyone! I'm happy to announce the release of InvokeAI 2. Then you get around 15-17it/s depending on ROCm version. FYI. 8it/s on windows with ONNX) No token limit for prompts (original stable diffusion lets you use up to 75 tokens) DeepDanbooru integration, creates danbooru style tags for anime prompts xformers , major speed increase for select cards: (add --xformers to commandline args) I also have a RX6750. Sep 12, 2022 · Run Stable Diffusion on AMD GPUs. A safe test could be activating WSL and running a stable diffusion docker image to see if you see any small bump between the windows environment and the wsl side. Unplug all power cables and plug in again just in case. You'll learn a LOT about how computers work by trying to wrangle linux, and it's a super great journey to go down. 5 are in line. no I freshly installed ubuntu in dualboot mode. For things not working with ONNX, you probably answered your question in this post actually: you're on Windows 8. If you installed Automatic1111 from the main branch, delete it. 5 silently enables 7xxx cards but they are not supported officially. Hi. Unfortunately, it is not so easy to define why it happens and you need to investigate it step by step. Last time I checked in order to get 7900 XTX working I still need to compile pytorch manually (it was ROCm 5. Neat and all, but again it seems like they are only doing it for 7xxxx series cards. Stable Diffusion is a bigger priority for me. Largely depends on practical performance (the previous DirectML iterations were slow as shit no matter the hardware; like, better than using a CPU, but not by that much) and actual compatibility (supporting Pytorch is good, but does it support all of pytorch or will it break half the time like the other times AMD DirectML/OpenCL has been "supporting" something and just weren't compatible, and it/s has no inherent universal meaning because your iteration speed depends on a huge number of factors including your model, base resolution, etc. well. In the case of Stable Diffusion with the Olive pipeline, AMD has released driver support for a metacommand implementation intended to Then I installed stable-diffusion-webui (Archlinux). 1 so I'm not sure if you would need to change that or not. amenotef. Apr 21, 2021 · We found that DirectML from 1. 5, but the updated github versions do also have a separate SDXL script. Pretty similar steps to yours, minus a few missing utilities I had to install not mentioned by either (probably because I selected minimal install for Ubuntu). Stable Diffusion raccomand a GPU with 16Gb of Each time you want to start Stable Diffusion, you’ll enter these commands (adjusted to what works for you): cd stable-diffusion-webui python -m venv venv source venv/bin/activate python launch. Any help would be appreciated. 0: amdgpu: ERROR: GPU over temperature range (SW CTF) detected!"). I'm using rocm 5. The --force-reinstall at the end is needed to force the rocm version to install over the original pytorch that automatic is going to install. Actual news PyTorch coming out of nightly which happened with 5. py --directml. But when I used it back under Windows (10 Pro), A1111 ran perfectly fine. Neat, but IMHO one of the chief historical problems. HW support -- auto1111 only support CUDA, ROCm, M1, and CPU by default. 13. r/StableDiffusion. 3 to 1. I don't have much experience, but first I tried with DirectML in Windows 11 and it was running very slow. I have two SD builds running on Windows 10 with a 9th Gen Intel Core I5, 32GB RAM, AMD RTX 580 with 8GB of VRAM. People saying Shark SD is fast for AMD gpus, but I could not run it, 9 out of 10 times it May 9, 2023 · The torch-directml package supports only PyTorch 1. It's not ROCM news as such but an overlapping circle of interest - plenty of ppl use ROCM on Linux for speed for Stable Diffusion (ie not cabbage nailed to the floor speeds on Windows with DirectML). Might have to do some additional things to actually get DirectML going (it's not part of Windows by default until a certain point in Windows 10). When comparing the 7900 XTX to the 4080, AMDs high end graphics card has like 10% of the performance of the Nvidia equivalent when using DirectML. Had to install python3. 10 from AUR to get it working and all rocm packages I could find. 5 it/s on Windows with DirectML, and around 17-18it/s on Linux with Auto1111 and ~20 it/s in Comfy. Otherwise, I have downloaded and began learning Linux this past week, and messing around with Python getting Stable Diffusion Shark Nod AI going has They were supported since Rocm 5. Here is an example python code for stable diffusion pipeline using huggingface diffusers. Or check it out in the app stores. Share. Ideally, they'd release images bundled with some of the most popular FLOSS ML tools ready to use and the latest stable ROCm version. Stable diffusion runs like a pig that's been shot multiple times and is still trying to zig zag its way out of the line of fire It refuses to even touch the gpu other than 1gb of its ram. If you’re up for the challenge though, I hear AMD GPU’s perform much better on Linux. from_pretrained ( ". ComfyUI only supports AMD GPU’s on Linux, so the process for installing it is a lot more involved. First, install the PyTorch dependencies by running the following commands: conda install numpy pandas tensorboard matplotlib tqdm pyyaml -y. You can give pytorch w/ rocm a try if you're under one of the ROCm-supported Linux distro like Ubuntu. Nov 30, 2023 · Now we are happy to share that with ‘Automatic1111 DirectML extension’ preview from Microsoft, you can run Stable Diffusion 1. Then I started the webui with export HSA_OVERRIDE_GFX_VERSION=9. But that's simply not enough to conquer the market and gain trust. 5 also works with Torch 2. py --precision full --no-half --skip-torch-cuda-test Used 80% ram with nothing running Used simply konsle, CD'd into it's SD folder, and installed Stable Diffusion doesn't work with my RX 7800 XT, I get the "RuntimeError: Torch is not able to use GPU" when I launch webui. Call of Duty: Warzone. And if you get hooked on generating stuff with SD and don't want to wait for stable ROCm support for Windows consider installing Linux on a second drive as dual boot. 0; . One inference of stable diffusion under default settings takes over 30s. 6 to get it to work. Sep 8, 2023 · Here is how to generate Microsoft Olive optimized stable diffusion model and run it using Automatic1111 WebUI: Open Anaconda/Miniconda Terminal. Directml fork is your best bet with windows and a1111. While the GP10x GPUs actually do have IDP4A and IDP2A instructions for inference, using int8/int4 for stable diffusion would require model changes. I used Garuda myself. I really don't want to break my rocm so I'm not updating it. call webui --use-directml --reinstall. g. Tutorial - Guide. I've been using an 7900XTX using directml on Windows and rocm 5. This might be the reason AI start-ups are starting to buy up consumer GPUs again just like with the last Crypto wave. So, I've been keeping an eye one the progress for ROCm 5. NSFW_CRM. So, to people who also use only-APU for SD: Did you also encounter this strange behaviour, that SD will hog alot of RAM from your system? May 23, 2023 · Stable Diffusion models with different checkpoints and/or weights but the same architecture and layers as these models will work well with Olive. 5 for their GPUs on Windows ( [How-To] Running Optimized Automatic1111 Stable Di - AMD Community ), but it only pulls fron SD1. Stable Diffusion Txt 2 Img on AMD GPUs Here is an example python code for the Onnx Stable Diffusion Pipeline using huggingface diffusers. DirectML Benchmarks on Windows 11 with AMD Radeon 6650XT. The code snippets used in this blog were tested with ROCm 5. Would anyone know why running a prompt for the first time on comfyui takes a long time on rocm but is normal on directml? Locked post. 10. It's got all the bells and whistles preinstalled and comes mostly configured. If 512x512 is true then even my ancient rx480 can almost render at ROCm does not guarantee backward or forward compatibility which means it's very hard to make code that would run on all current and future hardware without having to maintain it, and AMD often drops support for older hardware (and sometimes that hardware isn't even that old) completely from their ROCm releases, and it's also the reason why it So I decided to document my process of going from a fresh install of Ubuntu 20. Marked as NSFW cuz I talk about bj's and such. bat later. [00:48:02] youtube. No graphic card, only an APU. 0, meaning you can use SDP attention and don't have to envy Nvidia users for xformers anymore for example. 8it/s on windows with SHARK,8. 5, 2. long. I tried it with just the --medvram argument. 7 denoise and then generate the image, it will just generate the image with its base So that person compared SHARK to the ONNX/DirectML implementation with is extremely slow compared to the ROCm one on Linux. While it's true that it runs way, way faster, most of the models I used to work with using basic Automatic1111 send me a variety of errors or just straight up 'run out of memory' (I'm using a 10gb RX 6700). In any case, I used an AUR helper, paru, to build python-torchvision-rocm. Setting the environment variable TF_DIRECTML_KERNEL_CACHE_SIZE above the default 1024 (1300 works for my case) should prevent the bug. 5 and yes it includes 7900xt. Once complete, you are ready to start using Stable Diffusion" I've done this and it seems to have validated the credentials. tried it on my 7900xt much slower than the normal setup under windows. I used my AMD 6800XT with auto1111 in windows. 04 with AMD rx6750xt GPU by following these two guides: Please note that you'll need 15-50GiB of space on your Linux partition. 5 on Linux for ~2 months now (using the leaked rc before the official 5. 6. No, AMD is the only one responsible for ROCm. After about 2 months of being a SD DirectML power user and an active person in the discussions here I finally made my mind to compile the knowledge I've gathered after all that time. New comments cannot be posted. See here for a Python sample Also, it might be a problem with the wired connections (make double-check your powerlines). You need this fork in order to properly use your AMD GPU on Windows. A few months back, there was no ROCm support for RDNA3 yet, so I just up and ordered a second 13700K with a RTX 4090. ROCm is a real beast that pulls in all sort of dependencies. This only runs on linux. I believe some RDNA3 optimizations, specifically Install and run with:. Step by step instructions are available on the main May 2, 2023 · Someone got Stable diffusion working a couple of weeks ago on ROCm 5. Create a Folder to Store Stable Diffusion Related Files. Enter the following commands in the terminal, followed by the enter key, to install Automatic1111 WebUI. (tryed numerous things to fix it, still doesnt work) Stable Diffusion on AMD Radeon RX 6900 XT. What were your settings because if its 512x512 example image it's suspiciously slow and could hint at wrong/missing launch arguments. 3 working with Automatic1111 on actual Ubuntu 22. Vlad still releases directly to main with some branches for feature work. Intel's Arc GPUs all worked well doing 6x4, except the Stable diffusion does not run too shabby in the first place so personally Ive not tried this however so as to maintain overall compatibility with all available Stable Diffusion rendering packages and extensions. Create a new folder named "Stable Diffusion" and open it. Optimizing custom models with Olive? So recently AMD released a guide showing how to optimize Stable Diffusion 1. /stable_diffusion_onnx", provider="DmlExecutionProvider" ) prompt = "a photo of an astronaut riding a horse on mars ROCM team had the good idea to release Ubuntu image with the whole SDK & runtime pre-installed. And what would actually help us much more is releasing ROCm for Windows rather than Pytorch DirectML is 20-50% slower I think. If you are willing to try Linux, I have ROCm 5. Microsoft has optimized DirectML to accelerate transformer and diffusion models, used in Stable Diffusion, so that they run even better across the Windows hardware ecosystem. or both. I have ROCm 5. An AMD gpu will be tough because most AI support is for Nvidia cuda, but there is ROCM and DirectML for AMD cards. First proper stable diffusion generation on a steam deck. Install an arch linux distro. The request to add the “—use-directml” argument is in the instructions but easily missed. Rocm is the best performer in term of speed. 04. Dec 15, 2023 · AMD's RX 7000-series GPUs all liked 3x8 batches, while the RX 6000-series did best with 6x4 on Navi 21, 8x3 on Navi 22, and 12x2 on Navi 23. The WebUI here doesn't look nearly as full featured as what you'd get with Automatic1111 + Nvidia, but it should be good enough for casual users. In short, no there is no workaround; to get decent speed you need to use DirectML or ROCm/Linux (only certain cards) and both are extremely feature limited. ROCm still perform way better than the SHARK implementation (I have a 6800XT and I get 3. I have tried multiple options for getting SD to run on Windows 11 and use my AMD graphics card with no success. There is also a ROCm extension GPU acceleration using OpenCL made by AMD. Rocm on Linux is very viable BTW, for stable diffusion, and any LLM chat models today if you want to experiment with booting into linux. So I’ve tried out the Ishqqytiger DirectML version of Stable Diffusion and it works just fine. 5 release). By following the step-by-step instructions and exploring various parameters and optimizations, users can unlock the full potential of stable diffusion for generating high-resolution images. . 5 with base Automatic1111 with similar upside across AMD GPUs mentioned in our previous post. 6. Previously on my nvidia gpu, it worked flawlessly. I had to install an Ubuntu LTS to have rocm support (and a feeling of slow reactions of the GUI). Either support the whole line of cards you are still supporting or drop the farce of long term support. conda create --name Automatic1111_olive python=3. yaml And then follows it with a ton of size mismatches. ago • Edited 7 mo. Not sure whether the set up experience has improved or not with ROCm 5. It worked after a few hours of trials. 1), and I got around 16it/s. And in case anyone is interested, thought I'd link the recently released SD UI for DirectML. I've tried running the thing with the fan manually 120 upvotes · 9 comments. 5. It's why AMD users on windows are so much more VRAM limited than Nvidia/ROCm users. Now that ROCm seems to work, I can also try Invoke-AI, a SD toolkit. 5 launched today so it should be rebuild using final code soon. Check out tomorrow’s Build Breakout Session to see Stable Diffusion in action: Deliver AI-powered experiences across cloud and edge, with Windows. The 4070 has 12 and 7800 has 16. Nov 15, 2020 · It may sound unlikely, but in the past applications have relied on bugs in Windows components that prevented the component from fixing those bugs (see point 1 above). it will only use maybe 2 CPU cores total and then it will max out my regular ram for brief moments doing 1-4 batch 1024x1024 txt2img takes almost 3 hours. Looks like ROCm support is getting better and better allowing for blazing fast image generation on Radeon GPUs, provided they have good amounts of VRAM. But after this, I'm not able to figure out to get started. Watch Dogs: Legion. Mostly because of the vram. Details in comments. The latest release of Torch-DirectML follows a plugin model, meaning you have two packages to install. from diffusers import StableDiffusionOnnxPipeline pipe = StableDiffusionOnnxPipeline. 5GB of VRAM to generate a 512x768 image (and less for smaller images), and is compatible with b) for your GPU you should get NVIDIA cards to save yourself a LOT of headache, AMD's ROCm is not matured, and is unsupported on windows. Probably some rdna3 optimizations, but then it's an unfair comparison. Fig 1: up to 12X faster Inference on AMD Radeon™ RX 7900 XTX GPUs compared to non ONNXruntime default Automatic1111 path. I hope you figure something out. ROCm 5. 5 should also support the as-of-yet unreleased Navi32 and Navi33 GPUs, and of course the new W7900 and W7800 cards. (click green button "Code" and download as ZIP). u/echo off 100% 5. On Linux you have decent to good performance but installation is not as easy, e. Open File Explorer and navigate to your prefered storage location. Directml is great, but slower than rocm on Linux. AMD support for Microsoft® DirectML optimization of Stable Diffusion. 6 In addition to RDNA3 support, ROCm 5. In the navigation bar, in file explorer, highlight the folder path and type cmd and press enter. 8, and PyTorch 2. Philadelphia 76ers. but images at 512 took for ever. pip install opencv-python. metal079. Go to the folder you installed in step 1 and browse to repositories, extract these two folders there. Rocm is a solution under Linux with good performance (nearly as good as the 4080), but the driver is very unstable. 10 by running the following command: sudo dnf install python3. Long-term dependencies on "preview" builds of DML are dangerous, so we try to avoid it entirely. Stable Diffusion versions 1. 6 progress and release notes in hopes that may bring Windows compatibility for PyTorch. Discussion. 04 to a working Stable Diffusion. Do the same for GPU and RAM. /webui. But the chemicals are now almost everywhere, including in human blood, and are being linked to severe health problems. The optimization arguments in the launch file are important!! This repository that uses DirectML for the Automatic1111 Web UI has been working pretty well: For me Rocm is about twice faster, I don't understand where to 10x figure comes from, I also tried tensorflow+directml and its comparable to this. 2 - Find and install the AMD GPU drivers. AMD won't get more support until they invest in ML, maybe next gen they implement more ML-focused tech similar to CUDA but they are already a few generations behind Nvidia. I've been using directml ishqqytiger's fork for AMD GPUs and I've found it quite difficult for most models to work properly. HSA_OVERRIDE_GFX_VERSION=11. Here are the changes I made: Install Python 3. The Forever Chemical Scandal (2023) PFAS chemicals are used in thousands of products aimed at making life easier. sh It took about 1 minute to load the model (some 2GB photorealistic) and another minute to transfer it to vram (apparently). Currently ROCm is just a little bit faster than CPU on SDXL, but it will save you more RAM specially with --lowvram flag. The number at the end of the device argument refers to the slot it’s in. OC brings the card to 16. Earlier this week ZLuda was released to the AMD world, across this same week, the SDNext team have beavered away implementing it into their Stable So native rocm on windows is days away at this point for stable diffusion. AMD is enabling the next wave of hardware accelerated AI programs using DirectML as seen in the pre-release of Olive. TensorRT/Olive/DirectML requires some adjustments to the diffusion pipelines maintained by diffusion gurus to offer complete Lora training on AMD (ROCm) with kohya_ss starts here ↓↓↓↓↓↓↓. x it/s which is the limit at the moment, at least in my testing. pip install torch-directml. They are working on it, but they keep kicking it down the road. 1 are supported. As long as you have a 6000 or 7000 series AMD GPU you’ll be fine. Question - Help. Dev process -- auto1111 recently switched to using a dev brach instead of releasing directly to main. TL DR: Stick to Nvidia. The DirectML Fork of Stable Diffusion (SD in short from now on) works pretty good with only-APUs by AMD. • 2 mo. • 7 mo. 3. 5, latent upscaler, 10 steps, 0. I previously failed (probably because I was being impatient while installing / downloading or drunk. 76it/s on Linux with ROCm and 0. 0 - A Stable Diffusion Toolkit, a project that aims to provide enthusiasts and professionals both a suite of robust image creation tools. anyways. Microsoft is responsible for DirectML, which is ALSO moving very slowly and they aren't really working on fixing the speed or the VRAM utilization issues. A 7900XTX gets around 4. Used automatic1111 stable diffusion, launch command in konsole: python launch. 3 on Ubuntu. VULKAN seems to be the best bet, it's getting used for a lot of things now. It may be relatively small because of the black magic that is wsl but even in my experience I saw a decent 4-5% increase in speed and oddly the backend spoke to the frontend much more Sharing the experience of using DirectML for the new users. The only mentioned RDNA3 GPUs are the Radeon RX 7900 XTX and the Radeon PRO W7900. sh {your_arguments*} *For many AMD GPUs, you must add --precision full --no-half or --upcast-sampling arguments to avoid NaN errors or crashing. 1K upvotes I've been running SDXL and old SD using a 7900XTX for a few months now. Good news would be having it on windows at this point. 6, Ubuntu 20. You can even overlap regions to ensure they blend together properly. 0 Milestone · RadeonOpenCompute/ROCm. Vlad supports CUDA, ROCm, M1, DirectML, Intel, and CPU. • 1 mo. I cannot get SHARK to work. it worked. 0 and 2. bat. 1 - Install Ubuntu 20. Rocm + SD only works under Linux which should dramatically enhance your generation speed. This refers to the use of iGPUs (example: Ryzen 5 5600G). conda activate Automatic1111_olive. DirectML is Microsoft's machine learning API for Windows and this allows Tensorflow to leverage this API for GPU acceleration on Windows. Atlanta Hawks. This guide should help you as much as it did for me. PSYCHOPATHiO. py-- Stable Diffusion should be running! OpenCL, OpenGL compute, DirectX compute, DirectML, Vulkan-CUDNN, ROCM, oneAPI, and maybe something with non-cuda shaders, but they have not been ML related. fix tab, set the settings to upscale 1. If --upcast-sampling works as a fix with your card, you should have 2x speed (fp16) compared to running in full precisi May 23, 2023 · Stable Diffusion is a text-to-image model that transforms natural language into stunning images. Scan this QR code to download the app now. There are some solutions to run stable diffusion on Windows but they're either limited in capabilities (SHARK) or have bad performance (A1111 directml). 76 comments. But my 7900xt can only generate maximum 5 it/s with all the settings I could find online to optimize (Automatic1111). In the txt2img tab, if I expand the Hires. Using the ONNX rutime really is faster than not using it (~20x faster) but it seems to be breaking a lot of features, including HiresFix. Rename them to k-diffusion and stable-diffusion-stability-ai. ago. 1. Optimized for efficiency, InvokeAI needs only ~3. The Microsoft Windows AI team has announced the f irst preview of DirectML as a backend to PyTorch for training ML models. There have been no command line switches needed so far. 8 contained a perf bug affecting op creation on certain hardware and drivers. The results for SD 1. Yes we’re pretty much using the same thing with same arguments but i think first commenter isnt wrong at all i’ve seen a comparison video between amd windows(it was using onnx but test had the same generation time with me using the same gpu) vs linux. 0. It would appear from comfyanonymous' write up that a 7600 isn't officially supported but there is a possible launch option to help. Then check the RAM using memtest. If you already have these folders delete them Never tried ROCm on Windows myself, but from everything I've read and googled tells me that ROCm will NOT work under WSL or any other VM under Windows. More info can be found on the readme on their github page under the "DirectML (AMD Cards on Windows)" section. 04, Python 3. In conclusion, this comprehensive guide has covered the installation and setup process of stable diffusion on AMD GPUs using Rock M 5. So you may want to specify your generation settings. 5 is slower than SDXL at 1024 pixel an in general is better to use SDXL. Not sure how Intel fares with AI, but the ecosystem is so NVidia biased it's a pain to get anything running on a non-NVidia card as soon as you step outside of the basic stable diffusion needs. Microsoft has provided a path in DirectML for vendors like AMD to enable optimizations called ‘metacommands’. Hollow Knight: Silksong. Its image compostion capabilities allow you to assign different prompts and weights, even using different models, to specific areas of an image. This preview extension offers DirectML support for compute-heavy uNet models in Stable Diffusion, similar to Automatic1111's sample TensorRT extension and NVIDIA's TensorRT extension. With DirectML, I definitely needed the medvram and all the so-called AMD workaround options even at 512x512. Now change your new Webui-User batch file to the below lines . Aug 28, 2023 · Conclusion. Once rocm is vetted out on windows, it'll be comparable to rocm on Linux. For convenience, you can directly pull and run the Docker in your Linux system with the following code: To run the code Automatic1111 is great, but the one that impressed me, in doing things that Automatic1111 can't, is ComfyUI. 5. Never tried ROCm on Windows myself, but from everything I've read and googled tells me that ROCm will NOT work under WSL or any other VM under Windows. 5 beta, 5. : r/StableDiffusion. AMD GPU with ROCM in Linux / Ubuntu-> do it. The current DirectML library for GPU is more 2x slower than the TensorFlow CPU I regret purchased AMD 7900xt instead of 4070ti earlier this year. It can run your 6800 XT on par with a 3080, except for 3 things: Nvidia GPUs have dedicated ML cores which supported mixed percision, so that both the 16 bit and 32 bit floating point can be used simultaneously during training. CUDA is way more mature and will bring insane boost on your inference performance, try to get at least a 8GB VRAM card, and definitely avoid the low end models (no GTX 1030-1060s, GTX 1630-1660s ) Skynet Here is my first 45 days of wanting to make an AI Influencer and Fanvue/OF model with no prior Stable Diffusion experience. With AMD on Windows you have either terrible performance using DirectML or limited features and overhead (compile time and used HDD space) with Shark. The first is NMKD Stable Diffusion GUI running the ONNX direct ML with AMD GPU drivers, along with several CKPT models converted to ONNX diffusers. The next release of DirectML will contain the fix. I am using Fedora, so the process is slightly different. I did it this way using Windows. Background: About a month and half ago, I read an article about AI Influencers racking in $3-$10k on Instagram and Fanvue. This is where stuff gets kinda tricky, I expected there to just be a package to install and be done with it, not quite. Run once (let DirectML install), close down the window 7. This allows AMD users to GPU accelerate tensorflow but also gives people an alternative to CUDA. 04 with AMD rx6750xt GPU by following these two guides: The extension uses ONNX Runtime and DirectML to run inference against these models. 1. DirectML in action. Not at home rn, gotta check my command line args in webui. I am interested in playing with Stable Diffusion recently. Download k-diffusion and stablediffusion folders. for 7900XTX you need to install the nightly torch build with ROCm 5. zs wl rr cq qf do lx oy ff fv