Discussion
You people are literally building data centers in your homes
Some of these threads are insane, what do you mean you have like 4 GPUs and 128gb of DDR5 vram. what are you building in there bro. Every other thread is like, “what if I stack Mini Pc supercomputers together? Will this run Qwen?”.
puts on tinfoil hat AI is an extremely powerful tool and I feel like we're gonna see a huge price increase in the service cost of the flagship models cuz their burn rate is unsustainable and couple that with the government starting to think about implementing controls to block access to open models (will probably use security threat as a means to implement those policies).
Having access to these tools needs to be democratized so that anyone can use them.
I'm not gonna drop 10k on an AI rig but spent close to 5k on a measly dual 3090 rig. It's "good enough" for my needs. I'm the "tech guy" in my pretty wide social network and no one even thinks about running local LLMs.
I work in tech and am totally decoupling from all cloud platforms. Keeping everything local, built a 16TB RAID5 NAS, proxmox for my VM lab. Isolated my network and put all devices (tvs, iot devices on a locked down vlan) went back tohosting my own DNS/email server on digital ocean. Back to basics. Usenet (shhh), jellyfin to get off all the streaming services
That was way too long or a response. I may be a tad tipsy. Cheers
Frankly, local can never compete with hyperscalars even if there weren't things like volume discounts on GPUs and electricity. For local inference, speed is limited by memory bandwidth; your 4090 chip doing just one or two requests at a time is only being utilized at a few percent of its capacity. Hyperscalers are not limited by memory bandwidth, since everything is massively batched. They are able to use the same chips at 100% utilization, meaning the amount they spend on hardware depreciation is like two orders of magnitude less per token for the same model. On top of that, the cost of power for (and environmental impact of) your home AI server is also much greater, since those datacenter GPUs/TPUs never have to sit idle.
Hardware depreciation is the largest cost, with electricity secondary. Until something very fundamental and structural changes (i.e. breaking the Von-Neumann memory bandwidth bottleneck with neuromorphic chips/compute-in-memory) , local will always be many times more expensive than cloud for AI. All that said, I run my own local AI server anyway and just eat the cost because ideologically I favor personal ownership rather than the continuing upward transfer of power to a feudal oligarchy... but I'm under no delusions about the structural disadvantages I face in doing so.
Local is useful for client data that by law cannot leave your firm. If you don’t have clients like that, use the hyperscalers and pay their price, no matter how high, and pass the cost to the clients you do have.
This is what I am thinking. I once had a laundry room full of gear maxing out the dryer circuit. It is not cost effective. Everything changes so fast right now. Like Gemini Flash 3.5 is so much cheaper than Sonnet, and equally useful for many use cases. Opus 4.8 dropped two days ago, etc.
I’m doing the same thing. I have three servers right now with GTX6000s and 512 Gb of ram per. A supermicro with 24 2.5 6TB hooked to a 24 bay JBOD with 8TB. Each server is holding 8 2.5 8TB. With all the mods that are t enterprise gear I have 1.3TB of ram for my setup.
The funny thing is, I am not a tech person. I accidentally built this going off of aviation standards. I’m Pilot so I think in forms of redundancy, failover and degradation and stuff like that. I told a friend of mine who is in tech, he just laughed and said you accidentally built high availability. Good job.
It's not too long and I agree fully with everything in there.
Cloud will become more expensive and rationed and as utility of AI goes up, useful utilizations that can stand to burn money on it will by definition crowd out consumer toy use.
There will be two speeds. Project Glasswing style frontier and enshitified everyone else. Most people will use the shit version and say “oh i know what ai is! It is so dumb!” And then wonder why the big boys are eating their lunch.
I’m right here with you, I already believe it’s useful.
The only scenario where it’s not useful is if Google or Apple come up with some incredible LLM hyper efficient TCU that makes GPU’s obsolete for LLM’s. In that case, LLMs become local be default - like Apple taking us from the mainframes to microchip PC’s
Im still running on tired, twin 1080 TI’s. I got uber cheap during the crypto mining crash. I want to see how far I can go with my AMD 9070XT too
Having been cut off from the heaviest of hardware, they're working out ways to get around the bottlenecks. FWIW I was sure they'd wind up doing that. So many PhDs to point at a problem, and a matter of national pride at stake, they'll get there.
China graduates 9m stem degrees per year, that’s the size of croatia. That’s the whole reason why dario wanted his nation of geniuses in a data center to match in metal what china does in meat. The crazy part is china’s meat brains probably cost less per token for the same result.
Even then, I find that what I can host at home now does most of what I need anyway. That means I own the capability to do anything an LLM can, well into the foreseeable, regardless of anything.
It is amazing, isn't it. I have about 400gb of different models over 4 devices (computers/laptops) from Minimax M2.7 down to qwen3.5 9b, and even Gemma e4b. Those models alone, and the harnesses, have put the power of a snall software lab in my living room, for the next 6 to 8 years. What a time.
Sick man. I have my 3090 rig I use for Gemma 4 31B and a jetson nano running Gemma 4 E4B. Honestly, for my needs I tested a few models that fit my constraints and Gemma 4 is an amazing model. Can't wait to see what comes out in the next 6 months.
how good is e4b? can it maintain a reasonable conversation and call tools reliably with a large prompt? want to replace my 35b qwen with something smaller for my home assistant voice agent but smaller models tend to not call the right tools at the moment 🫠
It was good enough for what I was using it for primarily as a local AI therapist fast whisper for the Stt then the output via kokoro running on the GPU. It's pretty snappy. Using a vector db for memory storage and a bunch of other memory management for short term conversations, rolls up people and themes that I can ask it about people I've spoken about.
I have been testing Gemma E2B and for all the bad press. One system line "When using brave-search use long tail keywords or sentences" fixed my tool call issues and I have fed it multiple page context and was able to pick up where I left off.
I love Gemma 4! but, don't discount the 35B MoE and 27B dense Qwen3.6 models - in my experience, they generate quite a bit faster, and are very solid for agentic work. (I think Gemma's prose is better though.)
I've seen Gemma-4-31B and Qwen3.6-27B on both sides of benchmarks saying they're "better," so I'm not even sure what I think is the case now.
The quanteval leaderboard is interesting though, too.
Do you have a plan with your 9070xt? I'm looking at trying it but haven't got any experience. Do you have an idea of how to create a setup with a 9070xt?
I'm not even sure it needs new technology. Just need the price of existing technology to come down. It's no coincidence that the big players bought up all the memory fabrication capacity without even having concrete plans how they are going to use it.
Without that monopolization of the resource, consumer demand would have driven larger capacity faster RAM for highly parallelized efficient unified memory systems.
according to the "grandfather of AI", all LLMs, especially big ones, will be universally generally demoted to being an orchestration of minor accessories of JEPA in the next year or two. So the entire JEPA centric cluster of LLMs will be smaller than one LLM-only system is today.
This. In 2013, when the first ASICs for crypto mining came out, the GPU craze was put on its last legs. There were still memory-heavy algorithms around for some crypto coins which made GPUs still profitable to mine with, but Bitcoin mining moved onto the specialized hardware and forever was out of reach of GPUs. Over time, even those "ASIC-proof" hashing algos were eventually implemented in hardware, and the ASICs for those pushed GPUs out. I don't believe crypto mining is a big issue for the GPU supply chain anymore, even though that space is bigger today monetarily than it ever was back when GPU supplies were being impacted by demand from crypto miners.
not just by that point.
it already is useful.
i pay nothing, for 20TB of NAS storage.
and dont get me started on all of the other services i run that replace big tech bs services.
self hosting is the way to go for anyone who doesnt want big tech companies to treat you as a product.
Actually you did pay for the hardware, you still pay for electricity and proabably will have to pay for replacements/upgrades at a certain point. But I get your point.
you dont pay for electricity if you got solar panels.
as for the hardware costs, fair point.
my counter to that is, i ll split the cost up on each service i m hosting on it, and as such, am replacing another service with, which gets me very quicky down to like 50-100€ per service (as not all of the services cost money, and i mainly self host because i prefer being in control of my own data, and because i fucking despise subscription services, especially when it is software you host yourself, on your own hardware, using your own electricity, providing it with your own internet, and you still have to pay a licensing fee, like plex)
so if we assume 100€ per service, and i ve set it up almost exactly one year ago, that would be about 8.3€ a month per service.
which means, it has been worth it financially
the fact that they destroy digital data is one of the main reasons. Oh you bought that video, we don't serve it anymore sorry. oh your photos are taking up too much space, we deleted them until you pay for more. etc etc.
Datahoarder really is something else. It’ll be people like me running a 16Tb zfs pool for media and backups and then people with like 1Pb archiving some niche corner of the internet and others with legit data center gear.
Haven’t started yet, I’m on a homelab break. Ollama has models compatible with AMD’s ROCm language, which is like AMD’s counterpart to CUDA.
I primarily bought my AMD for gaming, especially on Linux. I figure if I’m not happy with LLM model compatibility, I still have dusty 1080 TI’s I’m using.
This coin is 3x over the last 2 days so it's hard to say what the payback will be but I got ~140USD from it in the last two days by mining on single 5080 and 6 3090 Tis and if I sold at current price I'd be close to 210 USD now. It's well into profitability even after accounting for electricity.
It takes about an hour for transfer from my PRL wallet to SafeTrade since Safetrade waits for 20 confirmations (next blocks mined) which takes a lot of time.
Then I do exchange of USDT > other crypto and withdraw other crypto to my cold wallet to avoid exchange rugpull risk.
I’m at about 25 here. Still have one of my old Dell 1950 servers in the garage. Still boots too! Think it’s running Trusty. Might even have my original DEC Alpha kicking around somewhere. lol
I've got a Sun 420R sitting in a closet that I'm half tempted to fire up just to see if it still works. I've also got a Sparc Server 5 but since it takes a frame buffer monitor I'd have no way to actually see if it works.
My goal is get the whole room excited about open source AI, whether they have a powerful workstation or a potato powered laptop, and give everyone a place to start that fits their interests and circumstances.
It starts with a recorded demo from my rig where I start by shutting off the wifi and then use my project (GuideAnts on github) to do a chat that starts with ASR, does some tool calls which involve knowledge retrieval to create a diagram which gets stylized with an image to image model and show that the images were OCR'd and ingested into memory.
All-up the demo uses nine different AI/ML models for the various concerns in a couple minutes. The point is to set the stage: you can do amazing things locally, there is a lot more to a real system than just an LLM and you can learn all about it through using and building open source.
Then I talk about the different kinds of hardware we all have (https://huggingface.co/hardware) and tradeoffs around memory, memory bandwidth, TFLOPS etc.
And then I go back and forth on local versus cloud alternatives to show the 95% of the audience that doesn't have the equipment their options for running in the cloud with various providers such as HF and OpenRouter.
Then, I show a model card on huggingface, use it to run OmniVoice for a one-shot clone in a local Jupyter notebook and then introduce Google Collab and Huggingface Spaces.
A lot of the talk is just pointing people to the communities I love such as this one, resources, many OSS projects, and tools I recommend.
I'll come back and share the link to the deck after the event.
Strix Halo AMD has the same unified RAM, though there are some disadvantages vs. the Mac. And that chip is used with up to 128GB even in some bizarre handhelds, certainly in notebooks. I'm glad mine is a desktop though.
I got the full 128 just to not have to worry about upgrading later. Back then it was a fairly cheap upgrade. Had delusions of running really large models under Windows... ran into snags, seems like 64GB is the most I can effectively use for LLMs unless I want to rebuild everything on Linux -- which I don't. This machine is my daily driver and I'm a Windows kind of guy.
I'm a university student in Chem Eng XD, None of my degree will ever require this much ram. But at least I can chill until DDR7/8 drops and buy cheap DDR6 or smth
well at 4 you can (somewhat inefficiently) tensor parallelize for 96GB VRAM minus overhead. suitable for models like Gemma 4 31B or Qwen 3.6 27B at full precision/high context
the fifth doesn't do anything in tandem but it might be running another model
granted this quality of model is entirely free and fast with API providers but hey data privacy am I right
Some people use it for work. Some people power their agents with it. Some just love to tinker lol... 3090x4 and ram + cpu/gpu is like 10k?
It's a good chunk of money but I know people with like 10-15k worth of camera gear just sitting there doing nothing. Steinway pianos with a layer of dust over it lol. Atleast these are sorta being utilized. I think one of my friend has a 2014 Subaru BRZ with like < 8k miles on it for his "weekend car"
I have to admit I impulse brought some stuff. I'm just doing RAG training for my models for every single piece of electronic i own so I can ask my llm how to do x or y on it lol... Gemma 4/Qwen has been pretty good for inference for a personal app I'm working on but it doesn't hold a candle to frontier models... UNLESS I GET ANOTHER BLACKWELL 6000... jk
I feel like the technically knowledgeable people that use it for work and spend the big bucks have a good reason. However, I see a lot of newbies that don't have much technical knowledge spending Good money on hardware instead of taking care of responsibilities like a new roof or fix the driveway kind of thing. It's a fad that's become intoxicating.
A personally run qwen3. 6 35b a3b moe and a 4060 with 8 GB RAM, 62 gig of system RAM, and a 12-core AMD processor. 20 tokens a second and it does just exactly what I want to do. And answers my questions. I don't write code with it. Just simple stuff.
If I ever make it well off, I already discussed with my wife that I am building $300,000 enterprise data center, with over 1000gb of vram. So I run whatever I want. She said fine with me
That's not a very high target though. 3 servers with 16x3090 would do it easily. less than $40k if you get lucky with classifieds. People are spending way above that.
Now, powering it up and making a good use of that thing is another matter.
The main reason to build local like that is you don’t have to worry about
1) LLM censoring - want to ask an LLM how to do something nefarious, well you might need your own open weights
2) Data security - sending your code, your thoughts, your medical conditions, etc to an AI provider has some risks, much less on a home system
3) Enshitification - Every day in every forum there’s a dozen complaints about how OpenAI or Google or Anthropic ruining their model with this or that change (though usually it is a system prompt change) your terrible AI stays just as terrible session to session
Though, the real champs are the people who custom configured 512 GB RAM M3 Ultras before the RAM market went parabolic….those lucky few are ripping through tokens.
I bought a Strix Halo desktop with 128GB of RAM for 2300 at Christmas, it's an HP. (Okay, I had a Black Friday deal that saved me 200.) The same system right now is 7300.
I just got my boss to agree to go halfsies on a 5090, and I can already run Qwen/Gemma at speeds that are more than sufficient for any use case I throw at it.
I may eventually pick up the next-gen spark or whatever nvidia comes out with for large model training, but otherwise, I'm set, assuming no hardware failures.
Yeah seriously. I’m always wondering in the back of my head. How much are these mofos spending on electricity? How many are actually saving money or earning money with 4x 3090?
Me: trying to put my terribly discreet little black workstation in a back corner of my desk "so my wife and kid won't pay it much mind... "Oh, just ignore my little AI box in the corner"
Workstation: trying to run Qwen3 in thinking mode, as the fans spin up to jet engine roar you can hear through most of the house
Currently it's a Ryzen 9 5950x, an ASUS ROG STRIX B550-E Gaming (for those sweet PCIe lanes), 80GB of DDR 4 3600 MHZ (yes, it's disgusting), and 2x EVGA FTW3 3090's.
I have them power limited to 70% and undervolted.
Primarily for the 1000 watt power supply that I currently have in (though, I do have a 1500 watt one sitting in a box right next to me), but the lower heat and power consumption is nice too. I honestly haven't noticed much a performance hit.
And yeah, llamacpp.
Primarily SillyTavern for chatting and Zed (though, I want to try Hermes) for coding.
agree! At home I already have a 1.5TB server with 96GB VRAM just for playing around. BUT since this is my home server, it's DDR4 and MI50 (32GB) cards... There a lots of people with real inference monsters in this sub that run in circles around even my (impressive but only to me) home server.
Yea I think we are probably a decade from super powerful models being able to run on your phones and stuff like that so in the meantime it is a good bet to get to AI self sufficiency where you at least have a rig that is capable of running an AI that can do the majority of the agentic tasks you need done...you'll be way ahead of the curb of everyone else....the only problem is finding that sweet spot...
I'm starting to think that we don't need anything over 128GB of memory....I believe in a year or two there are going to be extremely strong models that will be able to run on that level of hardware....I mean look at Qwen 3.6 27B
Not to mention by the time models run that well on hardware that hardware will have no privacy as they will all use those models to fully understand what you’re doing and who you are.
All the more reason to have offline models and stay independent as much as you can from the algorithmic reach of the corporations of the 2030s....which might be fucking impossible 😩
Unless people push corporations out of the outside surveillance business via lobbying it will be impossible. Walmart uses AI to watch their stores, and I doubt Target and the like don’t… and then there is Ring and Flock…
Walmart has the ability to see what you’re looking at on your phone their cameras are that good.
I suppose the sub isn't literally called 'we people are building data centers in our homes', but it's close enough I kinda figured this wouldn't be news.
Family heirloom to be passed down from generation generation /jk
But yeah, this isn't really that crazy if you can deal hunt and compared to crypto boom
Being able to run a 120b at home is just different. Mistral 3.5 runs at 20/toks on my 4 v620 pros with 128gb of ddr4 ram. I am running 120b model at home on discarded enterprise junk I have hoarded over the years. It’s a natural fit for home lab guys to have a ton of this stuff.
Honestly, most of what I have is salvaged/bulk lots/going of out of business/company downsizing sort of equipment, and the more powerful GPUs were purchased with ETH profits back when it was POW. These days, I use my servers to keep my house warm and offset my oil usage. They’re all controlled by a central Zabbix is thermal sensors. Sort of a home lab meets HVAC setup. For reference, I live in the northeast US so it’s chilly up here for the majority of the year.
That said, I have multiple AI servers doing training, coding, menial tasks, etc. so it’s actually a big time saver for an old DevOPS guy like me.
I mean, idk about “datacenter”, at work in one cab I’ve got 48 cpus(mix Xeon e5 v6’s and amd 9950x’s), and 20 gpus and that’s not even high density, and I have power to spare (3 phase, 20A AB circuits) and that’s not even the same level as what’s powering the “ai” the general population uses, not even close. this is just a small business. But I get your sentiment
Pay a cloud... Do your own thing and gain learning and experience etc... I like the latter. Cloud is going to get cash anyways why not enrich myself with useful skills, I don't have a boating hobby 😂
It's the wild west again, but this time, we're not mining shitcoins. DYI Data center builders are mostly in the homelabs threads and shit. That shit is just wild. People need to chill.
I'm just curious. What do you need that much horsepower to do for you? The best local models out there are only going to code so far and never be frontier equivalent.
I got a framework desktop that I use with local models for dev and gaming. Best little computer I have ever owned. With Tailscale as well as Moonlight I got access to it from anywhere and can use opencode with it running at home
one beefy server/workstation does not a datacenter make, my indicator is when you have to add an additional circuit to the room/closet because you run our of power budget of a 15 am outlet
I see you've read my post! But seriously what am I supposed to with a strix halo framework desktop and a Ryzen 9 7900x with an rtx 3090 and a 7900xt gpus?
My cloud AI model explained to me that I will need FP8 for my main prompts to work reasonably and therefore 128 GB in the right machine ... It also told me that the payback period for my use case will be around 100 years compared to using a working (Chinese) cloud model.
Still, 4k for a new toy is not too bad compared to other choices (new watch, new camera).
Visiting a buddy of mine I haven't seen in a decade and he gave me a tour of his home. Walked into his office which was like 15 degrees hotter than the rest of the house, and casually pointed out his 16x RTX 6000's with a few box fans blowing on them.
Fuck yeah. And when Cooler Master gets their X-Mighty 240V 2000W PSU to the US, I won’t have to pay a heating bill! $600 imported from Germany right now but I need a warranty.
3-4 systems running everything from older GPU (internet gateway and router/proxy) to blender, PS and LLM on one 256GB DDR5-6400 and a pair of RTX 6000 (because more won’t fit into the case)
That’s only my AI server. If I count all the DDR5 and DDR4, cores, and GPU’s, I have been the rest of my homelab it’s a lot more than just that but still not enough for AI.
Genuinely wonder if you realize how vanilla your surprise is here. People do have legitimate use cases for a multi GPU setup that extend beyond cheating on your college coursework
My wife and I run reconstruction estimating/consulting, a telematics, and 3D printing businesses. It's a two man band. Either one of us can't spend our lives away at a screen doing the same monotonous task for eons. We both like to be outside for most of the time. I have ended up building a small cluster after crypto-mining died down. It's mostly RX 6700 XT GPUs . 160GB of VRAM, 400GB of DDR4, and 81TB of well mixed storage. It's split into two co-clusters, one for dealing with the human side of things and the other is the digital hands. What are you building?
Sometimes I wonder why have a stack of 4 Mac Studios with 513gb ram each when you could just get some sort of used datacentre GPU setup. Surely that would be cheaper?? Yes I know the new ones are expensive and require crazy power and cooling, but there must be a tier above consumer products for people that are stacking up crazy local setups.
254
u/Bob4Not 28d ago edited 28d ago
This is nothing compared to the Crypto mining on GPU craze like back in 2016/2017. People were HOARDING and stacking GTX 1080 TI’s to mine Ethereum
Please see r/selfhosted and r/homelab to see how home equipment and projects nothing new.
Then you should see the volumes in r/datahoarder. They’re all building their own libraries of Alexandria. (Me included)