Thoughts on DeepSeek?

AstonJ · January 27, 2025, 6:36pm

There seems to be a lot of buzz around DeepSeek at the moment, with some saying it’s a ChatGPT killer. The most remarkable thing (if they are to be believed) is that they built it and run it at a fraction of the cost, yet is said to perform better!

Here’s an article about it:

Good explanation:

Lots of chatter about it on social media, curious to hear what everyone here thinks - anyone used it for Elixir or Erlang? What’s it like compared to others?

joelpaulkoch · January 28, 2025, 9:11am

I’ve been thinking for a while that “open” models are going to win at the end and therefore “closed” providers are massively overrated at the moment.

In my mind, it’s like open source databases vs. proprietary databases. Providers of proprietary databases make good money but at the end most stuff is built with open source databases. So, I believe it’s going to be similar for LLMs as building blocks of applications. I’m not sure about the chat apps, although open models should be cheaper.

I think part of the reason is that closed model companies lost their advantage in sitting on huge amounts of data because as far as I know synthetic data is getting more important for training LLMs. And now it looks like they also lost their advantage in having huge amounts of compute available as smaller companies innovate in that area.

For me it feels like these models get more similar to each other anyways. At the end users won’t care if they score 87 or 83 on some benchmark, and they all have kind of the same capabilities, so why not pick the cheapest one.

On the other hand, I don’t know what’s going on inside OpenAI, so there might be a surprising release of something that invalidates everything I just wrote.

hubertlepicki · January 28, 2025, 9:21am

Listening to Sam Altman speak, I don’t think he has strong beliefs in their models having to be closed-source or open-source. He wants to make money and develop AI. What I think will happen is that they will be prompted to make them open source. More, if they open up development of these models, that may be huge benefit for them, in terms of cost reduction. They can definitely save millions or billions on the DeepSeek optimizations alone.

AstonJ · January 28, 2025, 10:03am

Well it made the headlines in the UK… but not only that, they made it a live story on the BBC homepage and they usually only do that for big events, here’s their summary:

Summary

US President Donald Trump warns Chinese startup DeepSeek is a wake-up call for America’s technology industry

The emergence of DeepSeek’s low-cost Al chatbot caused shockwaves on Wall Street, with Nvidia losing more than $500bn in market value

Shares in other major technology firms fell steeply in value when markets opened on Monday. That day DeepSeek said it was hit by a “large-scale malicious attack”

Tech-focused shares in Japan fell on Tuesday, while stock markets in China, South Korea and Taiwan are closed for the Lunar New Year holiday

The DeepSeek app, which was launched last week, has overtaken rivals including ChatGPT to become the most downloaded free app in the US

DeepSeek was reportedly developed for a fraction of the cost of its rivals, raising questions about the future of America’s Al dominance and the scale of investments US firms are planning

I think many people agree (and actually want that) Joel. People are understandably sceptical and wary of this type of technology being closed source and controlled by a small number of people or companies that many do not trust or like - and this is a big advantage for companies like DeepSeek.

colibri · January 28, 2025, 10:39am

there’s always a better and cheaper code

AstonJ · January 29, 2025, 9:52am

I haven’t read any of his views but perhaps actions speak louder than words in this case, which does make it appear he has a preference for closed source. Many people think they should even drop ‘open’ from the name as it’s misleading.

Agree completely! If anything this feels like the real birth of AI and it will be exciting to see what happens next. I hope Apple either allow us to use DeepSeek with our Apple products (instead of ChatGPT as they’ve been saying) or use DeepSeek’s findings to create their own LLM for us to use.

Imagine if they apply the techniques used by DeepSeek to create on-device AI tools…

hubertlepicki · January 29, 2025, 10:25am

I think there are many things at play here, including shareholder and partner preferences (hi, MSTF). Making a model open source is also a headache, from the legal POV and also just logistics. They did not have to do it either. There was no pressure from competition. LLama was always many steps behind and it’s also “kinda” open-source, but not really a free software.

DeepSeek is both open-source and free software, with a permissive license with very few use restrictions.

I’m not saying Altman is enthusiastic for open-source models, but he will probably not be against it either if it’s something they will look into doing as a response to rise of other open source and free software models.

AstonJ · January 29, 2025, 11:16am

I haven’t really followed it, but wasn’t it why he and Musk fell-out? Am sure there were some stories on DT about it… let me find one… here’s the quote I was thinking of:

Musk wrote “OpenAI ws created as an open source… non-profit company to serve as a counterweight to Google, but now it has become a closed source, maximum-profit company effectively controlled by Microsoft. Not what I intended at all.”

From: OpenAI Isn't So 'Open' Anymore, According to Elon Musk - dot.LA

If that is how they go I imagine because they have little other choice. But for many it will be too late.

I think this is why so many people are excited - it’s brought AI to the masses

I’m going to try setting it up on my Mac later, I tried Janus Pro WebGPU - a Hugging Face Space by webml-community but that’s not working for me (clicking on the generate image prompt does nothing)

AstonJ · January 29, 2025, 2:04pm

This is really cool:

They’re running the full 671 billion parameter model on a cluster of 8 Mac Minis, each with 64GB of RAM.

hubertlepicki · January 29, 2025, 2:04pm

On the subject of Musk and his grievances - I had a brief look into it but that’s up to lawyers to decide, and I don’t know how that’s going.

Generally speaking there is a way in American law to convert non-profit org to a for-profit org, and it usually involves sale of all assets, which I believe happened, and this is why Altman is not the owner, just a CEO. If the procedure was followed properly is another question that I am unable to answer.

I also suspect if there was more of the “open” in actions of OpenAI post transitioning to for-profit org, i.e. they continued releasing their models as free and open source software, there would not be this beef between Musk and Altman. Maybe this whole DeepSeek thing resolves that dispute by forcing OpenAI to open up again.

AstonJ · January 29, 2025, 8:54pm

This is probably the best explanation/breakdown I’ve seen so far:

If anyone is interested we’ve just started a DeepSeek portal over on Devtalk. Think it’s going to be a huge topic this year!

TimButterfield · February 1, 2025, 5:02pm

While DeepSeek is interesting, when it tells me 'Too busy; try again later.", that makes it easier to go back to ChatGPT, where I very rarely have to try again. Of course, if I’m using the DeepSeek-R1 locally with Ollama, then I don’t have the busy issue. But, I am working with a much smaller LLM in that case.

Speaking of AI in general, I do find it interesting that, if you can be extremely specific in what is required, AI can do a somewhat decent job of creating a starter project for you. Of course, if you’re not sure you are specific enough, you can have it review your instructions and suggest options for clarification. But, you still need to have thought through a decent starting set of features. Of course, even then, there will still be cases where you have to correct the AI hallucinations. For example, when fixing resulting errors, it often suggests a fix that, while correcting one thing, will remove other required functionality. Or, it may get in a cyclic loop of delete A and add B only to later have deleted B and added A back, resulting in the original error all over again. This is partly related to limited context lengths. It cannot always hold sufficient scope simultaneously to make correct suggestions. So, you still need to keep on top of it and know enough to write code the old fashioned way, by using your own mind and with your own fingers on the keyboard.

AstonJ · February 1, 2025, 6:33pm

I’ve been getting the same since their servers were attacked. Hopefully it won’t be long before it’s back to normal.

It’s worth trying it on your local machine btw, I posted a macOS guide here - it’s surprisingly quick, though of course a much smaller model.

I agree - I think right now it’s great for grunt work or getting you up to speed on things, or for simpler tasks. But I see DeepSeek as the real/meaningful birth of AI - it’s open source and pretty powerful, and it is only going to get better from here… imagine where we’ll be 5 years from now. It’s an exciting time for the tech world, particular because it is truly open AI.

CharlesO · February 1, 2025, 7:50pm

You could try https://kimi.moonshot.cn/ as an alternative.

AstonJ · February 1, 2025, 8:08pm

Thanks Charles! I’ve been meaning to try Kimi!

It’s very fast! And really nice how it integrates web search automatically! However it requires a phone number to create an account… and I’m not sure I want to give it my number (DeepSeek and Qwen let you register with an email.)

TimButterfield · February 1, 2025, 9:51pm

I have it running on MacOS already (M2 Max/32GB). Along with other models, I use the deepseek-r1:7b model with Ollama. Local deepseek is interesting in that the different versions have different bases. For example, the 7b version has a qwen base, while the 8b version has a llama base. I pulled both, but have not yet done much comparison.

As a test project, I wrote a React.js/Rust/Tauri desktop GUI to allow a SQLite stored chat conversation with the Ollama API (a micro version of ChatGPT run locally). It lets me select and use whichever LLM I have loaded locally and revisit those chat sessions later. I’m now working on a version of the app using Flutter to see if I can point a mobile version at a local Ollama API URL to have similar chats while selecting from the same loaded models. It may be cool to try the same thing with Scenic/Elixir to run as a desktop GUI app. Elixir/Phoenix could do it also, though that forces a web app for a local API; didn’t seem practical. In theory, I could also use Elixir + Tauri, but I haven’t taken the time to wrap my head around that yet.

It will become much more interesting when the AI can begin to ask us the questions we usually ask the clients or product owners, having the AI ask the developer those clarifying questions. “Given the specified set of features, it seems you are going in the direction of blah, blah, blah. Have you considered whether it should do X or Y also?” It should also be capable of verifying the build configurations and build results itself instead of just responding with suggestions to fix our build error prompts. But, I suspect it will need quite a bit larger context capacity than currently available before those type of things become possible. Imagine the context capacity that would be required for it to hold an entire decent sized project to perform that type of analysis to ask those things. But, right now, even a few larger files can exceed that analysis capacity, not to mention the additional complexity of links between those and yet more files, too. From conversations on the ollama discord, I think num_ctx is now only around 2k or 4k. That’s not a lot of space, though it is likely to keep growing over time. (Of course, those are the local context sizes I’m working with, not the larger ones possible on big servers.)

AstonJ · February 2, 2025, 10:29am

I’ve only used the Qwen models, however I did notice a difference where I felt the 7b performed better than the 32b, even though the latter took longer! I posted the results here - if you’re on DT I’d be curious what kind of results you get with the same prompt.

That’d be pretty cool! In terms of design, they could show you initial concepts and ask which you’d prefer them to pursue.

I think the next meaningful progression will be when we can run the current big models locally, with the moe that can’t be that far off I imagine…

ken-kost · February 2, 2025, 11:04am

Think your guide is universal. I’m trying it out on Windows. I got an RTX 4090 and I’m downloading DeepSeek-R1-GGUF and have a warning label Likely too large for this machine I’m curious what will happen.

AstonJ · February 2, 2025, 11:09am

Haha! Let us know how you get on but I think you’ll need enough memory for the model to load into it

Thanks… I’ve updated the title

TimButterfield · February 2, 2025, 1:13pm

I’m on DT also. I’m on a lot of things, too many to keep up with.
I used the same prompt you had used and tested these (results posted in your linked thread):

local Ollama API DeepSeek-R1:7b (qwen family)
local Ollama API DeepSeek-R1:8b (llama family)
Online chat.deepseek.com
Online Gemini (Flutter AI Toolkit)
ChatGPT-4o
ChatGPT-o1