发布于: 修改于: iPhone转发:2回复:47喜欢:21
$比亚迪(SZ002594)$$特斯拉(TSLA)$ Sora横空出世,获得全球又一次一致惊叹,在老家抽空研究了其关键技术:
1.OpenAI继续继承和发扬其在chatGPT上的技术和积累,Sora视频生成扩散模型依旧采用了 Transformer 架构,而其他主流视频生成扩散模型较多采用 U-Net 架构 ;
2.为克服Transformer模型的在视频生成应用上的“密集计算”的缺点,openAI创造性引入了latent概念,对视频信息进行降维活压缩,意在用更少的关键信息去还原视频,OpenAI 为此开发了一个视频压缩网络,把视频先降维到latent空间。
3.基于视频压缩网络的输出,openAI另一个最具创意的发明是模拟chatGPT算法的“token”引入了“patche”,如果 Token 翻译为词元理解的话,Patche 或许可以被我们翻译为 “ 图块 ” ,用于训练 Sora 这个视频模型。
4.用户输入的提示词并非直接交给 Sora 进行生成的,OpenAI 利用了 GPT 的能力,在用户给 Sora 输入提示词的时候,GPT 会先将用户输入的提示词进行精准的详尽扩写,然后再将扩写后的提示词交给 Sora,这样能更好地让 Sora 遵循提示词来生成更精准的视频。
总结:
它能生成复杂、精美且业界领先的时长的视频,这证明AI在理解现实世界的能力上有相当大的提升。
但这种提升,依然基于大量的训练,而不是AI本身对世界的理解和“物理定律”,Sora对视频的处理依旧是有很多局限性,甚至包括很基本的事实错误。
所以Sora给人的感觉虽然震撼,但还称不上这两天热烈讨论的“世界模型”,离AGI还远着呢。和特斯拉end-to-end的FSD算法本质上是一样的,必须靠足够的数据喂养,才能接近“AGI”的效果。<a href="http:/<a href="http:/<a href="http:/<a href="http:/

全部讨论

验证了我的判断,Sora基于大量的训练,而不是AI本身对世界的理解和“物理定律”,没有AGI方向上的技术突破,大家不要神化它;Elon Musk宣称“特斯拉已经能够用精确物理原理制作真实世界视频大约一年了”也值得怀疑,原帖中我已经作出判断的确Sora和End to End FSD很像,都是要大量喂养数据才能理论上逼近AGI效果:$特斯拉(TSLA)$ $比亚迪(SZ002594)$
Sora的翻车视频看起来都很有趣,而且这几天行业里一直在争论Sora到底能不能理解真实世界的物理,周鸿祎和Jim Fan(英伟达的知名研究员)都倾向于OpenAI做了相关的训练,但根据OpenAI自己公布的博文,是明确表态了Sora不懂物理的,这就更匪夷所思了,就像这个翻车视频里,可以很明显的看到椅子的弹跳很像古早时期游戏引擎不完善时大家会在游戏里遇到的一些Bug画面,Jim Fan甚至认为OpenAI可以拿最新的虚幻引擎来为Sora提供训练素材,实现虚拟数据的金针菇式自循环。
视频:网页链接

把视频压缩成Transformer能处理的patch,对应的就是LLM的token,训练的素材和推理的结果都是patch,最后用Diffusion还原成一帧帧图片。$比亚迪(SZ002594)$ $特斯拉(TSLA)$

02-28 17:44

欧阳明高如何看Sora和FSD,和我一致的:$特斯拉(TSLA)$ $比亚迪(SZ002594)$
FSD与Open AI Sora所遵循的理念和底层逻辑是相同的,就是把视频生成模型作为世界模拟器来预测车辆的移动趋势从而生成自动驾驶指令。

02-17 11:46

“latent”应该是比较常见的概念…

03-01 09:58

【大语言模型将改变编程…但非常有限】,类似的夸大宣传很多;AI对无人驾驶的影响要把Elon Musk和王传福的观点结合在一起看:$比亚迪(SZ002594)$ $特斯拉(TSLA)$ 网页链接
- 大语言模型(LLM)确实能够模拟人类交流,但仍存在很多问题,如偏见、缺乏智能等。作者一直持怀疑态度,不相信LLM能产生巨大变革。
- 在编程领域,LLM有助于减少复杂性,因为开源世界已经存在大量解决编程语言和API问题的方案,LLM可以更好地搜索和提供这些现成的解决方案。但LLM本身不了解代码质量,提供的代码可能存在缺陷。
- LLM生成的代码质量参差不齐,开发者仍需自己验证、集成、重构, LLM最多只能作为有缺陷的头脑风暴工具。
- LLM对程序理解毫无帮助,也不会让开发者做出更好的设计决策。依赖LLM生成代码可能适得其反,使程序难以理解。
- 对学习者而言,LLM也存在负面影响。它鼓励避免学习编程的困难之处,导致无法深入理解;它也可能成为新的搜索复制代码的捷径,使学习者陷入低效的猜测-检查循环。
- 总体来说,LLM对编程的影响被高估了。它只会将一个乱麻转化为另一个乱麻,不会带来革命性的变化。作者预测LLM的影响会相对有限。
I do not like hype. Perhaps it’s the skeptic in me, always seeking to question a claim, to look for evidence, to examine closely before making a judgement. This stance on the world means that I’m often last to accept dramatic change and first to question it. There is nothing more blasphemous in computer science, which as often put me on its margins.
And so over the past decade, as I’ve watched large language models (LLMs) mimic human communication through sheer probabilistic scale, I’ve watched with doubt. Doubt that that they would ever overcome bias, doubt that they would achieve any form of intelligence, and doubt that they would be applied for anything other than profit. There was only one thing I was sure of: that once they emerged from my small scholarly corner of the world into the mainstream that hype would be the primary lens through which they were seen. And that’s basically what’s happened.
As LLMs made their way to programming through systems like 网页链接{GitHub CoPilot} and ChatGPT, however, I began to see LLMs with more nuance. Programming is, for those that don’t, 网页链接{kind of a mess}. Languages are strange historical notations we live with for decades and the most popular ones are often the most haphazardly designed. APIs are often poorly designed and poorly documented. Nearly every effort to create something simpler only adds more complexity, as it tries to interoperate with a complex world. And so while I still think of LLMs as a threat to society at large, in the world of programming, they hold some promise of helping people sift through the complexity.
But as I’ve thought more about it, and played with them, my skepticism has returned. And so I’ve come to the following predictions.
First, LLMs will reduce complexity. And for somewhat obvious reasons: the world of open source shares a massive collection of solutions to common messy problems in programming languages and API usage, and so for the past two decades of the internet, the problem hasn’t been so much about solving those problems, but finding网页链接{the solutions} that someone has already written down. This was what enabled the explosive popularity of Stack Overflow and the decades of research on code search. LLM-driven code synthesis will be, and already are to an extent, better search engines for finding those solutions (albeit without giving anyone credit). This future will be a slightly better one for programmers, just like better documentation and better search has always been. It will make it easier to find the useful design patterns amidst an infinite space of wrong answers.
Of course, it will not do it correctly. Because most of that code that it’s trained on? It’s bad. It’s full of security defects, design flaws, usability problems, accessibility problems. These models know nothing of these flaws, because they do not know anything about design or software engineering. And the popularity of code, as anyone knows from upvoted wrong answers on Stack Overflow, is not a reliable indicator of its correctness. And so the best these engines can do is offer a guess that helps a developer get a jump start on a problem or think of a new direction. But developers will still have to do everything they already have to do to ensure correctness: verify, integrate, refactor, redesign, rearchitect, etc., as the world changes. These models will be, at best, a helpful but fallible brainstorming tool for short segments of code.
Of course, that is the best case scenario. Some developers are going to use these tools and trust the code they get. They’re going to put it into production code. And there will be many stories of that code causing problems. But you probably won’t hear them, because drawing the line between a failure and some code is a hard thing to do, and society still hasn’t decided to hold developers accountable for the code they write. And so most people will never even notice that bad code is leaking into the world, regurgitated by these 网页链接{stochastic parrots}. They will just experience the usual failures that software always has, just for slightly different reasons, and at a slightly higher pace.
And then there are learners, including the students I teach, the youth who are first encountering code, the teachers who are learning to code to teach it. One might imagine that (currently) free and easy program synthesis would be a great boon to students and teachers who are stuck, allowing them to create things they couldn’t before, and overcome challenges with greater ease. But that would be the hype talking. The reality is that while writing code is hard, the harder part for students (and really anyone) is understanding how it executes and then making decisions about what it should do differently. Program comprehension is what makes APIs hard to use (because they intentionally hide what they do, capturing behavior only through poorly written natural language). It’s what makes programming languages hard to use (because debugging tools are so poor slowing down enough to teach). It’s what makes large software systems hard to change and evolve (because of the sheer amount of code to understand). LLMs do nothing to make this comprehension or decision making easier.
And in an ironic way, having something else write code for you only makes program comprehension harder. Every developer already knows this: if you write yourself, you’re far more likely to understand its behavior than if someone else wrote it. This is even true if you wrote it, but a long time ago. And so getting some code that no one understands, because it was extruded from a probabilistic machine, may generate some of the hardest to understand code, with little of the human interaction that occurs on sites like Stack Overflow to provide some degree of rationale or context. (And, of course, if people stop writing content for sites like Stack Overflow, LLMs will have nothing to train on and stop being useful).
But comprehension won’t be the only new burden on learners. Some learners will also see LLM-driven program synthesis as yet another shortcut to avoid the hard task of learning, just as they do now with StackOverflow. This isn’t because they’re lazy, it’s because learning is hard, and we all do everything we can to avoid it when there are easier ways to solve a problem. Because of LLMs provide a path with such low resistance, I believe it will, and probably already is, leading to highly unproductive cycles of guess and check that avoid the hard tasks of planning and verification. Stack Overflow at least has so many dead ends that learners eventually realize it’s not a great resource for most problems and get back to reasoning and planning instead of searching. But LLMs will likely just aggravate the comprehension problems above by creating the illusion of right answers. I fear that the disincentive this creates to learn will ultimately result in more struggle, more drop out, and ultimately fewer people who will be willing to comprehend the mess.
Should LLMs exist? I think I feel the same about LLM-driven program synthesis as I do about any developer tool: we made this mess, so we should probably clean it up. Maybe LLMs can help. But it feels like a tool that just takes one mess and turns it into another mess in the endless pursuit of productivity. It does not feel like a revolution. And in the worst case, that new mess may be even harder to make sense of.
Could I be wrong? Of course! My curmudgeonly take above comes from 网页链接{twenty years} of studying programming and programmers, but all of that research and experience could be pointing me in the wrong direction. Maybe LLMs do fundamentally change programming and I just can’t see it. If that’s the case, I’m sure I’ll be the last to notice :)

02-17 14:00

OpenAI发布sora模型,一句话能生成一分钟视频,关键是非常丰富可以以假乱真,和实际拍摄的视频无法区分了,首当其冲电影业瑟瑟发抖了!这种信息吞吐量,这种制造效率,这种成本,超过机器代替人工的工业革命,想想工业革命给社会带来的巨大冲击,就知道这次人工智能将对社会造成怎样的影响了!!!人工智能具有无时间性、无限复杂性、神性,是奇点、制高点、最高点,如果说人类智能是初级智能的最高点,那么人工智能即将进入高级智能,智能才是生命的核心,意识只是智能的一个高级模型库!

人工智能面临的最大的不确定性还是理论和算法的问题。。当前依靠不断刷数据的人工智能竞赛其实有点像古代的数学家PK圆周率。。各种费劲人力物力财力时间的海量计算。。不停地死磕把圆周率算到小数点后xx位的竞争。。。。结果到了牛顿这里。。直接搞出了一个公式。把公式一套。。想算出多少位算出多少位。。所有之前的竞赛瞬间变得没有意义。。
这就是理论和算法突破的意义。。。AI也是类似的情况。。如果说互联网泡沫的时候大家好歹把怎么造出来的都搞清楚了。。也看到了互联网这个工具到底是个啥。。。
ai则是。现在是连看到是个啥都还没看到。。怎么造出来的其实也还没搞清楚。。。主要依靠的还是刷数据。。堆算力。。。。

02-17 12:41

学习,问一下,周鸿祎说的Sora对世界规律的理解,乃至想象扩散,从Andy的收集的信息和分析中,似乎找不到证据?GPT核心词的拓写,以及Patch图块学习和拓展,是以Transformer和Diffusion结合生成。觉得还是一种基于规则和学习的模拟,并没有超脱成规律的理解和想象。#新能源汽车# #比亚迪# $比亚迪(SZ002594)$

02-17 11:30

合成数据的大量应用是通向AGI的关键一步,据说本次sora的能力大突破,其中之一就是可能通过UE5、Unity、Nerf等大量生成了合成数据作为训练集。

02-17 10:56

转发:关于AI进展