Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry, but what does anything you've said there have to do with the Othello paper?

The point of that paper was that the AI was given nothing but sequences of move locations, and it nonetheless intuited the "world model" necessary to explain those locations. That is, it figured out that it needed to allocate 64 binary values and swap some of them after each move. The paper demonstrated that the AI was not just doing applied statistics on character strings - it had constructed a model to explain what the strings represented.

"Strategy", meanwhile, has nothing to do with anything. The AI wasn't trained on competitive matches - it had no way of knowing that Othello has scoring, or even a win condition. It was simply trained to predict which moves are legal, not to strategize about anything.



> The point of that paper was that the AI was given nothing but sequences of move locations, and it nonetheless intuited the "world model" necessary to explain those locations

Yes...

> That is, it figured out that it needed to allocate 64 binary values and swap some of them after each move.

Yes, but "figured out" is misleading.

It didn't invent or "figure out" the model. It discovered it, just like any other pattern it discovers.

The pattern was already present in the example game. It was the "negative space" that the moves existed in.

> "Strategy", meanwhile, has nothing to do with anything. The AI wasn't trained on competitive matches - it had no way of knowing that Othello has scoring, or even a win condition. It was simply trained to predict which moves are legal, not to strategize about anything.

Yes, and that is critically important knowledge; yet dozens, if not hundreds, of comments here are missing that point.

It found a model. That doesn't mean it can use the model. It can only repeat examples the of "uses" it has already seen. This is also the nature of the model itself: it was found by looking at the structural patterns of the example game. It was not magically constructed.

> predict what moves are legal

That looks like strategy, but it's still missing the point. We are the ones categorizing GPT's results as "legal". GPT never uses the word. It doesn't make that judgement anywhere. It just generates the continuation we told it to.

What GPT was trained to do is emulate strategy. It modeled the example set of valid chronological game states. It can use that model to extrapolate any arbitrary valid game state into a hallucinated set of chronological game states. The model is so accurate that the hallucinated games usually follow the rules. Provided enough examples of edge cases, it could likely hallucinate a correct game every time; but that would still not be anything like a person playing the game intentionally.

The more complete and exhaustive the example games are, the more "correctly" GPT's model will match the game rules. But even having a good model is not enough to generate novel strategy: GPT will repeat the moves it feels to be most familiar to a given game state.

GPT does not play games, it plays plays.


> It found a model. That doesn't mean it can use the model.

It used the model in the only way that was investigated. The researchers tested whether the AI would invent a (known) model and use it to predict valid moves, and the AI did exactly that. They didn't try to make the AI strategize, or invent other models, or any of the things you're bringing up.

If you want to claim that AIs can't do something, you should present a case where someone tried unsuccessfully to make an AI do whatever it is you have in mind. The Othello paper isn't that.


"GPT will repeat the moves it feels to be most familiar to a given game state"

That's where temprature comes in. AI that parrots the highest probability output every time tends to be very boring and stilted. When we instead select randomly from all possible responses weighted by their probability we get more interesting behavior.

GPT also doesn't only respond based on examples it has already seen - that would be a markov chain. It turns out that even with trillions of words in a dataset, once you have 10 or so words in a row you will usually already be in a region that doesn't appear in the dataset at all. Instead the whole reason we have an AI here is so it learns to actually predict a response to this novel input based on higher-level rules that it has discovered.

I don't know how this relates to the discussion you were having but I felt like this is useful & interesting info


> GPT also doesn't only respond based on examples it has already seen - that would be a markov chain

The difference between GPT and a Markov chain is that GPT is finding more interesting patterns to repeat. It's still only working with "examples it has seen": the difference is that it is "seeing" more perspectives than a Markov chain could.

It still can only repeat the content it has seen. A unique prompt will have GPT construct that repetition in a way that follows less obvious patterns: something a Markov chain cannot accomplish.

The less obvious patterns are your "higher level rules". GPT doesn't see them as "rules", though. It just sees another pattern of tokens.

I was being very specific when I said, "GPT will repeat the moves it feels to be most familiar to a given game state."

The familiarity I'm talking about here is between the game state modeled in the prompt and the game states (and progressions) in GPT's model. Familiarity is defined implicitly by every pattern GPT can see.

GPT adds the prompt itself into its training corpus, and models it. By doing so, it finds a "place" (semantically) in its model where the prompt "belongs". It then finds the most familiar pattern of game state progression when starting at that position in the model.

Because there are complex patterns that GPT has implicitly modeled, the path GPT takes through its model can be just as complex. GPT is still doing no more than blindly following a pattern, but the complexity of the pattern itself "emerges" as "behavior".

Anything else that is done to seed divergent behavior (like the temperature alteration you mentioned) is also a source of "emergent behavior". This is still not part of the behavior of GPT itself: it's the behavior of humans making more interesting input for GPT to model.


What is the closest approach we know of today that plays games, not plays? The dialogue above is compelling, and makes me wonder if the same critique can be levied against most prior art in machine learning applied against games. E.g. would you say the same things about AlphaZero?


> It didn't invent or "figure out" the model. It discovered it, just like any other pattern it discovers.

Sure, and why isn't discovering patterns "figuring it out"?


What can be done with "it" after "figuring out" is different for a person than for an LLM.

A person can use a model to do any arbitrary thing they want to do.

An LLM can use a model to follow the patterns that are already present in that model. It doesn't choose the pattern, either: it will start at whatever location in the model that the prompt is modeled into, and then follow whatever pattern is most obvious to follow from that position.


> An LLM can use a model to follow the patterns that are already present in that model.

If that were true then it would not be effective at zero-shot learning.

> It doesn't choose the pattern, either: it will start at whatever location in the model that the prompt is modeled into, and then follow whatever pattern is most obvious to follow from that position.

Hmm, sounds like logical deduction...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: