There is something deep in this observation. When I reflect on how I write code, sometimes it’s backwards. Sometimes I start with the data and work back through to the outer functions, unnesting as I go. Sometimes I start with the final return and work back to the inputs. I notice sometimes LLMs should work this way, but can’t. So they end up rewriting from the start.
Makes me wonder if future llms will be composing nonlinear things and be able to work in non-token-order spaces temporarily, or will have a way to map their output back to linear token order. I know nonlinear thinking is common while writing code though. current llms might be hiding a deficit by having a large and perfect context window.
Right, but that smoothly(ish) resolves all at the same time. That might be sufficient, but it isn't actually replicating the thought process described above. That non-linear thinking is different than diffuse thinking. Resolving in a web around a foundation seems like it would be useful for coding (and other structured thinking, in general).
With enough resolution and appropriately chosen transformation steps, it is equivalent. E.g., the diffusion could focus on one region and then later focus on another, and it's allowed to undo the effort it did in one region. Nothing architecturally prohibits that solution style from emerging.
The choice of transformation steps to facilitate this specific diffuse approach seems like a non-trivial problem. It doesn't follow such an organic solution would emerge at all, now, does it?
The pattern ", now, " is indicative of a sort of patronization I don't normally engage with, but, yes, you're correct.
In some measure of agreeing with you: For other classes of models we know for a fact that there exist problems which can be solved by those architectures and which can't be trained using current techniques. It doesn't feel like a huge stretch that such training-resistent data might exist for diffusion models.
That said, I still see three problems. Notably, the current ancestral chain of inquiry seems to care about the model and not the training process, so the point is moot. Secondarily, in other similar domains (like soft circuits) those organic solutions do seem to emerge, suggesting (but not proving) that the training process _is_ up to par. Lastly, in other related domains, when such a solution doesn't emerge it ordinarily happens because some simpler methodology achieves better results, meaning that even with individual data points suggesting that diffusion solutions don't model that sort of linearity you still need to work a little bit to prove that such an observation actually matters.
The process of developing software involves this kind of non-linear code editing. When you learn to do something (and the same should go for code, even if sometimes people don't get this critical level of instruction), you don't just look at the final result: you watch people construct the result. The process of constructing code involves a temporarily linear sequence of operations on a text file, but your cursor is bouncing around as you put in commands that move your cursor through the file. We don't have the same kind of copious training data for it, but thereby what we really need to do is to train models not on code, but on all of the input that goes into a text editor. (If we concentrate on software developers that are used to do doing work entirely in a terminal this can be a bit easier, as we can then just essentially train the model on all of the keystrokes they press.)
There's a fair amount of experimental work happening trying different parsing and resolution procedures such that the training data reflects an AST and or predicts nodes in an AST as an in-filling capability.
LLMs don't have memory, so they can't build anything. Insofar as they produce correct results, they have implicit structures corresponding to ASTs built into their networks during training time.
> Sometimes I start with the final return and work back to the inputs.
Shouldn't be hard to train a coding LLM to do this too by doubling the training time: train the LLM both forwards and backwards across the training data.
GP is talking about the nonlinear way that software engineers think, reason, and write down code. Simply doing the same thing but backwards provides no benefit.
I was under the impression these offices closed during the pandemic and the return to office order is bringing people back into those places. If that’s the case I don’t think this is some sort of planned disruption but rather poor planning. Incompetence vs malice right?
The automation should be setting flags on videos. Users should have preferences for opting in or out of flags with reasonable defaults. If there is a jurisdictional requirement in a users location YouTube sets the preference to disabled according to the law and shows a link to the regional law so users understand.
Hence abuse is a local thing too. One can be getting flagged in one region but not in another. ‘Abuse’ amounts to getting certain flags auto-applied in some locations or whatever. Should not affect the account itself though.
I have a planned trip to work around. I want to make sure those are booked, and allocate the rest optimally. I suppose this is the same problem as having extra days to allocate.
I used to do something like this all the time with C/C++ compiler tests. I tried lots of fancy tools and stuff, but I kept going back to: expand all macros and retokenize to one token per line (I made a custom build of the preprocessor that had this built in). Then, have a shell script randomly remove lines, and use another script to check that the resulting test case behaves consistently with the failure. It would run for a few hours (or days, for boost problems), then usually you'd get a really minimal testcase that shows the problem. Often I would use this to find regressions. Just have the shell script check one is good, the other has the problem. The resulting output would then usually point exactly at the regressed feature, and make an amazing unit test.
Before leaving compiler team, I wanted to find a way to do this one the AST level (so parens always go in pairs, etc), but that could also complect with the bug.
I wonder if LLMs could accelerate this by more intelligently removing stuff first, iteratively?
Sophisticated reducers like C-Reduce do know things like that parens go in pairs. C-Reduce has many transformation operations, and while some are syntax agnostic (delete a few characters or tokens), others use Clang to try to parse the input as C++ and transform the AST.
Perses isn't language agnostic, it just knows the syntax of a lot of languages because there are antlr grammars for most commonly used languages.
Really there's no such thing as a language-agnostic test-case reducer. shrink ray is much closer than most, but all this means is that it's got some heuristics that work well for a wide variety of common languages (e.g. the bracket balancing thing). It's also got a bunch of language-specific passes.
This is sortof inherent to the problem, because in order to get good results and good performance, a test-case reducer has to have a strong idea of what sort of transformations are likely to work, which in turn means it has to have a strong idea of what sort of languages it's likely to be run on.
I am just shooting in the dark here so excuse me if my comment is too ignorant: have you considered rolling your own reducer and use TreeSitter grammars for it?
Env vars over-share and files depend on local permissions. We should have a capabilities -like way to send secrets between processes. e.g., decrypt and expose on a Unix socket with a sha filename that can only be read from once, and then gets torn down. Share the file name, target can read it and immediately the secret is now at-rest encrypted.
Encryption based on config containing a whitelist of ssh public keys and what they can access, sort of like age.
I like the idea of autolayout like flex for routing wires. There needs to be things like busses etc., feels like there is something there that could be flexed. Problem is the multiple dimensions of connections, so maybe something inspired by grid layout and grid template area?
Exactly! I'm hoping to expose autorouter/autolayout to userland (e.g. "drop your function here") and see what clever people come up with. I have some ideas but there's so many fun ways to do it so I'm cautious to sink too much time in. But it's a super important problem and I do think a new multi-layer flexbox/cssgrid could make wiring work really well.
Makes me wonder if future llms will be composing nonlinear things and be able to work in non-token-order spaces temporarily, or will have a way to map their output back to linear token order. I know nonlinear thinking is common while writing code though. current llms might be hiding a deficit by having a large and perfect context window.