This study was really highlighting a statistical issue which would occur with any imaging technique with noise (which is unavoidable). If you measure enough things, you'll inevitably find some false positives. The solution is to use procedures such as Bonferroni and FDR to correct for the multiple tests, now a standard part of such imaging experiments. It's a valid critique, but it's worth highlighting that it's not specific to fMRI or evidence of shaky science unless you skip those steps (other separate factors may indicate shakiness though).
When we published the salmon paper, approximately 25-35% of published fMRI results used uncorrected statistics. For myself and my co-authors, this was evidence of shaky science. The reader of a research paper could not say with certainty which results were legitimate and which might be false positives.
Hey, I know you got a lot of flack for the article. So, I just wanted to thank you for having the courage to publish it anyways and go through all of that for all of us.
I go back to the study frequently when looking at MRI studies, and it always holds up. It always reminds me to be careful with these things and to try to have other be careful with their results too. Though to me it's a bit of a lampooning, surprisingly it has been the best reminder for me to be more careful with my work.
So thank you for putting yourself through all that. To me, it was worth it.
Many thanks - appreciate the kind words. Thanks also for always working to work with care in your science. It makes all the difference.
Among other challenges, when we first submitted the poster to the Human Brain Mapping conference we got kicked out of consideration because the committee thought we were trolling. One person on the review committee said we actually had a good point and brought our poster back in for consideration. The salmon poster ended up being on a highlight slide at the closing session of the conference!
Thank you for publishing that paper, which I think greatly helped address this problem at the time, which you accurately describe. I guess things have to be taken in their historical context, and science is a community project which may not uniformly follow best practices, but work like this can help get everyone in line! It's unfortunate, and no fault of the authors, that the general public has run wild with referencing this work to reject fMRI as a experimental technique. There's plenty of different ways to criticize it today, for sure.
> a statistical issue which would occur with any imaging technique
I sounds like it goes beyond that: If a certain mistake ruins outcomes, and a lot of people are ruining outcomes and not noticing, then there's some much bigger systematic problem going on.
To a large extent, I think this could be solved by labs having more long-term permanent research staff (technicians, data analysts, scientists) and reducing the number of PhD students. Many students would gladly stay on in that position instead of leaving, so it increases job opportunities. It would also improve the quality of the science because the permanent staff would have more historical knowledge, in contrast to the current situation where students constantly rotate in and out with somewhat messy hand-offs. The students could also then focus more on scholarly work, planning and overseeing research execution with the team. The problem is that the incentives are aligned to allocate students to doing all lab tasks, not long term staff. I think we could change this through changes to the requirements and structure of science funding mechanisms however, since ultimately that's the source of the incentives.
One other feature with CLAUDE.md I’ve found useful is imports: prepending @ to a file name will force it to be imported into context. Otherwise, whether a file is read and loaded to context is dependent on tool use and planning by the agent (even with explicit instructions like “read file.txt”). Of course this means you have to be judicial with imports.
When asking customers how well they were helped by the customer support system (via CSAT score), I've found industry-standard AI support agents will generally perform worse than a well-trained human support team. AI agents are fine at handling some simple queries, e.g. basic product and order informatino, but support issues are often biased towards high complexity, because otherwise they could solve it in a more automated. I'm sure it depends on the industry, and whether the customer's issue is truly novel.
Taking a shot at this, one concrete definition might be that the business model is essentially white labeling, that is, the base LLM is rebranded, but task performance in the problem domain is not functionally improved in some measurable way. As a corollary, it means the user could receive the same value if they had gone straight to the base LLM provider.
I think this might be more narrow than most uses of the term “wrapper” though.
Yes, this feature might be a prime driver of user engagement and retention, and it could even emerge "naturally" if those criteria are included for optimization in RLHF. In the same way that the infinite scrolling feed works in social media, the deranged sycophant might be the addictive hook for chatbots.
I've always hated this saying, and I think the reason applies here too.
If you take up running and it never gets easier, that means you're never managing your pace and you're always going full throttle. That's a straight shot towards injury if not chronic disability. Most aerobic benefits happen at zone 2, where your heart rate is just above 'easy effort'. When you start out, this might just be walking, so it makes sense to run. But once you are able to sprint, you open up the ability to do more than just walk or sprint. You can jog, skip, run at a tempo pace, run at a race pace, etc., and you need to do those to maintain fitness and build up your chronic training load. That's not to say there aren't hard efforts at times, like when you do a sprint workout or hill repeats, but 90% of the time it should be and feel easier than when you started.
You can bring that to programming too. If it never gets easier, that means you're always pushing yourself and seeking challenges. That's not good for you, your coworkers, or your projects... everyone needs some grounding and to perform at a level they excel at. Not only will your velocity be more predictable, you won't burn out as easily. Challenges that increase that comfortable pace can be sought out, but usually they come naturally too.
Lemond's statement is in reference to racing. The race isn't won in zone 2. Same with programming. Nothing wrong with a lot of zone 2 programming, in fact it's quite important to maintain balance like you describe, but the race isn't won with comfortable work.
I’ve seen my share of programmers who showed off all-nighters and productive weekends, only to realize they compensate but not actually working productively their entire 40 hours, not even half. Programming is not a race, it’s a marathon.
My issue with the 40 hours is the constant interruptions with meetings, co-workers asking questions, shifting priorities, etc.
When pulling an all-nighter or weekend, all of that stuff goes away.
It’s been a long time since I pulled and all-nighter or weekend, but I’ve been thinking about it just so I can feel like I finished something. If I could get actual heads down time during my 40 hours, I’d much prefer to use that time.
I’m too burned out from the 40 hours of BS. When I have a couple weeks off I usually get inspired to start a side project. I get started, then in quickly dies once I start back up at work. I hardly touch my personal computer anymore.
The saying resonates with me. I have different problems programming now compared to when I started. But I still bang my head against a wall until it gives or I leave with bruises. I may not notice the little walls I step over now, and I learnt which walls to respect. The easy stuff I do on the side.
reply