When I took a Waymo in California, it was a nearly perfect experience. The one exception was when a person in front of me stopped and tried to parallel park on the right side of the road. The Waymo wasn't giving them enough space, sort of inching forward as they attempted to back in. I felt bad - I could see as we passed that the car had stressed them out. I still think Waymos are good overall, but it kind of surprised me it didn't know how to handle this situation.
Honestly, even though it failed, I'm kind of impressed that the trajectory mostly stays in the lines. If you remove all but two openings, does it work? The drawing you show has more than two openings, some of which are inaccessible from the inside of the maze.
It's ASCII art, so the "trajectory" will always stay within the lines, because you can't have the ● and ║ characters intersect each other.
The only impressive part would be that the trajectory is "continuous", meaning for every ● there is always another ● character in one of the 4 adjacent positions.
I thought adversarial testing like this was a routine part of software engineering. He's checking to see how flexible it is. Maybe prompting would help, but it would be cool if it was more flexible.
So the idea is what? What's the successful outcome look like for this test, in your mind? What should good software do? Respond and say there are 5 legs? Or question what kind of dog this even is? Or get confused by a nonsensical picture that doesn't quite match the prompt in a confusing way? Should it understand the concept of a dog and be able to tell you that this isn't a real dog?
You know, I had a potential hire last week, and I was interviewing this one guy whose resume was really strong, it was exceptional in many ways plus his open-source code was looking really tight. But at the beginning of the interview, I always show the candidates the same silly code example with signed integer overflow undefined behavior baked in. I did the same here and asked him if he sees anything unusual with it, and he failed to detect it. We closed the round immediately and I disclosed no hire decision.
Does the ability to verbally detect gotchas in short conversations dealing only with text on a screen or white board really map to stronger candidates?
In actual situations you have documentation, editor, tooling, tests, and are a tad less distracted than when dealing with a job interview and all the attendant stress. Isn't the fact that he actually produces quality code in real life a stronger signal of quality?
You're correct, however midwit people who don't actually fully understand all of this will latch on to one of the early difficult questions that was shown as an example, and then continued to use that over and over without really knowing what they're doing while the people developing the model and also testing the model are doing far more complex things
And before they were rights encoded in law were they rights?
I feel it makes your claim weaker to go from "should have" to "is a right" if there's any doubt in it.
There's strong "we have a right to ancillary thing" arguments you can make that rely on a right, but those rely on that right being a given, not the premise
When somebody says "X is a right", that does not necessarily mean they think the case is closed and the discussion is over. It can also mean that they are making an assertion, which frames the discussion for the follow-up questions that you are now making.
The target trial emulation specifies "individuals deceased or vaccinated during the 6 month grace period between the index date and the effective start of follow-up" as an exclusion criteria
These are fair questions. I guess I'd first say that there were studies on reactogenic/immunogenic effects coincident w/ the original rollout. But perhaps more importantly, this study conditions on the time frame, meaning that it applies to all participants, and should thus not affect the risk ratio. I think knowing the current hazard risk ratio is more scientifically/medically valuable than the previous one (even if I'm skeptical that it has changed significantly for reasons other than noise)
reply