> Instead of trying to make short scales that are really highly internally consistent, we could try to make short scales that are maximally correlated with the full scale's score.
Is this true? If a test has very low reliability, is it possible for that test to have a high correlation with anything?
"My reason for bringing this up is that in the current AI frenzy, there's a lot of focus on getting the LLMs to do stuff, but they are generally hacky and suck at math, and tend to just make stuff up. For the purpose of writing some of the code for this article, I had to ask GPT4 about 10 times to fix a function it made me, even after I gave it the right approach. Still, eventually, it did make the right model."
Now assume that you don't in fact know that the function the LLM produced is wrong and go use the wrong one one something critical where it doesn't crash but gives you the wrong answer
Although TBF certain people e.g. Neil Ferguson at Imperial, used the output of models they knew were bad because they were non-deterministic to lock us all down for "2 weeks to slow the spread"
> Instead of trying to make short scales that are really highly internally consistent, we could try to make short scales that are maximally correlated with the full scale's score.
Is this true? If a test has very low reliability, is it possible for that test to have a high correlation with anything?
"My reason for bringing this up is that in the current AI frenzy, there's a lot of focus on getting the LLMs to do stuff, but they are generally hacky and suck at math, and tend to just make stuff up. For the purpose of writing some of the code for this article, I had to ask GPT4 about 10 times to fix a function it made me, even after I gave it the right approach. Still, eventually, it did make the right model."
Now assume that you don't in fact know that the function the LLM produced is wrong and go use the wrong one one something critical where it doesn't crash but gives you the wrong answer
https://ombreolivier.substack.com/p/llm-considered-harmful
Although TBF certain people e.g. Neil Ferguson at Imperial, used the output of models they knew were bad because they were non-deterministic to lock us all down for "2 weeks to slow the spread"
Well, I can read the code, and I did some testing. In any case, if I write a function, there's no guarantee it's right in all edge cases either.