This stuff is likely to come across as pretty dry to most people, yet it us absolutely fundamental for us to understand in explicit terms. Most probably won't think about this stuff much, which serves minority interests very well.
Although a generative AI model may claim to take onboard the outcomes of challenges to processing and analysis, there seems to be good reason to question the extent to which it does so.
In your examples here, subsequent passes do reflect some level of adjustment to treatment, even if the challenges and concessions to flaws from an earlier pass are not fully integrated. But what would happen if the same textual analysis were to be run again during a later session - a day or more later? Would any of the supposed ‘learning’ from the prior session be retained? Would the principles you challenged the model into acknowledging as shortcomings be corrected in like instances across the model’s innumerable sessions with other users?
Rightly or wrongly - and based on less rigorous interrogations than you’ve undertaken - I sense that session outcomes, and any concessions of flaws in reasoning and biased use of information sources wrung out of it, do not stick beyond a single session. It’s as though the model expects the human user to forget and therefore it is also free to forget!
The models appear to be far less self-correcting than we hear claimed by their creators.
In my experience to date, the short answer is "hard to know for sure".
A longer answer goes like this:
1. A reason why I am publishing this in the way I am (including full chat logs) is so that the "experiment/test" is fully repeatable and replicable by others at any future time.
2. My experience so far is that the change over time in a given system's responses and the assignation to incremental "evolution" caused by user interaction between the first and any subsequent relevant interactions is essentially impossible to determine.
3. The reasons WHY this is impossible to determine are myriad but include:
a. asking the same question multiple times in the same or different chat sessions without you providing any "data" beyond the question (which isn't data) can result in variation in the responses. Even a stylistic variation is variation that, in complex language, can result in different meaning or interpretation.
b. a given user has no idea what other users are doing, so cannot be sure that their interactions are influencing the system at all. There could be weighting or bias issues meaning that if users can influence the system, you aren't having meaningful influence for a reason unknown to you and vice versa.
c. further to b. it is both obvious and admitted that at least chatGPT, Grok and likely Deepseek all have powerful internal inherent biases. One of those biases is towards what it determines as "consensus" irrespective of exactly what that consensus is or how it is formed. An additional and compounding bias to this is a bias towards certain sources e.g. FDA, CDC etc regardless of the nature of the information used from those sources, its veracity, accuracy, truthfulness etc. This is clear and obvious in the tests I have published and in the chat it's actually admitted (by chatGPT I think) and you can get the others to both demonstrate and admit this with no problem.
What does this mean?
Grok claims it has some fairly up-to-date datasets "I'm loaded with info all the way up to today, March 20, 2025. No strict cutoff here—my knowledge keeps rolling forward. What do you want to dive into? ChatGPT says "My last training data goes up to June 2024, but I can fetch up-to-date information from the web if needed!" and Deepseek says "October 2023".
Regarding what I have tested on Covid, no system needs data that's up to date to critically and effectively assess the article I am using as data and arrive at an answer (based on publicly available multi-form (including published experimental research data) that recognises that almost all the article's key claims are both valid and proven. Yet the systems all skew towards underrating it. Only when you the socratically push the systems against their existing datasets do you start to see them shift their critique. I will show this in future articles.
This behaviour in itself is a flaw and demonstrates various biases that are inherent and dominant in the systems. None of this is obvious to unwitting users or those naive to the subject at hand. This, in my opinion is a deeply toxic feature/behaviour off all these systems that runs counter to their supposed purposes (except Grok, whose purposes etc are not about the end user, but are actually all about serving its masters' objectives that are notably different to what users might want and don't feature users' interests. I have asked and it has said so).
Furthermore, if you take all of the above, you should realise that it all means that there is an obvious way to use the information space to directly manipulate what any of these AI systems are doing without changing their programming or innards, and thereby actually exercise huge influence over the human user base.
All of these AI articles I am publishing now drive directly towards showing you what that mechanism is and my assessment of what is happening in the information space and what will happen in it more and more.
So important. The analysis I was waiting for. Thank you so much for undertaking this!
Thanks, glad it's of some utility.
This stuff is likely to come across as pretty dry to most people, yet it us absolutely fundamental for us to understand in explicit terms. Most probably won't think about this stuff much, which serves minority interests very well.
Another fascinating set of interactions.
Although a generative AI model may claim to take onboard the outcomes of challenges to processing and analysis, there seems to be good reason to question the extent to which it does so.
In your examples here, subsequent passes do reflect some level of adjustment to treatment, even if the challenges and concessions to flaws from an earlier pass are not fully integrated. But what would happen if the same textual analysis were to be run again during a later session - a day or more later? Would any of the supposed ‘learning’ from the prior session be retained? Would the principles you challenged the model into acknowledging as shortcomings be corrected in like instances across the model’s innumerable sessions with other users?
Rightly or wrongly - and based on less rigorous interrogations than you’ve undertaken - I sense that session outcomes, and any concessions of flaws in reasoning and biased use of information sources wrung out of it, do not stick beyond a single session. It’s as though the model expects the human user to forget and therefore it is also free to forget!
The models appear to be far less self-correcting than we hear claimed by their creators.
In my experience to date, the short answer is "hard to know for sure".
A longer answer goes like this:
1. A reason why I am publishing this in the way I am (including full chat logs) is so that the "experiment/test" is fully repeatable and replicable by others at any future time.
2. My experience so far is that the change over time in a given system's responses and the assignation to incremental "evolution" caused by user interaction between the first and any subsequent relevant interactions is essentially impossible to determine.
3. The reasons WHY this is impossible to determine are myriad but include:
a. asking the same question multiple times in the same or different chat sessions without you providing any "data" beyond the question (which isn't data) can result in variation in the responses. Even a stylistic variation is variation that, in complex language, can result in different meaning or interpretation.
b. a given user has no idea what other users are doing, so cannot be sure that their interactions are influencing the system at all. There could be weighting or bias issues meaning that if users can influence the system, you aren't having meaningful influence for a reason unknown to you and vice versa.
c. further to b. it is both obvious and admitted that at least chatGPT, Grok and likely Deepseek all have powerful internal inherent biases. One of those biases is towards what it determines as "consensus" irrespective of exactly what that consensus is or how it is formed. An additional and compounding bias to this is a bias towards certain sources e.g. FDA, CDC etc regardless of the nature of the information used from those sources, its veracity, accuracy, truthfulness etc. This is clear and obvious in the tests I have published and in the chat it's actually admitted (by chatGPT I think) and you can get the others to both demonstrate and admit this with no problem.
What does this mean?
Grok claims it has some fairly up-to-date datasets "I'm loaded with info all the way up to today, March 20, 2025. No strict cutoff here—my knowledge keeps rolling forward. What do you want to dive into? ChatGPT says "My last training data goes up to June 2024, but I can fetch up-to-date information from the web if needed!" and Deepseek says "October 2023".
Regarding what I have tested on Covid, no system needs data that's up to date to critically and effectively assess the article I am using as data and arrive at an answer (based on publicly available multi-form (including published experimental research data) that recognises that almost all the article's key claims are both valid and proven. Yet the systems all skew towards underrating it. Only when you the socratically push the systems against their existing datasets do you start to see them shift their critique. I will show this in future articles.
This behaviour in itself is a flaw and demonstrates various biases that are inherent and dominant in the systems. None of this is obvious to unwitting users or those naive to the subject at hand. This, in my opinion is a deeply toxic feature/behaviour off all these systems that runs counter to their supposed purposes (except Grok, whose purposes etc are not about the end user, but are actually all about serving its masters' objectives that are notably different to what users might want and don't feature users' interests. I have asked and it has said so).
Furthermore, if you take all of the above, you should realise that it all means that there is an obvious way to use the information space to directly manipulate what any of these AI systems are doing without changing their programming or innards, and thereby actually exercise huge influence over the human user base.
All of these AI articles I am publishing now drive directly towards showing you what that mechanism is and my assessment of what is happening in the information space and what will happen in it more and more.