Here's a more positive challenge for the "AI scientists": reproduce a paper. From the PDF alone, implement and re-run the experiments as written, filling in the gaps as necessary and get the same results.
That's a lot more meaningful than passing peer review and it would actually be quite useful to automate. It could become a required check on any accepted paper.