LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > The Final

Thread Tools Display Modes
Prev Previous Post   Next Post Next
Old 05-27-2013, 03:28 PM
Michael Reach Michael Reach is offline
Senior Member
Join Date: Apr 2013
Location: Baltimore, Maryland, USA
Posts: 71
Default Data snooping and science

I was very struck by the idea mentioned in Lecture 17 (and in the book) that you can be guilty of data snooping because you are using information about the data that your colleagues gained earlier. It was especially interesting because I just watched a video from a famous physicist where (I think) he violated this rule. In case you're interested,
(the part I wanted starts at around 14:30)
Dr. Muller (head of BEST temperature project) points at world temperature anomalies for the last century or two. He says that what should really impress skeptics is: he can model the anomaly with just one parameter.
He can't, right? There is a set of temperature anomalies, and there are literally dozens of models representing the data. There are zillions of parameters and ways of representing the data, and I don't see how to ignore all that at this late date.

I'm really kind of confused (I know little about climate science - I'm asking about the Machine Learning side of this). Say I would want to become a big expert and analyze climate models, and figure out sensitivity to CO2 or such. How would one begin to give an estimate of the generalization of one's modelling? You have only one data set, so and so much data over the last century or whatever. It really isn't that much data, and there sure are a lot of variables to tweak and potential hypotheses. And then you test it: each year you get a few more data points to check. Coming in from the standpoint of this course, it would seem almost hopeless to me. Is that correct?
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -7. The time now is 02:48 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.