Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Worth noting that there has been a fair bit of good research in causal machine learning in the last year or so, for example "Implicit Causal Models for Genome-wide Association Studies" (https://arxiv.org/pdf/1710.10742.pdf).

The key point of this paper is that neural networks really are very good at "curve fitting" and that this curve fitting in the context of variational inference has advantages for causal reasoning, too.

Neural networks can be used in a variety of structures, and these structures tend to benefit from the inclusion of powerful trainable non-linear function approximators. In this sense, deep learning will continue to be a powerful tool despite some limitations in its current use.

I think Pearl, who's obviously remained very influential for many practitioners of machine learning, knows the value of "curve fitting". However I think it's a bit hard for a brief interview to sit down and have a real conversation about the state of the art of an academic field and the "Deep Learning is Broken" angle is a bit more attractive.



It's worth considering that anywhere in graphical models where coefficients of any sort learned can be augmented by neural networks (such as in the last decade of natural language processing, where the SOTA of almost all problems has been successfully neuralized).

I wonder if Deep Belief Machines and their flavor of generative models, which seem closer in nature to Pearl's PGMs, have a chance to bridge the gap involved.

Edit, as an aside: Given the enormously high dimensionality of personal genomes and the incredibly small sample size, for over a decade I've failed to put any trust in GWAS studies and found my suspicion supported on a number of occasions, considering difficulty in reproducibility likely brought about by the above problem. Is there any reason to think that improved statistical methods can possibly surmount the fundamental problem of limited sample size and high dimensionality?


Numerous important biomedical findings have resulted from GWAS. Most GWAS today are inherently reproducible since their hits usually come from multi-stage designs with independent samples. Sample sizes are no longer "incredibly small" either; large GWAS often have in the order of 100s of 1000s of patients. Some have over a million.

I suppose the most important idea is that GWAS aren't really supposed to show causality. "Association" is in the name. GWAS are usually hypothesis generating (e.g., identification of associated variants) and then identified variants can be probed experimentally with all of the tools of molecular biology.

In summary, GWAS have their problems, but I think your statement is a bit too strong.


Mendelian randomization is a good technique to start thinking about causality for epidemiological studies.

This is a good paper that demonstrates the approach: https://www.nature.com/articles/srep16645 Millard, Louise AC, et al. "MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization." Scientific reports 5 (2015): 16645.


Thousands of samples and millions of dimensions still doesn’t strike me as an easy problem, but it makes sense to me that downstream molecular biology can verify putative associations. Thank you for weighing in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: