What makes Python a great language for data science, is that so many people are familiar with it, and that it is an easy language to read. If you use a more obscure language like Clojure, Common Lisp, Julia, etc., many people will not be familiar with the language and unable to read or review your code. Peer review is fundamental to the scientific endeavor. If you only optimize on what is the best language for the task, there are clearly better languages than Python. If you optimize on what is best for science then I think it is hard not to argue that Python (and R) are the best choices. In science, just getting things done is not enough. Other people need to be able to read and understand what you are doing.
BTW AI is not helping and in fact is leading to a generation of scientists who know how to write prompts, but do not understand the code those prompts generate or have the ability to peer review it.
I can't speak for Julia - never used it; never used Common Lisp for analyzing data (I don't think it's very "data-oriented" for the modern age and the shape of data), but Clojure is really not "obscure" - it only looks weird for the first fifteen minutes or so; once you start using it - it is one of the most straightforward and reasonable languages out there - it is in fact simpler than Python and Javascript. Immutable-by-default makes it far much easier to reason about the code. And OMG, it is so much more data-oriented - it's crazy that more people don't use it. Most never even heard about it.
Common Lisp fan here, but not a data scientist. Why do you say to avoid CL for data analysis? Not trying to flame or anything, just curious about your experience with it.
I don't have great experience of using CL for analyzing data, because of "why?", if I already have another Lisp that is simply amazing for data.
Clojure, unlike lists in traditional Lisps, based on composable, unified abstraction for its collections, they are lazy by default and literal readable data structures, they are far easier to introspect and not so "opaque" compared to anything - not just CL (even Python), they are superb for dealing with heterogeneous data. Clojure's cohesive data manipulation story is where Common Lisp's lists-and-symbols just can't match.
Homework assignments notwithstanding, very few serious Common Lisp programs use lists and symbols as their primary data structures. This has been true since around 1985.
Common Lisp has O[1] vectors, multidimensional arrays, hash-tables (what Clojure calls maps), structs, and objects. It has set operations too but it doesn't enforce membership uniqueness. It also has bignums, several sizes of floats, infinite-precision rationals, and complex numbers. Not to mention characters, strings, and logical operations on individual bits. The main difference from Clojure is that CL data structures are not immutable. But that's an orthogonal issue to the suggestion that CL doesn't contain a rich library of modern data structures.
Common Lisp has never been limited to "List Processing."
I wasn't trying to denigrate Common Lisp, I'm sorry if I hurt your feelings. It does have comprehensive support for all kinds of data structures. I wasn't talking it being limited to "list processing". SBCL is great for many things, but from many practical points Clojure actually much better suited for data analysis.
You're saying: "hash-tables (what Clojure calls maps)" not only inaccurate, you're hand-waving Clojure's core design philosophy (immutability, structural sharing, lazy sequences) as orthogonal. But those aren't cosmetic differences - they're the reason why Clojure's data structures are fundamentally better for data analysis. I think you're confusing "having equivalent data types" with "solving the same problem the same way"
It doesn't require any Java but the docs do at times sort of assume you understand the JVM to some extent - which was a bit frustrating when first learning the language. It'll use terms like "classpath" without explaining what that is. However nowadays with LLMs these are insignificant speedbumps.
If you want to use Java you also don't really need to know Java beyond "you create instances of classes and call methods on them". I really don't want to learn a dinosaur like Java, but having access to the universe of Java libs has saved me many times. It's super fun and nice to use and poke around mature Java libs interactively with a REPL :)
All that said I'd have no idea how to write even a helloworld in Java
PS: Agreed on Emacs. I love Emacs.. but it's for turbo nerds. Having to learn Emacs and Clojure in parallel was a crazy barrier. (and no, Emacs is not as easy people make it out to be)
None of this even remotely true. I've gotten into Clojure without knowing jackshit about Java, almost ten years later, after tons of things successfully built and deployed, still don't know jackshit about Java. Mia, co-host of 'Clojure apropos' podcast was my colleague, we've worked together on multiple teams, she learned Clojure as her very first PL. Later she tried learning some Java and she was shocked how impossibly weird it looked compared to Clojure. Besides, you can use Clojure without any JVM - e.g., with nbb. I use it for things like browser automation with Playwright.
The tooling story is also very solid - I use Emacs, but many of my friends and colleagues use IntelliJ, Vim, Sublime and VSCode, and some of them migrated to it from Atom.
It might not be a problem for you, but it has been for many. I did start by reading through 3 Clojure books. The repl and the basic stuff like using lists is all easy of course, but the tooling was pretty poor compared to what I was used to (I like lisp, but Emacs is a commitment). Also, a lot of tutorials at the time definitely assumed java familiarity, especially with debugging java stack traces.
> It might not be a problem for you, but it has been for many
Do you have a habit of referring to yourself in plural, or do you typically like to generalize things based on your personal experiences?
I personally know many Clojurists who never had problems you're describing - hundreds of people. Sure, that could be the case of survivorship bias, perhaps I just don't befriend people who struggled with getting into Clojure specifically in a way you're describing. But like they say: "Those who are willing to make the effort will find the solutions. Those who aren't will find the excuses."
Clojure undeniably had challenges in the past, and still has some today. But not the things you're talking about. This is literally not an exaggeration - it's as easy as installing Calva extention for VSCode - that's all one needs to mess around with Clojure.
I've had this discussion here on HN several times over the years. Lots of comments from others have pointed out similar experiences. I'm guessing your experience was more positive and that's great to hear.
I did point out that maybe things had changed a good bit (literally said maybe VSCode made that easier now as it has for other tools) and tried to make it clear that my experience was a bit dated.
As far as excuses go, I don't see how that's relevant. I just pointed out I had issues with a steep learning curve when I was seriously considering it many years ago along with other languages that are hosted on the JVM (Scala, Kotlin) or .NET (F#). Nothing against those languages, but all the tutorials and even many of the books at the time would frequently borrow from the host language in weird ways. Like I'd have to use some random Java library and when it didn't work, had no idea how to troubleshoot why it wasn't there and I didn't want to have to go learn Java first.
I own at least two books on F# and talked with some prominent authors personally and they admitted it was really geared towards intermediate or greater C# users who wanted to move over to functional programming. I could have stuck with it, but decided to stick with other tools.
Clojure certainly is nice and I wanted to take advantage of it...it just ended up not being as ergonomic for my needs as I had hoped.
> What makes Python a great language for data science, is that so many people are familiar with it
While I agree with you in principal this also leads to what I call the "VB Effect". Back in the day VB was taught at every school as part of the standard curriculum. This made every kid a 'computer wizz'. I have had to fix many a legacy codebase that was started by someone's nephew the whizz kid.
Peer review is fundamental to scientific endeavor but... in ML fields, reviewers almost never check the code and Python package management is hardly reproducible. So clearly we are not there, Python or not.
That's ok, I don't think anyone knows how to properly write Julia. After using it for a while and following the community (watching talks, checking the forum etc), I don't think it has a concept of code quality. You just throw random code at the wall until it starts working. Which makes sense, considering most of the users are scientists.
BTW AI is not helping and in fact is leading to a generation of scientists who know how to write prompts, but do not understand the code those prompts generate or have the ability to peer review it.