RL is a lot more general than that, it is basically a way in which an agent learns to make optimal decisions by learning from experience to maximize rewards. So you can do all kinds of stuff other than finetuning LLMs with it, like controlling a robotic arm, playing/mastering videogames, etc. For example, AlphaGo was also RL.