Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are you planning to switch to programming-language optimized model inside Phind? So, if a user is asking for something related to Python, that the python-optimized model gets used?

If so:

The Object Pascal language is completely out of fashion, and the most non-hyped language there is. However, there are hundreds of thousands of active users of Delphi, FreePascal and Lazarus. And due to the language being stable for over 20 years, there also is a gigantic amount of highest-quality code available. As most of it is neither on Github nor StackOverflow, Pascal code is dramatically underrepresented in GPT3.5, GPT-4 - and therefore also in Phind.

I'd like to finally be able to use AI-assisted programming with Pascal.

In case you are interested in that, I would be willing to internally pay for the work to prepare a good dataset of high quality code with comments/context/prompts.

If you are not interested, is there any chance that you are going to release the code and toolchain used to fine-tune CodeLlama, so I could do it myself?



Yes, this is the direction we're heading towards! We're building a mixture of experts of different coding models that we will deploy for precisely this use case.


Nice!

I suppose that Pascal is not on your planned list of supported languages, right?


Why would it? Do you know how much it costs to finetune one of these models for such a niche language? I'm not just talking about the cost of training, but also the cost of acquiring data because there's much less data about niche languages.


96 x A100 hours for a finetune according to the article.

The cost of the dataset curation for a given language is hard to quantify as there are many unknowns. However, it seems perfectly crowdsourcable to volunteers.


A project like SETI@home should help these efforts I believe?


Maybe the stable horde could work it into the project.

https://stablehorde.net/


It certainly costs much less to the society to train for Pascal once than to make everyone burn CPU cycles running Python!


> Do you know how much it costs to finetune

Between 30-3000$, often in the 300$ range.


From their numbers 3 hours with 32 A100 80GBs.

From lambda cloud:

3 hours * 4 ~22$/hr for 8x A100 ~= $265

So yeah not too expensive even for a native fine tune (obviously this ignores all other costs other than the GPUs)


It's wild to me that the raw compute cost is so low.


Sweet Jesus, that is amazing.


Where are the troves of Pascal code? Also manuals, books, etc. The quality doesn't have to be great. You can label and generate more data once you have enough to bootstrap the model.


> I would be willing to internally pay for the work

What kind of budget do you think this will require?


Not much, I guess. It's basically writing some scripts that will take the code base of some of the available high quality pascal projects, and then depending on what is available extract/merge documentation available as PDF, PasDoc, RTF, .HLP or method/function source code comments.

I would assume that one of my devs could write the needed scripts in three weeks or so.

So, basically a budget of <$5000.

For me - due to missing competence - the actual challenge would be to get a sample on how training data should optimally look like (for example the Python training set), and someone doing the actual training. For a newbie to get up the required level of competence surely will take more than three weeks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: