Are you planning to switch to programming-language optimized model inside Phind?...

rushingcreek · on Aug 25, 2023

Yes, this is the direction we're heading towards! We're building a mixture of experts of different coding models that we will deploy for precisely this use case.

Fischgericht · on Aug 25, 2023

Nice!

I suppose that Pascal is not on your planned list of supported languages, right?

behnamoh · on Aug 25, 2023

Why would it? Do you know how much it costs to finetune one of these models for such a niche language? I'm not just talking about the cost of training, but also the cost of acquiring data because there's much less data about niche languages.

PeterStuer · on Aug 26, 2023

96 x A100 hours for a finetune according to the article.

The cost of the dataset curation for a given language is hard to quantify as there are many unknowns. However, it seems perfectly crowdsourcable to volunteers.

albert_e · on Aug 26, 2023

A project like SETI@home should help these efforts I believe?

Sabinus · on Aug 27, 2023

Maybe the stable horde could work it into the project.

https://stablehorde.net/

wtarreau · on Aug 26, 2023

It certainly costs much less to the society to train for Pascal once than to make everyone burn CPU cycles running Python!

sitkack · on Aug 26, 2023

> Do you know how much it costs to finetune

Between 30-3000$, often in the 300$ range.

kirill5pol · on Aug 26, 2023

From their numbers 3 hours with 32 A100 80GBs.

From lambda cloud:

3 hours * 4 ~22$/hr for 8x A100 ~= $265

So yeah not too expensive even for a native fine tune (obviously this ignores all other costs other than the GPUs)

IanCal · on Aug 26, 2023

It's wild to me that the raw compute cost is so low.

cbozeman · on Aug 26, 2023

Sweet Jesus, that is amazing.

sitkack · on Aug 26, 2023

Where are the troves of Pascal code? Also manuals, books, etc. The quality doesn't have to be great. You can label and generate more data once you have enough to bootstrap the model.

jacquesm · on Aug 25, 2023

> I would be willing to internally pay for the work

What kind of budget do you think this will require?

Fischgericht · on Aug 25, 2023

Not much, I guess. It's basically writing some scripts that will take the code base of some of the available high quality pascal projects, and then depending on what is available extract/merge documentation available as PDF, PasDoc, RTF, .HLP or method/function source code comments.

I would assume that one of my devs could write the needed scripts in three weeks or so.

So, basically a budget of <$5000.

For me - due to missing competence - the actual challenge would be to get a sample on how training data should optimally look like (for example the Python training set), and someone doing the actual training. For a newbie to get up the required level of competence surely will take more than three weeks.