The linux of machine learning!

Iron_Bound · February 4, 2023, 3:34pm

Yo, fellow enthusiasts! I hope you’re ready to talk about the importance of focusing on open source machine learning models.

We all know that time and energy are valuable resources, and we want to make sure that we are not stuck depending on Megacorp’s. That’s why I believe that open source machine learning models should be at the forefront of our priorities.

Here’s why:

Flexibility: Open source models give us the freedom to adapt and improve the algorithms to fit our specific needs. No more being limited by proprietary systems.

Collaboration: The beauty of open source is that it encourages collaboration and sharing among the community. Imagine having access to the knowledge and expertise of the best data scientists in the world, all working together to improve the model.

Cost-effectiveness: Open source machine learning models are typically free, making them a great choice for those who don’t have the resources for a proprietary solution.

Increased transparency: Proprietary machine learning models can often be difficult to understand, leaving us with limited insight into how they make predictions. With open source models, the code is readily available for examination, leading to greater transparency and accountability.

I hope you liked the joke but let’s take advantage of their benefits and make open source models a priority.

Sign up at: https://open-assistant.io/

Iron_Bound · February 5, 2023, 3:10pm

We passed the mark of 500 trees in ready_for_export state … 25% of our initial 2k tree goal is complete. Big thanks to everyone who contributed so far!
Current message stats:
lang | count
-------±------
hu | 5
de | 1064
ja | 30
fr | 484
zh | 76
en | 10917
es | 171
pt-BR | 134
ru | 594
vi | 4

Iron_Bound · February 8, 2023, 12:22am

On the way to 50k
lang | count
-------±------
vi | 41
hu | 23
de | 1667
ja | 161
ko | 8
fr | 946
tr | 2
zh | 181
en | 19136
es | 1066
uk-UA | 32
pt-BR | 258
ru | 1789

Iron_Bound · February 8, 2023, 12:39pm

The projects only been open four days and making good progress!

Stats update 2023-08-02 ~12:20 UTC: Message Trees by State

         state          | count 
------------------------+-------
 ranking                |     3
 aborted_low_grade      |   499
 prompt_lottery_waiting | 11841
 ready_for_export       |  1145
 growing                |   926
 initial_prompt_review  |  1251

Message Trees in growing/export State by lang

 lang  |      state       | count 
-------+------------------+-------
 de    | growing          |   100
 de    | ready_for_export |    52
 en    | growing          |   100
 en    | ready_for_export |   943
 es    | growing          |   100
 es    | ready_for_export |     9
 fr    | growing          |   100
 fr    | ready_for_export |    30
 hu    | growing          |    28
 it    | growing          |    11
 ja    | growing          |    77
 ja    | ready_for_export |     1
 ko    | growing          |     5
 pt-BR | growing          |   101
 ru    | growing          |   101
 ru    | ready_for_export |   110
 tr    | growing          |     4
 uk-UA | growing          |    41
 vi    | growing          |    58
 zh    | growing          |   100

Messages by lang-Tag

 lang  | count 
-------+-------
 fr    |  1164
 tr    |     4
 en    | 22284
 ru    |  2483
 vi    |    88
 hu    |    47
 ja    |   186
 ko    |     9
 zh    |   259
 it    |    13
 es    |  1264
 uk-UA |    55
 pt-BR |   317
 de    |  1925

josephmack · March 6, 2023, 11:07am

we are tech services company, in despite of I read your website’s content on daily basis. I really enjoyed this thread and got a lot of information

Iron_Bound · March 6, 2023, 11:31am

Latest update we are moving to data filter/training!

Hey @everyone I just wanted to give a short update on the current status of the project.
Most important thing: We’re currently processing the data and IT LOOKS AWESOME no joke, we have like 7k entire trees to export, that’s over 100k(!) messages, meaning we’ve gotten millions of contributions and that’s all thanks to YOU .
And the quality of the data is beyond what I’ve ever expected. The vast majority of contributions are super high quality, people really try their best, and the filtering systems we have in place also work very well to weed out any crap, so huge respect to anyone and thank you so much, I’m very sure this will already be a huge contribution to the field, no matter what happens next. I’m very sure no amount of paid crowd-workers from OpenAI can ever match the quality of people who are really putting their soul into something. And also, we have not only English data, but many, many languages, congratulations to all!

What’s happening now:

we are on exporting the dataset v1, we need to run some cleaning, filter PII and stuff as possible, scramble user IDs, etc. and determine the best export format. Now that doesn’t mean the data collection stops, the more the merrier, and we hope we can release many future versions, plus now that we train models, we can interleave human and model data and really test how good they are!
as said, we are currently training the initial batch of models. so far it looks pretty good, but also as said, the data isn’t fully processed yet, so I hope those improve as well. Our goal is to soon make a release of v1 of the data, models, and the inference system, although maybe those will come one by one, we’ll see
if you want to help, there are still plenty of GH issues, and of course data contributions are still extremely welcome!

On we go

bedHedd · April 7, 2023, 12:47am

Can’t wait to get the model hands on

Iron_Bound · April 7, 2023, 2:31am

You can try the chat model now at Sign Up - Open Assistant

I’m impressed as most models fail at fizzbuzz tests

Iron_Bound · April 7, 2023, 6:17pm

Would be interesting see how it does with ReAct prompting, basically set a goal and have it try to achieve it.

Braysive · April 7, 2023, 6:55pm

This is rad. Thanks for posting, I’m definitely going to keep an eye on it

Iron_Bound · April 8, 2023, 12:10am

People here should try programming ML models

nand · August 1, 2023, 4:46pm

It seems like the open source model and tool space is teaming with activity these days.

I wonder what resources folks here read or listen to to stay abreast of what’s going on?

I’ve been listening to the Latent Space podcast.