Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
update gitignore
Browse files- .gitignore +1 -0
- app/scripts/notion-importer/.notion-to-md/media/2421384e-bcac-80fb-aa7c-f939fc39269d_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/27877f1c-9c9d-804d-9c82-f7b3905578ff_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/29177f1c-9c9d-8079-aebf-cfe3ee40f7c5_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/29177f1c-9c9d-80d6-91bc-cec1904f628f_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-8006-9cac-f8b9876a3daa_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-8033-84a0-f498edd20d5d_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-8070-84b7-d2c55eec7b31_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-807b-b544-e308e74095eb_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-80b9-ac9e-e2ed81f6f335_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-80df-81c0-fc7920a269f8_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-80fc-abcc-ed31b76eb37d_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2951384e-bcac-8087-898b-f7fff54fb54b_media.json +0 -3
- app/scripts/notion-importer/.notion-to-md/media/2951384e-bcac-809b-9bc4-c0f7647080f3_media.json +0 -3
- app/scripts/notion-importer/static/frontmatter.mdx +31 -4
- app/src/content/article.mdx +43 -6
- app/{scripts/notion-importer/.notion-to-md/media/2421384e-bcac-800c-b22c-df0bb34c69f7_media.json → src/content/assets/image/Capture_decran_2025-10-22_a_09_46_36_2941384e-bcac-80cc-af9d-d2ace91e56e8.png} +2 -2
- app/{scripts/notion-importer/.notion-to-md/media/29177f1c-9c9d-80c7-aec6-c6ab90d7912a_media.json → src/content/assets/image/Screenshot_2025-10-24_at_12_26_55_2961384e-bcac-80c9-9266-c37d8d5f4a39.png} +2 -2
.gitignore
CHANGED
|
@@ -20,6 +20,7 @@ node_modules/
|
|
| 20 |
*.log
|
| 21 |
*.env
|
| 22 |
*.cache
|
|
|
|
| 23 |
|
| 24 |
app/scripts/latex-to-mdx/output/
|
| 25 |
app/scripts/notion-importer/output/**/*
|
|
|
|
| 20 |
*.log
|
| 21 |
*.env
|
| 22 |
*.cache
|
| 23 |
+
.notion-to-md
|
| 24 |
|
| 25 |
app/scripts/latex-to-mdx/output/
|
| 26 |
app/scripts/notion-importer/output/**/*
|
app/scripts/notion-importer/.notion-to-md/media/2421384e-bcac-80fb-aa7c-f939fc39269d_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:caa89ec1b62b1b78b4592c41d3b9997ed79e276d7169064572f88303a400c4e8
|
| 3 |
-
size 61500
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/27877f1c-9c9d-804d-9c82-f7b3905578ff_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:573e517dd54f2ec7a8caf2badeea4b54ca90d3eff45b88a47877f38fc39203f0
|
| 3 |
-
size 39793
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/29177f1c-9c9d-8079-aebf-cfe3ee40f7c5_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:da3d0e8b86bab2816814fa8d103e5f10310b2f401fc33ab5e3730d231989c5ff
|
| 3 |
-
size 35370
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/29177f1c-9c9d-80d6-91bc-cec1904f628f_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:2898a98abe6fce5c59caeee0c09b440ec4613bfb5eaa3e527b14056a175e76ae
|
| 3 |
-
size 2428
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-8006-9cac-f8b9876a3daa_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:e95ed06c4df8d63c7b1aa78d022c10195605c6118ea4b7a035d59fc52f104a81
|
| 3 |
-
size 4891
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-8033-84a0-f498edd20d5d_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:2c056bf9e20d2f88d8d96ffecc848f3b8447e036277741dcf49e798d35bc7f08
|
| 3 |
-
size 2436
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-8070-84b7-d2c55eec7b31_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:71db30f156ec18a2bc6b1431d7dee2a2a16951a4a5d86887af3854ab024534b4
|
| 3 |
-
size 2535
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-807b-b544-e308e74095eb_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:1acc48f0812f0cef1132328d1890b98b20037f806ea73a4d639381473841024f
|
| 3 |
-
size 64334
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-80b9-ac9e-e2ed81f6f335_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:b3a20c6fb37ce6e4fed6f3def722ffe7947d65c8390b8244c02d276707a566ce
|
| 3 |
-
size 16424
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-80df-81c0-fc7920a269f8_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:f7283e798efc4d1f187fa8629264045d032f7066fd58eb0e87d81048eb3e88d3
|
| 3 |
-
size 21352
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2921384e-bcac-80fc-abcc-ed31b76eb37d_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:51c243c47d5731d2589360c467b3122e366297646be8a618e33fbacf85d79c98
|
| 3 |
-
size 64206
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2951384e-bcac-8087-898b-f7fff54fb54b_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:68288a0aec237b15610681a864f5c802e22ef10322d8251fa4bfa773ec248621
|
| 3 |
-
size 2428
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/.notion-to-md/media/2951384e-bcac-809b-9bc4-c0f7647080f3_media.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:c3842402ba6d83e6ab44fac53432190f95b07ed2ac096031608b1d29a178e858
|
| 3 |
-
size 21458
|
|
|
|
|
|
|
|
|
|
|
|
app/scripts/notion-importer/static/frontmatter.mdx
CHANGED
|
@@ -6,18 +6,39 @@ authors:
|
|
| 6 |
- name: "Loubna Ben Allal"
|
| 7 |
url: "https://huggingface.co/loubnabnl"
|
| 8 |
affiliations: [1]
|
| 9 |
-
- name: "Leandro von Werra"
|
| 10 |
-
url: "https://huggingface.co/lvwerra"
|
| 11 |
-
affiliations: [1]
|
| 12 |
- name: "Lewis Tunstall"
|
| 13 |
url: "https://huggingface.co/lewtun"
|
| 14 |
affiliations: [1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
- name: "Clémentine Fourrier"
|
| 16 |
url: "https://huggingface.co/clefourrier"
|
| 17 |
affiliations: [1]
|
| 18 |
- name: "Thibaud Frere"
|
| 19 |
url: "https://huggingface.co/tfrere"
|
| 20 |
affiliations: [1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
affiliations:
|
| 22 |
- name: "Hugging Face"
|
| 23 |
url: "https://huggingface.co"
|
|
@@ -31,4 +52,10 @@ tags:
|
|
| 31 |
- template
|
| 32 |
tableOfContentsAutoCollapse: true
|
| 33 |
pdfProOnly: true
|
| 34 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
- name: "Loubna Ben Allal"
|
| 7 |
url: "https://huggingface.co/loubnabnl"
|
| 8 |
affiliations: [1]
|
|
|
|
|
|
|
|
|
|
| 9 |
- name: "Lewis Tunstall"
|
| 10 |
url: "https://huggingface.co/lewtun"
|
| 11 |
affiliations: [1]
|
| 12 |
+
- name: "Nouamane Tazi"
|
| 13 |
+
url: "https://huggingface.co/nouamanetazi"
|
| 14 |
+
affiliations: [1]
|
| 15 |
+
- name: "Elie Bak"
|
| 16 |
+
url: "https://huggingface.co/eliebak"
|
| 17 |
+
affiliations: [1]
|
| 18 |
+
- name: "Ed Beeching"
|
| 19 |
+
url: "https://huggingface.co/edbeeching"
|
| 20 |
+
affiliations: [1]
|
| 21 |
+
- name: "Carlos Muñoz Ferrandis"
|
| 22 |
+
url: "https://huggingface.co/CarlosMF"
|
| 23 |
+
affiliations: [1]
|
| 24 |
- name: "Clémentine Fourrier"
|
| 25 |
url: "https://huggingface.co/clefourrier"
|
| 26 |
affiliations: [1]
|
| 27 |
- name: "Thibaud Frere"
|
| 28 |
url: "https://huggingface.co/tfrere"
|
| 29 |
affiliations: [1]
|
| 30 |
+
- name: "Anton Lozhkov"
|
| 31 |
+
url: "https://huggingface.co/anton-l"
|
| 32 |
+
affiliations: [1]
|
| 33 |
+
- name: "Colin Raffel"
|
| 34 |
+
url: "https://huggingface.co/craffel"
|
| 35 |
+
affiliations: [1]
|
| 36 |
+
- name: "Leandro von Werra"
|
| 37 |
+
url: "https://huggingface.co/lvwerra"
|
| 38 |
+
affiliations: [1]
|
| 39 |
+
- name: "Thomas Wolf"
|
| 40 |
+
url: "https://huggingface.co/thomwolf"
|
| 41 |
+
affiliations: [1]
|
| 42 |
affiliations:
|
| 43 |
- name: "Hugging Face"
|
| 44 |
url: "https://huggingface.co"
|
|
|
|
| 52 |
- template
|
| 53 |
tableOfContentsAutoCollapse: true
|
| 54 |
pdfProOnly: true
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
|
app/src/content/article.mdx
CHANGED
|
@@ -9,14 +9,26 @@ authors:
|
|
| 9 |
url: 'https://huggingface.co/loubnabnl'
|
| 10 |
affiliations:
|
| 11 |
- 1
|
| 12 |
-
- name: Leandro von Werra
|
| 13 |
-
url: 'https://huggingface.co/lvwerra'
|
| 14 |
-
affiliations:
|
| 15 |
-
- 1
|
| 16 |
- name: Lewis Tunstall
|
| 17 |
url: 'https://huggingface.co/lewtun'
|
| 18 |
affiliations:
|
| 19 |
- 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
- name: Clémentine Fourrier
|
| 21 |
url: 'https://huggingface.co/clefourrier'
|
| 22 |
affiliations:
|
|
@@ -25,6 +37,22 @@ authors:
|
|
| 25 |
url: 'https://huggingface.co/tfrere'
|
| 26 |
affiliations:
|
| 27 |
- 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
affiliations:
|
| 29 |
- name: Hugging Face
|
| 30 |
url: 'https://huggingface.co'
|
|
@@ -79,6 +107,7 @@ import Screenshot_2025_09_26_at_22_36_40_27a1384e_bcac_8063_94e0_f1c689e7d9b9 fr
|
|
| 79 |
import Screenshot_2025_10_16_at_21_20_31_28e1384e_bcac_8059_83b0_c6e19a90f49c from './assets/image/Screenshot_2025-10-16_at_21_20_31_28e1384e-bcac-8059-83b0-c6e19a90f49c.png';
|
| 80 |
import Screenshot_2025_10_16_at_21_39_34_28e1384e_bcac_8004_9f13_c53e50cd416e from './assets/image/Screenshot_2025-10-16_at_21_39_34_28e1384e-bcac-8004-9f13-c53e50cd416e.png';
|
| 81 |
import Screenshot_2025_10_17_at_15_32_35_28f1384e_bcac_8038_aa05_f429bd1bbf0d from './assets/image/Screenshot_2025-10-17_at_15_32_35_28f1384e-bcac-8038-aa05-f429bd1bbf0d.png';
|
|
|
|
| 82 |
import Screenshot_2025_10_17_at_14_07_07_28f1384e_bcac_8000_b77f_d6b0627e5f54 from './assets/image/Screenshot_2025-10-17_at_14_07_07_28f1384e-bcac-8000-b77f-d6b0627e5f54.png';
|
| 83 |
import Screenshot_2025_10_17_at_15_15_26_28f1384e_bcac_8006_a7de_ec57a503d412 from './assets/image/Screenshot_2025-10-17_at_15_15_26_28f1384e-bcac-8006-a7de-ec57a503d412.png';
|
| 84 |
import Screenshot_2025_10_17_at_16_00_20_28f1384e_bcac_80f7_ab1c_e448a7e126b4 from './assets/image/Screenshot_2025-10-17_at_16_00_20_28f1384e-bcac-80f7-ab1c-e448a7e126b4.png';
|
|
@@ -4866,6 +4895,9 @@ Packing solves this by concatenating multiple sequences together until a desired
|
|
| 4866 |
|
| 4867 |
To get a sense of how efficient packing is for training, below we compare the runtimes between packing and no-packing over one epoch of our baseline dataset:
|
| 4868 |
|
|
|
|
|
|
|
|
|
|
| 4869 |
<Image src={Screenshot_2025_10_17_at_14_07_07_28f1384e_bcac_8000_b77f_d6b0627e5f54} alt="Image" />
|
| 4870 |
|
| 4871 |
|
|
@@ -4949,7 +4981,7 @@ Although you can keep scaling SFT with more data, at some point you'll observe d
|
|
| 4949 |
|
| 4950 |
This is where preference optimisation comes in. Instead of just copying demonstrations, we give the model comparative feedback like "response A is better than response B". These preferences provide a more direct training signal for quality and enable to model performance to scale beyond the limits of SFT alone.
|
| 4951 |
|
| 4952 |
-
Another benefit of preference optimisation is that you typically need far less data than SFT, since the starting point is already a pretty good model that can follow instructions and has knowledge
|
| 4953 |
|
| 4954 |
#### Creating preference datasets
|
| 4955 |
|
|
@@ -5017,7 +5049,7 @@ We also ran experiments to determine how dataset size influences results, testin
|
|
| 5017 |
<Image src={image_2941384e_bcac_807e_8b6e_fc9020752eb0} alt="Image" />
|
| 5018 |
|
| 5019 |
|
| 5020 |
-
The experiments we ran for the ß parameter ranged from 0.01 to 0.99 to explore values that encourage different degrees of alignment to the reference model. As a reminder, lower values of beta encourage staying close to the reference model while higher values allow the PO model to match the preference data more closely. The model performance for ß=0.1 is the highest for both reasoning modes and improves compared to the metrics from the SFT checkpoint. Using a low beta value hurts model performance and results in a worse model than the SFT checkpoint, while performance remains stable
|
| 5021 |
|
| 5022 |
These results suggest that values greater than 0.1 are preferable for PO, and that aligning the model with the preference data is more beneficial than staying close to the reference model. However, we suggest exploring ß values in the range 0.01 and 0.5. Higher values may erase capabilities from the SFT checkpoint that we might not be capturing in the evals shown on the plot.
|
| 5023 |
|
|
@@ -5267,6 +5299,11 @@ We hope this blog helps you approach your next training project with clarity and
|
|
| 5267 |
|
| 5268 |
Now go train something. And when your loss spikes mysteriously at 2am, remember: every great model has debugging stories behind it. May the force of open source and open science always be with you!
|
| 5269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5270 |
## References
|
| 5271 |
|
| 5272 |
|
|
|
|
| 9 |
url: 'https://huggingface.co/loubnabnl'
|
| 10 |
affiliations:
|
| 11 |
- 1
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
- name: Lewis Tunstall
|
| 13 |
url: 'https://huggingface.co/lewtun'
|
| 14 |
affiliations:
|
| 15 |
- 1
|
| 16 |
+
- name: Nouamane Tazi
|
| 17 |
+
url: 'https://huggingface.co/nouamanetazi'
|
| 18 |
+
affiliations:
|
| 19 |
+
- 1
|
| 20 |
+
- name: Elie Bak
|
| 21 |
+
url: 'https://huggingface.co/eliebak'
|
| 22 |
+
affiliations:
|
| 23 |
+
- 1
|
| 24 |
+
- name: Ed Beeching
|
| 25 |
+
url: 'https://huggingface.co/edbeeching'
|
| 26 |
+
affiliations:
|
| 27 |
+
- 1
|
| 28 |
+
- name: Carlos Muñoz Ferrandis
|
| 29 |
+
url: 'https://huggingface.co/CarlosMF'
|
| 30 |
+
affiliations:
|
| 31 |
+
- 1
|
| 32 |
- name: Clémentine Fourrier
|
| 33 |
url: 'https://huggingface.co/clefourrier'
|
| 34 |
affiliations:
|
|
|
|
| 37 |
url: 'https://huggingface.co/tfrere'
|
| 38 |
affiliations:
|
| 39 |
- 1
|
| 40 |
+
- name: Anton Lozhkov
|
| 41 |
+
url: 'https://huggingface.co/anton-l'
|
| 42 |
+
affiliations:
|
| 43 |
+
- 1
|
| 44 |
+
- name: Colin Raffel
|
| 45 |
+
url: 'https://huggingface.co/craffel'
|
| 46 |
+
affiliations:
|
| 47 |
+
- 1
|
| 48 |
+
- name: Leandro von Werra
|
| 49 |
+
url: 'https://huggingface.co/lvwerra'
|
| 50 |
+
affiliations:
|
| 51 |
+
- 1
|
| 52 |
+
- name: Thomas Wolf
|
| 53 |
+
url: 'https://huggingface.co/thomwolf'
|
| 54 |
+
affiliations:
|
| 55 |
+
- 1
|
| 56 |
affiliations:
|
| 57 |
- name: Hugging Face
|
| 58 |
url: 'https://huggingface.co'
|
|
|
|
| 107 |
import Screenshot_2025_10_16_at_21_20_31_28e1384e_bcac_8059_83b0_c6e19a90f49c from './assets/image/Screenshot_2025-10-16_at_21_20_31_28e1384e-bcac-8059-83b0-c6e19a90f49c.png';
|
| 108 |
import Screenshot_2025_10_16_at_21_39_34_28e1384e_bcac_8004_9f13_c53e50cd416e from './assets/image/Screenshot_2025-10-16_at_21_39_34_28e1384e-bcac-8004-9f13-c53e50cd416e.png';
|
| 109 |
import Screenshot_2025_10_17_at_15_32_35_28f1384e_bcac_8038_aa05_f429bd1bbf0d from './assets/image/Screenshot_2025-10-17_at_15_32_35_28f1384e-bcac-8038-aa05-f429bd1bbf0d.png';
|
| 110 |
+
import Screenshot_2025_10_24_at_12_26_55_2961384e_bcac_80c9_9266_c37d8d5f4a39 from './assets/image/Screenshot_2025-10-24_at_12_26_55_2961384e-bcac-80c9-9266-c37d8d5f4a39.png';
|
| 111 |
import Screenshot_2025_10_17_at_14_07_07_28f1384e_bcac_8000_b77f_d6b0627e5f54 from './assets/image/Screenshot_2025-10-17_at_14_07_07_28f1384e-bcac-8000-b77f-d6b0627e5f54.png';
|
| 112 |
import Screenshot_2025_10_17_at_15_15_26_28f1384e_bcac_8006_a7de_ec57a503d412 from './assets/image/Screenshot_2025-10-17_at_15_15_26_28f1384e-bcac-8006-a7de-ec57a503d412.png';
|
| 113 |
import Screenshot_2025_10_17_at_16_00_20_28f1384e_bcac_80f7_ab1c_e448a7e126b4 from './assets/image/Screenshot_2025-10-17_at_16_00_20_28f1384e-bcac-80f7-ab1c-e448a7e126b4.png';
|
|
|
|
| 4895 |
|
| 4896 |
To get a sense of how efficient packing is for training, below we compare the runtimes between packing and no-packing over one epoch of our baseline dataset:
|
| 4897 |
|
| 4898 |
+
<Image src={Screenshot_2025_10_24_at_12_26_55_2961384e_bcac_80c9_9266_c37d8d5f4a39} alt="Image" />
|
| 4899 |
+
|
| 4900 |
+
|
| 4901 |
<Image src={Screenshot_2025_10_17_at_14_07_07_28f1384e_bcac_8000_b77f_d6b0627e5f54} alt="Image" />
|
| 4902 |
|
| 4903 |
|
|
|
|
| 4981 |
|
| 4982 |
This is where preference optimisation comes in. Instead of just copying demonstrations, we give the model comparative feedback like "response A is better than response B". These preferences provide a more direct training signal for quality and enable to model performance to scale beyond the limits of SFT alone.
|
| 4983 |
|
| 4984 |
+
Another benefit of preference optimisation is that you typically need far less data than SFT, since the starting point is already a pretty good model that can follow instructions and has knowledge from previous training stages.<Sidenote>As we'll see below, there are some algorithms like [ORPO](https://arxiv.org/abs/2403.07691) which can be applied directly to base models.</Sidenote> Let's take a look at how these datasets are created.
|
| 4985 |
|
| 4986 |
#### Creating preference datasets
|
| 4987 |
|
|
|
|
| 5049 |
<Image src={image_2941384e_bcac_807e_8b6e_fc9020752eb0} alt="Image" />
|
| 5050 |
|
| 5051 |
|
| 5052 |
+
The experiments we ran for the ß parameter ranged from 0.01 to 0.99 to explore values that encourage different degrees of alignment to the reference model. As a reminder, lower values of beta encourage staying close to the reference model while higher values allow the PO model to match the preference data more closely. The model performance for ß=0.1 is the highest for both reasoning modes and improves compared to the metrics from the SFT checkpoint. Using a low beta value hurts model performance and results in a worse model than the SFT checkpoint, while performance remains stable across multiple ß values without extended thinking.
|
| 5053 |
|
| 5054 |
These results suggest that values greater than 0.1 are preferable for PO, and that aligning the model with the preference data is more beneficial than staying close to the reference model. However, we suggest exploring ß values in the range 0.01 and 0.5. Higher values may erase capabilities from the SFT checkpoint that we might not be capturing in the evals shown on the plot.
|
| 5055 |
|
|
|
|
| 5299 |
|
| 5300 |
Now go train something. And when your loss spikes mysteriously at 2am, remember: every great model has debugging stories behind it. May the force of open source and open science always be with you!
|
| 5301 |
|
| 5302 |
+
#### **Acknowledgments**
|
| 5303 |
+
|
| 5304 |
+
|
| 5305 |
+
We thank [Guilherme](https://huggingface.co/guipenedo) and [Hugo](https://huggingface.co/hlarcher) for their valuable feedback, and [Abubakar](https://huggingface.co/abidlabs) for his help with Trackio features.
|
| 5306 |
+
|
| 5307 |
## References
|
| 5308 |
|
| 5309 |
|
app/{scripts/notion-importer/.notion-to-md/media/2421384e-bcac-800c-b22c-df0bb34c69f7_media.json → src/content/assets/image/Capture_decran_2025-10-22_a_09_46_36_2941384e-bcac-80cc-af9d-d2ace91e56e8.png}
RENAMED
|
File without changes
|
app/{scripts/notion-importer/.notion-to-md/media/29177f1c-9c9d-80c7-aec6-c6ab90d7912a_media.json → src/content/assets/image/Screenshot_2025-10-24_at_12_26_55_2961384e-bcac-80c9-9266-c37d8d5f4a39.png}
RENAMED
|
File without changes
|