tfrere HF Staff commited on
Commit
5a8da9a
·
1 Parent(s): 274a141

add content changes in git repository

Browse files
Files changed (15) hide show
  1. app/src/content/article.mdx +0 -0
  2. app/src/content/assets/image/{image_2941384e-bcac-80d2-b3ea-ff509ccf857d.png → Capture_decran_2025-10-29_a_10_45_33_29b1384e-bcac-803d-8e1b-e95ec0eb0be8.png} +2 -2
  3. app/src/content/assets/image/{image_2941384e-bcac-800c-88e8-c294c0484b38.png → Capture_decran_2025-10-29_a_14_47_17_2941384e-bcac-803c-8ba2-dbae2c39e8b5.png} +2 -2
  4. app/src/content/assets/image/{image_2941384e-bcac-801f-8178-c6a934bc1509.png → Capture_decran_2025-10-30_a_10_22_32_29c1384e-bcac-8069-9c41-de3cd522de13.png} +2 -2
  5. app/src/content/assets/image/{image_2941384e-bcac-803c-8ba2-dbae2c39e8b5.png → Capture_decran_2025-10-30_a_11_07_49_29c1384e-bcac-80ef-974e-fd08e851ea94.png} +2 -2
  6. app/src/content/assets/image/Screenshot_2025-10-30_at_11_58_25_29c1384e-bcac-804a-b080-d36d452fd1ef.png +3 -0
  7. app/src/content/assets/image/{image_28d1384e-bcac-8095-a79d-d1e3840c2716.png → Screenshot_2025-10-30_at_13_02_36_29c1384e-bcac-80d6-a72d-ff34bc221b60.png} +2 -2
  8. app/src/content/assets/image/Screenshot_2025-10-30_at_15_23_25_2941384e-bcac-80d2-b3ea-ff509ccf857d.png +3 -0
  9. app/src/content/assets/image/Screenshot_2025-10-30_at_15_23_52_2941384e-bcac-801f-8178-c6a934bc1509.png +3 -0
  10. app/src/content/assets/image/Screenshot_2025-10-30_at_15_24_02_2941384e-bcac-800c-88e8-c294c0484b38.png +3 -0
  11. app/src/content/assets/image/image_29c1384e-bcac-805f-8a1c-e7699d1e5b3b.png +3 -0
  12. app/src/content/assets/image/lstopo_2951384e-bcac-808f-a7c5-c244e7ac69db.jpg +0 -3
  13. app/src/content/assets/image/lstopo_29c1384e-bcac-80c9-9715-cbfe9e73d86b.jpg +3 -0
  14. app/src/content/assets/image/thumb.png +0 -3
  15. app/src/content/bibliography.bib +100 -2
app/src/content/article.mdx CHANGED
The diff for this file is too large to render. See raw diff
 
app/src/content/assets/image/{image_2941384e-bcac-80d2-b3ea-ff509ccf857d.png → Capture_decran_2025-10-29_a_10_45_33_29b1384e-bcac-803d-8e1b-e95ec0eb0be8.png} RENAMED
File without changes
app/src/content/assets/image/{image_2941384e-bcac-800c-88e8-c294c0484b38.png → Capture_decran_2025-10-29_a_14_47_17_2941384e-bcac-803c-8ba2-dbae2c39e8b5.png} RENAMED
File without changes
app/src/content/assets/image/{image_2941384e-bcac-801f-8178-c6a934bc1509.png → Capture_decran_2025-10-30_a_10_22_32_29c1384e-bcac-8069-9c41-de3cd522de13.png} RENAMED
File without changes
app/src/content/assets/image/{image_2941384e-bcac-803c-8ba2-dbae2c39e8b5.png → Capture_decran_2025-10-30_a_11_07_49_29c1384e-bcac-80ef-974e-fd08e851ea94.png} RENAMED
File without changes
app/src/content/assets/image/Screenshot_2025-10-30_at_11_58_25_29c1384e-bcac-804a-b080-d36d452fd1ef.png ADDED

Git LFS Details

  • SHA256: 261eab143f90dafac963e8f45910a563f9786c35157ac5a6df1cd9578a848ff8
  • Pointer size: 130 Bytes
  • Size of remote file: 27.6 kB
app/src/content/assets/image/{image_28d1384e-bcac-8095-a79d-d1e3840c2716.png → Screenshot_2025-10-30_at_13_02_36_29c1384e-bcac-80d6-a72d-ff34bc221b60.png} RENAMED
File without changes
app/src/content/assets/image/Screenshot_2025-10-30_at_15_23_25_2941384e-bcac-80d2-b3ea-ff509ccf857d.png ADDED

Git LFS Details

  • SHA256: 05268dda2b4a492855a9aec4b408982da0420373607ab446aae4af4d02352468
  • Pointer size: 131 Bytes
  • Size of remote file: 158 kB
app/src/content/assets/image/Screenshot_2025-10-30_at_15_23_52_2941384e-bcac-801f-8178-c6a934bc1509.png ADDED

Git LFS Details

  • SHA256: be0fe3458adf7f33b307a7858a190f3caff13672d4080d405922a649e7de7809
  • Pointer size: 131 Bytes
  • Size of remote file: 208 kB
app/src/content/assets/image/Screenshot_2025-10-30_at_15_24_02_2941384e-bcac-800c-88e8-c294c0484b38.png ADDED

Git LFS Details

  • SHA256: cccfe5481d5aa75585977f5ce0cf82055b36ba507f0ffbf6ee581cdb70cb02ca
  • Pointer size: 131 Bytes
  • Size of remote file: 151 kB
app/src/content/assets/image/image_29c1384e-bcac-805f-8a1c-e7699d1e5b3b.png ADDED

Git LFS Details

  • SHA256: 68bd4536bed55042e537abec0984eda486d726e6ba1cefd597034496bb164534
  • Pointer size: 130 Bytes
  • Size of remote file: 56.8 kB
app/src/content/assets/image/lstopo_2951384e-bcac-808f-a7c5-c244e7ac69db.jpg DELETED

Git LFS Details

  • SHA256: 6f7cfbba35513295bccd627a556374b2ac051f395411a472cfad30bb8d31d761
  • Pointer size: 132 Bytes
  • Size of remote file: 1.29 MB
app/src/content/assets/image/lstopo_29c1384e-bcac-80c9-9715-cbfe9e73d86b.jpg ADDED

Git LFS Details

  • SHA256: d380655f06e6c821c84ca60b6e03bf9b87c2a727b6e979f9a2bab605c1152f15
  • Pointer size: 132 Bytes
  • Size of remote file: 1.28 MB
app/src/content/assets/image/thumb.png DELETED

Git LFS Details

  • SHA256: ae7bfe85551fa5f70df5341e6c3a5d5d5f0d68553d9a137725fda61d55627ded
  • Pointer size: 131 Bytes
  • Size of remote file: 279 kB
app/src/content/bibliography.bib CHANGED
@@ -99,7 +99,7 @@
99
  url = {https://arxiv.org/abs/2401.02954}
100
  }
101
 
102
- @misc{hägele2024scalinglawscomputeoptimaltraining,
103
  title = {Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations},
104
  author = {Alexander Hägele and Elie Bakouch and Atli Kosson and Loubna Ben Allal and Leandro Von Werra and Martin Jaggi},
105
  year = {2024},
@@ -996,7 +996,15 @@
996
  primaryclass = {cs.CL},
997
  url = {https://arxiv.org/abs/2406.08446}
998
  }
999
-
 
 
 
 
 
 
 
 
1000
  @misc{du2025,
1001
  title = {Understanding Emergent Abilities of Language Models from the Loss Perspective},
1002
  author = {Zhengxiao Du and Aohan Zeng and Yuxiao Dong and Jie Tang},
@@ -1610,4 +1618,94 @@
1610
  author={Child, Rewon and Gray, Scott and Radford, Alec and Sutskever, Ilya},
1611
  journal={arXiv preprint arXiv:1904.10509},
1612
  year={2019}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1613
  }
 
99
  url = {https://arxiv.org/abs/2401.02954}
100
  }
101
 
102
+ @misc{wsdhagele,
103
  title = {Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations},
104
  author = {Alexander Hägele and Elie Bakouch and Atli Kosson and Loubna Ben Allal and Leandro Von Werra and Martin Jaggi},
105
  year = {2024},
 
996
  primaryclass = {cs.CL},
997
  url = {https://arxiv.org/abs/2406.08446}
998
  }
999
+ @misc{olmo2,
1000
+ title={2 OLMo 2 Furious},
1001
+ author={Team OLMo and Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Allyson Ettinger and Michal Guerquin and David Heineman and Hamish Ivison and Pang Wei Koh and Jiacheng Liu and Saumya Malik and William Merrill and Lester James V. Miranda and Jacob Morrison and Tyler Murray and Crystal Nam and Jake Poznanski and Valentina Pyatkin and Aman Rangapur and Michael Schmitz and Sam Skjonsberg and David Wadden and Christopher Wilhelm and Michael Wilson and Luke Zettlemoyer and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
1002
+ year={2025},
1003
+ eprint={2501.00656},
1004
+ archivePrefix={arXiv},
1005
+ primaryClass={cs.CL},
1006
+ url={https://arxiv.org/abs/2501.00656},
1007
+ }
1008
  @misc{du2025,
1009
  title = {Understanding Emergent Abilities of Language Models from the Loss Perspective},
1010
  author = {Zhengxiao Du and Aohan Zeng and Yuxiao Dong and Jie Tang},
 
1618
  author={Child, Rewon and Gray, Scott and Radford, Alec and Sutskever, Ilya},
1619
  journal={arXiv preprint arXiv:1904.10509},
1620
  year={2019}
1621
+ }
1622
+
1623
+
1624
+ @misc{dsa,
1625
+ title={{DeepSeek-V3.2-Exp}: Boosting Long-Context Efficiency with {DeepSeek} Sparse Attention},
1626
+ author={{DeepSeek-AI}},
1627
+ year={2025},
1628
+ institution={DeepSeek},
1629
+ url={https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf},
1630
+ note={Technical Report}
1631
+ }
1632
+
1633
+ @misc{nsa,
1634
+ title={Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention},
1635
+ author={Jingyang Yuan and Huazuo Gao and Damai Dai and Junyu Luo and Liang Zhao and Zhengyan Zhang and Zhenda Xie and Y. X. Wei and Lean Wang and Zhiping Xiao and Yuqing Wang and Chong Ruan and Ming Zhang and Wenfeng Liang and Wangding Zeng},
1636
+ year={2025},
1637
+ eprint={2502.11089},
1638
+ archivePrefix={arXiv},
1639
+ primaryClass={cs.CL},
1640
+ url={https://arxiv.org/abs/2502.11089},
1641
+ }
1642
+
1643
+ @misc{minicpm4,
1644
+ title={MiniCPM4: Ultra-Efficient LLMs on End Devices},
1645
+ author={MiniCPM Team and Chaojun Xiao and Yuxuan Li and Xu Han and Yuzhuo Bai and Jie Cai and Haotian Chen and Wentong Chen and Xin Cong and Ganqu Cui and Ning Ding and Shengda Fan and Yewei Fang and Zixuan Fu and Wenyu Guan and Yitong Guan and Junshao Guo and Yufeng Han and Bingxiang He and Yuxiang Huang and Baoxi Ji and Cunliang Kong and Qiuzuo Li and Siyuan Li and Wenhao Li and Xin Li and Yanghao Li and Yishan Li and Zhen Li and Dan Liu and Biyuan Lin and Yankai Lin and Xiang Long and Quanyu Lu and Yaxi Lu and Peiyan Luo and Hongya Lyu and Litu Ou and Yinxu Pan and Lushi Pu and Zekai Qu and Qundong Shi and Zijun Song and Jiayuan Su and Zhou Su and Ao Sun and Xianghui Sun and Peijun Tang and Fangzheng Wang and Feng Wang and Shuo Wang and Yudong Wang and Zheng Wang and Yesai Wu and Zhenyu Xiao and Jie Xie and Zihao Xie and Xiaoyue Xu and Yukun Yan and Jiarui Yuan and Jinqian Zhang and Kaihuo Zhang and Lei Zhang and Linyue Zhang and Xueren Zhang and Yudi Zhang and Hengyu Zhao and Weilin Zhao and Weilun Zhao and Yuanqian Zhao and Zhi Zheng and Chuyue Zhou and Ge Zhou and Jie Zhou and Wei Zhou and Yanghao Zhou and Zihan Zhou and Zixuan Zhou and Zhiyuan Liu and Guoyang Zeng and Chao Jia and Dahai Li and Maosong Sun},
1646
+ year={2025},
1647
+ eprint={2506.07900},
1648
+ archivePrefix={arXiv},
1649
+ primaryClass={cs.CL},
1650
+ url={https://arxiv.org/abs/2506.07900},
1651
+ }
1652
+
1653
+ @misc{cognitivebehaviours,
1654
+ title={Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs},
1655
+ author={Kanishk Gandhi and Ayush Chakravarthy and Anikait Singh and Nathan Lile and Noah D. Goodman},
1656
+ year={2025},
1657
+ eprint={2503.01307},
1658
+ archivePrefix={arXiv},
1659
+ primaryClass={cs.CL},
1660
+ url={https://arxiv.org/abs/2503.01307},
1661
+ }
1662
+ @misc{nrusimha2025flashformerwholemodelkernelsefficient,
1663
+ title={FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference},
1664
+ author={Aniruddha Nrusimha and William Brandon and Mayank Mishra and Yikang Shen and Rameswar Panda and Jonathan Ragan-Kelley and Yoon Kim},
1665
+ year={2025},
1666
+ eprint={2505.22758},
1667
+ archivePrefix={arXiv},
1668
+ primaryClass={cs.LG},
1669
+ url={https://arxiv.org/abs/2505.22758},
1670
+ }
1671
+
1672
+ @misc{gkd,
1673
+ title={On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes},
1674
+ author={Rishabh Agarwal and Nino Vieillard and Yongchao Zhou and Piotr Stanczyk and Sabela Ramos and Matthieu Geist and Olivier Bachem},
1675
+ year={2024},
1676
+ eprint={2306.13649},
1677
+ archivePrefix={arXiv},
1678
+ primaryClass={cs.LG},
1679
+ url={https://arxiv.org/abs/2306.13649},
1680
+ }
1681
+
1682
+ @misc{onlinedpo,
1683
+ title={Direct Language Model Alignment from Online AI Feedback},
1684
+ author={Shangmin Guo and Biao Zhang and Tianlin Liu and Tianqi Liu and Misha Khalman and Felipe Llinares and Alexandre Rame and Thomas Mesnard and Yao Zhao and Bilal Piot and Johan Ferret and Mathieu Blondel},
1685
+ year={2024},
1686
+ eprint={2402.04792},
1687
+ archivePrefix={arXiv},
1688
+ primaryClass={cs.AI},
1689
+ url={https://arxiv.org/abs/2402.04792},
1690
+ }
1691
+
1692
+
1693
+ @misc{mup,
1694
+ title={Feature Learning in Infinite-Width Neural Networks},
1695
+ author={Greg Yang and Edward J. Hu},
1696
+ year={2022},
1697
+ eprint={2011.14522},
1698
+ archivePrefix={arXiv},
1699
+ primaryClass={cs.LG},
1700
+ url={https://arxiv.org/abs/2011.14522},
1701
+ }
1702
+
1703
+ @misc{commandacohere,
1704
+ title={Command A: An Enterprise-Ready Large Language Model},
1705
+ author={Team Cohere and : and Aakanksha and Arash Ahmadian and Marwan Ahmed and Jay Alammar and Milad Alizadeh and Yazeed Alnumay and Sophia Althammer and Arkady Arkhangorodsky and Viraat Aryabumi and Dennis Aumiller and Raphaël Avalos and Zahara Aviv and Sammie Bae and Saurabh Baji and Alexandre Barbet and Max Bartolo and Björn Bebensee and Neeral Beladia and Walter Beller-Morales and Alexandre Bérard and Andrew Berneshawi and Anna Bialas and Phil Blunsom and Matt Bobkin and Adi Bongale and Sam Braun and Maxime Brunet and Samuel Cahyawijaya and David Cairuz and Jon Ander Campos and Cassie Cao and Kris Cao and Roman Castagné and Julián Cendrero and Leila Chan Currie and Yash Chandak and Diane Chang and Giannis Chatziveroglou and Hongyu Chen and Claire Cheng and Alexis Chevalier and Justin T. Chiu and Eugene Cho and Eugene Choi and Eujeong Choi and Tim Chung and Volkan Cirik and Ana Cismaru and Pierre Clavier and Henry Conklin and Lucas Crawhall-Stein and Devon Crouse and Andres Felipe Cruz-Salinas and Ben Cyrus and Daniel D'souza and Hugo Dalla-Torre and John Dang and William Darling and Omar Darwiche Domingues and Saurabh Dash and Antoine Debugne and Théo Dehaze and Shaan Desai and Joan Devassy and Rishit Dholakia and Kyle Duffy and Ali Edalati and Ace Eldeib and Abdullah Elkady and Sarah Elsharkawy and Irem Ergün and Beyza Ermis and Marzieh Fadaee and Boyu Fan and Lucas Fayoux and Yannis Flet-Berliac and Nick Frosst and Matthias Gallé and Wojciech Galuba and Utsav Garg and Matthieu Geist and Mohammad Gheshlaghi Azar and Ellen Gilsenan-McMahon and Seraphina Goldfarb-Tarrant and Tomas Goldsack and Aidan Gomez and Victor Machado Gonzaga and Nithya Govindarajan and Manoj Govindassamy and Nathan Grinsztajn and Nikolas Gritsch and Patrick Gu and Shangmin Guo and Kilian Haefeli and Rod Hajjar and Tim Hawes and Jingyi He and Sebastian Hofstätter and Sungjin Hong and Sara Hooker and Tom Hosking and Stephanie Howe and Eric Hu and Renjie Huang and Hemant Jain and Ritika Jain and Nick Jakobi and Madeline Jenkins and JJ Jordan and Dhruti Joshi and Jason Jung and Trushant Kalyanpur and Siddhartha Rao Kamalakara and Julia Kedrzycki and Gokce Keskin and Edward Kim and Joon Kim and Wei-Yin Ko and Tom Kocmi and Michael Kozakov and Wojciech Kryściński and Arnav Kumar Jain and Komal Kumar Teru and Sander Land and Michael Lasby and Olivia Lasche and Justin Lee and Patrick Lewis and Jeffrey Li and Jonathan Li and Hangyu Lin and Acyr Locatelli and Kevin Luong and Raymond Ma and Lukáš Mach and Marina Machado and Joanne Magbitang and Brenda Malacara Lopez and Aryan Mann and Kelly Marchisio and Olivia Markham and Alexandre Matton and Alex McKinney and Dominic McLoughlin and Jozef Mokry and Adrien Morisot and Autumn Moulder and Harry Moynehan and Maximilian Mozes and Vivek Muppalla and Lidiya Murakhovska and Hemangani Nagarajan and Alekhya Nandula and Hisham Nasir and Shauna Nehra and Josh Netto-Rosen and Daniel Ohashi and James Owers-Bardsley and Jason Ozuzu and Dennis Padilla and Gloria Park and Sam Passaglia and Jeremy Pekmez and Laura Penstone and Aleksandra Piktus and Case Ploeg and Andrew Poulton and Youran Qi and Shubha Raghvendra and Miguel Ramos and Ekagra Ranjan and Pierre Richemond and Cécile Robert-Michon and Aurélien Rodriguez and Sudip Roy and Sebastian Ruder and Laura Ruis and Louise Rust and Anubhav Sachan and Alejandro Salamanca and Kailash Karthik Saravanakumar and Isha Satyakam and Alice Schoenauer Sebag and Priyanka Sen and Sholeh Sepehri and Preethi Seshadri and Ye Shen and Tom Sherborne and Sylvie Shang Shi and Sanal Shivaprasad and Vladyslav Shmyhlo and Anirudh Shrinivason and Inna Shteinbuk and Amir Shukayev and Mathieu Simard and Ella Snyder and Ava Spataru and Victoria Spooner and Trisha Starostina and Florian Strub and Yixuan Su and Jimin Sun and Dwarak Talupuru and Eugene Tarassov and Elena Tommasone and Jennifer Tracey and Billy Trend and Evren Tumer and Ahmet Üstün and Bharat Venkitesh and David Venuto and Pat Verga and Maxime Voisin and Alex Wang and Donglu Wang and Shijian Wang and Edmond Wen and Naomi White and Jesse Willman and Marysia Winkels and Chen Xia and Jessica Xie and Minjie Xu and Bowen Yang and Tan Yi-Chern and Ivan Zhang and Zhenyu Zhao and Zhoujie Zhao},
1706
+ year={2025},
1707
+ eprint={2504.00698},
1708
+ archivePrefix={arXiv},
1709
+ primaryClass={cs.CL},
1710
+ url={https://arxiv.org/abs/2504.00698},
1711
  }