Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

VoidWhisperer · 2026-06-14T16:37:03 1781455023

https://github.com/nex-agi/Nex-N2/issues/4

Seems that they didn't make/train a new novel model, they did a mix of two existing models and then gave it an instruction to say it was 'Rio, trained by Rio AI Labs'

w4yai · 2026-06-14T16:53:38 1781456018

> The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

daquisu · 2026-06-14T17:53:12 1781459592

It was a recent edit though. Yesterday snapshot: https://web.archive.org/web/20260613072958/https://huggingfa...

danieldrehmer · 2026-06-15T03:09:41 1781492981

can you offer a 4-bit quantized version and name it Zé Pequeno, pretty please?

scotty79 · 2026-06-15T12:35:48 1781526948

I'd love to see people figuring out how to build models from several smaller ones. We could then train small specialized models and deploy setups more optimized for any given task. Modular LLMs should be a thing.

urbnspacecowboy · 2026-06-14T20:30:02 1781469002

See discussion: https://news.ycombinator.com/item?id=48528371

mettamage · 2026-06-14T14:55:42 1781448942

https://xcancel.com/ZenMagnets/status/2065796012820848699

Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

oceansky · 2026-06-14T15:13:11 1781449991

Yes. It's post training in qwen using the novel SwiReasoning framework.

hedgehog · 2026-06-14T15:49:50 1781452190

I hadn't seen SwiReasoning (https://swireasoning.github.io, paper and code), it looks like that works at generation time without any requirements on the model. It increases token-efficiency and accuracy, but at first skim it seems like this would be incompatible with multi-token prediction. For large reductions in token budget it could be worth it.

rafaquintanilha · 2026-06-14T16:31:06 1781454666

Doesn't look like it's incompatible. Someone already released a quantization using MTP: https://huggingface.co/foxipanda/Rio-3.5-Open-397B-GGUF

hedgehog · 2026-06-14T17:16:52 1781457412

As I understand it the basic premise of all the speculative decoding schemes is that the logits on the draft don't need to be exact so long as you mostly sample the same tokens, and because each position is fed by the embedding associated with the previous position's token you sort of "round away" error. With SwiReasoning I think you skip the sampling/rounding part and do something continuous using the whole distribution, so it would seem to rely on the accuracy of those values. MTP still makes sense outside the latent reasoning chunks though.

Kelteseth · 2026-06-14T14:57:35 1781449055

Thanks, Firefox and uBlock does not let me watch any X content (I guess this is a good thing)

drnick1 · 2026-06-14T15:41:09 1781451669

Same thing here, X content and trackers are blocked by my Firefox settings. The occasional inconvenience is a small price to pay not to be profiled by X, Google, FB, Amazon, and countless other Internet parasites.

adrian_b · 2026-06-14T15:00:00 1781449200

> Post-trained from Qwen 3.5 397B

Model Card:

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

arjie · 2026-06-14T15:53:45 1781452425

Benchmaxxing is the new “have a crypto trading strategy”. No one is impressed by it except non practitioners.

Aurornis · 2026-06-14T15:48:12 1781452092

A city government funding a fine-tune of a model is interesting.

As for the benchmarks: If you spend any time playing with fine tunes of published models you know that benchmarks are gamed so much that they're a useless indicator of performance for models from small teams. It's too easy to fine tune a model to perform well on the benchmarks, release it, put a line on your resume saying you released a model that beat the major labs on benchmarks, and then try to use that to jump into a new job. The temptation is high.

There are a lot of fringe models and fine tunes that claim to have better performance on some benchmark. Then you try to use them and find they're often worse at general tasks than the base model.

I would wait and see if these results hold across other benchmarks. It's cool that the city is doing something with AI, but this is something where extraordinary claims require extraordinary evidence. I doubt a small, previously unknown team has unlocked something secret that the team who made Qwen couldn't figure out. It's more likely it was fine tuned for a specific outcome (possibly these benchmarks) and performance in other areas was reduced as a consequence.

marcosdumay · 2026-06-14T16:30:02 1781454602

> A city government funding a fine-tune of a model is interesting.

Looks like it's an IT services government-owned company.

Most likely, they saw some business opportunity on selling it around for cities.

embedding-shape · 2026-06-14T16:27:17 1781454437

Indeed, this is all very true, I'd say it's true for the larger teams too, the entire ecosystem is so gamed by now that if you don't have your own private benchmarks with private test cases you haven't shared publicly, it's almost impossible to get a fair picture how well a model works, unless you actually sit down and use it.

HeliumHydride · 2026-06-14T14:58:06 1781449086

https://www.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_mod... https://x.com/SemiAnalysis_/status/2065894494935933191

betimsl · 2026-06-14T21:39:43 1781473183

The problem with these is the tool calling. From my experiments qwen agent almost always fails with tool calling and porting the correct config is quite tedious.

Rio3.5 with Qwen compatible tool calling, we need that :)

mrandish · 2026-06-14T19:03:04 1781463784

> Rio de Janeiro's city government model...

Because... lack of a good open weight LLM is a pressing need high on the municipal priorities list for Rio de Janeiro citizens?

true_religion · 2026-06-14T19:17:11 1781464631

Should governments not take actions that later benefit the academic, scientific, and economic welfare of their constituents?

Or is it that it’s a city doing this?

Now Brazil does know how to boondoggle its finances for a prestigious cause with little return (e.g. the Olympics games) but this is far smaller a cost, more akin to a city setting up a tech accelerator or making a media campaign about how important STEM is.

senorrib · 2026-06-14T20:36:33 1781469393

It's the municipal IT company, and the dude that did this is a volunteer.

xbar · 2026-06-14T19:02:35 1781463755

Sexy.

pelasaco · 2026-06-14T19:54:50 1781466890

The Taubaté LLM Hoax https://en.wikipedia.org/wiki/Taubat%C3%A9_pregnancy_hoax

cuzezzzbbfofai · 2026-06-14T14:57:55 1781449075

[flagged]

atoav · 2026-06-14T15:15:39 1781450139

A government ideally is a representation of the democratically chosen will of the people. If it is not, work towards making it so. IMO wherever someone says "the government" we should mentally substitute "we all, collectively".

But a specific type of person appears to labour under the illusion that somehow we can get by without we all collectively steering our direction and choosing people who do what needs to be done without commercial interest. Their idea is that instead of choosing people who do it, we just make them compete for who can squeeze the most profit out of dealing with a problem and "somehow" that leads to a better result. When you press them for the details on that part of the mechanism, you will usually get crickets.

cassianoleal · 2026-06-14T15:32:04 1781451124

Thank you, that's also one of my peeves.

Interestingly, the people who try to separate themselves from "the government" also seem to be the kind of people who want to "spread our model of democracy to the rest of the world".

How they can even reconcile being such a great democracy that the world needs to ~copy~ be force-fed with having an adversary government I don't know. The cognitive dissonance is so great that it's hard to fathom.

hgoel · 2026-06-14T15:38:14 1781451494

It's all such a self-defeating ideology, they think the government isn't doing a good enough job, so they lobby to make it impossible for them to do a good job and then pretend that it proves their point.

naasking · 2026-06-14T15:54:44 1781452484

> IMO wherever someone says "the government" we should mentally substitute "we all, collectively".

No, we should substitute "unaccountable bureaucrats". The people who enter and leave power from elections are not the source of the daily frustrations people have with government, it's the rest.

airstrike · 2026-06-14T16:08:28 1781453308

how do you think that alleged amorphous mass of unaccountable bureaucrats got their jobs?

atoav · 2026-06-14T16:05:26 1781453126

If this is in fact an issue where you life, then you should consider stopping to elect politicians that allow bureaucrats to be unaccountable. Or stop believing politicians who rave on about how bureaucrats are unaccountable while they themselves have the power to shape systems where that would not be the case.

latency-guy2 · 2026-06-14T16:59:47 1781456387

"we all" is wrong, always.

You do not agree with me. You can't claim to have my interests or my will if you are against it.

atoav · 2026-06-14T19:59:50 1781467190

Yes? With sufficient pedantic spirit anything can be argued against. This is what you're doing. So to give a counter-example: You drive with three friends in a car. You ask them: "Do we all want to go to MC Donald's?"

Explain how it is wrong and why it would be. If it is always wrong it follows it has to be wrong here too. The answer is that the meaning of "we all" is context dependent and that friend of yours that argues that we all somehow includes people in the whole city is an oddball that doesn't pick up the context within the words have been said.

We can all go around and make each others day worse with deliberate pedantry by ignoring the context of words, but that is basically just a waste of human energy. If you disagree with the fundamental point I made, argue against it based on the merits of the idea instead of arguing semantics.

blahblaher · 2026-06-14T16:10:09 1781453409

yes, let's instead trust a bunch of billionaires, that "for sure" have your and all of our interests at heart. And no, the "invisible hand" does not exist, it's the Epstein class hand, you just don't see it

hmokiguess · 2026-06-14T15:18:46 1781450326

Never let them know your next move

ramon156 · 2026-06-14T14:59:03 1781449143

Every day I'm reminded why I don't spend time on twitter. What use does it have to claim "X is better than Y in benchmark Z, disagreeing with that means disagreeing with me"

Information is power, dick measurements are not.

itsthecourier · 2026-06-14T15:40:13 1781451613

my length is a valid data point for the sake of science

reed1234 · 2026-06-14T15:22:45 1781450565

No, I love twitter— and you are wrong.