Kishan ,
Just saw your message re; uploading by tomorrow :
Ø Counters ( Visitors So far ……… / Questions So far ………. )
Ø Buttons ( Share / Download / Subscribe )
Just to refresh your memory :
Ø “ Share and Download “ buttons are meant ONLY for the CONSENSUS answer. Hence, these buttons will get positioned at the bottom of that answer
Ø As far as SUBSCRIBE button is concerned, it is meant to deliver ( into the mailbox of the subscriber ), a TABULATION of the “ Benchmark Scores “ of all 4 LLMs ( Scores for > Completeness > Practicality > Truthfulness > Ethicality…. as described in mail below )
But that would require that , any visitor can view these scores ( for any given LLM and for any given month / year ) , ONLINE also ( - which would motivate him to say , “ Hey , why do I have to keep coming back to this site in order to know whether these LLMs are actually IMPROVING their SCORES , as claimed by this site ? All that I have to do is to submit my Email ID and click “ SUBSCRIBE “ . Everyday , I will get to know the IMPROVEMENTS in scores “
As far as that ONLINE DISPLAY of BENCHMARK SCORES is concerned, you may adopt a system similar to our DASHBOARD ( Droplists for “ LLMs “ and “ Year / Month “
hcp
Kishan ,
In your WhatsApp message , you talked about your apprehensions with respect to Benchmarking
I asked GROK about this. I reproduce its reply below . Does it make sense ?
Hcp
Guidance for Kishan
WhatsApp Draft:
Hey Kishan,
loving your enthusiasm—2 weeks to a prototype demo at my place on April 13th is spot-on!
You’re right that benchmarks are tricky without predefined standards, but let’s not let that stall us.
For the prototype, we don’t need perfection—just a proof of concept showing LLMs collaborating, splitting tasks, and delivering fast, practical answers.
Here’s how we can tackle your “Measurable Achievement Against Benchmarks” concern:
1. Start Simple: For now, define basic metrics like:
o Completeness: % of the question answered (e.g., 80% if it covers most angles).
o
o Practicality: % of actionable insights (e.g., 70% if it’s usable advice).
o
o Truthfulness: % alignment with verifiable info (e.g., 90% if web-backed).
o
o Ethicality: % free of bias/harm (e.g., 95% if it’s fair).
o
Guess these for the first 10-20 questions we test—eyeballing is fine for a prototype.
o
o
2. Benchmark Lite:
Set a loose target (e.g., >70% across all metrics) based on what feels “good enough” for a human-like answer. We’ll refine this post-launch with real data.
3. Tool Integration:
Awesome that you’re adding internet access—let the AI pull articles and self-score truthfulness (e.g., “90%—backed by 3 sources”). That’s a killer feature already!
4. Focus on Collaboration:
Show 2-3 LLMs dividing a question (e.g., one researches, one drafts, one checks). That’s the “lean, mean, FAST” AGI roadmap I’m betting on—rising above the multi-agent crowd.
Don’t sweat the complexity now—build the interactions, slap on rough metrics, and we’ll tweak after 100 questions. If it feels intuitive and outpaces solo AIs, we’re golden. Thoughts? You’re GO—see you April 13th!
Why This Works
- Addresses His Concern: Kishan’s worried about undefined benchmarks. This gives him a lightweight starting point (70%+ eyeball metrics) without bogging him down.
-
- Aligns with Your Vision: Pushes the “cooperating LLMs” concept and positions IndiaAGI as a lean AGI contender—fast and collaborative.
-
- Keeps It Prototype-Friendly: No perfection pressure, just a functional demo to test the waters, as you suggested.
-
- Encourages Momentum: Greenlights his 2-week plan and tool idea, keeping him motivated.
-
- Opens Dialogue: Invites his input (“Thoughts?”), ensuring he feels heard on the benchmark complexity.
-
Tweaks You Might Add
- Specific Example: If you’ve got a sample question (e.g., “How can India fix political funding?”), mention it to focus his AI interactions.
-
- Reassurance: Add a line like, “We’ll nail the benchmarks post-100 questions—data will guide us,” to ease his research itch.