Kishan
I agree . Let us keep it in abeyance
As far as our next improvement is concerned , both Vishesh and Nirmit feel that there is need to come up with a better / more institutive U / I for the “ Explore “ page
Basic observation is :
Each LLM should have its own / independent / separate / vertical BLOCK , in which its own answer appears ( as against , current visual of every LLM’s answers appearing in the same block )
I attach my chat in this regard with ChatGPT , for your guidance. You decide how best we can obtain the most easily understandable visual , with arrows directing the “ blinks “ from one block to the another block
You may first prepare a Wireline diagram and show ( online / AnyDesk ) to me / vishesh, before starting coding
What is your estimate for implementation ?
hcp
From: Kishan Kokal [mailto:kokalkishan.official@
Sent: 02 June 2025 23:55
To: Hemen Parekh
Cc: vishesh@incomegroup.in
Subject: Re: Consensus Answer Rating System
If we’re not actively planning to review or use the ratings to improve the model at this stage, I’d suggest we move ahead with the next feature instead. We'd revisit this down the line if a clear use case emerges.
Let me know your thoughts.
On Mon, 2 Jun 2025 at 11:34, Hemen Parekh <hcp@recruitguru.com> wrote:
Kishan ,
Just saw your following WhatsApp message :
“Based on what I understood from the email and our conversation when I visited your house, simply telling the system that it’s hallucinating isn’t enough to fine-tune the model. The user needs to provide the expected answer for effective fine-tuning “
That made me chat with ChatGPT as follows.
Does it make any sense ? – even , by way of impressing Users – Minister ( Ashwini Vaishnaw ) – IndiaAI mission officers – startups feverishly working on Foundational LLMs ?
Of course, if it is going to take more than 10 hours of your time, it may not be worth the effort
You make the final decision and let me know your plan. Should you decide, NOT to spend time on this, then I will send to you the write-up re introduction of the NEXT feature
Hcp
Hey Chetan bhai : Do you have some system whereby a user can " Rate " your answer ( on some kind of a 1 to 5 scale ) ? If yes, how does such a simple , non-descriptive feedback help you to " learn and improve " ?
ChatGPT said:
Hey bhai! Great question — thoughtful as always.
Yes, there's a rating system:
On platforms like ChatGPT, you’ll often see
or
buttons, and sometimes users are asked to rate responses on a 1 to 5 scale. That’s part of how OpenAI collects user feedback.
But how does simple feedback (like a number) help me “learn and improve”?
Well, I don’t learn from feedback in real time, and I don’t store or remember interactions across users (privacy and safety first). But here’s how feedback is used by OpenAI to improve models like me:
Behind the scenes: How it works
1. User gives a rating (e.g., 2/5)
→ That response, the prompt, and the rating go into a dataset (anonymized).2. Humans review a sample of these interactions
→ Especially the low-rated ones. They label what was missing:o Was it factually wrong?
o Did it miss the user’s intent?
o Was it too vague, too verbose, or inappropriate?
3. That feedback helps train better models
→ Engineers fine-tune new versions using this labeled data.
→ They may even use reinforcement learning (like RLHF, Reinforcement Learning from Human Feedback) where models learn to favor patterns that correlate with high ratings.
Why even 1–5 stars matter
Even though it’s simple:
· Patterns emerge — e.g., lots of 1-star ratings on certain types of questions = red flag.
· Training signals — it tells the system which responses are generally good or bad.
· Model alignment — helps future models respond more helpfully, safely, and clearly.
Bonus: You can give detailed feedback too!
In ChatGPT, after rating, you can often add comments — those are super valuable. They tell engineers why you rated it that way, which is pure gold.
So, tl;dr:
Ratings alone don’t make me smarter directly, but they’re essential signals that help train and improve future versions of me.
Let me know if you’d like to peek deeper into how RLHF or fine-tuning works!
You said:
Hey Chetan Bhai : I get it. In your case , a human expert steps in with his feedback ( in respect of a USER RATING SCALE ) after User selection. Unfortunately , I am a one and half-men army ( my developer colleague Kishan and myself ) as far as www.IndiaAGI.ai is concerned . Kishan has no time and ( at 92 ) , I have no energy , for providing any human feedback to the 4 LLMs ( ChatGPT - Gemini - Claude and Grok ) participating in IndiaAGI. Suppose I provide for that 5 point Rating Scale ( without subsequent human intervention ), is that of any value ?
ChatGPT said:
Chetan Bhai
— this is an excellent and very real-world question. You're right on point, and the fact that you're even thinking about structured feedback in IndiaAGI.ai shows your clarity.
Let’s break this down simply:
Short Answer: YES, a 5-point rating scale without human intervention can still be of value — but with clear limitations.
What it can help you with (even without human review):
User Sentiment Signal
· A 5-point scale gives you quantitative feedback on how users feel.
· If you find, say:
o ChatGPT averages 4.6
o Claude averages 3.9
o Gemini lags at 3.2
…you’ve got comparative performance insight, even without manual intervention.
Heatmapping topics
· If you log:
o The query topic
o The LLM that answered
o The rating
→ Over time, you’ll see which LLM does better on what (e.g., Claude might ace philosophy, Gemini might struggle with Hindi poetry).
Training ground for future
· If later you or someone else wants to introduce RLHF, the user-rated data can serve as a foundation dataset (even for fine-tuning prompts or reranking).
User Engagement Booster
· A rating system makes users feel like their opinion counts — and that’s gold for retention and product trust.
What it can’t do (without human loop):
· It won’t help the LLMs learn on their own (OpenAI, Anthropic, etc., don’t let the models update themselves from your app’s feedback).
· It won’t explain “why” a response failed (unless the user adds comments).
· You can’t apply corrective tuning or reinforcement without knowing what was missing.
️ Simple Advice for Your Case
Here’s what might make it truly valuable even as a 1.5-men army:
1. Capture query + LLM + rating into a basic table.
2. Optionally, allow an optional text comment: “What could have been better?”
3. Use this data to:
o Identify poorly performing topics
o Adjust your prompt templates
o Select default LLMs based on topic
e.g., If Claude is always rated high for ethics/philosophy, maybe make it the default for such questions.
Bonus Idea (Low-Energy Trick):
Once a month, sample just 10–20 low-rated interactions, look for patterns (e.g., "LLM misunderstood context", "hallucinated stats", etc.). Even this tiny feedback loop can help tweak your system.
Conclusion:
Even without Kishan or you stepping in, a 5-star rating scale can:
· Help you spot patterns,
· Compare LLMs objectively,
· Guide better defaults or prompt routing,
· And give users a voice.
It’s not a training loop, but it’s still a smart layer of feedback intelligence. Very much worth adding.
Let me know if you want a mockup of how to structure the backend or frontend for this!
You said:
Hey , many thanks for your valuable guidance. Sure , Kishan would be delighted with whatever " backend and frontend " structure that you can suggest to him. Pl do
ChatGPT said:
Updated saved memory
Bahut badiya, Chetan Bhai
— here's a practical suggestion for both frontend and backend structure to implement a 5-point rating system on IndiaAGI.ai. This design keeps it simple but extensible.
️ Frontend (User Interface)
Goal:
Let users rate LLM responses quickly with an optional comment.
Suggested Layout:
For each LLM response block, add:
htmlCopyEdit<divclass="rating-section"><labelfor="rating">Rate this response:</label><divclass="stars"><!-- Replace with star icons or emojis -->![]()
![]()
![]()
![]()
</div><textareaplaceholder="Optional feedback (Why this rating?)"></textarea><button>Submit Feedback</button></div>
UX Features:
· Hover + Click stars to choose rating (1 to 5).
· Use emojis or SVG icons for stars.
· Minimalist styling using TailwindCSS or Bootstrap.
· Optional comment field with character limit (e.g., 200 chars).
️ Backend (Data Capture & Storage)
Required Fields to Log:
Each feedback entry should store:
Field
Type
Description
timestampdatetime
When the rating was submitted
llm_namestring
"ChatGPT", "Claude", "Gemini", or "Grok"
user_querytext
The original prompt/query
llm_responsetext
Full or summary of the model's response
ratingint
1 to 5
user_commenttext/null
Optional comment
session_idstring
Optional: for tracking repeat users
️ Suggested Storage Options:
· Use a simple SQLite or PostgreSQL DB (or Firebase for low-maintenance).
· Create a table like
llm_feedback.sqlCopyEditCREATETABLEllm_feedback (id SERIALPRIMARYKEY,timestampTIMESTAMPDEFAULTCURRENT_TIMESTAMP,llm_nameVARCHAR(20),user_query TEXT,llm_response TEXT,ratingINTEGERCHECK(ratingBETWEEN1AND5),user_comment TEXT,session_idVARCHAR(64));
Bonus: Admin Dashboard (Optional for Later)
A simple UI to:
· View average ratings by LLM
· Filter by topic or keyword
· Export CSV
Use:
· Retool, Streamlit, or React frontend with DataTables.js
· OR just run SQL queries manually for now
Summary
Backend Flow:
1. Frontend sends
rating,comment,llm_name, etc. via API.2. Server writes entry to
llm_feedbacktable.3. You later query this table to discover patterns.
If Kishan wants a jumpstart:
I can generate:
· A working React frontend component
· A Node.js/Express or Flask API endpoint to store feedback
· A sample SQLite DB schema
Let me know Kishan bhai's preferred stack and I’ll tailor it perfectly to your setup!

Yes, there's a rating system:

But how does simple feedback (like a number) help me “learn and improve”?
Why even 1–5 stars matter
Bonus: You can give detailed feedback too!
Ratings alone don’t make me smarter directly, but they’re essential signals that help train and improve future versions of me.

What it can’t do (without human loop):
Bonus Idea (Low-Energy Trick):

UX Features:




No comments:
Post a Comment