A
faster / cheaper / accurate , alternative to
ANNUAL
EMPLOYMENT SURVEY ?
---------------------------------------------------------------
A
Proposal submitted to
* Shri TCA Anant ,
Chief Statistician ,
Ministry of Statistics and Programme Implementation ,
Government of India ,
------------------------------------------------------------------------------------
Suggested
By :
hemen
parekh
mumbai
/ (M) 0 - 98,67,55,08,08
27
Oct 2015
-------------------------------------------------------------------------------------
I have a
database of over 5 million job advts , downloaded over the past 6 / 7 years
from various job portals of India
Each job advt
database consists of :
Advt ID
Designation ( being advertised )
Company Name ( Advertiser )
Job Description
Desired Profile
Compensation Offered
Experience ( desired ) – Years
Industry Type
Education Quali ( Min )
Location ( Posting City )
Keywords
Advt Posting Date
Expiry Date
Some years back
, ( when our website , www.World-Wide-Jobs.com , was up and running )
, we had developed a feature to analyze this database and display the findings
visually , in different ways
We were
displaying PIE-CHARTS of :
Industry-wise Jobs
City-wise Jobs
You will
observe that , with a much larger database available now , it is possible to analyze
/ display the “ No of Jobs “ , in many more ways
Not only that ,
it should be possible to analyze this huge database to predict the future
expected PATTERN of the occurrence of jobs , in many different ways !
At any given
time , the number of jobs getting advertised , is an important Economic
Indicator
If economy is
booming and company Order Books are getting fatter , then more jobs will get
advertized – and vice-versa
Hence , a
time-series analysis of the no of new jobs getting posted on job portals , has
a straight line relationship with the state of the economy ( a high
co-efficient of correlation )
Apart from that
, can a Data mining of 5 million jobs , answer ( even partially ) , the
following questions ?
Who ( which Companies ) are advertizing and when ?
What jobs / vacancies / positions are being advertized ?
What is the frequency with
which a particular job gets advertized ? By entire industry ? By a given
Company ?
Which regions / cities have max / min no of new jobs ?
What are regional disparities
due to ?
Which Industries are
advertising most – creating most jobs ?
What Edu Qualifications are
in max demand ?
What kind of jobs demand what kind of Edu Qualifications ?
What is the level of co-relation between , Position and the years
of Experience demanded ?
For identical positions being advertized , how much do “ Job
Descriptions / Desired Profiles “ differ, from company to company ?
Are there significant differences in the “ No of years of
Experience “ being demanded , for identical positions ?
What is the probability of finding the “ Keywords “ in “ Job
Description / Desired Profile “ ?
What is the extent of duplication ( redundancy ? ) between , “ Job
Description “ and “ Desired Profile “ ?
What percentage of Advts fail to make any mention of ,
Compensation Offered ?
When a company posts an advt for same / identical position , at
different points of time , are there any differences in values ( fields ) ?
From an analysis of all the advts posted by a given Company ( over
past 7 years ) , can any conclusion be reached as to the changing nature of
that company’s business (by co-relating the “ Skills related Keywords “)?
Can the algorithm predict what job a company will advertize next –
and when ?
Is there any correlation between , “ Designation / Position “ and
the “ Keywords “ ?
From analyzing this huge data , can software auto-generate , a
complete / editable job advt , as soon as a Recruiter simply types the “
Designation / Position “ ?
I believe , so
far , no one has undertaken such a Data mining project
If carried out
diligently , I am sure , the outcome would be of immense benefit to :
HR Managers
for Manpower Planning / Compensation Planning
Recruiting Managers
for framing Man Specifications / Job Description Manuals
Educationists
for deciding what Edu Quali are in demand and tailor the Courses
Students
to figure out what “ Skills “ are in demand by Industry and prepare
Planning Commission ( NITI Aayog )
for allocating Resources to States / Regions , based on imbalances
HRD Ministry
For long term Macro-Planning in respect of Education
National Skills Development Commission
for chalking out Skills Development Programs in collaboration with Companies /
Industries
If undertaken –
and executed seriously – then this Data mining project has the potential to place
Ministry of Statistics and Programme Implementation ,
on the Centre-Stage of National
Education Planning Scenario
What
can / will such a project yield ?
Without
exaggerating , it would be safe to assume that , this vast database of job
advts would contain :
50
million phrases / sentences
500
million words
Obviously
, each word / phrase / sentence , is nothing more than a
“
Database of Intentions “ of the Employer Companies
(
to borrow from John Battelle’s well-researched book about Google )
Our
goal shall be to make this ( Data mining Algorithm ) a dynamic / continuous “
Process “ , so that , we can measure the changing nature of these “ Intentions
“ , over a long , long period
And
we must enable a “ Researching Visitor ( of web site ) “, to benefit from these
trends / patterns
Even
though 5 million job advts may contain 500 million “ words “ , these are not
Unique
Most
of these are used again and again , hundreds or thousands of times
Thru
data mining , it is not difficult to compute their “ Frequency of Usage “
And
then , these frequencies can be graphically plotted against any particular
time-period
Such
Graphical Representations can be further broken up by ,
City Names
Company Names
Industry Names
Function Names
Designations ( Vacancy Names ).. etc
And
such graphical analysis can be done , not only for “ Keywords “ but even for “
Key Phrases “ and “ Sentences “ !
Take
a look at this project paper ( NOT ENCLOSED )
It
is all about data mining of some 150 million records ( location points ) and
about uncovering “ trends / patterns “ of physical movements of 300 human
volunteers , over a “ period of time “
I
quote from article in Times of India ( 19 July
2013 ) :
“
..the first system of its kind to predict long term human mobility in a unified
way , parse the data. " Far Out " does
not need to be told exactly what to look for --- it automatically
discovered regularities in the data “
“
Do you know precisely where you’ll be 285 days from now at 2 pm ?
Researchers
have developed a new tracking software that can tell you exactly where you will
be on a precise time and date , years into the future “
What
we want to do with 5 million job advts database , is quite similar, viz ;
predict
,
WHO ( which Company / Industry
) , will advertize
WHAT
( vacancies / positions / designations ), and
WHEN ( time )
I am talking about developing an “ Expert System “ , thru
discovery of specific “ Co-relations “ amongst various Data Fields of 5 million
job advts
Eg :
Ø
What is the Co-relation between , any given
Ø
“ Designation / Vacancy-Name / Advertized Position ,
and
Ø
Educational Qualifications ?
Here are some examples :
Ø
Any designation such as “ Production Manager “ would call
for an “ Engineering Degree / Diploma “ ( but never a CS / CA )
Ø
Any designation in “ Finance Function “ will require,
B Com
M Com
CA etc
But never a BE(M ) / BE (Chem
)
Ø
Any designation at Manager level will call for a minimum
experience of 5 years ( but never a Fresh Graduate with NIL experience )
Ø
MBA / BBA / MMS etc are the most preferred Edu Qualifications for
positions in Marketing
Ø
No vacancy in an Automobile Manufacturing Company , will call for
a degree in Pharmaceutical
Ø
No Electrical Machinery Manufacturing company will ever demand a
Medical Degree (MBBS )
To a human mind , these ( rules ) are so obvious !
But , no human mind can write-down ALL of such RULES , in 2
minutes ! – something that your Data mining Software can – and will – do in 5
seconds !
All that you need , after computing “ Frequencies of Occurrences “
, is to :
Ø
Plot the Co-efficients of Co-relations between various Fields ( of job advts )
Ø
Compute Probabilities for each and create hundreds of Probability Tables
And , since a thousand new job advts are getting added to our Job
Advt Database , daily , the SAMPLE SIZE is perpetually increasing – thereby ,
increasing the Accuracies of your Predictions !
Having done this , imagine the following scenario :
Recruitment Officer of Wipro , comes to our “ Post Job “ page and
, in the field for “ Designation “ simply types ,
“ Business Analyst “
And Presto !
The entire Job Advt Form gets auto-filled , with MOST PROBABLE
values !
Would not that amaze her ?
All that our software has done is analyzed job advts of all “
Software Companies “ ( an Industry ),– and of WIPRO – for the position of Business
Analyst and filled in the most probable values
This is no rocket science !
We had actually , partially attempted it – albeit in a crude way –
in our earlier web site ,
What surprises me is , how come no one has attempted this so far !
Especially , Naukri / TimesJobs / MonsterIndia , who have
accumulated millions of job advts !
Anyway , the fact that they have , so far , ignored this
Line of Examination , will work to the advantage of
Ministry of Statistics and Programme
Implementation
– making YOU the very first person in the entire world to come up with
a
> PREDICTION MODEL
in the area of JOBS
However , without
applying some simple data mining tool , it would not be possible to answer the
following questions :
Where is the
greatest decline of jobs being advertized ?
How much is the
percentage decline ?
In which Industry ?
In which Company ?
In which City ?
In which Region ?
In which Skills ?
For which Positions ?
For which Education Levels ? ………… etc
With a data
mining tool , such individual graphs could emerge ( within fraction of a second
) at the click of a button !
One could even
co-relate these graphs with other ,
publicly
available statistical data such as :
IIP ( Index of Industrial Production )
Stock Market Index
Currency Exchange Rate ( eg; declining Rupee )
Decline in GDP / Increasing Fiscal Deficit
CAD ( Current Account Deficit )
Foreign Investments
Primary Bank Rates of RBI…………………………….etc
With proper co-relations , one could even predict how much the job market will further shrink , over the next 6 months ! or grow ?
Such” Predictive Model of Job
Market “, would be of immense interest to , not only the economists but
also to the
HRD Ministry /
Planning Commission /
Educational Institutions and of course the
students themselves