SQL Text Mining

One of the projects Jim Gray and I worked on this summer was classifying the types of SQL users ask on the SkyServer site ( http://cas.sdss.org/dr5/en/ ). We were surprised that we could not find any existing research that could describe methods on how to break down the SQL for categorization – especially considering the number of websites and database workloads that bookkeep query logs. Below is a link to the powerpoint presentation I gave at MSR Mountain View last week which describes how we analyzed the SQL. Notable features include text processing strategies, clustering algorithms, distance functions, and two example applications (Bot detection and Query recommendation). We plan to publish our algorithms and results in a technical report in the next month or so – but for now, enjoy the .ppt. As always, comments are more than welcome.

SQL Text Mining Presentation

Creative Commons License

Advertisement