SQL Text Mining

One of the projects Jim Gray and I worked on this summer was classifying the types of SQL users ask on the SkyServer site ( http://cas.sdss.org/dr5/en/ ). We were surprised that we could not find any existing research that could describe methods on how to break down the SQL for categorization – especially considering the number of websites and database workloads that bookkeep query logs. Below is a link to the powerpoint presentation I gave at MSR Mountain View last week which describes how we analyzed the SQL. Notable features include text processing strategies, clustering algorithms, distance functions, and two example applications (Bot detection and Query recommendation). We plan to publish our algorithms and results in a technical report in the next month or so – but for now, enjoy the .ppt. As always, comments are more than welcome.

SQL Text Mining Presentation

Creative Commons License

Advertisement

One thought on “SQL Text Mining

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s