Below is a small class talk I gave on the hierarchical multi-labeling classification framework I outlined in my previous ‘Future of Tagging’ posts. I did a small experiment classifying tech news articles as Pro/Anti- Microsoft/Google (along with some other tags like the tech category and whether the article is a blog or publication based off the text of the piece). The results are very promising – even with such a small corpus of training documents the classifier performed very well. I do have some ideas on how to further improve accuracy, so when time cycles free up I’ll add those to the code and rerun it on a bigger and better (in terms of tag structure) training data set. By then I’m hoping the code will look less embarassing for public release and the results will be more conclusive – but until that day here are the presentation slides:
5 thoughts on “Hierarchical Multi-Labeling Presentation Slides”
Hey, What are ‘tag normalization violations’ (Slide 13)
Also, a second question: I don’t see how your multi-labelling binary classification yields a hierarchy. As I understand it, if you feed doc.txt into a ‘happy-classifier’ you get back a 0 or a 1. Even if you do this for other classifiers as well, how does this give you any sort of hierarchy?
Refer to Slide 9 which discusses what the tag normalization rules are. Basically if the bitstring (produced by the binary classifiers) associates 1’s to multiple tags in the same tree then there’s a violation.
This notion also buys into your second question. You design the tag trees (hierarchies) beforehand, and for each leaf tag you create a binary classifier (refer to the Future of Tagging Part II article which discusses how to build a more specific training data sets based off these trees). Pass in your new input through each classifier, get back a 0/1 for each – a bitstring associating 1’s or 0’s to the leaf tags. Now use the tag hierarchy here to detect errors (according to the rules on Slide 9) and sample solutions that correct them. So basically, we just use the tag hierarchy to enforce a little structure on how the solution should look like. Binary classification does not ‘yield a hierarchy’ per say – it just helps us ballpark an initial solution. One of the goals of this project was to demonstrate how a little tag structure (i.e. the tree or hierarchy) can help boost classification accuracy.
Hope this helps. Btw, these slides really rely on the presentation talk, so my apologies if some parts seem incomplete. I’m hoping I can produce something more in-depth when time permits.
Yes, everything is much clearer now. Can you suggest any papers which try to solve this problem by focusing on learning rather than by decoupling learning / error correction?