What would be the best way to grab the lang attribute from a HTML document. i.e.
<html lang="en"
I'm guessing that metamap.cfg classes don't pick up on the HTML tag, so
l,0,lang wouldn't work.
I've also tried the following settings for the metadata scraper, and tried pointing it to the HTML tag,
[{
"urlRegex": "http://www\.site\.org",
"metadataName": "dc.language",
"elementSelector": "html",
"applyIfNoMatch": false,
"extractionType": "attr",
"attributeName": "lang"
}]
But no luck with this either.
created
Sep '17
last reply
Sep '17
- 4
replies
- 7.1k
views
- 3
users
- 2
likes
- 3
links