Linux Tips


Lighttpd/ロボット弾き

Lighttpdでクローラーからのアクセスを拒否する場合、Apacheと同様、UserAgentを見てアクセス制限すればよい。なお、VirtualHostの設定をしている場合、各VirtualHostの設定内でこれらの設定を行う必要があるため、別ファイルに書き、includeすればOK。

$HTTP["useragent"] =~  "user-agent=Mozilla\/4.0 \(compatible; MSIE 5.5; Windows 98; DigExt\)" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "User-Agent: Mozilla\/4.0 \(compatible; MSIE 5.5; Windows NT 5.0\)" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "dloader\(NaverRobot\)\/1.0" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "NaverBot" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "^GoogleBot$" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Mozilla\/3.0 \(compatible; Indy Library\)" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Zeus 2.6" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Mozilla\/4.0 \(compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent\)" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Microsfot Internet Explorer\/4.40.426" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Mozilla\/4.0 \(compatible; MSIE 6.0; Windows 98; TencentTraveler" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Shim-Crawler" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Indy Library" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "Mozilla\/4.76 \[en\] \(Win98; U\)" {
        url.access-deny = ( "" )
}
$HTTP["useragent"] =~  "trafficmagnet\." {
        url.access-deny = ( "" )
}

:Lighttpd

 
Last-modified: 2008/07/15 (火)