# # Block web spiders/robots # This file is used to exclude certain types of user-agents from # browsing portions of the site which are not to be indexed. Some # additional exclusions prevent # # Disallow these bots User-agent: dloader User-agent: EasyWebPromotion User-agent: Harvest User-agent: LinkWalker User-agent: Mozilla User-agent: MSIE User-agent: Microsoft URL Control User-agent: Teleport User-agent: Webwhacker User-agent: Webzip User-agent: Net Attache User-agent: SiteSnagger User-agent: HTTrack User-agent: WebCapture User-agent: WebSauger User-agent: Zeus Disallow: / # Notes: # Mozilla included because we don't want bots masquerading as regular # user-agents # Known email bots # Not sure that any will respond to this file or not. User-agent: AtSpider User-agent: AutoEmailSpider User-agent: CherryPicker User-agent: Crescent User-agent: DSurf User-agent: DTS Agent User-agent: EmailCollector User-agent: EmailSiphon User-agent: EmailWolf User-agent: ExtractorPro User-agent: Mail Sweeper User-agent: WhoWhere User-agent: WX_mail Disallow: / # Normal search engines and anyone else who doesn't respond to # other rules should avoid these files. User-agent: * Disallow: /cgi-bin/ Disallow: /cms/ Disallow: /errors/ Disallow: /include/ # Old folders which are now gone... Disallow: /~ Disallow: /info/ Disallow: /press/releases/ Disallow: /press/news/ Disallow: /athletics/ Disallow: /sports/ Disallow: /bboards/ Disallow: /home.html # Organizations and departments... Disallow: /abengrg Disallow: /acs Disallow: /ce-enve Disallow: /cs Disallow: /geology Disallow: /govlaw Disallow: /library Disallow: /mecheng Disallow: /music Disallow: /outreach Disallow: /publius Disallow: /security # Faculty and staff... Disallow: /allanr Disallow: /bukicsr Disallow: /cape Disallow: /doughera Disallow: /faccipop Disallow: /kayserj Disallow: /mcglonem Disallow: /millerg Disallow: /niless # Lure the window shoppers to request this directory... Disallow: /ignorant_bots