Bittorrent
Use modules
- XML::RSS::Parser::Lite
- LWP::UserAgent
- DBI
- URI
- HTML::Entities
- HTML::TokeParser::Simple
- WWW::Search::Mininova
- Net::BitTorrent
- POSIX qw(setsid)
- [[1]]
daemonize set up sigint handler to set killflag connect to db set up torrent queue while not killflag processqueue if its been a second, checktorrentstates if its been a minute, checkfornewtorrents stop torrent queue checktorrentstates destroy torrent queue quit
checkfornewtorrents: lock table get all torrents with state of torrentretrieved foreach new torrent add to queue and update torrent state unlock table
checktorrentstates: lock table calculate stats for each torrent in the queue check if we were told to pause this torrent, if so, remove that torrent from the queue update the table with the latest stats. if necessary, remove from torrent queue. if not connected anymore, torrentretrieved/unconnected. If paused, paused/unconnected. If failed, torrentretrieved/unconnected. unlock table
CUT!
So, the idea was that some shows it might just be easier to get from Bittorrent than to wait for them to show up on television. Also, they are usually compressed better, are at higher quality, and usually have the commercials completely removed from them. So it would be nice if there was an easy way to see what could be gotten via Bittorrent. (Might also be useful to be able to get a list of episodes from places like wikipedia, with description, original air date, episode number, etc.)
So, I need a way to access The Pirate Bay's database automatically. Recently, they have hinted at an API that will be useful for searching their database, but I can't currently find any details about that. For the time being, I may have to make my own HTML scraping version.
http://thepiratebay.org/search.php?q=<searchterms>&page=<resultpagenum>&orderby=<orderkey>
orderkey can be name, date, size, se, le page starts at 0.
The table we want has an id of searchResult
Thence follows a thead row, and then rows of results.
Each result row is of form (has the following cells with examples):
<category link>: "<a href="CATLINK" title="More from this category">CATEGORY</a>"
<detail link and title>: "<a href="DESCLINK" class="detLink" title="Details for TITLE">TITLE</a>"
<date and time of upload>: "DATE TIME"
<download link>: "<a href="DLLINK" title="Download this torrent"><img src="IMGLINK" class="dl" alt="Download" /></a>"
<size>: "SIZENUM SIZEUNIT"
<seeders>: "SENUM"
<leachers>: "LENUM"
So, from that, we want to extract CATEGORY, TITLE, DATE, TIME, DLLINK, SIZENUM, SIZEUNIT, SENUM, LENUM. Should be a cinch.
The last row is empty in the source, but gets magically javascriptally filled in to be the page selector. I wouldn't worry about it for now.
Note that all of this is likely to change as The Pirate Bay changes how they do things to be more AJAXy and APIy. Hopefully when that occurs, this will all fail gracefully, we'll know, and we can fix it to use the new APIs.