Bittorrent

From Fernseher
Revision as of 21:53, 24 September 2009 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Use modules

  • XML::RSS::Parser::Lite
  • LWP::UserAgent
  • DBI
  • URI
  • HTML::Entities
  • HTML::TokeParser::Simple
  • WWW::Search::Mininova
  • Net::BitTorrent
  • POSIX qw(setsid)
  • [[1]]
daemonize
set up sigint handler to set killflag
connect to db
set up torrent queue
while not killflag
  processqueue
  if its been a second, checktorrentstates
  if its been a minute, checkfornewtorrents
stop torrent queue
checktorrentstates
destroy torrent queue
quit
checkfornewtorrents:
  lock table
  get all torrents with state of torrentretrieved
  foreach new torrent
    add to queue and update torrent state
  unlock table
checktorrentstates:
  lock table
  calculate stats for each torrent in the queue
  check if we were told to pause this torrent, if so, remove that torrent from the queue
  update the table with the latest stats.
  if necessary, remove from torrent queue. if not connected anymore, torrentretrieved/unconnected.  If paused, paused/unconnected. If failed, torrentretrieved/unconnected.
  unlock table



CUT!


So, the idea was that some shows it might just be easier to get from Bittorrent than to wait for them to show up on television. Also, they are usually compressed better, are at higher quality, and usually have the commercials completely removed from them. So it would be nice if there was an easy way to see what could be gotten via Bittorrent. (Might also be useful to be able to get a list of episodes from places like wikipedia, with description, original air date, episode number, etc.)

So, I need a way to access The Pirate Bay's database automatically. Recently, they have hinted at an API that will be useful for searching their database, but I can't currently find any details about that. For the time being, I may have to make my own HTML scraping version.


http://thepiratebay.org/search.php?q=<searchterms>&page=<resultpagenum>&orderby=<orderkey>

orderkey can be name, date, size, se, le page starts at 0.

The table we want has an id of searchResult

Thence follows a thead row, and then rows of results.

Each result row is of form (has the following cells with examples):

<category link>: "<a href="CATLINK" title="More from this category">CATEGORY</a>"

<detail link and title>: "<a href="DESCLINK" class="detLink" title="Details for TITLE">TITLE</a>"

<date and time of upload>: "DATE&nbsp;TIME"

<download link>: "<a href="DLLINK" title="Download this torrent"><img src="IMGLINK" class="dl" alt="Download" /></a>"

<size>: "SIZENUM&nbsp;SIZEUNIT"

<seeders>: "SENUM"

<leachers>: "LENUM"

So, from that, we want to extract CATEGORY, TITLE, DATE, TIME, DLLINK, SIZENUM, SIZEUNIT, SENUM, LENUM. Should be a cinch.

The last row is empty in the source, but gets magically javascriptally filled in to be the page selector. I wouldn't worry about it for now.

Note that all of this is likely to change as The Pirate Bay changes how they do things to be more AJAXy and APIy. Hopefully when that occurs, this will all fail gracefully, we'll know, and we can fix it to use the new APIs.