FUPS: Forum user-post scraper
Enter settings for your phpBB forum
To retrieve your posts: fill in the settings below, optionally after reading the questions and answers below the settings form, then click "Retrieve posts!". A status page will appear, updating progress automatically in a status box. When scraping is complete, the results file(s) will be linked to.
Unconditionally required fields are single-asterisked. Double-asterisked fields indicate that at least one of these fields is required.
Answers to possible questions
How can I know if a forum is a phpBB forum?
Typically, phpBB forums can be identified by the presence of the text "Powered by phpBB" in the footer of their forum pages. It is possible, however, that these footer texts have been removed by the administrator of the forum. In this case, the only way to know for sure is to contact your forum administrator.
Does the script work with forums using a language other than English?
Yes, or at least, it's intended to: if you experience problems, please contact me.
Do I need to supply a login username and password?
Probably not. These are the conditions under which you do:
- You do not supply a value for the Extract User Username setting, and the phpBB board you're retrieving from requires login before it will display member information.
- Your local timezone (configured in your board preferences) is different to the board's default timezone, and you wish for all dates and times displayed against your posts to be in your local timezone.
- You are retrieving posts from a private forum.
Is it safe to supply my login username and password?
You will need to use your judgement here. I have attempted to make it as safe as possible without compromising simplicity. Your username and password, along with all other settings, will be stored in one or two files in a private directory (i.e. not accessible via the web) on my web hosting account for no longer than three days (a scheduled task deletes these files periodically; it runs once a day and deletes files more than two days old). In addition, you will be presented with an option after the script runs, or, if you cancel the script, to delete immediately all files associated with your request. I will never look inside the temporary files containing your username/password.
If this doesn't satisfy you, you might consider temporarily changing your password for the script, and then changing it back again once the script has finished.
Is it safe to retrieve posts from a private forum through this script?
Your username and password are as safe as the previous answer describes. The content of your posts (the output file) is slightly less safe in that this output file is publicly accessible - but only to those who know the 32-character random token associated with it, and only until it is deleted either by you after you have saved it, or by the daily scheduled deletion task. As with usernames and passwords, I will never look inside the temporary file containing your posts' content.
Which skins are supported?
Both the prosilver and subsilver skins are supported. The script probably won't work with customised skins, but if you desire support for such a skin (you are getting error messages about regular expressions failing), feel free to contact me. A workaround is to simply set your skin to either prosilver or subsilver in the user control panel of your phpBB forum whilst you are logged in, and then to supply your login credentials in the settings above, optionally reverting your skin back to whatever it was before in the user control panel after running FUPS.
How long will the process take?
It depends on how many posts are to be retrieved, and how many pages they are spread across. You can expect to wait roughly one hour to extract and output 1,000 posts.
Are images supported?
Yes when scraping based on "Extract User ID"; no when scraping based on "Forum IDs". In the case of the former: if you check "Scrape images" (checked by default), then images are downloaded along with the posts. If not, then all relative image URLs are converted to absolute URLs, so images will display in the HTML output files so long as you are online at the time of viewing those files. Note that if you wish to scrape images which are attached to posts then you will need to also check "Scrape attachments" too. Note however that attachments are not supported on all skins: if the version of the phpBB software that your forum is running is old then FUPS might not scrape attachments even if you do check "Scrape attachments".
Is the downloading of attachments supported?
Yes when scraping based on "Extract User ID"; no when scraping based on "Forum IDs". In the case of the former: if you check "Scrape attachments" (checked by default), then attachments are downloaded along with the posts. Note however that attachments are not supported on all skins: if the version of the phpBB software that your forum is running is old then FUPS might not scrape attachments even if you do check "Scrape attachments".
Why is this script so slow?
So as to avoid hammering other people's web servers, the script pauses for five seconds between each page retrieval.
Does this script have any relationship with the PHPBB-Extract script on GitHub?
No, they are separate projects.
Are there any resource issues of which I should be aware?
Yes - because this site is hosted on a shared server, I am limited to a fixed and fairly small number of processes, and each run of FUPS requires two processes, one for the background process doing the scraping, and another for the status web page. For most users, too, the number of posts is significant and the process will run for some time. Please, then, limit yourself to one run of the script at a time, and if you change your mind about wanting to run the script after having clicked "Retrieve posts!", then please click the cancellation link.