Table of Contents
Submission Comments
*This GIF has been cut for demonstration purposes.
All Flags
These are all the flags that may be used when scraping submission comments.
[-c <submission_url> <n_results>]
[--raw]
Usage
poetry run Urs.py -c <submission_url> <n_results>
Submission metadata will be included in the submission_metadata
field and includes the following attributes:
author
created_utc
distinguished
edited
is_original_content
is_self
link_flair_text
locked
nsfw
num_comments
permalink
score
selftext
spoiler
stickied
subreddit
title
upvote_ratio
If the submission contains a gallery, the attributes gallery_data
and media_metadata
will be included.
Comments are written to the comments
field. They are sorted by "Best", which is the default sorting option when you visit a submission.
PRAW returns submission comments in level order, which means scrape speeds are proportional to the submission's popularity.
File Naming Conventions
The file names will generally follow this format:
[POST_TITLE]-[N_RESULTS]-result(s).json
Scrape data is exported to the comments
directory.
Number of Comments Returned
You can scrape all comments from a submission by passing in 0
for <n_results>
. Subsequently, [N_RESULTS]-result(s)
in the file name will be replaced with all
.
Otherwise, specify the number of results you want returned. If you passed in a specific number of results, the structured export will return up to <n_results>
top level comments and include all of its replies.
Structured Comments
This is the default export style. Structured scrapes resemble comment threads on Reddit. This style takes just a little longer to export compared to the raw format because URS
uses depth-first search to create the comment Forest
after retrieving all comments from a submission.
If you want to learn more about how it works, refer to The Forest, where I describe how I implemented the Forest
, and Speeding up Python With Rust to learn about how I drastically improved the performance of the Forest
by rewriting it in Rust.
Raw Comments
Raw scrapes do not resemble comment threads, but returns all comments on a submission in level order: all top-level comments are listed first, followed by all second-level comments, then third, etc.
You can export to raw format by including the --raw
flag. -raw
will also be appended to the end of the file name.