May 4, 2011

Extracting complete material from ready-made sources | CyberSEO Pro | Support Forum

Avatar

Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_TopicIcon
Extracting complete material from ready-made sources
Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 (0 votes) 
August 9, 2024
10:35 pm
Avatar
Jessy
Member
Members
Forum Posts: 4
Member Since:
August 6, 2024
sp_UserOfflineSmall Offline

Hi, I use the source:

https://news.bitcoin.com/feed/


Short articles work fine, but it’s not possible to pull out the full text, it gives an error "Operation failed. Unable to retrieve full-text content from...." 
Setting selected: Extract full text articles
How can I pull out the full article?

[09-08-24 08:29:06] Processing a new post: Login to see this link
[09-08-24 08:29:06] Apply post filtering
[09-08-24 08:29:06] Done
[09-08-24 08:29:06] Checking for duplicate by link
[09-08-24 08:29:06] Trying to extract full text article with Full-Text RSS script
[09-08-24 08:29:07] Operation failed. Unable to retrieve full-text content from Login to see this link
[09-08-24 08:29:07] The post will not be added

[09-08-24 08:29:07] Processing a new post: Login to see this link
[09-08-24 08:29:07] Apply post filtering
[09-08-24 08:29:07] Done
[09-08-24 08:29:07] Checking for duplicate by link
[09-08-24 08:29:07] Trying to extract full text article with Full-Text RSS script
[09-08-24 08:29:08] Operation failed. Unable to retrieve full-text content from Login to see this link
[09-08-24 08:29:08] The post will not be added

[09-08-24 08:29:08] 0 posts were added.

August 9, 2024
10:57 pm
Avatar
CyberSEO
Admin
Forum Posts: 3913
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

The article you're trying to extract is protected by anti-scraping (JavaScript rendering) measures. These methods may block or obfuscate the content when extraction tools attempt to access it. As a result, the error "Operation failed. Unable to retrieve full-text content" error.

This limitation is typical of many sites that try to protect their content from automated extraction.

This article cannot be accessed or read in a browser without JavaScript enabled. The site relies on JavaScript to dynamically load and display content, so extraction tools that don't run JavaScript will have difficulty retrieving the full text. If JavaScript is disabled, the page content won't load properly, further preventing access to the article.

August 9, 2024
11:03 pm
Avatar
Jessy
Member
Members
Forum Posts: 4
Member Since:
August 6, 2024
sp_UserOfflineSmall Offline

This solution won't help either? Login to see this link
If the solution above does not help, then you need to make a custom solution that would call JS and generate your own rss-feed, something like a parser in Python?

August 9, 2024
11:13 pm
Avatar
CyberSEO
Admin
Forum Posts: 3913
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

CyberSEO Pro, like any other WordPress plugin, is written in PHP and cannot execute JavaScript by itself. If you need to extract content that requires JavaScript execution, you have a few options:

  1. Use a third-party service that can handle this for you. For example, services like Login to see this link offer JavaScript rendering as part of their features.
  2. Set up a custom solution on your server by installing something like Login to see this link to run JavaScript in a headless browser environment and extract the content you need.
August 15, 2024
12:35 pm
Avatar
Jessy
Member
Members
Forum Posts: 4
Member Since:
August 6, 2024
sp_UserOfflineSmall Offline

Is a JS support needed here too?

https://coinedition.com/news/

Log

[15-08-24 10:27:35] Feed URL: Login to see this link
[15-08-24 10:27:40] Processing a new post: Login to see this link
[15-08-24 10:27:40] Apply post filtering
[15-08-24 10:27:40] Done
[15-08-24 10:27:40] Checking for duplicate by link
[15-08-24 10:27:40] The post already exists
[15-08-24 10:27:40] Skipping

[15-08-24 10:27:40] 0 posts were added.

SiteConf 

body: //h1//ya-tr-span
body: //div[contains(concat(' ',normalize-space(@class),' '),' ce-single-post-featured-img-block ')]//img
body: //ul[contains(concat(' ',normalize-space(@class),' '),' wp-block-list ')]
body: //div[contains(concat(' ',normalize-space(@class),' '),' ce-single-post-content-block ')]
test_url: Login to see this link

WP G???bbe? copes with this site, without third-party solutions. I'm trying to figure out how to make friends with CyberSEO.

Can I make a setting so that CyberSEO takes materials from drafts on a schedule, for example, checks "drafts" every 15 minutes and then rewrites through AI automatically?

August 15, 2024
6:50 pm
Avatar
CyberSEO
Admin
Forum Posts: 3913
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

Login to see this link

To understand what RSS feeds are and how they differ from regular HTML Web pages, please read this: Login to see this link

August 15, 2024
7:40 pm
Avatar
Jessy
Member
Members
Forum Posts: 4
Member Since:
August 6, 2024
sp_UserOfflineSmall Offline

RSS messages are displayed automatically, can they be displayed as in the user's view using HTML?

and how to make such a method of work?

Login to see the quote

August 15, 2024
7:45 pm
Avatar
CyberSEO
Admin
Forum Posts: 3913
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

Be more specific on your first question, because it's unclear.

Regarding your second question. The plugin doesn't rewrite the already generated articles. Why don't you set it to rewrite them in the process of importing them?

If you want to save them as drafts, select the appropriate option in the Login to see this link

Forum Timezone: Europe/Amsterdam

Most Users Ever Online: 541

Currently Online:
6 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

ninja321: 84

s.baryshev.aoasp: 68

Freedom: 61

Pandermos: 54

MediFormatica: 49

B8europe: 48

Member Stats:

Guest Posters: 337

Members: 2820

Moderators: 0

Admins: 1

Forum Stats:

Groups: 1

Forums: 5

Topics: 1627

Posts: 8280

Newest Members:

rinikasyari, sneakpeachagency, bkc.tessier, igor.buzaev, jeremyboucher, lthompson2709

Administrators: CyberSEO: 3913