10:56 am
July 27, 2023
The script does not guarantee that the full-text article will be extracted from each web page. Some pages have a complicated layout and it is not possible to parse them automatically. In this case it is recommended to use the container tag for article extraction as described in this article: Login to see this link
11:06 am
July 27, 2023
11:17 am
July 27, 2023
1:04 pm
July 27, 2023
I'm trying to get the content of the page Login to see this link by specifying the attributes of the div container {"id": "block-system-main", "class": " block block-system"} but I'm getting the error --
[31-07-23 11:01:33] Processing a new post: Login to see this link
[31-07-23 11:01:33] Checking for duplicate by link
[31-07-23 11:01:33] Trying to extract full text article
[31-07-23 11:01:33] Tag specified: <div>
[31-07-23 11:01:33] Attributes specified: {"id": "block-system-main", "class": " block block-system"}
[31-07-23 11:01:34] Operation failed. Unable to retrieve full-text content from Login to see this link
[31-07-23 11:01:34] The post will not be added.
[31-07-23 11:01:34] 0 posts were added.
What am I doing wrong?
1:50 pm
July 27, 2023
Thank you, now I can extract the text of the article.
But the original formatting is preserved in the article.
I'm trying to remove the corresponding container using attributes div {"class": "with-sidebar-first col-12 col-sm-12 col-md-12 col-lg-9 col-xl-9"} in the parameter Remove outer HTML elements but after that the encoding flies in the article.
Login to see this link
The page code looks like this Login to see this link
There is no such a class as "with-sidebar-first col-12 col-sm-12 col-md-12 col-lg-9 col-xl-9". These are 6 different classes. Also, what do you mean by "formatting"? These are just classes that do not format the article in any way. Their CSS style does it. If you don't import the CSS file, they have no effect.
If you want to remove part of the HTML code (for example, class="with-sidebar-first col-12 col-sm-12 col-md-12 col-lg-9 col-xl-9"), you can do it using the custom PHP code, as described in this article: Login to see this link
Login to see the code
You can also remove any tags like <div>, <p>, <strong>, etc: Login to see this link
If you want to remove an entire container with all its contents, you should use this tool: Login to see this link
2:32 pm
July 27, 2023
Any part of the article, including the part described above as the second message, can be removed using the methods described in my previous post. I don't see any other problems. If there is something wrong with the formatting of the article with your theme, I suggest to modify its CSS styles or use an alternative theme. For example, your post looks absolutely correct in all standard WordPress themes. The plugin does not render the posts in the browser. Your theme does.
Also keep in mind that some web pages may have errors in their HTML structure (e.g. a missing </div>). So these posts may look ok with some HTML layouts and may be displayed weird in others. I would suggest you to try Login to see this link. If it doesn't help, just remove all <div> elements from the syndicated posts:
3:01 pm
July 27, 2023
OK, you write this: "If you want to remove an entire container with all its contents, you should use this tool: Login to see this link"
I’m trying to remove the corresponding container using attributes div {"class": "with-sidebar-first"} in the parameter Remove outer HTML elements but after that the encoding flies in the article. I will remove this container.
As a result, the container was not removed and the encoding in the article fell off
Login to see this link
Login to see this link
What am I doing wrong?
As I mentioned above, if the imported HTML page is broken and missing some closing element, you won't be able to do anything with it using standard tools. The only way to do it is to write a regular expression for your particular case, like this:
Login to see the code
Here is a good manual on regular expressions: Login to see this link
3:32 pm
July 27, 2023
3:50 pm
July 27, 2023
Most Users Ever Online: 541
Currently Online:
9 Guest(s)
Currently Browsing this Page:
1 Guest(s)
Top Posters:
ninja321: 84
s.baryshev.aoasp: 68
Freedom: 61
Pandermos: 54
MediFormatica: 49
B8europe: 48
Member Stats:
Guest Posters: 337
Members: 2852
Moderators: 0
Admins: 1
Forum Stats:
Groups: 1
Forums: 5
Topics: 1640
Posts: 8352
Newest Members:
torontomark48, info.ckmedianetwork, contact.mybeautystar, samuelbodde, john.prush, creightonnick0Administrators: CyberSEO: 3947