Text Purify API

Text Purify API extracts clean text from web pages by removing ads and irrelevant content, facilitating automated reading and processing.

About the API:  

Text Purify API is designed to transform the way you interact with web content, providing a robust and efficient solution for extracting relevant text from articles and web pages. In a world flooded with information, this API becomes an essential tool for users looking to get clean, meaningful data without the clutter of ads, menus and other unwanted elements.
The Text Purify API is a cloud-based service that allows users to extract the core content of web articles with high accuracy. This API is ideal for applications that require the collection and analysis of content from news, blogs, research and more. It uses advanced natural language processing (NLP) and machine learning techniques to identify and extract relevant text, ensuring that only valuable information is delivered to the user. The API is equipped with sophisticated algorithms that recognise and extract the main body text of a web page. This includes identifying the main text of articles and automatically excluding ads, menus, sidebars and other non-essential elements.

It can handle a wide variety of web page formats and layout styles, ensuring that content extraction is effective regardless of website design. The API is designed to work with content in different languages, making it versatile for global applications. A simple and well-documented application programming interface (API) is provided, making it easy to integrate with your existing applications and workflows. The API provides fast responses, which is crucial for real-time applications and large-scale data analysis. This enables a smooth and efficient user experience.

 

What this API receives and what your API provides (input/output)?

The Text Purify API receives a URL and optional settings, and provides clean text of the article, excluding ads, along with metadata such as title and author.

 

What are the most common uses cases of this API?

  1. Uses the API to extract the main text of articles from multiple news sources and present them in a unified platform, improving the user experience by avoiding ads and irrelevant content.

    Facilitates the collection of information from academic and research articles, allowing researchers to extract the essential content for analysis and review without the distractions of advertising.

    Create applications that generate concise summaries of web articles by extracting only the main, relevant content, offering users more digestible versions of long texts.

    Enables content curators to extract and present only the most relevant text from articles and publications, ensuring their audiences receive high quality information without distracting elements.

    Extracts relevant content from online reviews and articles to perform sentiment analysis, helping companies better understand public perception of their products or services.

     

Are there any limitations to your plans?

Basic Plan: 50 requests per minute.

Pro Plan: 100 requests per minute.

Pro Plus Plan: 240 requests per minute.

Premium Plan: 360 requests per minute.

API Documentation

Endpoints


To use this endpoint, provide the URL of the article to extract its main content, cleaning out advertisements and non-relevant elements.

 

word_per_minute (optional): this parameter influences the calculation of "time to read." By default, it's set to 300 words per minute. Adjust this value as needed to match your desired reading speed estimation

desc_truncate_len (optional): controls the maximum length of the generated description. The default is 210 characters. If the extracted description exceeds this limit, it will be truncated to ensure conciseness

desc_len_min (optional): sets the minimum required character count for the description. The default is 180 characters. If the extracted description falls below this threshold, the API will return "null"

content_len_min (optional): defines the minimum character count requirement for the extracted content. The default is 200 characters. If the content falls below this minimum, the API will return "null"



                                                                            
GET https://zylalabs.com/api/4949/text+purify+api/6229/article+extract
                                                                            
                                                                        

Article Extract - Endpoint Features

Object Description
url [Required]
word_per_minute [Optional]
desc_truncate_len [Optional]
desc_len_min [Optional]
content_len_min [Optional]
Test Endpoint

API EXAMPLE RESPONSE

       
                                                                                                        
                                                                                                                                                                                                                            {"error":0,"message":"Article extraction success","data":{"url":"https://ellzey.house.gov/2024/10/congressman-jake-ellzey-s-statement-on-fema-aid","title":"Congressman Jake Ellzey's Statement on FEMA Aid","description":"The Department of Homeland Security, under Secretary Mayorkas, has taken actions that make illegal immigration more attractive by reallocating funds that should be prioritized for disaster relief efforts. At...","links":["https://ellzey.house.gov/2024/10/congressman-jake-ellzey-s-statement-on-fema-aid"],"image":"https://ellzey.house.gov/vendor/_accounts/jakeellzey/_skins/062422/images/social_card.png","content":"<div>\n<article>\n<a></a>\n<div><p>The Department of Homeland Security, under Secretary Mayorkas, has taken actions that make illegal immigration more attractive by reallocating funds that should be prioritized for disaster relief efforts. At a time when FEMA is warning that they do not have enough funding to cover the rest of the hurricane season, money has been funneled into programs that provide aid to noncitizen migrants.</p>\r\n<p>Over $1 billion has been directed to programs like the Shelter and Services Program (SSP) and the Emergency Food and Shelter Program, which have been repurposed to support illegal immigrants. With 150,000 households already relying on FEMA aid after devastating hurricanes, this is a gross misallocation of resources.</p>\r\n<p>The current Administration needs to stop diverting taxpayer money to initiatives that encourage illegal immigration and instead focus on supporting the American people and their immediate needs during natural disasters.</p>\r\n<p>Here is what we know: </p>\r\n<ul>\r\n<li>Homeland Security Secretary Alejandro Mayorkas said Federal Emergency Management Agency (FEMA) can meet immediate needs but does not have enough funds for the rest of Hurricane season.</li>\r\n<ul>\r\n<li>Congress recently granted $20 Billion for FEMA’s disaster relief fund as part of the September continuing resolution.</li>\r\n<li>The Biden Administration has granted North Carolina additional aid in the recovery effort with a 100 percent federal cost share for debris removal and emergency protective measures for six months.</li>\r\n<li>150,000 households have registered for FEMA aid.</li>\r\n</ul>\r\n<li>The Shelter and Services Program (SSP) administered by FEMA provides financial support to non-federal agencies to provide humanitarian services to “noncitizen migrants.”</li>\r\n<ul>\r\n<li>FEMA, on their website, said they have funneled at least $1 billion into the program between FY23 and FY24.</li>\r\n<li>New York City’s Department of Homeless Services has given $4,000 in grants to 150 families to help illegal immigrants settle into permanent homes.</li>\r\n<li>The Emergency Food and Shelter Program, also under FEMA, was repurposed into a fund for Illegal immigrants. Many of these funds went to Catholic Charities on the border, totaling $13,937,331 in 2023.</li>\r\n</ul>\r\n</ul>\r\n<ul>\r\n<li>Secretary Mayorkas’ response is that SSP is a separate appropriated account from disaster relief and is not associated with those funding streams.</li>\r\n<ul>\r\n<li>On FEMA’s website, they claim, “No money is being diverted from disaster response needs. FEMA’s disaster response efforts and individual assistance are funded through the Disaster Relief Fund, which is a dedicated fund for disaster efforts. Disaster Relief Fund money has not been diverted to other, non-disaster related efforts.”</li>\r\n<li>The December 2022 consolidated funding bill authorizing the split-off program for spending on migrants vaguely described the purpose as for “providing shelter and other services to families and individuals encountered by the Department of Homeland Security.”</li>\r\n</ul>\r\n</ul>\n<p>######</p></div>\n</article>\n</div>","author":"@RepEllzey","favicon":"https://ellzey.house.gov/vendor/_accounts/jakeellzey/_skins/062422/images/favicon.ico","source":"ellzey.house.gov","published":"2024-10-07T04:00:00Z","ttr":86,"type":"article"}}
                                                                                                                                                                                                                    
                                                                                                    

Article Extract - CODE SNIPPETS


curl --location --request GET 'https://zylalabs.com/api/4949/text+purify+api/6229/article+extract?url=https://css-tricks.com/empathetic-animation/&word_per_minute=300&desc_truncate_len=210&desc_len_min=180&content_len_min=200' --header 'Authorization: Bearer YOUR_API_KEY' 


    

To use this endpoint, it provides the URL of the article to extract its main content through a proxy, facilitating the extraction of sites with access restrictions.

This additional endpoint can be helpful for extracting articles from websites that restrict access based on user geography or session.

When you call this endpoint, the extractor engine will randomly select a proxy agent from our pool, then attempt to load the target webpage through the chosen proxy.

Due to the nature of proxy servers, loading times may vary depending on the selected proxy's location and performance.

 



                                                                            
GET https://zylalabs.com/api/4949/text+purify+api/6230/article+proxy+extract
                                                                            
                                                                        

Article Proxy Extract - Endpoint Features

Object Description
url [Required]
word_per_minute [Optional]
desc_truncate_len [Optional]
desc_len_min [Optional]
content_len_min [Optional]
Test Endpoint

API EXAMPLE RESPONSE

       
                                                                                                        
                                                                                                                                                                                                                            {"error":0,"message":"Article extraction success","data":{"url":"https://cryptobriefing.com/fidelity-ethereum-etf-dtcc-listing/","title":"Fidelity's Ethereum spot ETF listed on DTCC under ticker $FETH","description":"Fidelity's spot Ethereum fund is now listed on DTCC under ticker $FETH following SEC's approval of multiple Ethereum ETFs.","links":["https://cryptobriefing.com/fidelity-ethereum-etf-dtcc-listing/"],"image":"https://static.cryptobriefing.com/wp-content/uploads/2024/05/29232455/img-HBnmOBf0yYWOnnbZiut1I8BO-800x457.jpg","content":"<div>\n            <section>\n            <h2>SEC's approval process for Ethereum ETFs underway, trading awaits S-1 filings.</h2>\n        </section>\n            <section>\n            <picture>\n                <source media=\"(min-width: 850px)\" srcset=\"https://static.cryptobriefing.com/wp-content/uploads/2024/05/29232455/img-HBnmOBf0yYWOnnbZiut1I8BO-800x457.jpg\"></source>\n                <img src=\"https://static.cryptobriefing.com/wp-content/uploads/2024/05/29232455/img-HBnmOBf0yYWOnnbZiut1I8BO-400x228.jpg\" alt=\"Fidelity's spot Ethereum ETF listed on DTCC under ticker $FETH\" title=\"Fidelity’s spot Ethereum ETF listed on DTCC under ticker $FETH\" />\n            </picture>\n        </section>\n    <section>\n        <p>Fidelity’s Ethereum spot ETF has been listed on the Depository Trust and Clearing Corporation (DTCC) under the ticker symbol $FETH. This development comes on the heels of the US Securities and Exchange Commission’s (SEC) <a href=\"https://cryptobriefing.com/sec-ethereum-etf-approval/\" target=\"_blank\">approval of spot Ethereum exchange-traded funds</a> (ETFs) on May 23.</p><figure><img src=\"https://static.cryptobriefing.com/wp-content/uploads/2024/05/29225708/Fidelity-Ethereum-ETF-on-DTCC.jpg\" /><figcaption>Fidelity’s Ethereum spot ETF is now listed on <a href=\"https://www.dtcc.com/products/cs/exchange_traded_funds_plain_new.php\" target=\"_blank\">DTCC</a></figcaption></figure><p>BlackRock’s Ethereum fund, iShares Ethereum Trust, is listed on the DTCC <a href=\"https://cryptobriefing.com/blackrock-ethereum-etf-dtcc/\" target=\"_blank\">under ticker $ETHA</a>. VanEck’s Ethereum ETF is listed <a href=\"https://cryptobriefing.com/vaneck-dtcc-ethereum-etf-listing/\" target=\"_blank\">under ticker $ETHV</a> and Franklin Templeton’s <a href=\"https://cryptobriefing.com/franklin-templeton-ethereum-etf-dtcc-listing/\" target=\"_blank\">under ticker $EZET</a>.</p><p>The SEC’s acceptance of the 19b-4 forms for the spot Ethereum ETFs marks a major step, although the commencement of trading awaits the approval of each ETF’s S-1 filing.</p><p>Discussions between the SEC and ETF issuers about the S-1 forms are reportedly <a href=\"https://cryptobriefing.com/sec-engages-ethereum-etf-issuers-s-1-forms/\" target=\"_blank\">underway</a>. However, the timeframe for the trading approval is uncertain, with projections ranging from weeks to months.</p><p>VanEck was among the first to submit an amended S-1 form on May 23, with BlackRock following suit with an <a href=\"https://cryptobriefing.com/blackrock-ethereum-etf-launch/\" target=\"_blank\">updated S-1 filing</a> today. The S-1 form serves as an initial registration document that must be filed with the SEC before a security can be offered to the public.</p>\n                                </section>\n    <section>\n                    <a href=\"https://cryptobriefing.com/disclaimer/\" target=\"_blank\">\n                Disclaimer            </a>\n    </section>\n</div>","author":"@crypto_briefing","favicon":"https://static.cryptobriefing.com/wp-content/uploads/2020/02/02093517/ios-144.png","source":"cryptobriefing.com","published":"2024-05-30T17:14:47+00:00","ttr":40,"type":"article"}}
                                                                                                                                                                                                                    
                                                                                                    

Article Proxy Extract - CODE SNIPPETS


curl --location --request GET 'https://zylalabs.com/api/4949/text+purify+api/6230/article+proxy+extract?url=https://cryptobriefing.com/fidelity-ethereum-etf-dtcc-listing/&word_per_minute=300&desc_truncate_len=210&desc_len_min=180&content_len_min=200' --header 'Authorization: Bearer YOUR_API_KEY' 


    

API Access Key & Authentication

After signing up, every developer is assigned a personal API access key, a unique combination of letters and digits provided to access to our API endpoint. To authenticate with the Text Purify API REST API, simply include your bearer token in the Authorization header.
Headers
Header Description
Authorization [Required] Should be Bearer access_key. See "Your API Access Key" above when you are subscribed.

Simple Transparent Pricing

No long-term commitment. Upgrade, downgrade, or cancel anytime. Free Trial includes up to 50 requests.

🚀 Enterprise

Starts at
$ 10,000/Year


  • Custom Volume
  • Specialized Customer Support
  • Real-Time API Monitoring

Customer favorite features

  • ✔︎ Only Pay for Successful Requests
  • ✔︎ Free 7-Day Trial
  • ✔︎ Multi-Language Support
  • ✔︎ One API Key, All APIs.
  • ✔︎ Intuitive Dashboard
  • ✔︎ Comprehensive Error Handling
  • ✔︎ Developer-Friendly Docs
  • ✔︎ Postman Integration
  • ✔︎ Secure HTTPS Connections
  • ✔︎ Reliable Uptime

Text Purify API FAQs

Use the API by providing a URL to extract the main content of the article. Set optional parameters to customise the extraction and formatting.

The Text Purify API cleans and extracts relevant text from web pages, removing ads and unwanted content, providing only the main text of the article.

There are different plans suits everyone including a free trial for small amount of requests, but it’s rate is limit to prevent abuse of the service.

Zyla provides a wide range of integration methods for almost all programming languages. You can use these codes to integrate with your project as you need.

The API returns detailed information about the age and history of a domain, including years, months and days since its creation, as well as expiration and update dates.

The GET Article Extract endpoint returns the main content of an article, including the title, description, content, and metadata like the URL and image. The GET Article Proxy Extract endpoint provides similar data but through a proxy for restricted sites.

Key fields in the response include "url" (the article's link), "title" (the article's title), "description" (a brief summary), "content" (the main text), and "image" (a relevant image URL).

The response data is structured in JSON format, with an "error" field indicating success or failure, a "message" field for status updates, and a "data" object containing the extracted article details.

Parameters include "word_per_minute" for reading speed, "desc_truncate_len" for maximum description length, "desc_len_min" for minimum description length, and "content_len_min" for minimum content length.

Users can customize requests by adjusting optional parameters to control reading speed, description length, and content length, allowing for tailored output based on specific needs.

Each endpoint provides the main article text, title, description, image, and links, enabling users to access comprehensive content without ads or irrelevant elements.

Data accuracy is maintained through advanced natural language processing and machine learning techniques that identify and extract relevant content while filtering out ads and non-essential elements.

Typical use cases include content curation, academic research, sentiment analysis, and creating summaries of articles, allowing users to focus on essential information without distractions.

General FAQs

Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.

Prices are listed in USD (United States Dollar), EUR (Euro), CAD (Canadian Dollar), AUD (Australian Dollar), and GBP (British Pound). We accept all major debit and credit cards. Our payment system uses the latest security technology and is powered by Stripe, one of the world’s most reliable payment companies. If you have any trouble paying by card, just contact us at [email protected]

Additionally, if you already have an active subscription in any of these currencies (USD, EUR, CAD, AUD, GBP), that currency will remain for subsequent subscriptions. You can change the currency at any time as long as you don't have any active subscriptions.

The local currency shown on the pricing page is based on the country of your IP address and is provided for reference only. The actual prices are in USD (United States Dollar). When you make a payment, the charge will appear on your card statement in USD, even if you see the equivalent amount in your local currency on our website. This means you cannot pay directly with your local currency.

Occasionally, a bank may decline the charge due to its fraud protection settings. We suggest reaching out to your bank initially to check if they are blocking our charges. Also, you can access the Billing Portal and change the card associated to make the payment. If these does not work and you need further assistance, please contact our team at [email protected]

Prices are determined by a recurring monthly or yearly subscription, depending on the chosen plan.

API calls are deducted from your plan based on successful requests. Each plan comes with a specific number of calls that you can make per month. Only successful calls, indicated by a Status 200 response, will be counted against your total. This ensures that failed or incomplete requests do not impact your monthly quota.

Zyla API Hub works on a recurring monthly subscription system. Your billing cycle will start the day you purchase one of the paid plans, and it will renew the same day of the next month. So be aware to cancel your subscription beforehand if you want to avoid future charges.

To upgrade your current subscription plan, simply go to the pricing page of the API and select the plan you want to upgrade to. The upgrade will be instant, allowing you to immediately enjoy the features of the new plan. Please note that any remaining calls from your previous plan will not be carried over to the new plan, so be aware of this when upgrading. You will be charged the full amount of the new plan.

To check how many API calls you have left for the current month, refer to the ‘X-Zyla-API-Calls-Monthly-Remaining’ field in the response header. For example, if your plan allows 1,000 requests per month and you've used 100, this field in the response header will indicate 900 remaining calls.

To see the maximum number of API requests your plan allows, check the ‘X-Zyla-RateLimit-Limit’ response header. For instance, if your plan includes 1,000 requests per month, this header will display 1,000.

The ‘X-Zyla-RateLimit-Reset’ header shows the number of seconds until your rate limit resets. This tells you when your request count will start fresh. For example, if it displays 3,600, it means 3,600 seconds are left until the limit resets.

Yes, you can cancel your plan anytime by going to your account and selecting the cancellation option on the Billing page. Please note that upgrades, downgrades, and cancellations take effect immediately. Additionally, upon cancellation, you will no longer have access to the service, even if you have remaining calls left in your quota.

You can contact us through our chat channel to receive immediate assistance. We are always online from 8 am to 5 pm (EST). If you reach us after that time, we will get back to you as soon as possible. Additionally, you can contact us via email at [email protected]

To give you the opportunity to experience our APIs without any commitment, we offer a 7-day free trial that allows you to make up to 50 API calls at no cost. This trial can be used only once, so we recommend applying it to the API that interests you the most. While most of our APIs offer a free trial, some may not. The trial concludes after 7 days or once you've made 50 requests, whichever occurs first. If you reach the 50 request limit during the trial, you will need to "Start Your Paid Plan" to continue making requests. You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab. Alternatively, if you don't cancel your subscription before the 7th day, your free trial will end, and your plan will automatically be billed, granting you access to all the API calls specified in your plan. Please keep this in mind to avoid unwanted charges.

After 7 days, you will be charged the full amount for the plan you were subscribed to during the trial. Therefore, it’s important to cancel before the trial period ends. Refund requests for forgetting to cancel on time are not accepted.

When you subscribe to an API free trial, you can make up to 50 API calls. If you wish to make additional API calls beyond this limit, the API will prompt you to perform an "Start Your Paid Plan." You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab.

Payout Orders are processed between the 20th and the 30th of each month. If you submit your request before the 20th, your payment will be processed within this timeframe.


Related APIs


You might also like