The Content Extract API is an advanced tool designed to facilitate the extraction of textual content from web pages in clean and structured formats. This API is especially geared towards users who need to efficiently and accurately obtain and analyze textual data from the web. With a series of specialized endpoints, the API allows the conversion of web content into clean text and markdown formats, adapting to various data processing and analysis needs.
Main Functionalities
Clean Text Extraction: The first API endpoint focuses on providing the clean textual content of a web page. This endpoint removes unwanted elements such as ads, menus and sidebars, leaving only relevant and meaningful text. Clean text extraction is ideal for applications that require clear, unformatted content for analysis or display, such as automatic summaries, search engines or content analysis tools.
Markdown conversion: The second endpoint converts web content into markdown format. Markdown is a lightweight markup language that allows text to be structured in a simple way, facilitating its integration into applications that use this format for document generation, blog posts or content management.
Support for Different Types of Pages: The Content Extract API is designed to handle a wide variety of web pages, from static sites to dynamic pages generated by JavaScript. This ensures that users can extract content from almost any type of page, regardless of its complexity or structure.
In short, the Content Extract API offers advanced solutions for extracting and converting textual content from web pages. With its specialized clean text and markdown endpoints, it provides users with effective tools for obtaining and managing web data in useful formats adaptable to a variety of applications and needs.Its flexibility and integration capabilities make it a valuable option for any task involving web content manipulation and analysis.
This API receives a web page URL and provides the clean text or markdown format of the content extracted from that page.
Content Generation for Blogs: Convert web content into markdown format for easy integration into blogging platforms or content management systems, facilitating publishing and editing.
Data Collection for Market Research: Extract clean text from various web pages to gather data on market trends, consumer behavior or competitive analysis.
News Brief Automation: Use the text extractor to create automated news summaries by removing non-relevant elements and focusing on the main content.
Technical Documentation Creation: Convert web content into markdown to develop technical documentation or user guides that integrate into collaborative documentation systems.
Data Extraction for SEO Tools: Extract clean text from web pages to analyze content and optimize SEO strategies, identifying relevant keywords and topics.
Beside the number of API calls per month allowed, there are no other limitations.
To use this endpoint, send a request with the URL of the web page and receive the clean text extracted from the content of that page.
Extract Info - Endpoint Features
Object | Description |
---|---|
Request Body |
[Required] Json |
{"response":null}
curl --location --request POST 'https://zylalabs.com/api/5081/content+extract+api/6473/extract+info' --header 'Authorization: Bearer YOUR_API_KEY'
--data-raw '{
"url": "https://techtalkverse.com/post/software-development/spark-basics/"
}'
To use this endpoint, send a request with the URL of the web page and receive the content converted to markdown format of that page.
Exc Marktdown - Endpoint Features
Object | Description |
---|---|
Request Body |
[Required] Json |
{"response":"---\ntitle: Spark Basics\nurl: https://techtalkverse.com/post/software-development/spark-basics/\nhostname: techtalkverse.com\ndescription: Suppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\nsitename: techtalkverse.com\ndate: 2023-05-01\ncategories: ['post']\n---\n# Spark Basics\n\nSuppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\n\nNow letβs suppose there is heavy compute operation happening in each of the pods. Then there will be certain limit upto which these services can run because unlike horizontal scaling where you can have as many numbers of machines as required, there is limit for vertical scaling because you canβt have unlimited ram and cpu cores for each of the machines in a cluster. **Distributed Computing** removes this limitation of vertical scaling by distributing the processing across cluster of machines.\nNow, a group of machines alone is not powerful, you need a framework to\ncoordinate work across them. Spark does just that, managing and coordinating the execution of tasks on data across a cluster of computers. The cluster of machines that Spark will use to execute tasks is managed by a cluster manager like Sparkβs standalone cluster manager, Kubernetes, YARN, or Mesos.\n\n## Spark Basics\n\nSpark is distributed data processing engine. Distributed data processing in big data is simply series of map and reduce functions which runs across the cluster machines. Given below is python code for calculating the sum of all the even numbers from a given list with the help of map and reduce functions.\n\n```\nfrom functools import reduce\na = [1,2,3,4,5]\nres = reduce(lambda x,y: x+y, (map(lambda x: x if x%2==0 else 0, a)))\n```\n\n\nNow consider, if instead of a simple list, it is a parquet file of size in order of gigabytes. Computation with MapReduce system becomes optimized way of dealing with such problems. In this case spark will load the big parquet file into multiple worker nodes (if the file doesnβt support distributed storage then it will be first loaded into driver node and afterwards, it will get distributed across the worker nodes). Then map function will be executed for each task in each worker node and the final result will fetched with the reduce function.\n\n## Spark timeline\n\nGoogle was first to introduce large scale distributed computing solution with **MapReduce** and its own distributed file system i.e., **Google File System(GFS)**. GFS provided a blueprint for the **Hadoop File System (HDFS)**, including the MapReduce implementation as a framework for distributed computing. **Apache Hadoop** framework was developed consisting of Hadoop Common, MapReduce, HDFS, and Apache Hadoop YARN. There were various limitations with Apache Hadoop like it fell short for combining other workloads such as machine learning, streaming, or interactive SQL-like queries etc. Also the results of the reduce computations were written to a local disk for subsequent stage of operations. Then came the **Spark**. Spark provides in-memory storage for intermediate computations, making it much faster than Hadoop MapReduce. It incorporates libraries with composable APIs for\nmachine learning (MLlib), SQL for interactive queries (Spark SQL), stream processing (Structured Streaming) for interacting with real-time data, and graph processing (GraphX).\n\n## Spark Application\n\n**Spark Applications** consist of a driver process and a set of executor processes. The **driver** process runs your main() function, sits on a node in the cluster. The **executors** are responsible for actually carrying out the work that the driver assigns them. The driver and executors are simply processes, which means that they can live on the same machine or different machines.\n\nThere is a **SparkSession** object available to the user, which is the entrance point to running Spark code. When using Spark from Python or R, you donβt write explicit JVM instructions; instead, you write Python and R code that Spark translates into code that it then can run on the executor JVMs.\n**Sparkβs language APIs** make it possible for you to run Spark code using various programming languages like Scala, Java, Python, SQL and R.\nSpark has two fundamental sets of APIs: the **low-level βunstructuredβ APIs** (RDDs), and the **higher-level structured APIs** (Dataframes, Datasets).\n\n## Spark Toolsets\n\nA **DataFrame** is the most common Structured API and simply represents a table of data with rows and columns. To allow every executor to perform work in parallel, Spark breaks up the data into chunks called partitions. A **partition** is a collection of rows that sit on one physical machine in your cluster.\n\nIf a function returns a Dataframe or Dataset or Resilient Distributed Dataset (RDD) then it is a **transformation** and if it doesnβt return anything then itβs an **action**. An action instructs Spark to compute a result from a series of transformations. The simplest action is count.\n\nTransformation are of types narrow and wide. **Narrow transformations** are those for which each input partition will contribute to only one output partition. **Wide transformation** will have input partitions contributing to many output partitions.\n\nSparks performs a **lazy evaluation** which means that Spark will wait until the very last moment to execute the graph of computation instructions. This provides immense benefits because Spark can optimize the entire data flow from end to end.\n\n## Spark-submit\n\n## References\n\n- https://spark.apache.org/docs/latest/\n- spark: The Definitive Guide by Bill Chambers and Matei Zaharia"}
curl --location --request POST 'https://zylalabs.com/api/5081/content+extract+api/6474/exc+marktdown' --header 'Authorization: Bearer YOUR_API_KEY'
--data-raw '{
"url": "https://techtalkverse.com/post/software-development/spark-basics/"
}'
Header | Description |
---|---|
Authorization
|
[Required] Should be Bearer access_key . See "Your API Access Key" above when you are subscribed. |
No long term commitments. One click upgrade/downgrade or cancellation. No questions asked.
To use this API, send a web page URL to the corresponding endpoints and receive the extracted content in clean or markdown format.
The Content Extract API extracts and converts web page content into clean text or markdown, facilitating web data analysis and integration.
There are different plans suits everyone including a free trial for small amount of requests, but itβs rate is limit to prevent abuse of the service.
Zyla provides a wide range of integration methods for almost all programming languages. You can use these codes to integrate with your project as you need.
The API returns detailed information about the age and history of a domain, including years, months and days since its creation, as well as expiration and update dates.
Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.
Prices are listed in USD (United States Dollar), EUR (Euro), CAD (Canadian Dollar), AUD (Australian Dollar), and GBP (British Pound). We accept all major debit and credit cards. Our payment system uses the latest security technology and is powered by Stripe, one of the worldβs most reliable payment companies. If you have any trouble paying by card, just contact us at [email protected]
Additionally, if you already have an active subscription in any of these currencies (USD, EUR, CAD, AUD, GBP), that currency will remain for subsequent subscriptions. You can change the currency at any time as long as you don't have any active subscriptions.
The local currency shown on the pricing page is based on the country of your IP address and is provided for reference only. The actual prices are in USD (United States Dollar). When you make a payment, the charge will appear on your card statement in USD, even if you see the equivalent amount in your local currency on our website. This means you cannot pay directly with your local currency.
Occasionally, a bank may decline the charge due to its fraud protection settings. We suggest reaching out to your bank initially to check if they are blocking our charges. Also, you can access the Billing Portal and change the card associated to make the payment. If these does not work and you need further assistance, please contact our team at [email protected]
Prices are determined by a recurring monthly or yearly subscription, depending on the chosen plan.
API calls are deducted from your plan based on successful requests. Each plan comes with a specific number of calls that you can make per month. Only successful calls, indicated by a Status 200 response, will be counted against your total. This ensures that failed or incomplete requests do not impact your monthly quota.
Zyla API Hub works on a recurring monthly subscription system. Your billing cycle will start the day you purchase one of the paid plans, and it will renew the same day of the next month. So be aware to cancel your subscription beforehand if you want to avoid future charges.
To upgrade your current subscription plan, simply go to the pricing page of the API and select the plan you want to upgrade to. The upgrade will be instant, allowing you to immediately enjoy the features of the new plan. Please note that any remaining calls from your previous plan will not be carried over to the new plan, so be aware of this when upgrading. You will be charged the full amount of the new plan.
To check how many API calls you have left for the current month, look at the βX-Zyla-API-Calls-Monthly-Remainingβ header. For example, if your plan allows 1000 requests per month and you've used 100, this header will show 900.
To see the maximum number of API requests your plan allows, check the βX-Zyla-RateLimit-Limitβ header. For instance, if your plan includes 1000 requests per month, this header will display 1000.
The βX-Zyla-RateLimit-Resetβ header shows the number of seconds until your rate limit resets. This tells you when your request count will start fresh. For example, if it displays 3600, it means 3600 seconds are left until the limit resets.
Yes, you can cancel your plan anytime by going to your account and selecting the cancellation option on the Billing page. Please note that upgrades, downgrades, and cancellations take effect immediately. Additionally, upon cancellation, you will no longer have access to the service, even if you have remaining calls left in your quota.
You can contact us through our chat channel to receive immediate assistance. We are always online from 8 am to 5 pm (EST). If you reach us after that time, we will get back to you as soon as possible. Additionally, you can contact us via email at [email protected]
To let you experience our APIs without any commitment, we offer a 7-day free trial that allows you to make API calls at no cost during this period. Please note that you can only use this trial once, so make sure to use it with the API that interests you the most. Most of our APIs provide a free trial, but some may not support it.
After 7 days, you will be charged the full amount for the plan you were subscribed to during the trial. Therefore, itβs important to cancel before the trial period ends. Refund requests for forgetting to cancel on time are not accepted.
When you subscribe to an API trial, you can make only 25% of the calls allowed by that plan. For example, if the API plan offers 1000 calls, you can make only 250 during the trial. To access the full number of calls offered by the plan, you will need to subscribe to the full plan.
Service Level:
100%
Response Time:
1,500ms
Service Level:
100%
Response Time:
1,217ms
Service Level:
100%
Response Time:
1,583ms
Service Level:
100%
Response Time:
476ms
Service Level:
100%
Response Time:
762ms
Service Level:
100%
Response Time:
2,560ms
Service Level:
100%
Response Time:
6,185ms
Service Level:
100%
Response Time:
1,419ms
Service Level:
100%
Response Time:
285ms
Service Level:
100%
Response Time:
2,381ms
Service Level:
100%
Response Time:
3,905ms
Service Level:
100%
Response Time:
4,172ms
Service Level:
100%
Response Time:
2,280ms
Service Level:
100%
Response Time:
1,316ms
Service Level:
100%
Response Time:
14,614ms
Service Level:
100%
Response Time:
1,187ms
Service Level:
100%
Response Time:
557ms