La API de Extracción de Contenido es una herramienta avanzada diseñada para facilitar la extracción de contenido textual de páginas web en formatos limpios y estructurados. Esta API está especialmente orientada a usuarios que necesitan obtener y analizar datos textuales de la web de manera eficiente y precisa. Con una serie de endpoints especializados, la API permite la conversión de contenido web en formatos de texto limpio y markdown, adaptándose a diversas necesidades de procesamiento y análisis de datos.
Funcionalidades Principales
Extracción de Texto Limpio: El primer endpoint de la API se centra en proporcionar el contenido textual limpio de una página web. Este endpoint elimina elementos no deseados como anuncios, menús y barras laterales, dejando solo texto relevante y significativo. La extracción de texto limpio es ideal para aplicaciones que requieren contenido claro y sin formato para análisis o visualización, como resúmenes automáticos, motores de búsqueda o herramientas de análisis de contenido.
Conversión a Markdown: El segundo endpoint convierte el contenido web en formato markdown. Markdown es un lenguaje de marcado ligero que permite estructurar el texto de manera sencilla, facilitando su integración en aplicaciones que utilizan este formato para la generación de documentos, publicaciones de blog o gestión de contenido.
Soporte para Diferentes Tipos de Páginas: La API de Extracción de Contenido está diseñada para manejar una amplia variedad de páginas web, desde sitios estáticos hasta páginas dinámicas generadas por JavaScript. Esto asegura que los usuarios puedan extraer contenido de casi cualquier tipo de página, independientemente de su complejidad o estructura.
En resumen, la API de Extracción de Contenido ofrece soluciones avanzadas para extraer y convertir contenido textual de páginas web. Con sus endpoints especializados en texto limpio y markdown, proporciona a los usuarios herramientas efectivas para obtener y gestionar datos web en formatos útiles adaptables a una variedad de aplicaciones y necesidades. Su flexibilidad y capacidades de integración la convierten en una opción valiosa para cualquier tarea que implique manipulación y análisis de contenido web.
Esta API recibe una URL de página web y proporciona el texto limpio o el formato markdown del contenido extraído de esa página.
Generación de Contenido para Blogs: Convertir contenido web en formato markdown para una fácil integración en plataformas de blogging o sistemas de gestión de contenido, facilitando la publicación y edición.
Recolección de Datos para Investigación de Mercado: Extraer texto limpio de varias páginas web para recopilar datos sobre tendencias de mercado, comportamiento del consumidor o análisis competitivo.
Automatización de Resúmenes de Noticias: Utilizar el extractor de texto para crear resúmenes automáticos de noticias eliminando elementos no relevantes y centrándose en el contenido principal.
Creación de Documentación Técnica: Convertir contenido web en markdown para desarrollar documentación técnica o guías de usuario que se integren en sistemas de documentación colaborativa.
Extracción de Datos para Herramientas de SEO: Extraer texto limpio de páginas web para analizar contenido y optimizar estrategias de SEO, identificando palabras clave y temas relevantes.
Además del número de llamadas a la API permitidas por mes, no hay otras limitaciones.
Para usar este punto final, envíe una solicitud con la URL de la página web y reciba el texto limpio extraído del contenido de esa página.
Extraer información. - Características del Endpoint
Objeto | Descripción |
---|---|
Cuerpo de la Solicitud |
[Requerido] Json |
{"response":"Spark Basics\nSuppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\nNow let’s suppose there is heavy compute operation happening in each of the pods. Then there will be certain limit upto which these services can run because unlike horizontal scaling where you can have as many numbers of machines as required, there is limit for vertical scaling because you can’t have unlimited ram and cpu cores for each of the machines in a cluster. Distributed Computing removes this limitation of vertical scaling by distributing the processing across cluster of machines. Now, a group of machines alone is not powerful, you need a framework to coordinate work across them. Spark does just that, managing and coordinating the execution of tasks on data across a cluster of computers. The cluster of machines that Spark will use to execute tasks is managed by a cluster manager like Spark’s standalone cluster manager, Kubernetes, YARN, or Mesos.\nSpark Basics\nSpark is distributed data processing engine. Distributed data processing in big data is simply series of map and reduce functions which runs across the cluster machines. Given below is python code for calculating the sum of all the even numbers from a given list with the help of map and reduce functions.\nfrom functools import reduce\na = [1,2,3,4,5]\nres = reduce(lambda x,y: x+y, (map(lambda x: x if x%2==0 else 0, a)))\nNow consider, if instead of a simple list, it is a parquet file of size in order of gigabytes. Computation with MapReduce system becomes optimized way of dealing with such problems. In this case spark will load the big parquet file into multiple worker nodes (if the file doesn’t support distributed storage then it will be first loaded into driver node and afterwards, it will get distributed across the worker nodes). Then map function will be executed for each task in each worker node and the final result will fetched with the reduce function.\nSpark timeline\nGoogle was first to introduce large scale distributed computing solution with MapReduce and its own distributed file system i.e., Google File System(GFS). GFS provided a blueprint for the Hadoop File System (HDFS), including the MapReduce implementation as a framework for distributed computing. Apache Hadoop framework was developed consisting of Hadoop Common, MapReduce, HDFS, and Apache Hadoop YARN. There were various limitations with Apache Hadoop like it fell short for combining other workloads such as machine learning, streaming, or interactive SQL-like queries etc. Also the results of the reduce computations were written to a local disk for subsequent stage of operations. Then came the Spark. Spark provides in-memory storage for intermediate computations, making it much faster than Hadoop MapReduce. It incorporates libraries with composable APIs for machine learning (MLlib), SQL for interactive queries (Spark SQL), stream processing (Structured Streaming) for interacting with real-time data, and graph processing (GraphX).\nSpark Application\nSpark Applications consist of a driver process and a set of executor processes. The driver process runs your main() function, sits on a node in the cluster. The executors are responsible for actually carrying out the work that the driver assigns them. The driver and executors are simply processes, which means that they can live on the same machine or different machines.\nThere is a SparkSession object available to the user, which is the entrance point to running Spark code. When using Spark from Python or R, you don’t write explicit JVM instructions; instead, you write Python and R code that Spark translates into code that it then can run on the executor JVMs.\nSpark’s language APIs make it possible for you to run Spark code using various programming languages like Scala, Java, Python, SQL and R.\nSpark has two fundamental sets of APIs: the low-level “unstructured” APIs (RDDs), and the higher-level structured APIs (Dataframes, Datasets).\nSpark Toolsets\nA DataFrame is the most common Structured API and simply represents a table of data with rows and columns. To allow every executor to perform work in parallel, Spark breaks up the data into chunks called partitions. A partition is a collection of rows that sit on one physical machine in your cluster.\nIf a function returns a Dataframe or Dataset or Resilient Distributed Dataset (RDD) then it is a transformation and if it doesn’t return anything then it’s an action. An action instructs Spark to compute a result from a series of transformations. The simplest action is count.\nTransformation are of types narrow and wide. Narrow transformations are those for which each input partition will contribute to only one output partition. Wide transformation will have input partitions contributing to many output partitions.\nSparks performs a lazy evaluation which means that Spark will wait until the very last moment to execute the graph of computation instructions. This provides immense benefits because Spark can optimize the entire data flow from end to end.\nSpark-submit\nReferences\n- https://spark.apache.org/docs/latest/\n- spark: The Definitive Guide by Bill Chambers and Matei Zaharia"}
curl --location --request POST 'https://zylalabs.com/api/5081/content+extract+api/6473/extract+info' --header 'Authorization: Bearer YOUR_API_KEY'
--data-raw '{
"url": "https://techtalkverse.com/post/software-development/spark-basics/"
}'
Para usar este endpoint, envíe una solicitud con la URL de la página web y reciba el contenido convertido al formato markdown de esa página.
Error - Características del Endpoint
Objeto | Descripción |
---|---|
Cuerpo de la Solicitud |
[Requerido] Json |
{"response":"---\ntitle: Spark Basics\nurl: https://techtalkverse.com/post/software-development/spark-basics/\nhostname: techtalkverse.com\ndescription: Suppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\nsitename: techtalkverse.com\ndate: 2023-05-01\ncategories: ['post']\n---\n# Spark Basics\n\nSuppose we have a web application hosted in an application orchestrator like kubernetes. If load in that particular application increases then we can horizontally scale our application simply by increasing the number of pods in our service.\n\nNow let’s suppose there is heavy compute operation happening in each of the pods. Then there will be certain limit upto which these services can run because unlike horizontal scaling where you can have as many numbers of machines as required, there is limit for vertical scaling because you can’t have unlimited ram and cpu cores for each of the machines in a cluster. **Distributed Computing** removes this limitation of vertical scaling by distributing the processing across cluster of machines.\nNow, a group of machines alone is not powerful, you need a framework to\ncoordinate work across them. Spark does just that, managing and coordinating the execution of tasks on data across a cluster of computers. The cluster of machines that Spark will use to execute tasks is managed by a cluster manager like Spark’s standalone cluster manager, Kubernetes, YARN, or Mesos.\n\n## Spark Basics\n\nSpark is distributed data processing engine. Distributed data processing in big data is simply series of map and reduce functions which runs across the cluster machines. Given below is python code for calculating the sum of all the even numbers from a given list with the help of map and reduce functions.\n\n```\nfrom functools import reduce\na = [1,2,3,4,5]\nres = reduce(lambda x,y: x+y, (map(lambda x: x if x%2==0 else 0, a)))\n```\n\n\nNow consider, if instead of a simple list, it is a parquet file of size in order of gigabytes. Computation with MapReduce system becomes optimized way of dealing with such problems. In this case spark will load the big parquet file into multiple worker nodes (if the file doesn’t support distributed storage then it will be first loaded into driver node and afterwards, it will get distributed across the worker nodes). Then map function will be executed for each task in each worker node and the final result will fetched with the reduce function.\n\n## Spark timeline\n\nGoogle was first to introduce large scale distributed computing solution with **MapReduce** and its own distributed file system i.e., **Google File System(GFS)**. GFS provided a blueprint for the **Hadoop File System (HDFS)**, including the MapReduce implementation as a framework for distributed computing. **Apache Hadoop** framework was developed consisting of Hadoop Common, MapReduce, HDFS, and Apache Hadoop YARN. There were various limitations with Apache Hadoop like it fell short for combining other workloads such as machine learning, streaming, or interactive SQL-like queries etc. Also the results of the reduce computations were written to a local disk for subsequent stage of operations. Then came the **Spark**. Spark provides in-memory storage for intermediate computations, making it much faster than Hadoop MapReduce. It incorporates libraries with composable APIs for\nmachine learning (MLlib), SQL for interactive queries (Spark SQL), stream processing (Structured Streaming) for interacting with real-time data, and graph processing (GraphX).\n\n## Spark Application\n\n**Spark Applications** consist of a driver process and a set of executor processes. The **driver** process runs your main() function, sits on a node in the cluster. The **executors** are responsible for actually carrying out the work that the driver assigns them. The driver and executors are simply processes, which means that they can live on the same machine or different machines.\n\nThere is a **SparkSession** object available to the user, which is the entrance point to running Spark code. When using Spark from Python or R, you don’t write explicit JVM instructions; instead, you write Python and R code that Spark translates into code that it then can run on the executor JVMs.\n**Spark’s language APIs** make it possible for you to run Spark code using various programming languages like Scala, Java, Python, SQL and R.\nSpark has two fundamental sets of APIs: the **low-level “unstructured” APIs** (RDDs), and the **higher-level structured APIs** (Dataframes, Datasets).\n\n## Spark Toolsets\n\nA **DataFrame** is the most common Structured API and simply represents a table of data with rows and columns. To allow every executor to perform work in parallel, Spark breaks up the data into chunks called partitions. A **partition** is a collection of rows that sit on one physical machine in your cluster.\n\nIf a function returns a Dataframe or Dataset or Resilient Distributed Dataset (RDD) then it is a **transformation** and if it doesn’t return anything then it’s an **action**. An action instructs Spark to compute a result from a series of transformations. The simplest action is count.\n\nTransformation are of types narrow and wide. **Narrow transformations** are those for which each input partition will contribute to only one output partition. **Wide transformation** will have input partitions contributing to many output partitions.\n\nSparks performs a **lazy evaluation** which means that Spark will wait until the very last moment to execute the graph of computation instructions. This provides immense benefits because Spark can optimize the entire data flow from end to end.\n\n## Spark-submit\n\n## References\n\n- https://spark.apache.org/docs/latest/\n- spark: The Definitive Guide by Bill Chambers and Matei Zaharia"}
curl --location --request POST 'https://zylalabs.com/api/5081/content+extract+api/6474/exc+marktdown' --header 'Authorization: Bearer YOUR_API_KEY'
--data-raw '{
"url": "https://techtalkverse.com/post/software-development/spark-basics/"
}'
Encabezado | Descripción |
---|---|
Autorización
|
[Requerido] Debería ser Bearer access_key . Consulta "Tu Clave de Acceso a la API" arriba cuando estés suscrito. |
Sin compromiso a largo plazo. Mejora, reduce o cancela en cualquier momento. La Prueba Gratuita incluye hasta 50 solicitudes.
Para utilizar esta API, envíe una URL de una página web a los endpoints correspondientes y reciba el contenido extraído en formato limpio o markdown.
La API de Extracción de Contenido extrae y convierte el contenido de páginas web en texto limpio o markdown, facilitando el análisis e integración de datos web.
Hay diferentes planes que se adaptan a todos, incluyendo una prueba gratuita para un pequeño número de solicitudes, pero su tarifa está limitada para prevenir el abuso del servicio.
Zyla proporciona una amplia gama de métodos de integración para casi todos los lenguajes de programación. Puedes utilizar estos códigos para integrarte con tu proyecto según lo necesites.
La API devuelve información detallada sobre la edad e historia de un dominio, incluyendo años, meses y días desde su creación, así como fechas de expiración y actualización.
El endpoint "Extract Info" devuelve texto limpio extraído de una página web, mientras que el endpoint "Exc Marktdown" proporciona el mismo contenido formateado en markdown. Ambos endpoints se centran en ofrecer contenido estructurado y legible para análisis o integración.
Los datos de respuesta generalmente incluyen el contenido extraído como un único bloque de texto para el punto final "Extraer Info" y una cadena en formato markdown para el punto final "Exc Marktdown". Se puede incluir metadatos adicionales según la implementación.
Los datos de respuesta están estructurados como un objeto JSON, que contiene el contenido extraído como un par clave-valor. Por ejemplo, la clave podría ser "contenido" con el texto limpio o markdown correspondiente como valor.
El parámetro principal para ambos puntos finales es la "url" de la página web de la que se extraerá contenido. Los usuarios pueden personalizar sus solicitudes proporcionando diferentes URL para dirigirse a páginas web específicas.
Cada punto final proporciona contenido textual de páginas web, enfocándose en el cuerpo principal del texto mientras filtra anuncios, menús y otros elementos no esenciales. Esto asegura que los usuarios reciban información relevante para sus necesidades.
Los usuarios pueden integrar el texto limpio o markdown devuelto en aplicaciones para la generación de contenido, análisis o documentación. Por ejemplo, el markdown se puede usar directamente en plataformas de blogs, mientras que el texto limpio se puede analizar para obtener información.
Los casos de uso comunes incluyen la generación de contenido para blogs, la recolección de datos de investigación de mercado, resúmenes de noticias automatizados, la creación de documentación técnica y el análisis SEO. Cada caso de uso aprovecha la capacidad de la API para extraer y formatear contenido web.
La API emplea algoritmos para filtrar contenido irrelevante, asegurando que el texto extraído sea limpio y significativo. Actualizaciones y mejoras continuas en el proceso de extracción ayudan a mantener una alta calidad y relevancia de los datos.
Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.
Prices are listed in USD (United States Dollar), EUR (Euro), CAD (Canadian Dollar), AUD (Australian Dollar), and GBP (British Pound). We accept all major debit and credit cards. Our payment system uses the latest security technology and is powered by Stripe, one of the world's most reliable payment companies. If you have any trouble paying by card, just contact us at [email protected]
Additionally, if you already have an active subscription in any of these currencies (USD, EUR, CAD, AUD, GBP), that currency will remain for subsequent subscriptions. You can change the currency at any time as long as you don't have any active subscriptions.
The local currency shown on the pricing page is based on the country of your IP address and is provided for reference only. The actual prices are in USD (United States Dollar). When you make a payment, the charge will appear on your card statement in USD, even if you see the equivalent amount in your local currency on our website. This means you cannot pay directly with your local currency.
Occasionally, a bank may decline the charge due to its fraud protection settings. We suggest reaching out to your bank initially to check if they are blocking our charges. Also, you can access the Billing Portal and change the card associated to make the payment. If these does not work and you need further assistance, please contact our team at [email protected]
Prices are determined by a recurring monthly or yearly subscription, depending on the chosen plan.
API calls are deducted from your plan based on successful requests. Each plan comes with a specific number of calls that you can make per month. Only successful calls, indicated by a Status 200 response, will be counted against your total. This ensures that failed or incomplete requests do not impact your monthly quota.
Zyla API Hub works on a recurring monthly subscription system. Your billing cycle will start the day you purchase one of the paid plans, and it will renew the same day of the next month. So be aware to cancel your subscription beforehand if you want to avoid future charges.
To upgrade your current subscription plan, simply go to the pricing page of the API and select the plan you want to upgrade to. The upgrade will be instant, allowing you to immediately enjoy the features of the new plan. Please note that any remaining calls from your previous plan will not be carried over to the new plan, so be aware of this when upgrading. You will be charged the full amount of the new plan.
To check how many API calls you have left for the current month, refer to the 'X-Zyla-API-Calls-Monthly-Remaining' field in the response header. For example, if your plan allows 1,000 requests per month and you've used 100, this field in the response header will indicate 900 remaining calls.
To see the maximum number of API requests your plan allows, check the 'X-Zyla-RateLimit-Limit' response header. For instance, if your plan includes 1,000 requests per month, this header will display 1,000.
The 'X-Zyla-RateLimit-Reset' header shows the number of seconds until your rate limit resets. This tells you when your request count will start fresh. For example, if it displays 3,600, it means 3,600 seconds are left until the limit resets.
Yes, you can cancel your plan anytime by going to your account and selecting the cancellation option on the Billing page. Please note that upgrades, downgrades, and cancellations take effect immediately. Additionally, upon cancellation, you will no longer have access to the service, even if you have remaining calls left in your quota.
You can contact us through our chat channel to receive immediate assistance. We are always online from 8 am to 5 pm (EST). If you reach us after that time, we will get back to you as soon as possible. Additionally, you can contact us via email at [email protected]
To give you the opportunity to experience our APIs without any commitment, we offer a 7-day free trial that allows you to make up to 50 API calls at no cost. This trial can be used only once, so we recommend applying it to the API that interests you the most. While most of our APIs offer a free trial, some may not. The trial concludes after 7 days or once you've made 50 requests, whichever occurs first. If you reach the 50 request limit during the trial, you will need to "Start Your Paid Plan" to continue making requests. You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab. Alternatively, if you don't cancel your subscription before the 7th day, your free trial will end, and your plan will automatically be billed, granting you access to all the API calls specified in your plan. Please keep this in mind to avoid unwanted charges.
After 7 days, you will be charged the full amount for the plan you were subscribed to during the trial. Therefore, it's important to cancel before the trial period ends. Refund requests for forgetting to cancel on time are not accepted.
When you subscribe to an API free trial, you can make up to 50 API calls. If you wish to make additional API calls beyond this limit, the API will prompt you to perform an "Start Your Paid Plan." You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab.
Payout Orders are processed between the 20th and the 30th of each month. If you submit your request before the 20th, your payment will be processed within this timeframe.
Nivel de Servicio:
100%
Tiempo de Respuesta:
12,200ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,583ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
472ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,500ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
0ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,883ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,288ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
0ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
8,098ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
7,660ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
201ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
508ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,176ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
331ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
230ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,176ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
309ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
29ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,122ms
Nivel de Servicio:
100%
Tiempo de Respuesta:
1,043ms