These options stress the multilingual features of MarkLogic. The Wikipedia data was also chosen because it contains non-English and non-ASCII text of which we use: Arabic, Dutch, French, German, Italian, Japanese, Korean, Persian, Portuguese, Russian, Spanish, Simplified Chinese and Traditional Chinese. This activity requires CPU cycles which make the benchmark a good balance between high I/O and high CPU utilization. The latency of Merge reads is captured in MR-lat and the latency of Merge writes in MW-lat.ĭuring ingestion, MarkLogic also indexes all of the documents, creates term lists, etc. A merge reads (Merge Read) some of the stands on disk and creates a new singular stand out of them Merge Write), coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments. To keep the number of stands to a manageable level, MarkLogic runs merges in the background. To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. As the total number of on-disk stands grows, an efficiency issue threatens to emerge.The latency of Save writes is captured in S-lat After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand.The latency of Journal writes is captures in the J-lat metric The journal protects from outages, it is guaranteed to survive a system crash thereafter. Updates can be additions, replacements or deletions of documents. Those changes can be applied again from the journal, without running the request again. When an update request runs, all the changes it made to the state of the database are recorded in the journal. Journal writes record the deltas to the database.In the diagram above we see the I/O paths and latencies in MarkLogic: Our focus in this test is to look at overall latency from each storage solution across four areas of interest: journal writes (J-lat), save writes (S-lat), as well as merge read (MR-lat) and merge write latency (MW-lat). For devices with lower I/O throughput, the total test time can span days. For PCIe Application Accelerators, each interval takes between 60-120 minutes to complete, thus putting the total test time into a range of 24-48 hours. To ensure the highest level of accuracy and to force each device into steady-state, we repeat the ingestion and query phases 24 times for flash-based devices. Merging involves reading multiple on-disk stands, writing back a merged single version and deleting the originals. As the number of on-disk stands increase, MarkLogic must merge them to reduce query overhead.In-memory stands quickly overflow and are continually written as on-disk stands.Initially documents are ingested into in-memory stands and the only disk writes are Journal saves.I/O is broken down into three categories: The ingestion phase in particular is I/O intensive. For ingestion we use MarkLogic Content Pump (mlcp). The corpus used is the publicly available Wikipedia xml collection. These queries also use MarkLogic features such as facets, pagination and bookmarks. Query phase where searches, views updates, and deletes are applied to the inserted data set.Ingestion phase where a large data set is inserted with indexes into the MarkLogic database.The workload is divided into two distinct parts: The benchmark we are utilizing is internally developed by MarkLogic and is used to evaluate both hardware configurations and upcoming MarkLogic software releases. Example solutions built on MarkLogic include intelligence analysis, real-time decision support, risk management, digital asset management, digital supply chain, and content delivery. Any environment that faces a combination of data volume, velocity, variety, and complexity-a data challenge known as Big Data-can be enhanced with MarkLogic. MarkLogic leverages existing tools, knowledge, and experience while providing a reliable, scalable, and secure platform for mission-critical data.Ĭompanies and organizations across industries including the public sector, media, and financial services have benefited from MarkLogic’s unique architecture. It provides the functionality enterprises need to deliver value. MarkLogic combines database functionality, search, and application services in a single system. It also has enterprise-grade capabilities like search, ACID transactions, failover, replication, and security to run mission-critical applications. MarkLogic 6 is an Enterprise NoSQL (“Not Only SQL”) database that has the flexibility and scalability to handle today’s data challenges that SQL-based databases were not designed to handle.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |