Haystack 2.4.0
Highlights
๐ Local LLMs and custom generation parameters in evaluation
The new api_params init parameter added to LLM-based evaluators such as ContextRelevanceEvaluator and FaithfulnessEvaluator can be used to pass in supported
OpenAIGenerator parameters, allowing for custom generation parameters (via generation_kwargs) and local LLM support (via api_base_url).
๐ New Joiner
New
AnswerJoiner component to combine multiple lists of Answers.
โฌ๏ธ Upgrade Notes
- The
ContextRelevanceEvaluatornow returns a list of relevant sentences for each context, instead of all the sentences in a context. Also, a score of 1 is now returned if a relevant sentence is found, and 0 otherwise. - Removed the deprecated
DynamicPromptBuilderandDynamicChatPromptBuildercomponents. UsePromptBuilderandChatPromptBuilderinstead. OutputAdapterandConditionalRoutercan’t return users inputs anymore.Multiplexeris removed and users should switch toBranchJoinerinstead.- Removed deprecated init parameters
extractor_typeandtry_othersfromHTMLToDocument. SentenceWindowRetrievalcomponent has been renamed toSenetenceWindowRetriever.- The
serialize_callback_handleranddeserialize_callback_handlerutility functions have been removed. Useserialize_callableanddeserialize_callableinstead. For more information onserialize_callableanddeserialize_callable, see the API reference: https://docs.haystack.deepset.ai/reference/utils-api#module-callable_serialization
๐ New Features
- LLM based evaluators can pass in supported
OpenAIGeneratorparameters viaapi_params. This allows for custom generation_kwargs, changing the api_base_url (for local evaluation), and all other supported parameters as described in the OpenAIGenerator docs. - Introduced a new
AnswerJoinercomponent that allows joining multiple lists of Answers into a single list using the Concatenate join mode. - Add
truncate_dimparameter to Sentence Transformers Embedders, which allows truncating embeddings. Especially useful for models trained with Matryoshka Representation Learning. - Add
precisionparameter to Sentence Transformers Embedders, which allows quantized embeddings. Especially useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
โก๏ธ Enhancement Notes
- Adds model_kwargs and tokenizer_kwargs to the components TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder. This allows passing things like model_max_length or torch_dtype for better management of model inference.
- Added
unicode_normalizationparameter to the DocumentCleaner, allowing to normalize the text to NFC, NFD, NFKC, or NFKD. - Added
ascii_onlyparameter to the DocumentCleaner, transforming letters with diacritics to their ASCII equivalent and removing other non-ASCII characters. - Improved error messages for deserialization errors.
TikaDocumentConverternow returns page breaks (“f”) in the output. This only works for PDF files.- Enhanced filter application logic to support merging of filters. It facilitates more precise retrieval filtering, allowing for both init and runtime complex filter combinations with logical operators. For more details see https://docs.haystack.deepset.ai/docs/metadata-filtering
- The
streaming_callbackparameter can be passed to OpenAIGenerator and OpenAIChatGenerator during pipeline run. This prevents the need to recreate pipelines for streaming callbacks. - Add
max_retriesandtimeoutparameters to the AzureOpenAIChatGenerator initializations. - Document Python 3.11 and 3.12 support in project configuration.
- Refactor DocumentJoiner to use enum pattern for the ‘join_mode’ parameter instead of bare string.
- Add
max_retries,timeoutparameters to theAzureOpenAIDocumentEmbedderinitialization. - Add
max_retriesandtimeoutparameters to the AzureOpenAITextEmbedder initializations. - Introduce an utility function to deserialize a generic Document Store from the init_parameters of a serialized component.
โ ๏ธ Deprecation Notes
- Haystack 1.x legacy filters are deprecated and will be removed in a future release. Please use the new filter style as described in the documentation - https://docs.haystack.deepset.ai/docs/metadata-filtering
- Deprecate the method
to_openai_formatof theChatMessagedataclass. This method was never intended to be public and was only used internally. Now, each Chat Generator will know internally how to convert the messages to the format of their specific provider. - Deprecate the unused
debugparameter in thePipeline.runmethod. SentenceWindowRetrievalis deprecated and will be removed in future. UseSentenceWindowRetrieverinstead.
Security Notes
-
Fix issue that could lead to remote code execution when using insecure Jinja template in the following Components:
PromptBuilderhatPromptBuilderOutputAdapteronditionalRouter
The same issue has been fixed in the
PipelineTemplateclass too.
๐ Bug Fixes
- Fix
ChatPromptBuilderfrom_dict method when template value is None. - Fix the
DocumentCleanerremoving theftag from content preventing from counting page number (by Splitter for example). - The DocumentSplitter was incorrectly calculating the
split_start_idxandsplit_overlapinformation due to slight miscalculations of appropriate indices. This fixes those so thesplit_start_idxandsplit_overlapinformation is correct. - Fix bug in Pipeline.run() executing Components in a wrong and unexpected order
- Encoding of HTML files in LinkContentFetcher
- Fix Output Adapter from_dict method when custom_filters value is None.
- Prevent
Pipeline.from_dictfrom modifying the dictionary parameter passed to it. - Fix a bug in
Pipeline.run()that would cause it to get stuck in an infinite loop and never return. This was caused by Components waiting forever for their inputs when parts of the Pipeline graph are skipped cause of a “decision” Component not returning outputs for that side of the Pipeline. - This updates the components, TransformersSimilarityRanker, SentenceTransformersDiversityRanker, SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder and LocalWhisperTranscriber from_dict methods to work when loading with init_parameters only containing required parameters.
- Pins structlog to <= 24.2.0 to avoid some unit test failures. This is a temporary fix until we can upgrade tests to a newer versions of structlog.
- Correctly expose
PPTXToDocumentcomponent inhaystacknamespace. - Fix
TransformersZeroShotTextRouterandTransformersTextRouterfrom_dictmethods to work wheninit_parametersonly contain required variables. - For components that support multiple Document Stores, prioritize using the specific
from_dictclass method for deserialization when available. Otherwise, fall back to the genericdefault_from_dictmethod. This impacts the following generic components:acheChecker,DocumentWriter,FilterRetriever, andSentenceWindowRetriever.