aboutsummaryrefslogtreecommitdiffstats
path: root/drivers
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2019-02-07 10:34:39 -0800
committerDavid S. Miller <davem@davemloft.net>2019-02-07 10:34:39 -0800
commit0739d24d0c7b79a944b366dae4aaea7e7f452cd5 (patch)
tree89580a11f3e6d759f21f4abb661349e46c733fc1 /drivers
parent8f289805616e81f7c1690931aa8a586c76f4fa88 (diff)
parentdb2ab7a08f06aa4dd784b87bafeb3169ee7f0d95 (diff)
downloadkernel_replicant_linux-0739d24d0c7b79a944b366dae4aaea7e7f452cd5.tar.gz
kernel_replicant_linux-0739d24d0c7b79a944b366dae4aaea7e7f452cd5.tar.bz2
kernel_replicant_linux-0739d24d0c7b79a944b366dae4aaea7e7f452cd5.zip
Merge branch 'devlink-health'
Eran Ben Elisha says: ==================== Devlink health reporting and recovery system The health mechanism is targeted for Real Time Alerting, in order to know when something bad had happened to a PCI device - Provide alert debug information - Self healing - If problem needs vendor support, provide a way to gather all needed debugging information. The main idea is to unify and centralize driver health reports in the generic devlink instance and allow the user to set different attributes of the health reporting and recovery procedures. The devlink health reporter: Device driver creates a "health reporter" per each error/health type. Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) or unknown (driver specific). For each registered health reporter a driver can issue error/health reports asynchronously. All health reports handling is done by devlink. Device driver can provide specific callbacks for each "health reporter", e.g. - Recovery procedures - Diagnostics and object dump procedures - OOB initial attributes Different parts of the driver can register different types of health reporters with different handlers. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and saved at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: - Auto-recovery configuration - Grace period vs. time passed since last recover The user interface: User can access/change each reporter attributes and driver specific callbacks via devlink, e.g per error type (per health reporter) - Configure reporter's generic attributes (like: Disable/enable auto recovery) - Invoke recovery procedure - Run diagnostics - Object dump The devlink health interface (via netlink): DEVLINK_CMD_HEALTH_REPORTER_GET Retrieves status and configuration info per DEV and reporter. DEVLINK_CMD_HEALTH_REPORTER_SET Allows reporter-related configuration setting. DEVLINK_CMD_HEALTH_REPORTER_RECOVER Triggers a reporter's recovery procedure. DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE Retrieves diagnostics data from a reporter on a device. DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET Retrieves the last stored dump. Devlink health saves a single dump. If an dump is not already stored by the devlink for this reporter, devlink generates a new dump. dump output is defined by the reporter. DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR Clears the last saved dump file for the specified reporter. netlink +--------------------------+ | | | + | | | | +--------------------------+ |request for ops |(diagnose, mlx5_core devlink |recover, |dump) +--------+ +--------------------------+ | | | reporter| | | | | +---------v----------+ | | | ops execution | | | | | <----------------------------------+ | | | | | | | | | | | + ^------------------+ | | | | | request for ops | | | | | (recover, dump) | | | | | | | | | +-+------------------+ | | | health report | | health handler | | | +-------------------------------> | | | | | +--------------------+ | | | health reporter create | | | +----------------------------> | +--------+ +--------------------------+ In this patchset, mlx5e TX reporter is implemented. Cmdline format: devlink health show [DEV reporter REPORTE_NAME] devlink health recover DEV reporter REPORTER_NAME devlink health diagnose DEV reporter REPORTER_NAME devlink health dump show DEV reporter REPORTER_NAME devlink health dump clear DEV reporter REPORTER_NAME devlink health set DEV reporter REPORTER_NAME NAME VALUE Cmdline examples: $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 0 last_dump_ts N/A parameters: grace_period 500 auto_recover false $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false $devlink health recover pci/0000:00:09 reporter tx $devlink health set pci/0000:00:09.0 reporter tx grace_period 3500 $devlink health set pci/0000:00:09.0 reporter tx auto_recover false Changelog: v4: - Rebase on latest net-next - Remove trace_devlink_health signature exposure in case CONFIG_NET_DEVLINK is not defined as it shall only be used from devlink. v3: - Redesign of devlink <-> driver fmsg API - Various bug fixes v2: - Remove FW* reporters to decrease the amount of patches in the patchset ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers')
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/Makefile2
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/en.h18
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h15
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c297
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/en_main.c189
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/en_tx.c5
6 files changed, 361 insertions, 165 deletions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9de9abacf7f6..6bb2a860b15b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -22,7 +22,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
#
mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
- en_selftest.o en/port.o en/monitor_stats.o
+ en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o
#
# Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6dd74ef69389..66e510b16243 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -387,10 +387,7 @@ struct mlx5e_txqsq {
struct mlx5e_channel *channel;
int txq_ix;
u32 rate_limit;
- struct mlx5e_txqsq_recover {
- struct work_struct recover_work;
- u64 last_recover;
- } recover;
+ struct work_struct recover_work;
} ____cacheline_aligned_in_smp;
struct mlx5e_dma_info {
@@ -683,6 +680,13 @@ struct mlx5e_rss_params {
u8 hfunc;
};
+struct mlx5e_modify_sq_param {
+ int curr_state;
+ int next_state;
+ int rl_update;
+ int rl_index;
+};
+
struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -738,6 +742,7 @@ struct mlx5e_priv {
#ifdef CONFIG_MLX5_EN_TLS
struct mlx5e_tls *tls;
#endif
+ struct devlink_health_reporter *tx_reporter;
};
struct mlx5e_profile {
@@ -868,6 +873,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params);
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+ struct mlx5e_modify_sq_param *p);
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_tx_disable_queue(struct netdev_queue *txq);
+
static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
{
return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
new file mode 100644
index 000000000000..e78e92753d73
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __MLX5E_EN_REPORTER_H
+#define __MLX5E_EN_REPORTER_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
new file mode 100644
index 000000000000..0aebfb377cf0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -0,0 +1,297 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#include <net/devlink.h>
+#include "reporter.h"
+#include "lib/eq.h"
+
+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
+
+struct mlx5e_tx_err_ctx {
+ int (*recover)(struct mlx5e_txqsq *sq);
+ struct mlx5e_txqsq *sq;
+};
+
+static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+{
+ unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
+
+ while (time_before(jiffies, exp_time)) {
+ if (sq->cc == sq->pc)
+ return 0;
+
+ msleep(20);
+ }
+
+ netdev_err(sq->channel->netdev,
+ "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
+ sq->sqn, sq->cc, sq->pc);
+
+ return -ETIMEDOUT;
+}
+
+static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
+{
+ WARN_ONCE(sq->cc != sq->pc,
+ "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
+ sq->sqn, sq->cc, sq->pc);
+ sq->cc = 0;
+ sq->dma_fifo_cc = 0;
+ sq->pc = 0;
+}
+
+static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+{
+ struct mlx5_core_dev *mdev = sq->channel->mdev;
+ struct net_device *dev = sq->channel->netdev;
+ struct mlx5e_modify_sq_param msp = {0};
+ int err;
+
+ msp.curr_state = curr_state;
+ msp.next_state = MLX5_SQC_STATE_RST;
+
+ err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+ if (err) {
+ netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
+ return err;
+ }
+
+ memset(&msp, 0, sizeof(msp));
+ msp.curr_state = MLX5_SQC_STATE_RST;
+ msp.next_state = MLX5_SQC_STATE_RDY;
+
+ err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+ if (err) {
+ netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
+ return err;
+ }
+
+ return 0;
+}
+
+static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
+{
+ struct mlx5_core_dev *mdev = sq->channel->mdev;
+ struct net_device *dev = sq->channel->netdev;
+ u8 state;
+ int err;
+
+ if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+ return 0;
+
+ err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
+ if (err) {
+ netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
+ sq->sqn, err);
+ return err;
+ }
+
+ if (state != MLX5_SQC_STATE_ERR) {
+ netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
+ return -EINVAL;
+ }
+
+ mlx5e_tx_disable_queue(sq->txq);
+
+ err = mlx5e_wait_for_sq_flush(sq);
+ if (err)
+ return err;
+
+ /* At this point, no new packets will arrive from the stack as TXQ is
+ * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
+ * pending WQEs. SQ can safely reset the SQ.
+ */
+
+ err = mlx5e_sq_to_ready(sq, state);
+ if (err)
+ return err;
+
+ mlx5e_reset_txqsq_cc_pc(sq);
+ sq->stats->recover++;
+ mlx5e_activate_txqsq(sq);
+
+ return 0;
+}
+
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
+{
+ char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+ struct mlx5e_tx_err_ctx err_ctx = {0};
+
+ err_ctx.sq = sq;
+ err_ctx.recover = mlx5e_tx_reporter_err_cqe_recover;
+ sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
+
+ devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+ &err_ctx);
+}
+
+static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
+{
+ struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
+ u32 eqe_count;
+ int ret;
+
+ netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+ eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+
+ eqe_count = mlx5_eq_poll_irq_disabled(eq);
+ ret = eqe_count ? true : false;
+ if (!eqe_count) {
+ clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
+ return ret;
+ }
+
+ netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
+ eqe_count, eq->core.eqn);
+ sq->channel->stats->eq_rearm++;
+ return ret;
+}
+
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq)
+{
+ char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+ struct mlx5e_tx_err_ctx err_ctx;
+
+ err_ctx.sq = sq;
+ err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
+ sprintf(err_str,
+ "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
+ sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
+ jiffies_to_usecs(jiffies - sq->txq->trans_start));
+
+ return devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+ &err_ctx);
+}
+
+/* state lock cannot be grabbed within this function.
+ * It can cause a dead lock or a read-after-free.
+ */
+int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
+{
+ return err_ctx->recover(err_ctx->sq);
+}
+
+static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
+{
+ int err;
+
+ rtnl_lock();
+ mutex_lock(&priv->state_lock);
+ mlx5e_close_locked(priv->netdev);
+ err = mlx5e_open_locked(priv->netdev);
+ mutex_unlock(&priv->state_lock);
+ rtnl_unlock();
+
+ return err;
+}
+
+static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
+ void *context)
+{
+ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+ struct mlx5e_tx_err_ctx *err_ctx = context;
+
+ return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
+ mlx5e_tx_reporter_recover_all(priv);
+}
+
+static int
+mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
+ u32 sqn, u8 state, bool stopped)
+{
+ int err;
+
+ err = devlink_fmsg_obj_nest_start(fmsg);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sqn);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u8_pair_put(fmsg, "HW state", state);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_bool_pair_put(fmsg, "stopped", stopped);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_obj_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg)
+{
+ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+ int i, err = 0;
+
+ mutex_lock(&priv->state_lock);
+
+ if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+ goto unlock;
+
+ err = devlink_fmsg_arr_pair_nest_start(fmsg, "SQs");
+ if (err)
+ goto unlock;
+
+ for (i = 0; i < priv->channels.num * priv->channels.params.num_tc;
+ i++) {
+ struct mlx5e_txqsq *sq = priv->txq2sq[i];
+ u8 state;
+
+ err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+ if (err)
+ break;
+
+ err = mlx5e_tx_reporter_build_diagnose_output(fmsg, sq->sqn,
+ state,
+ netif_xmit_stopped(sq->txq));
+ if (err)
+ break;
+ }
+ err = devlink_fmsg_arr_pair_nest_end(fmsg);
+ if (err)
+ goto unlock;
+
+unlock:
+ mutex_unlock(&priv->state_lock);
+ return err;
+}
+
+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
+ .name = "tx",
+ .recover = mlx5e_tx_reporter_recover,
+ .diagnose = mlx5e_tx_reporter_diagnose,
+};
+
+#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ struct devlink *devlink = priv_to_devlink(mdev);
+
+ priv->tx_reporter =
+ devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
+ MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+ true, priv);
+ if (IS_ERR_OR_NULL(priv->tx_reporter))
+ netdev_warn(priv->netdev,
+ "Failed to create tx reporter, err = %ld\n",
+ PTR_ERR(priv->tx_reporter));
+ return PTR_ERR_OR_ZERO(priv->tx_reporter);
+}
+
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
+{
+ if (IS_ERR_OR_NULL(priv->tx_reporter))
+ return;
+
+ devlink_health_reporter_destroy(priv->tx_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 099d307e6f25..d81ebba7c04e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -51,6 +51,7 @@
#include "en/xdp.h"
#include "lib/eq.h"
#include "en/monitor_stats.h"
+#include "en/reporter.h"
struct mlx5e_rq_param {
u32 rqc[MLX5_ST_SZ_DW(rqc)];
@@ -1159,7 +1160,7 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
return 0;
}
-static void mlx5e_sq_recover(struct work_struct *work);
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
int txq_ix,
struct mlx5e_params *params,
@@ -1181,7 +1182,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
sq->uar_map = mdev->mlx5e_res.bfreg.map;
sq->min_inline_mode = params->tx_min_inline_mode;
sq->stats = &c->priv->channel_stats[c->ix].sq[tc];
- INIT_WORK(&sq->recover.recover_work, mlx5e_sq_recover);
+ INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work);
if (MLX5_IPSEC_DEV(c->priv->mdev))
set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
if (mlx5_accel_is_tls_device(c->priv->mdev))
@@ -1269,15 +1270,8 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
return err;
}
-struct mlx5e_modify_sq_param {
- int curr_state;
- int next_state;
- bool rl_update;
- int rl_index;
-};
-
-static int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
- struct mlx5e_modify_sq_param *p)
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+ struct mlx5e_modify_sq_param *p)
{
void *in;
void *sqc;
@@ -1375,17 +1369,7 @@ err_free_txqsq:
return err;
}
-static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
-{
- WARN_ONCE(sq->cc != sq->pc,
- "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
- sq->sqn, sq->cc, sq->pc);
- sq->cc = 0;
- sq->dma_fifo_cc = 0;
- sq->pc = 0;
-}
-
-static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
{
sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
@@ -1394,7 +1378,7 @@ static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
netif_tx_start_queue(sq->txq);
}
-static inline void netif_tx_disable_queue(struct netdev_queue *txq)
+void mlx5e_tx_disable_queue(struct netdev_queue *txq)
{
__netif_tx_lock_bh(txq);
netif_tx_stop_queue(txq);
@@ -1410,7 +1394,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
/* prevent netif_tx_wake_queue */
napi_synchronize(&c->napi);
- netif_tx_disable_queue(sq->txq);
+ mlx5e_tx_disable_queue(sq->txq);
/* last doorbell out, godspeed .. */
if (mlx5e_wqc_has_room_for(wq, sq->cc, sq->pc, 1)) {
@@ -1430,6 +1414,7 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
struct mlx5_rate_limit rl = {0};
cancel_work_sync(&sq->dim.work);
+ cancel_work_sync(&sq->recover_work);
mlx5e_destroy_sq(mdev, sq->sqn);
if (sq->rate_limit) {
rl.rate = sq->rate_limit;
@@ -1439,105 +1424,12 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
mlx5e_free_txqsq(sq);
}
-static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
-{
- unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
-
- while (time_before(jiffies, exp_time)) {
- if (sq->cc == sq->pc)
- return 0;
-
- msleep(20);
- }
-
- netdev_err(sq->channel->netdev,
- "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
- sq->sqn, sq->cc, sq->pc);
-
- return -ETIMEDOUT;
-}
-
-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
-{
- struct mlx5_core_dev *mdev = sq->channel->mdev;
- struct net_device *dev = sq->channel->netdev;
- struct mlx5e_modify_sq_param msp = {0};
- int err;
-
- msp.curr_state = curr_state;
- msp.next_state = MLX5_SQC_STATE_RST;
-
- err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
- if (err) {
- netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
- return err;
- }
-
- memset(&msp, 0, sizeof(msp));
- msp.curr_state = MLX5_SQC_STATE_RST;
- msp.next_state = MLX5_SQC_STATE_RDY;
-
- err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
- if (err) {
- netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
- return err;
- }
-
- return 0;
-}
-
-static void mlx5e_sq_recover(struct work_struct *work)
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
{
- struct mlx5e_txqsq_recover *recover =
- container_of(work, struct mlx5e_txqsq_recover,
- recover_work);
- struct mlx5e_txqsq *sq = container_of(recover, struct mlx5e_txqsq,
- recover);
- struct mlx5_core_dev *mdev = sq->channel->mdev;
- struct net_device *dev = sq->channel->netdev;
- u8 state;
- int err;
-
- err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
- if (err) {
- netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
- sq->sqn, err);
- return;
- }
-
- if (state != MLX5_RQC_STATE_ERR) {
- netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
- return;
- }
-
- netif_tx_disable_queue(sq->txq);
-
- if (mlx5e_wait_for_sq_flush(sq))
- return;
-
- /* If the interval between two consecutive recovers per SQ is too
- * short, don't recover to avoid infinite loop of ERR_CQE -> recover.
- * If we reached this state, there is probably a bug that needs to be
- * fixed. let's keep the queue close and let tx timeout cleanup.
- */
- if (jiffies_to_msecs(jiffies - recover->last_recover) <
- MLX5E_SQ_RECOVER_MIN_INTERVAL) {
- netdev_err(dev, "Recover SQ 0x%x canceled, too many error CQEs\n",
- sq->sqn);
- return;
- }
-
- /* At this point, no new packets will arrive from the stack as TXQ is
- * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
- * pending WQEs. SQ can safely reset the SQ.
- */
- if (mlx5e_sq_to_ready(sq, state))
- return;
+ struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
+ recover_work);
- mlx5e_reset_txqsq_cc_pc(sq);
- sq->stats->recover++;
- recover->last_recover = jiffies;
- mlx5e_activate_txqsq(sq);
+ mlx5e_tx_reporter_err_cqe(sq);
}
static int mlx5e_open_icosq(struct mlx5e_channel *c,
@@ -3236,6 +3128,7 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
{
int tc;
+ mlx5e_tx_reporter_destroy(priv);
for (tc = 0; tc < priv->profile->max_tc; tc++)
mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
}
@@ -4223,31 +4116,13 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
return features;
}
-static bool mlx5e_tx_timeout_eq_recover(struct net_device *dev,
- struct mlx5e_txqsq *sq)
-{
- struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
- u32 eqe_count;
-
- netdev_err(dev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
- eq->core.eqn, eq->core.cons_index, eq->core.irqn);
-
- eqe_count = mlx5_eq_poll_irq_disabled(eq);
- if (!eqe_count)
- return false;
-
- netdev_err(dev, "Recover %d eqes on EQ 0x%x\n", eqe_count, eq->core.eqn);
- sq->channel->stats->eq_rearm++;
- return true;
-}
-
static void mlx5e_tx_timeout_work(struct work_struct *work)
{
struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
tx_timeout_work);
- struct net_device *dev = priv->netdev;
- bool reopen_channels = false;
- int i, err;
+ bool report_failed = false;
+ int err;
+ int i;
rtnl_lock();
mutex_lock(&priv->state_lock);
@@ -4256,31 +4131,22 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
goto unlock;
for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) {
- struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
+ struct netdev_queue *dev_queue =
+ netdev_get_tx_queue(priv->netdev, i);
struct mlx5e_txqsq *sq = priv->txq2sq[i];
if (!netif_xmit_stopped(dev_queue))
continue;
- netdev_err(dev,
- "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
- i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
- jiffies_to_usecs(jiffies - dev_queue->trans_start));
-
- /* If we recover a lost interrupt, most likely TX timeout will
- * be resolved, skip reopening channels
- */
- if (!mlx5e_tx_timeout_eq_recover(dev, sq)) {
- clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
- reopen_channels = true;
- }
+ if (mlx5e_tx_reporter_timeout(sq))
+ report_failed = true;
}
- if (!reopen_channels)
+ if (!report_failed)
goto unlock;
- mlx5e_close_locked(dev);
- err = mlx5e_open_locked(dev);
+ mlx5e_close_locked(priv->netdev);
+ err = mlx5e_open_locked(priv->netdev);
if (err)
netdev_err(priv->netdev,
"mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
@@ -4296,6 +4162,12 @@ static void mlx5e_tx_timeout(struct net_device *dev)
struct mlx5e_priv *priv = netdev_priv(dev);
netdev_err(dev, "TX timeout detected\n");
+
+ if (IS_ERR_OR_NULL(priv->tx_reporter)) {
+ netdev_err_once(priv->netdev, "tx timeout will not be handled, no valid tx reporter\n");
+ return;
+ }
+
queue_work(priv->wq, &priv->tx_timeout_work);
}
@@ -4953,6 +4825,7 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
#ifdef CONFIG_MLX5_CORE_EN_DCB
mlx5e_dcbnl_initialize(priv);
#endif
+ mlx5e_tx_reporter_create(priv);
return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 598ad7e4d5c9..189211295e48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -513,8 +513,9 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
&sq->state)) {
mlx5e_dump_error_cqe(sq,
(struct mlx5_err_cqe *)cqe);
- queue_work(cq->channel->priv->wq,
- &sq->recover.recover_work);
+ if (!IS_ERR_OR_NULL(cq->channel->priv->tx_reporter))
+ queue_work(cq->channel->priv->wq,
+ &sq->recover_work);
}
stats->cqe_err++;
}